idnits 2.17.1 draft-ietf-avt-avpf-ccm-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2112. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2123. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2130. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2136. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 6 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 720 has weird spacing: '...sg type mult...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 17, 2006) is 6428 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2327 (Obsoleted by RFC 4566) == Outdated reference: A later version (-07) exists of draft-ietf-avt-topologies-00 ** Downref: Normative reference to an Informational draft: draft-ietf-avt-topologies (ref. 'Topologies') -- Obsolete informational reference (is this intentional?): RFC 2032 (Obsoleted by RFC 4587) == Outdated reference: A later version (-12) exists of draft-ietf-avt-profile-savpf-02 -- Obsolete informational reference (is this intentional?): RFC 3525 (Obsoleted by RFC 5125) Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Stephan Wenger 3 INTERNET-DRAFT Umesh Chandra 4 Expires: March 2007 Nokia 5 Magnus Westerlund 6 Bo Burman 7 Ericsson 8 September 17, 2006 10 Codec Control Messages in the 11 Audio-Visual Profile with Feedback (AVPF) 12 draft-ietf-avt-avpf-ccm-01.txt> 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 This document specifies a few extensions to the messages defined in 44 the Audio-Visual Profile with Feedback (AVPF). They are helpful 45 primarily in conversational multimedia scenarios where centralized 46 multipoint functionalities are in use. However some are also usable 47 in smaller multicast environments and point-to-point calls. The 48 extensions discussed are H.271 video back channel, Full Intra 49 Request, Temporary Maximum Media Bit-rate and Temporal Spatial Trade- 50 off. 52 TABLE OF CONTENTS 54 1. Introduction....................................................5 55 2. Definitions.....................................................7 56 2.1. Glossary...................................................7 57 2.2. Terminology................................................8 58 2.3. Topologies.................................................9 59 3. Motivation (Informative)........................................9 60 3.1. Use Cases.................................................10 61 3.2. Using the Media Path......................................12 62 3.3. Using AVPF................................................12 63 3.3.1. Reliability..........................................12 64 3.4. Multicast.................................................13 65 3.5. Feedback Messages.........................................13 66 3.5.1. Full Intra Request Command...........................13 67 3.5.1.1. Reliability.....................................14 68 3.5.2. Temporal Spatial Trade-off Request and Announcement..15 69 3.5.2.1. Point-to-point..................................15 70 3.5.2.2. Point-to-Multipoint using Multicast or Translators16 71 3.5.2.3. Point-to-Multipoint using RTP Mixer.............16 72 3.5.2.4. Reliability.....................................16 73 3.5.3. H.271 Video Back Channel Message conforming to ITU-T Rec. 74 H.271.......................................................17 75 3.5.3.1. Reliability.....................................19 76 3.5.4. Temporary Maximum Media Bit-rate Request.............19 77 3.5.4.1. MCU based Multi-point operation.................20 78 3.5.4.2. Point-to-Multipoint using Multicast or Translators21 79 3.5.4.3. Point-to-point operation........................22 80 3.5.4.4. Reliability.....................................22 81 4. RTCP Receiver Report Extensions................................23 82 4.1. Design Principles of the Extension Mechanism..............23 83 4.2. Transport Layer Feedback Messages.........................24 84 4.2.1. Temporary Maximum Media Bit-rate Request (TMMBR).....24 85 4.2.1.1. Semantics.......................................24 86 4.2.1.2. Message Format..................................26 87 4.2.1.3. Timing Rules....................................27 88 4.2.2. Temporary Maximum Media Bit-rate Notification (TMMBN) 27 89 4.2.2.1. Semantics.......................................27 90 4.2.2.2. Message Format..................................28 91 4.2.2.3. Timing Rules....................................29 92 4.3. Payload Specific Feedback Messages........................29 93 4.3.1. Full Intra Request (FIR) command.....................29 94 4.3.1.1. Semantics.......................................29 95 4.3.1.2. Message Format..................................31 96 4.3.1.3. Timing Rules....................................32 97 4.3.1.4. Remarks.........................................32 98 4.3.2. Temporal-Spatial Trade-off Request (TSTR)............33 99 4.3.2.1. Semantics.......................................33 100 4.3.2.2. Message Format..................................33 101 4.3.2.3. Timing Rules....................................34 102 4.3.2.4. Remarks.........................................34 103 4.3.3. Temporal-Spatial Trade-off Announcement (TSTA).......35 104 4.3.3.1. Semantics.......................................35 105 4.3.3.2. Message Format..................................35 106 4.3.3.3. Timing Rules....................................36 107 4.3.3.4. Remarks.........................................36 108 4.3.4. H.271 VideoBackChannelMessage (VBCM).................36 109 5. Congestion Control.............................................39 110 6. Security Considerations........................................39 111 7. SDP Definitions................................................40 112 7.1. Extension of rtcp-fb attribute............................40 113 7.2. Offer-Answer..............................................42 114 7.3. Examples..................................................42 115 8. IANA Considerations............................................45 116 9. Acknowledgements...............................................45 117 10. References....................................................46 118 10.1. Normative references.....................................46 119 10.2. Informative references...................................46 120 11. Authors' Addresses............................................47 121 12. List of Changes relative to previous drafts...................47 122 1. Introduction 124 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 125 developed, the main emphasis lied in the efficient support of point- 126 to-point and small multipoint scenarios without centralized 127 multipoint control. However, in practice, many small multipoint 128 conferences operate utilizing devices known as Multipoint Control 129 Units (MCUs). Long standing experience of the conversational video 130 conferencing industry suggests that there is a need for a few 131 additional feedback messages, to efficiently support MCU-based 132 multipoint conferencing. Some of the messages have applications 133 beyond centralized multipoint, and this is indicated in the 134 description of the message. This is especially true for the message 135 intended to carry ITU-T Rec. H.271 [H.271] bitstrings for video back 136 channel messages. 138 In RTP [RFC3550] terminology, MCUs comprise mixers and translators. 139 Most MCUs also include signalling support. During the development of 140 this memo, it was noticed that there is considerable confusion in the 141 community related to the use of terms such as "mixer", 142 "translator", and "MCU". In response to these concerns, a number of 143 topologies have been identified that are of practical relevance to 144 the industry, but were not envisioned (or at least not documented in 145 sufficient detail) in RTP. These topologies are documented in 146 [Topologies], and understanding this memo requires previous or 147 parallel study of [Topologies]. 149 Some of the messages defined here are forward only, in that they do 150 not require an explicit acknowledgement. Other messages require 151 acknowledgement, leading to a two way communication model that could 152 suggest to some to be useful for control purposes. It is not the 153 intention of this memo to open up RTCP to a generalized control 154 protocol. All mentioned messages have relatively strict real-time 155 constraints -- in the sense that their value diminishes with 156 increased delay. This makes the use of more traditional control 157 protocol means, such as SIP re-invites, undesirable. Furthermore, 158 all messages are of a very simple format that can be easily processed 159 by an RTP/RTCP sender/receiver. Finally, all messages infer only to 160 the RTP stream they are related to, and not to any other property of 161 a communication system. 163 The Full Intra Request (FIR) Command requires the receiver of the 164 message (and sender of the stream) to immediately insert a decoder 165 refresh point. In video coding, one commonly used form of a decoder 166 refresh point is an IDR or Intra picture. Other codecs may have 167 other forms of decoder refresh points. In order to fulfil congestion 168 control constraints, sending a decoder refresh point may imply a 169 significant drop in frame rate, as they are commonly much larger than 170 regular predicted content. The use of this message is restricted to 171 cases where no other means of decoder refresh can be employed, e.g. 172 during the join-phase of a new participant in a multipoint 173 conference. It is explicitly disallowed to use the FIR command for 174 error resilience purposes, and instead it is referred to AVPF's 175 [RFC4585] PLI message, which reports lost pictures and has been 176 included in AVPF for precisely that purpose. The message does not 177 require an acknowledgement, as the presence of a decoder refresh 178 point can be easily derived from the media bit stream. Today, the 179 FIR message appears to be useful primarily with video streams, but in 180 the future it may become helpful also in conjunction with other media 181 codecs that support prediction across RTP packets. 183 The Temporary Maximum Media Bandwidth Request (TMMBR) Message allows 184 to signal, from media receiver to media sender, the current maximum 185 supported media bit-rate for a given media stream. Once a bandwidth 186 limitation is established by the media sender, that sender notifies 187 the initiator of the request, and all other session participants, by 188 sending a TMMBN notification message. One usage scenarios can be 189 seen as limiting media senders in multiparty conferencing to the 190 slowest receiver's maximum media bandwidth reception/handling 191 capability. Such a use is helpful, for example, because the 192 receiver's situation may have changed due to computational load, or 193 because the receiver has just joined the conference and it is helpful 194 to inform media sender(s) about its constraints, without waiting for 195 congestion induced bandwidth reduction. Another application involves 196 graceful bandwidth adaptation in scenarios where the upper limit 197 connection bandwidth to a receiver changes, but is known in the 198 interval between these dynamic changes. The TMMBR message is useful 199 for all media types that are not inherently of constant bit rate. 201 The Video back channel message (VBCM) allows conveying bit streams 202 conforming to ITU-T Rec. H.271 [H.271], from a video receiver to 203 video sender. This ITU-T Recommendation defines codepoints for a 204 number of video-specific feedback messages. Examples include 205 messages to signal: 206 - the corruption of reference pictures or parts thereof, 207 - the corruption of decoder state information, e.g. parameter sets, 208 - the suggestion of using a reference picture other than the one 209 typically used, e.g. to support the NEWPRED algorithm [NEWPRED]. 210 The ITU-T plans to add codepoints to H.271 every time a need arises, 211 e.g. with the introduction of new video codecs or new tools into 212 existing video codecs. 214 There exists some overlap between H.271 messages and "native" 215 messages specified in this memo and in AVPF. Examples include the 216 PLI message of [RFC4585] and the FIR message specified herein. As a 217 general rule, the "native" messages should be prefered over the 218 sending of VBCM messages when all senders and receivers implement 219 this memo. However, if gateways are in the picture, it may be more 220 advisable to utilize VBCM. Similarly, for feedback message types 221 that exist in H.271 but do not exist in this memo or AVPF, there is 222 no other choice but using VBCM. 223 Video feedback channel messages according to H.271 do not require 224 acknowledgements on a protocol level, because the appropriate 225 reaction of the video encoder and sender can be derived from the 226 forward video bit stream. 228 Finally, the Temporal-Spatial Trade-off Request (TSTR) Message 229 enables a video receiver to signal to the video sender its preference 230 for spatial quality or high temporal resolution (frame rate). The 231 receiver of the video stream generates this signal typically based on 232 input from its user interface, so to react to explicit requests of 233 the user. However, some implicit use forms are also known. For 234 example, the trade-offs commonly used for live video and document 235 camera content are different. Obviously, this indication is relevant 236 only with respect to video transmission. The message is acknowledged 237 by an announcement message indicating the newly chosen tradeoff, so 238 to allow immediate user feedback. 240 2. Definitions 242 2.1. Glossary 244 ASM - Asynchronous Multicast 245 AVPF - The Extended RTP Profile for RTCP-based Feedback 246 FEC - Forward Error Correction 247 FIR - Full Intra Request 248 MCU - Multipoint Control Unit 249 MPEG - Moving Picture Experts Group 250 PtM - Point to Multipoint 251 PtP - Point to Point 252 TMMBN - Temporary Maximum Media Bit-rate Notification 253 TMMBR - Temporary Maximum Media Bit-rate Request 254 PLI - Picture Loss Indication 255 TSTA - Temporal Spatial Trade-off Announcement 256 TSTR - Temporal Spatial Trade-off Request 257 VBCM - Video Back Channel Message indication. 259 2.2. Terminology 261 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 262 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 263 document are to be interpreted as described in RFC 2119 [RFC2119]. 265 Message: 266 Codepoint defined by this specification, of one of the 267 following types: 269 Request: 270 Message that requires Acknowledgement 272 Acknowledgment: 273 Message that answers a Request 275 Command: 276 Message that forces the receiver to an action 278 Indication: 279 Message that reports a situation 281 Notification: 282 See Indication. 284 Note that, with the exception of "Notification", this terminology 285 is in alignment with ITU-T Rec. H.245. 287 Decoder Refresh Point: 288 A bit string, packetised in one or more RTP packets, which 289 completely resets the decoder to a known state. Typical 290 examples of Decoder Refresh Points are H.261 Intra pictures 291 and H.264 IDR pictures. However, there are also much more 292 complex decoder refresh points. 294 Typical examples for "hard" decoder refresh points are Intra 295 pictures in H.261, H.263, MPEG 1, MPEG 2, and MPEG-4 part 2, 296 and IDR pictures in H.264. "Gradual" decoder refresh points 297 may also be used; see for example [AVC]. While both "hard" 298 and "gradual" decoder refresh points are acceptable in the 299 scope of this specification, in most cases the user 300 experience will benefit from using a "hard" decoder refresh 301 point. 303 A decoder refresh point also contains all header information 304 above the picture layer (or equivalent, depending on the 305 video compression standard) that is conveyed in-band. In 306 H.264, for example, a decoder refresh point contains 307 parameter set NAL units that generate parameter sets 308 necessary for the decoding of the following slice/data 309 partition NAL units (and that are not conveyed out of band). 310 To the best of the author's knowledge, the term "Decoder 311 Refresh Point" has been formally defined only in H.264; hence 312 we are referring here to this video compression standard. 314 Decoding: 315 The operation of reconstructing the media stream. 317 Rendering: 318 The operation of presenting (parts of) the reconstructed 319 media stream to the user. 321 Stream thinning: 322 The operation of removing some of the packets from a media 323 stream. Stream thinning, preferably, is performed media 324 aware, implying that media packets are removed in the order 325 of their relevance to the reproductive quality. However even 326 when employing media-aware stream thinning, most media 327 streams quickly lose quality when subject to increasing 328 levels of thinning. Media-unaware stream thinning leads to 329 even worse quality degradation. 331 2.3. Topologies 333 Please refer to [Topologies] for an in depth discussion. the 334 topologies referred to throughout this memo are labeled (consistent 335 with [Topologies] as follows: 337 Topo-Point-to-Point . . . . . point-to-point communication 338 > Topo-Multicast . . . . . . multicast communication as in RFC 3550 339 > Topo-Translator . . . . . . translator based as in RFC 3550 340 > Topo-Mixer . . . . . . . . mixer based as in RFC 3550 341 > Topo-Video-switch-MCU . . . video switching MCU, 342 > Topo-RTCP-terminating-MCU . mixer but terminating RTCP 344 3. Motivation (Informative) 346 This section discusses the motivation and usage of the different 347 video and media control messages. The video control messages have 348 been under discussion for a long time, and a requirement draft was 349 drawn up [Basso]. This draft has expired; however we do quote 350 relevant sections of it to provide motivation and requirements. 352 3.1. Use Cases 354 There are a number of possible usages for the proposed feedback 355 messages. Let's begin with looking through the use cases Basso et al. 356 [Basso] proposed. Some of the use cases have been reformulated and 357 commented: 359 1. An RTP video mixer composes multiple encoded video sources into a 360 single encoded video stream. Each time a video source is added, 361 the RTP mixer needs to request a decoder refresh point from the 362 video source, so as to start an uncorrupted prediction chain on 363 the spatial area of the mixed picture occupied by the data from 364 the new video source. 366 2. An RTP video mixer that receives multiple encoded RTP video 367 streams from conference participants, and dynamically selects one 368 of the streams to be included in its output RTP stream. At the 369 time of a bit stream change (determined through means such as 370 voice activation or the user interface), the mixer requests a 371 decoder refresh point from the remote source, in order to avoid 372 using unrelated content as reference data for inter picture 373 prediction. After requesting the decoder refresh point, the video 374 mixer stops the delivery of the current RTP stream and monitors 375 the RTP stream from the new source until it detects data belonging 376 to the decoder refresh point. At that time, the RTP mixer starts 377 forwarding the newly selected stream to the receiver(s). 379 3. An application needs to signal to the remote encoder a request of 380 change of the desired trade-off in temporal/spatial resolution. 381 For example, one user may prefer a higher frame rate and a lower 382 spatial quality, and another use may prefer the opposite. This 383 choice is also highly content dependent. Many current video 384 conferencing systems offer in the user interface a mechanism to 385 make this selection, usually in the form of a slider. The 386 mechanism is helpful in point-to-point, centralized multipoint and 387 non-centralized multipoint uses. 389 4. Use case 4 of the Basso draft applies only to AVPF's PLI [RFC4585] 390 and is not reproduced here. 392 5. Use case 5 of the Basso draft relates to a mechanism known as 393 "freeze picture request". Sending freeze picture requests 394 over a non-reliable forward RTCP channel has been identified as 395 problematic. Therefore, no freeze picture request has been 396 included in this memo, and the use case discussion is not 397 reproduced here. 399 6. A video mixer dynamically selects one of the received video 400 streams to be sent out to participants and tries to provide the 401 highest bit rate possible to all participants, while minimizing 402 stream transrating. One way of achieving this is to setup sessions 403 with endpoints using the maximum bit rate accepted by that 404 endpoint, and by the call admission method used by the mixer. By 405 means of commands that allow reducing the maximum media bitrate 406 beyond what has been negotiated during session setup, the mixer 407 can then reduce the maximum bit rate sent by endpoints to the 408 lowest common denominator of all received streams. As the lowest 409 common denominator changes due to endpoints joining, leaving, or 410 network congestion, the mixer can adjust the limits to which 411 endpoints can send their streams to match the new limit. The mixer 412 then would request a new maximum bit rate, which is equal or less 413 than the maximum bit-rate negotiated at session setup, for a 414 specific media stream, and the remote endpoint can respond with 415 the actual bit-rate that it can support. 417 The picture Basso, et al draws up covers most applications we 418 foresee. However we would like to extend the list with two additional 419 use cases: 421 7. The used congestion control algorithms (AMID and TFRC) probe for 422 more bandwidth as long as there is something to send. With 423 congestion control using packet-loss as the indication for 424 congestion, this probing does generally result in reduced media 425 quality (often to a point where the distortion is large enough to 426 make the media unusable), due to packet loss and increased delay. 427 In a number of deployment scenarios, especially cellular ones, the 428 bottleneck link is often the last hop link. That cellular link 429 also commonly has some type of QoS negotiation enabling the 430 cellular device to learn the maximal bit-rate available over this 431 last hop. Thus indicating the maximum available bit-rate to the 432 transmitting part can be beneficial to prevent it from even trying 433 to exceed the known hard limit that exists. For cellular or other 434 mobile devices the available known bit-rate can also quickly 435 change due to handover to another transmission technology, QoS 436 renegotiation due to congestion, etc. To enable minimal disruption 437 of service a possibility for quick convergence, especially in 438 cases of reduced bandwidth, a media path signalling method is 439 desired. 441 8. The use of reference picture selection as an error resilience tool 442 has been introduced in 1997 as NEWPRED [NEWPRED], and is now 443 widely deployed. It operates the receiver sending a feedback 444 message to the sender, indicating a reference picture that should 445 be used for future prediction. AVPF contains a mechanism for 446 conveying such a message, but did not specify for which codec and 447 according to which syntax the message conforms to. Recently, the 448 ITU-T finalized Rec. H.271 which (among other message types) also 449 includes a feedback message. It is expected that this feedback 450 message will enjoy wide support and fairly quickly. Therefore, a 451 mechanism to convey feedback messages according to H.271 appears 452 to be desirable. 454 3.2. Using the Media Path 456 There are multiple reasons why we propose to use the media path for 457 the codec control messages. First, systems employing MCUs are often 458 separating the control and media processing parts. As these messages 459 are intended or generated by the media part rather than the 460 signalling part of the MCU, having them on the media path avoids 461 interfaces and unnecessary control traffic between signalling and 462 processing. If the MCU is physically decomposite, the use of the 463 media path avoids the need for media control protocol extensions 464 (e.g. in MEGACO [RFC3525]). 466 Secondly, the signalling path quite commonly contains several 467 signalling entities, e.g. SIP-proxies and application servers. 468 Avoiding signalling entities avoids delay for several reasons. 469 Proxies have less stringent delay requirements than media processing 470 and due to their complex and more generic nature may result in 471 significant processing delay. The topological locations of the 472 signalling entities are also commonly not optimized for minimal 473 delay, rather other architectural goals. Thus the signalling path can 474 be significantly longer in both geographical and delay sense. 476 3.3. Using AVPF 478 The AVPF feedback message framework [RFC4585] provides a simple way 479 of implementing the new messages. Furthermore, AVPF implements rules 480 controlling the timing of feedback messages so to avoid congestion 481 through network flooding. We re-use these rules by referencing to 482 AVPF. 484 The signalling setup for AVPF allows each individual type of function 485 to be configured or negotiated on a RTP session basis. 487 3.3.1. Reliability 489 The use of RTCP messages implies that each message transfer is 490 unreliable, unless the lower layer transport provides reliability. 492 The different messages proposed in this specification have different 493 requirements in terms of reliability. However, in all cases, the 494 reaction to an (occasional) loss of a feedback message is specified. 496 3.4. Multicast 498 The media related requests might be used with multicast. The RTCP 499 timing rules specified in [RFC3550] and [RFC4585] ensure that the 500 messages do not cause overload of the RTCP connection. The use of 501 multicast may result in the reception of messages with inconsistent 502 semantics. The reaction to inconsistencies depends on the message 503 type, and is discussed for each message type separately. 505 3.5. Feedback Messages 507 This section describes the semantics of the different feedback 508 messages and how they apply to the different use cases. 510 3.5.1. Full Intra Request Command 512 A Full Intra Request (FIR) command, when received by the designated 513 media sender, requires that the media sender sends a "decoder refresh 514 point" (see 2.2) at the earliest opportunity. The evaluation of such 515 opportunity includes the current encoder coding strategy and the 516 current available network resources. 518 FIR is also known as an "instantaneous decoder refresh request" or 519 "video fast update request". 521 Using a decoder refresh point implies refraining from using any 522 picture sent prior to that point as a reference for the encoding 523 process of any subsequent picture sent in the stream. For predictive 524 media types that are not video, the analogue applies. For example, 525 if in MPEG-4 systems scene updates are used, the decoder refresh 526 point consists of the full representation of the scene and is not 527 delta-coded relative to previous updates. 529 Decoder Refresh points, especially Intra or IDR pictures, are in 530 general several times larger in size than predicted pictures. Thus, 531 in scenarios in which the available bandwidth is small, the use of a 532 decoder refresh point implies a delay that is significantly longer 533 than the typical picture duration. 535 Usage in multicast is possible; however aggregation of the commands 536 is recommended. A receiver that receives a request closely (within 2 537 times the longest Round Trip Time (RTT) known) after sending a 538 decoder refresh point should await a second request message to ensure 539 that the media receiver has not been served by the previously 540 delivered decoder refresh point. The reason for delaying 2 times the 541 longest known RTT is to avoid sending unnecessary decoder refresh 542 points. A session participant may have sent its own request while 543 another participants request was in-flight to them. Thus suppressing 544 those requests that may have been sent without knowledge about the 545 other request avoids this issue. 547 Full Intra Request is applicable in use-case 1, 2, and 5. 549 3.5.1.1. Reliability 551 The FIR message results in the delivery of a decoder refresh point, 552 unless the message is lost. Decoder refresh points are easily 553 identifiable from the bit stream. Therefore, there is no need for 554 protocol-level acknowledgement, and a simple command repetition 555 mechanism is sufficient for ensuring the level of reliability 556 required. However, the potential use of repetition does require a 557 mechanism to prevent the recipient from responding to messages 558 already received and responded to. 560 To ensure the best possible reliability, a sender of FIR may repeat 561 the FIR request until a response has been received. The repetition 562 interval is determined by the RTCP timing rules the session operates 563 under. Upon reception of a complete decoder refresh point or the 564 detection of an attempt to send a decoder refresh point (which got 565 damaged due to a packet loss) the repetition of the FIR must stop. If 566 another FIR is necessary, the request sequence number must be 567 increased. To combat loss of the decoder refresh points sent, the 568 sender that receives repetitions of the FIR 2*RTT after the 569 transmission of the decoder refresh point shall send a new decoder 570 refresh point. Two round trip times allow time for the request to 571 arrive at the media sender and the decoder refresh point to arrive 572 back to the requestor. A FIR sender shall not have more than one FIR 573 request (different request sequence number) outstanding at any time 574 per media sender in the session. 576 An RTP Mixer that receives an FIR from a media receiver is 577 responsible to ensure that a decoder refresh point is delivered to 578 the requesting receiver. It may be necessary to generate FIR commands 579 by the MCU. The two legs (FIR-requesting endpoint to MCU, and MCU to 580 decoder refresh point generating MCU) are handled independently from 581 each other from a reliability perspective. 583 3.5.2. Temporal Spatial Trade-off Request and Announcement 585 The Temporal Spatial Trade-off Request (TSTR) instructs the video 586 encoder to change its trade-off between temporal and spatial 587 resolution. Index values from 0 to 31 indicate monotonically a 588 desire for higher frame rate. In general the encoder reaction time 589 may be significantly longer than the typical picture duration. See 590 use case 3 for an example. The encoder decides if the request 591 results in a change of the trade off. An acknowledgement process has 592 been defined to provide feedback of the trade-off that is used 593 henceforth. 595 Informative note: TSTR and TSTA have been introduced primarily 596 because it is believed that control protocol mechanisms, e.g. a SIP 597 re-invite, are too heavyweight, and too slow to allow for a 598 reasonable user experience. Consider, for example, a user 599 interface where the remote user selects the temporal/spatial trade- 600 off with a slider (as it is common in state-of-the-art video 601 conferencing systems). An immediate feedback to any slider 602 movement is required for a reasonable user experience. A SIP re- 603 invite would require at least 2 round-trips more (compared to the 604 TSTR/TSTA mechanism) and may involve proxies and other complex 605 mechanisms. Even in a well-designed system, it may take a second 606 or so until finally the new trade-off is selected. 607 Furthermore the use of RTCP solves very efficiently the multicast 608 use case. 610 The use of TSTR and TSTA in multipoint scenarios is a non-trivial 611 subject, and can be solved in many implementation specific ways. 612 Problems are stemming from the fact that TSTRs will typically arrive 613 unsynchronized, and may request different trade-off values for the 614 same stream and/or endpoint encoder. This memo does not specify a 615 MCU's or endpoint's reaction to the reception of a suggested trade- 616 off as conveyed in the TSTR -- we only require the receiver of a TSTR 617 message to reply to it by sending a TSTA, carrying the new trade-off 618 chosen by its own criteria (which may or may not be based on the 619 trade-off conveyed by TSTR). In other words, the trade-off sent in 620 TSTR is a non-binding recommendation; nothing more. 622 With respect to TSTR/TSTA, four scenarios based on the topologies 623 described in [Topologies] need to be distinguished. The scenarios are 624 described in the following sub-clauses. 626 3.5.2.1. Point-to-point 628 In this most trivial case (Topo-Point-to-Point), the media sender 629 typically adjusts its temporal/spatial trade-off based on the 630 requested value in TSTR, and within its capabilities. The TSTA 631 message conveys back the new trade-off value (which may be identical 632 to the old one if, for example, the sender is not capable to adjust 633 its trade-off). 635 3.5.2.2. Point-to-Multipoint using Multicast or Translators 637 RTCP Multicast is used either with media multicast according to Topo- 638 Multicast, or following RFC 3550's translator model according to 639 Topo-Translator. In these cases, TSTR messages from different 640 receivers may be received unsynchronized, and possibly with different 641 requested trade-offs (because of different user preferences). This 642 memo does not specify how the media sender tunes its trade-off. 643 Possible strategies include selecting the mean, or median, of all 644 trade-off requests received, prioritize certain participants, or 645 continue using the previously selected trade-off (e.g. when the 646 sender is not capable of adjusting it). Again, all TSTR messages 647 need to be acknowledged by TSTA, and the value conveyed back has to 648 reflect the decision made. 650 3.5.2.3. Point-to-Multipoint using RTP Mixer 652 In this scenario (Topo-Mixer) the RTP Mixer receives all TSTR 653 messages, and has the opportunity to act on them based on its own 654 criteria. In most cases, the MCU should form a "consensus" of 655 potentially conflicting TSTR messages arriving from different 656 participants, and initiate its own TSTR message(s) to the media 657 sender(s). The strategy of forming this "consensus" is open for the 658 implementation, and can, for example, encompass averaging the 659 participant's request values, prioritizing certain participants, or 660 use session default values. If the Mixer changes its trade-off, it 661 needs to request from the media sender(s) the use of the new value, 662 by creating a TSTR of its own. Upon reaching a decision on the used 663 trade-off it includes that value in the acknowledgement. 665 Even if a Mixer or Translator performs transcoding, it is very 666 difficult to deliver media with the requested trade-off, unless the 667 content the MCU receives is already close to that trade-off. Only in 668 cases where the original source has substantially higher quality (and 669 bit-rate), it is likely that transcoding can result in the requested 670 trade-off. 672 3.5.2.4. Reliability 673 A request and reception acknowledgement mechanism is specified. The 674 Temporal Spatial Trade-off Announcement (TSTA) message informs the 675 request-sender that its request has been received, and what trade-off 676 is used henceforth. This acknowledgment mechanism is desirable for at 677 least the following reasons: 679 o A change in the trade-off cannot be directly identified from the 680 media bit stream, 681 o User feedback cannot be implemented without information of the 682 chosen trade-off value, according to the media sender's 683 constraints, 684 o Repetitive sending of messages requesting an unimplementable trade- 685 off can be avoided. 687 3.5.3. H.271 Video Back Channel Message 689 ITU-T Rec. H.271 defines syntax, semantics, and suggested encoder 690 reaction to a video back channel message. The codepoint defined in 691 this memo is used to transparently convey such a message from media 692 receiver to media sender. 694 We refrain from an in-depth discussion of the available codepoints 695 within H.271 in this memo for a number of reasons. The perhaps most 696 important reason is that we expect backward-compatible additions of 697 codepoints to H.271 outside the update/maturity cycle of this memo. 698 Another reason lies in the complexity of the H.271 specification: it 699 is a dense document with currently 16 pages of content. It does not 700 make any sense to try to summarize its content in a few sentences of 701 IETF lingo -- oversimplification and misguidance would be inevitable. 702 Finally, please note that H.271 contains many statements of 703 applicability and interpretation of its various messages in 704 conjunction with specific video compression standards. This type of 705 discussion would overload the present memo. 707 In so far, this memo follows the guidance of a decade of RTP payload 708 format specification work -- the details of the media format carried 709 is normally not described in any significant detail. 711 However, we note that some H.271 messages bear similarities with 712 native messages of AVPF and this memo. Furthermore, we note that 713 some H.271 message are known to require caution in multicast 714 environments -- or are plainly not usable in multicast or multipoint 715 scenarios. Table xxx provides a brief, oversimplifying overview of 716 the messages currenty defined in H.271, their similar AVPF or CCM 717 messages (the latter as specified in this memo), and an indication of 718 our current knowledge of their multicast safety. 720 H.271 msg type AVPF/CCM msg type multicast-safe 721 0 (when used for reference 722 picture selection) AVPF RPSI No (positive ACK of pictures) 723 1 AVPF PLI Yes 724 2 AVPF SLI Yes 725 3 N/A Yes (no required sender action) 726 4 N/A Yes (no required sender action) 728 Note: H.271 message type 0 is not a strict equivalent to 729 AVPF's RPSI; it is an indication of known-as-correct reference 730 picture(s) at the decoder. It does not command an encoder to 731 use a defined reference picture (the form of control 732 information envisioned to be carried in RPSI). However, it is 733 believed and intended that H.271 message type 0 will be used 734 for the same purpose as AVPF's RPSI -- although other use 735 forms are also possible. 737 In response to the opaqueness of the H.271 messages especially with 738 respect to the multicast safety, the following guidelines MUST be 739 followed when an implementation wishes to employ the H.271 video back 740 channel message: 742 1. Implementations utilizing the H.271 feedback message MUST stay in 743 compliance with congestion control principles, as outlined in 744 section 5. 745 2. An implementation SHOULD utilize the native messages as defined in 746 [RFC4585] and in this memo instead of similar messages defined in 747 [H.271]. Our current understanding of similar messages is 748 documented in table xxx above. One good reason to divert from the 749 SHOULD statement above would be if it is clearly understood that, 750 for a given application and video compression standard, the 751 aforementioned "similarity" is not given, in contrast to what 752 the table indicates. 753 3. It has been observed that some of the H.271 codepoints currently 754 in existence are not multicast-save. Therefore, the sensible 755 thing to do is not to use the H.271 feedback message type in 756 multicast environments. It MAY be used only when all the issues 757 mentioned later are fully understood by the implementer, and 758 properly taken into account by all endpoints. In all other cases, 759 the H.271 message type MUST NOT be used in conjunction with 760 multicast. 761 4. It has been observed that even in centralized multipoint 762 environments, where the mixer should theoretically be able to 763 resolve issues as deocumented below, the implementation of such a 764 mixer and cooperative endpoints is a very difficult and tedious 765 task. Therefore, H.271 message MUST NOT be used in centralized 766 multipoint scenarios, unless all the issues mentioned below are 767 fully understood by the implementer, and properly taken into 768 account by both mixer and endpoints. 770 Issues with point to Multi-point: 772 1. Different state established on different receivers. One example is 773 the reference picture feedback message, which, when sent to receivers 774 in which the video codecs are at different state due to previous 775 losses or stream switches, the results can be unpredictable and 776 annoying. 777 2. Combination of multiple messages/requests by a media sender into 778 an action and or response. 779 3. Suppression of requests may need to go beyond the basic mechanism 780 described in AVPF. For example forward messages may be need to 781 suppress the generation of requests. 783 Issues with translators and mixers 784 1. Combination of multiple message or requests into an action or 785 response. 786 2. 788 3.5.3.1. Reliability 790 H.271 video back channel messages do not require reliable 791 transmission, and the reception of a message can be derived from the 792 forward video bit stream. Therefore, no specific reception 793 acknowledgement is specified. 795 With respect to re-sending rules, clause 3.5.1.1. applies. 797 3.5.4. Temporary Maximum Media Bit-rate Request 799 A receiver, translator or mixer uses the Temporary Maximum Media Bit- 800 rate Request (TMMBR, "timber") to request a sender to limit the 801 maximum bit-rate for a media stream to, or below, the provided value. 802 The primary usage for this is a scenario with MCU (use case 6), 803 corresponding to Topo-Translator or Topo-Mixer, but also Topo-Point- 804 to-Point. 806 The temporary maximum media bit-rate messages are generic messages 807 that can be applied to any media. 809 The reasoning below assumes that the participants have negotiated a 810 session maximum bit-rate, using the signalling protocol. This value 811 can be global, for example in case of point-to-point, multicast, or 812 translators. It may also be local between the participant and the 813 peer or mixer. In both cases, the bit-rate negotiated in signalling 814 is the one that the participant guarantees to be able to handle 815 (encode and decode). In practice, the connectivity of the 816 participant also bears an influence to the negotiated value -- it 817 does not necessarily make much sense to negotiate a media bit rate 818 that one's network interface does not support. 820 An already established temporary bit-rate value may be changed at any 821 time (subject to the timing rules of the feedback message sending), 822 and to any value between zero and the session maximum, as negotiated 823 during signalling. Even if a sender has received a TMMBR message 824 increasing the bit-rate, all increases must be governed by a 825 congestion control algorithm. TMMBR only indicates known limitations, 826 usually in the local environment, and does not provide any 827 guarantees. 829 If it is likely that the new bit-rate indicated by TMMBR will be 830 valid for the remainder of the session, the TMMBR sender can perform 831 a renegotiation of the session upper limit using the session 832 signalling protocol. 834 3.5.4.1. MCU based Multi-point operation 836 Assume a small mixer-based multiparty conference is ongoing, as 837 depicted in Topo-Mixer of [Topologies]. All participants (A-D) have 838 negotiated a common maximum bit-rate that this session can use. The 839 conference operates over a number of unicast links between the 840 participants and the MCU. The congestion situation on each of these 841 links can easily be monitored by the participant in question and by 842 the MCU, utilizing, for example, RTCP Receiver Reports. However, any 843 given participant has no knowledge of the congestion situation of the 844 connections to the other participants. Worse, without mechanisms 845 similar to the ones discussed in this draft, the MCU (who is aware of 846 the congestion situation on all connections it manages) has no 847 standardized means to inform participants to slow down, short of 848 forging its own receiver reports (which is undesirable). In 849 principle, an MCU confronted with such a situation is obliged to thin 850 or transcode streams intended for connections that detected 851 congestion. 853 In practice, stream thinning - if performed media aware - is 854 unfortunately a very difficult and cumbersome operation and adds 855 undesirable delay. If done media unaware, it leads very quickly to 856 unacceptable reproduced media quality. Hence, means to slow down 857 senders even in the absence of congestion on their connections to the 858 MCU are desirable. 860 To allow the MCU to perform congestion control on the individual 861 links, without performing transcoding, there is a need for a 862 mechanism that enables the MCU to request the participant's media 863 encoders to limit their maximum media bit-rate currently used. The 864 MCU handles the detection of a congestion state between itself and a 865 participant as follows: 866 1. Start thinning the media traffic to the supported bit-rate. 867 2. Use the TMMBR to request the media sender(s) to reduce the media 868 bit-rate sent by them to the MCU, to a value that is in compliance 869 with congestion control principles for the slowest link. Slow 870 refers here to the available bandwidth and packet rate after 871 congestion control. 872 3. As soon as the bit-rate has been reduced by the sending part, the 873 MCU stops stream thinning implicitly, because there is no need for 874 it any more as the stream is in compliance with congestion 875 control. 877 Above algorithms may suggest to some that there is no need for the 878 TMMBR - it should be sufficient to solely rely on stream thinning. 879 As much as this is desirable from a network protocol designer's 880 viewpoint, it has the disadvantage that it doesn't work very 881 well - the reproduced media quality quickly becomes unusable. 883 It appears to be a reasonable compromise to rely on stream thinning 884 as an immediate reaction tool to combat congestions, and have a quick 885 control mechanism that instructs the original sender to reduce its 886 bitrate. 888 Note also that the standard RTCP receiver report cannot serve for the 889 purpose mentioned. In an environment with RTP Mixers, the RTCP RR is 890 being sent between the RTP receiver in the endpoint and the RTP 891 sender in the Mixer only - as there is no multicast transmission. 892 The stream that needs to be bandwidth-reduced, however, is the one 893 between the original sending endpoint and the Mixer. This endpoint 894 doesn't see the aforementioned RTCP RRs, and hence needs explicitly 895 informed about desired bandwidth adjustments. 897 In this topology it is the Mixer's responsibility to collect, and 898 consider jointly, the different bit-rates which the different links 899 may support, into the bit rate requested. This aggregation may also 900 take into account that the Mixer may contain certain transcoding 901 capabilities (as discussed in section 2.3.4 of [Topologies]), which 902 can be employed for those few of the session participants that have 903 the lowest available bit-rates. 905 3.5.4.2. Point-to-Multipoint using Multicast or Translators 907 In these topologies, corresponding to Topo-Multicast or Topo- 908 Translator RTCP RRs are transmitted globally which allows for the 909 detection of transmission problems such as congestion, on a medium 910 timescale. As all media senders are aware of the congestion 911 situation of all media receivers, the rationale of the use of TMMBR 912 of section 3.5.4.1 does not apply. However, even in this case the 913 congestion control response can be improved when the unicast links 914 are employing congestion controlled transport protocols (such as TCP 915 or DCCP). A peer may also report local limitation to the media 916 sender. 918 3.5.4.3. Point-to-point operation 920 In use case 7 it is possible to use TMMBR to improve the performance 921 at times of changes in the known upper limit of the bit-rate. In 922 this use case the signalling protocol has established an upper limit 923 for the session and media bit-rates. However at the time of 924 transport link bit-rate reduction, a receiver could avoid serious 925 congestion by sending a TMMBR to the sending side. 927 3.5.4.4. Reliability 929 The reaction of a media sender to the reception of a TMMBR message is 930 not immediately identifiable through inspection of the media stream. 931 Therefore a more explicit mechanism is needed to avoid unnecessary 932 re-sending of TMMBR messages. Using a statistically based 933 retransmission scheme would only provide statistical guarantees of 934 the request being received. It would also not avoid the 935 retransmission of already received messages. In addition it does not 936 allow for easy suppression of other participants requests. For the 937 reasons mentioned, a mechanism based on explicit notification is 938 used. 940 Upon the reception of a request a media sender sends a notification 941 containing the current applicable limitation of the bit-rate, and 942 which session participants that own that limit. That allows all other 943 participants to suppress any request they may have, with limitation 944 value equal or higher to the current one. The identity of the owner 945 allows for small message sizes and media sender states. A media 946 sender only keeps state for the SSRC of the current owner of the 947 limitation; all other requests and their sources are not saved. Only 948 the participant with the lowest value is allowed to remove or change 949 its limitation. Otherwise anyone that ever set a limitation would 950 need to remove it to allow the maximum bit-rate to be raised beyond 951 that value. 953 4. RTCP Receiver Report Extensions 955 This memo specifies six new feedback messages. The Full Intra Request 956 (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal-Spatial 957 Trade-off Announcement (TSTA), and Video Back Channel Message (VBCM) 958 are "Payload Specific Feedback Messages" in the sense of section 6.3 959 of AVPF [RFC4585]. The Temporary Maximum Media Bit-rate Request 960 (TMMBR) and Temporary Maximum Media Bit-rate Notification (TMMBN) are 961 "Transport Layer Feedback Messages" in the sense of section 6.2 of 962 AVPF. 964 In the following subsections, the new feedback messages are defined, 965 following a similar structure as in the AVPF specification's sections 966 6.2 and 6.3, respectively. 968 4.1. Design Principles of the Extension Mechanism 970 RTCP was originally introduced as a channel to convey presence, 971 reception quality statistics and hints on the desired media coding. 972 A limited set of media control mechanisms have been introduced in 973 early RTP payload formats for video formats, for example in RFC 2032 974 [RFC2032]. However, this specification, for the first time, suggests 975 a two-way handshake for one of its messages. There is danger that 976 this introduction could be misunderstood as the precedence for the 977 use of RTCP as an RTP session control protocol. In order to prevent 978 these misunderstandings, this subsection attempts to clarify the 979 scope of the extensions specified in this memo, and strongly suggests 980 that future extensions follow the rationale spelled out here, or 981 compellingly explain why they divert from the rationale. 983 In this memo, and in AVPF [RFC4585], only such messages have been 984 included which 986 a) have comparatively strict real-time constraints, which prevent the 987 use of mechanisms such as a SIP re-invite in most application 988 scenarios. The real-time constraints are explained separately for 989 each message where necessary 990 b) are multicast-safe in that the reaction to potentially 991 contradicting feedback messages is specified, as necessary for 992 each message 993 c) are directly related to activities of a certain media codec, class 994 of media codecs (e.g. video codecs), or the given media stream. 996 In this memo, a two-way handshake is only introduced for such 997 messages that 998 a) require a notification or acknowledgement due to their nature, 999 which is motivated separately for each message 1000 b) the notification or acknowledgement cannot be easily derived from 1001 the media bit stream. 1003 All messages in AVPF [RFC4585] and in this memo follow a number of 1004 common design principles. In particular: 1006 a) Media receivers are not always implementing higher control 1007 protocol functionalities (SDP, XML parsers and such) in their 1008 media path. Therefore, simple binary representations are used in 1009 the feedback messages and not an (otherwise desirable) flexible 1010 format such as, for example, XML. 1012 4.2. Transport Layer Feedback Messages 1014 Transport Layer FB messages are identified by the value RTPFB (205) 1015 as RTCP packet type. 1017 In AVPF, one message of this category had been defined. This memo 1018 specifies two more messages for a total of three messages of this 1019 type. They are identified by means of the FMT parameter as follows: 1021 0: unassigned 1022 1: Generic NACK (as per AVPF) 1023 2: Temporary Maximum Media Bit-rate Request 1024 3: Temporary Maximum Media Bit-rate Notification 1025 4-30: unassigned 1026 31: reserved for future expansion of the identifier number space 1028 The following subsection defines the formats of the FCI field for 1029 this type of FB message. 1031 4.2.1. Temporary Maximum Media Bit-rate Request (TMMBR) 1033 The FCI field of a TMMBR Feedback message SHALL contain one or more 1034 FCI entries. 1036 4.2.1.1. Semantics 1038 The TMMBR is used to indicate the highest bit-rate per sender of a 1039 media, which the receiver currently supports in this RTP session. 1040 The media sender MAY use any lower bit-rate, as it may need to 1041 address a congestion situation or other limiting factors. See 1042 section 5 (congestion control) for more discussion. 1044 The "SSRC of the packet sender" field indicates the source of the 1045 request, and the "SSRC of media source" is not used and SHALL be set 1046 to 0. The SSRC of media sender in the FCI field denotes the media 1047 sender the message applies to. This is useful in the multicast or 1048 translator topologies where each media sender may be addressed in a 1049 single TMMBR message using multiple FCIs. 1051 A TMMBR FCI MAY be repeated in subsequent TMMBR messages if no 1052 applicable TMMBN FCI has been received at the time of transmission of 1053 the next RTCP packet. The bit-rate value of a TMMBR FCI MAY be 1054 changed from a previous TMMBR message and the next, regardless of the 1055 eventual reception of an applicable TMMBN FCI. 1057 Please note that a TMMBN message SHALL be sent by the media sender at 1058 the earliest possible point in time, as a result of any TMMBR 1059 messages received since the last sending of TMMBN. The TMMBN message 1060 indicates the limit and the owner of that limit at the time of the 1061 transmission of the message. The limit is the lowest of the previous 1062 value and all values received in TMMBR FCI's since the last TMMBN was 1063 transmitted. 1065 A media receiver who is not the owner of the bandwidth limit when 1066 planning to send a TMMBR, SHOULD request a bandwidth lower than their 1067 knowledge of currently established bandwidth limit for this media 1068 sender, or suppres their transmission for TMMBR. The exception to 1069 the above rule is when a receiver either doesn't know the limit or 1070 are certain that their local representation of the value is in error. 1071 All received requests for bandwidth limits greater or equal to the 1072 one currently established are ignored, with the exception of them 1073 resulting in the transmission of a TMMBN. A media receiver who is 1074 the owner of the current bandwidth limit, MAY lower the value 1075 further, raise the value or remove the restriction completely by 1076 setting the bandwidth limit equal to the session limit. 1078 Once a session participant receives the TMMBN in response to its 1079 TMMBR, with its own SSRC, it knows that it "owns" the bandwidth 1080 limitation. Only the "owner" of a bandwidth limitation can raise it 1081 or reset it to the session limit. 1083 Note that, due to the unreliable nature of transport of TMMBR and 1084 TMMBN, the above rules may lead to the sending of TMMBR messages 1085 disobeying the rules above. Furthermore, in multicast scenarios it 1086 can happen that more than one session participants believes it "owns" 1087 the current bandwidth limitation. This is not critical for a number 1088 of reasons: 1089 a) If a TMMBR message is lost in transmission, the media sender does 1090 not learn about the restrictions imposed on it. However, it also 1091 does not send a TMMBN message notifying reception of a request it has 1092 never received. Therefore, no new limit is established, the media 1093 receiver sending the more restrictive TMMBR is not the owner. Since 1094 this media receiver has not seen a notification corresponding to its 1095 request, it is free to re-send it. 1096 b) Similarly, if a TMMBN message gets lost, the media receiver that 1097 has sent the corresponding TMMBR request does not receive 1098 acknowledgement. In that case, it is also not the "owner" of the 1099 restriction and is free to re-send the request. 1100 c) If multiple competing TMMBR messages are sent by different session 1101 participants, then the resulting TMMBN indicates the lowest bandwidth 1102 requested; the owner is set to the sender of the TMMBR with the 1103 lowest requested bandwidth value. 1105 TMMBR feedback SHOULD NOT be used if the underlying transport 1106 protocol is capable of providing similar feedback information from 1107 the receiver to the sender. 1109 It also important to consider the security risks involved with faked 1110 TMMBRs. See security considerations in Section 6. 1112 The feedback messages may be used in both multicast and unicast 1113 sessions of any of the specified topologies. 1115 For sessions with a larger number of participants using the lowest 1116 common denominator, as required by this mechanism, may not be the 1117 most suitable course of action. Larger session may need to consider 1118 other ways to support adapted bit-rate to participants, such as 1119 partitioning the session in different quality tiers, or use some 1120 other method of achieving bit-rate scalability. 1122 If the value set by a TMMBR message is expected to be permanent the 1123 TMMBR setting party is RECOMMENDED to renegotiate the session 1124 parameters to reflect that using the setup signalling. 1126 An SSRC may time out according to the default rules for RTP session 1127 participants, i.e. the media sender has not received any RTCP packet 1128 from the owner for the last five regular reporting intervals. An SSRC 1129 may also leave the session, indicating this through the transmission 1130 of an RTCP BYE packet or an external signalling channel. In all of 1131 these cases the entity is considered to have left the session. In the 1132 case the "owner" leaves the session, the value SHALL be set to the 1133 session maximum and the transmission of a TMMBN is scheduled. 1135 4.2.1.2. Message Format 1136 The Feedback control information (FCI) consists of one or more TMMBR 1137 FCI entries with the following syntax: 1139 0 1 2 3 1140 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1142 | SSRC | 1143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1144 | Maximum bit-rate in units of 128 bits/s | 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 Figure 1 - Syntax for the TMMBR message 1149 SSRC: The SSRC value of the target of this specific maximum bit- 1150 rate request. 1152 Maximum bit-rate: The temporary maximum media bit-rate value in 1153 units of 128 bit/s. This provides range from 0 to 1154 549755813888 bits/s (~550 Tbit/s) with a granularity of 128 1155 bits/s. 1157 The length of the FB message is be set to 2+2*N where N is the number 1158 of TMMBR FCI entries. 1160 4.2.1.3. Timing Rules 1162 The first transmission of the request message MAY use early or 1163 immediate feedback in cases when timeliness is desirable. Any 1164 repetition of a request message SHOULD use regular RTCP mode for its 1165 transmission timing. 1167 4.2.2. Temporary Maximum Media Bit-rate Notification (TMMBN) 1169 The FCI field of the TMMBN Feedback message SHALL contain one TMMBN 1170 FCI entry. 1172 4.2.2.1. Semantics 1174 This feedback message is used to notify the senders of any TMMBR 1175 message that one or more TMMBR messages have been received. It 1176 indicates to all participants the currently employed maximum bit-rate 1177 value and the "owner" of the current limitation. The "owner" of a 1178 limitation is the sender of the last (most restrictive) TMMBR message 1179 received by the media sender. 1181 The "SSRC of the packet sender" field indicates the source of the 1182 notification. The "SSRC of media source" SHALL be set to the SSRC of 1183 the media receiver that currently owns the bit-rate limitation. 1185 A TMMBN message SHALL be scheduled for transmission after the 1186 reception of a TMMBR message with a FCI including the session 1187 participant's SSRC. Only a single TMMBN SHALL be sent, even if more 1188 than one TMMBR messages are received between the scheduling of the 1189 transmission and the actual transmission of the TMMBN message. The 1190 TMMBN message indicates the limit and the owner of that limit at the 1191 time of transmitting the message. The limit SHALL be the lowest of 1192 the existing and all values received in TMMBR messages since the last 1193 TMMBN was transmitted. The one sending that request SHALL become the 1194 owner of the limit. 1196 The reception of a TMMBR message with a transmission limit greater or 1197 equal than the current limit SHALL still result in the transmission 1198 of a TMMBN message. However the limit and owner is not changed, 1199 unless it was from the same owner, and the current limit and owner is 1200 indicated in the TMMBN message. This procedure allows session 1201 participants that haven't seen the last TMMBN message to get a 1202 correct view of this media sender's state. 1204 When a media sender determines an "owner" of a limitation has left 1205 the session, then the current limitation is removed, and the media 1206 sender SHALL send a TMMBN message indicating the maximum session 1207 bandwidth. 1209 4.2.2.2. Message Format 1211 The TMMBN Feedback control information (FCI) entry has the following 1212 syntax: 1214 0 1 2 3 1215 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1217 | Maximum bit-rate in units of 128 bits/s | 1218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1220 Figure 2 - Syntax for the TMMBN message 1222 Maximum bit-rate: The current temporary maximum media bit-rate 1223 value in units of 128 bit/s. 1225 The length field value of the FB message SHALL be 3. 1227 4.2.2.3. Timing Rules 1229 The acknowledgement SHOULD be sent as soon as allowed by the applied 1230 timing rules for the session. Immediate or early feedback mode SHOULD 1231 be used for these messages. 1233 4.3. Payload Specific Feedback Messages 1235 Payload-Specific FB messages are identified by the value PT=PSFB 1236 (206) as RTCP packet type. 1238 AVPF defines three payload-specific FB messages and one application 1239 layer FB message. This memo specifies four additional payload 1240 specific feedback messages. All are identified by means of the FMT 1241 parameter as follows: 1243 0: unassigned 1244 1: Picture Loss Indication (PLI) 1245 2: Slice Lost Indication (SLI) 1246 3: Reference Picture Selection Indication (RPSI) 1247 4: Full Intra Request Command (FIR) 1248 5: Temporal-Spatial Trade-off Request (TSTR) 1249 6: Temporal-Spatial Trade-off Announcement (TSTA) 1250 7: Video Back Channel Message (VBCM) 1251 8-14: unassigned 1252 15: Application layer FB message 1253 16-30: unassigned 1254 31: reserved for future expansion of the number space 1256 The following subsections define the new FCI formats for the payload- 1257 specific FB messages. 1259 4.3.1. Full Intra Request (FIR) command 1261 The FIR command FB message is identified by PT=PSFB and FMT=4. 1263 There MUST be one or more FIR entry contained in the FCI field. 1265 4.3.1.1. Semantics 1267 Upon reception of a FIR message, an encoder MUST send a decoder 1268 refresh point (see Section 2.2) as soon as possible. 1270 Note: Currently, video appears to be the only useful application 1271 for FIR, as it appears to be the only RTP payloads widely deployed 1272 that relies heavily on media prediction across RTP packet 1273 boundaries. However, use of FIR could also reasonably be 1274 envisioned for other media types that share essential properties 1275 with compressed video, namely cross-frame prediction (whatever a 1276 frame may be for that media type). One possible example may be the 1277 dynamic updates of MPEG-4 scene descriptions. It is suggested that 1278 payload formats for such media types refer to FIR and other message 1279 types defined in this specification and in AVPF, instead of 1280 creating similar mechanisms in the payload specifications. The 1281 payload specifications may have to explain how the payload specific 1282 terminologies map to the video-centric terminology used here. 1284 Note: In environments where the sender has no control over the 1285 codec (e.g. when streaming pre-recorded and pre-coded content), the 1286 reaction to this command cannot be specified. One suitable 1287 reaction of a sender would be to skip forward in the video bit 1288 stream to the next decoder refresh point. In other scenarios, it 1289 may be preferable not to react to the command at all, e.g. when 1290 streaming to a large multicast group. Other reactions may also be 1291 possible. When deciding on a strategy, a sender could take into 1292 account factors such as the size of the receiving group, the 1293 "importance" of the sender of the FIR message (however "importance" 1294 may be defined in this specific application), the frequency of 1295 decoder refresh points in the content, and others. However a 1296 session which predominately handles pre-coded content shouldn't use 1297 the FIR at all. 1299 The sender MUST consider congestion control as outlined in section 5, 1300 which MAY restrict its ability to send a decoder refresh point 1301 quickly. 1303 Note: The relationship between the Picture Loss Indication and FIR 1304 is as follows. As discussed in section 6.3.1 of AVPF, a Picture 1305 Loss Indication informs the decoder about the loss of a picture and 1306 hence the likeliness of misalignment of the reference pictures in 1307 encoder and decoder. Such a scenario is normally related to losses 1308 in an ongoing connection. In point-to-point scenarios, and without 1309 the presence of advanced error resilience tools, one possible 1310 option an encoder has is to send a decoder refresh point. However, 1311 there are other options including ignoring the PLI, for example if 1312 only one receiver of many has sent a PLI or when the embedded 1313 stream redundancy is likely to clean up the reproduced picture 1314 within a reasonable amount of time. The FIR, in contrast, leaves a 1315 real-time encoder no choice but to send a decoder refresh point. 1316 It disallows the encoder to take into account any considerations 1317 such as the ones mentioned above. 1319 Note: Mandating a maximum delay for completing the sending of a 1320 decoder refresh point would be desirable from an application 1321 viewpoint, but may be problematic from a congestion control point 1322 of view. "As soon as possible" as mentioned above appears to be a 1323 reasonable compromise. 1325 FIR SHALL NOT be sent as a reaction to picture losses - it is 1326 RECOMMENDED to use PLI instead. FIR SHOULD be used only in such 1327 situations where not sending a decoder refresh point would render the 1328 video unusable for the users. 1330 Note: a typical example where sending FIR is adequate is when, in a 1331 multipoint conference, a new user joins the session and no regular 1332 decoder refresh point interval is established. Another example 1333 would be a video switching MCU that changes streams. Here, 1334 normally, the MCU issues a FIR to the new sender so to force it to 1335 emit a decoder refresh point. The decoder refresh point includes 1336 normally a Freeze Picture Release (defined outside this 1337 specification), which re-starts the rendering process of the 1338 receivers. Both techniques mentioned are commonly used in MCU- 1339 based multipoint conferences. 1341 Other RTP payload specifications such as RFC 2032 [RFC2032] already 1342 define a feedback mechanism for certain codecs. An application 1343 supporting both schemes MUST use the feedback mechanism defined in 1344 this specification when sending feedback. For backward compatibility 1345 reasons, such an application SHOULD also be capable to receive and 1346 react to the feedback scheme defined in the respective RTP payload 1347 format, if this is required by that payload format. 1349 The "SSRC of the packet sender" field indicates the source of the 1350 request, and the "SSRC of media source" is not used and SHALL be set 1351 to 0. The SSRC of media sender to which the FIR command applies to is 1352 in the FCI. 1354 4.3.1.2. Message Format 1356 Full Intra Request uses one additional FCI field, the content of 1357 which is depicted in Figure 3 The length of the FB message MUST be 1358 set to 2+2*N, where N is the number of FCI entries. 1360 0 1 2 3 1361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1364 | SSRC | 1365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1366 | Seq. nr | Reserved | 1367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1369 Figure 3 - Syntax for the FIR message 1371 SSRC: The SSRC value of the media sender of this specific FIR 1372 command. 1374 Seq. nr: Command sequence number. The sequence number space is 1375 unique for each tuple consisting of the SSRC of command 1376 source and the SSRC of the command target. The sequence 1377 number SHALL be increased by 1 modulo 256 for each new 1378 command. A repetition SHALL NOT increase the sequence 1379 number. Initial value is arbitrary. 1381 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1382 reception. 1384 The semantics of this FB message is independent of the RTP payload 1385 type. 1387 4.3.1.3. Timing Rules 1389 The timing follows the rules outlined in section 3 of [RFC4585]. FIR 1390 commands MAY be used with early or immediate feedback. The FIR 1391 feedback message MAY be repeated. If using immediate feedback mode 1392 the repetition SHOULD wait at least onee RTT before being sent. In 1393 early or regular RTCP mode the repetition is sent in the next regular 1394 RTCP packet. 1396 4.3.1.4. Remarks 1398 FIR messages typically trigger the sending of full intra or IDR 1399 pictures. Both are several times larger then predicted (inter) 1400 pictures. Their size is independent of the time they are generated. 1401 In most environments, especially when employing bandwidth-limited 1402 links, the use of an intra picture implies an allowed delay that is a 1403 significant multitude of the typical frame duration. An example: If 1404 the sending frame rate is 10 fps, and an intra picture is assumed to 1405 be 10 times as big as an inter picture, then a full second of latency 1406 has to be accepted. In such an environment there is no need for a 1407 particular short delay in sending the FIR message. Hence waiting for 1408 the next possible time slot allowed by RTCP timing rules as per 1409 [RFC4585] may not have an overly negative impact on the system 1410 performance. 1412 4.3.2. Temporal-Spatial Trade-off Request (TSTR) 1414 The TSTR FB message is identified by PT=PSFB and FMT=5. 1416 There MUST be one or more TSTR entry contained in the FCI field. 1418 4.3.2.1. Semantics 1420 A decoder can suggest the use of a temporal-spatial trade-off by 1421 sending a TSTR message to an encoder. If the encoder is capable of 1422 adjusting its temporal-spatial trade-off, it SHOULD take into account 1423 the received TSTR message for future coding of pictures. A value of 1424 0 suggests a high spatial quality and a value of 31 suggests a high 1425 frame rate. The values from 0 to 31 indicate monotonically a desire 1426 for higher frame rate. Actual values do not correspond to precise 1427 values of spatial quality or frame rate. 1429 The reaction to the reception of more than one TSTR message by a 1430 media sender from different media receivers is left open to the 1431 implementation. The selected trade-off SHALL be communicated to the 1432 media receivers by the means of the TSTA message. 1434 The "SSRC of the packet sender" field indicates the source of the 1435 request, and the "SSRC of media source" is not used and SHALL be set 1436 to 0. The SSRC of media sender to which the TSTR applies to is in the 1437 FCI entries. 1439 A TSTR message may contain multiple requests to different media 1440 senders, using multiple FCI entries. 1442 4.3.2.2. Message Format 1444 The Temporal-Spatial Trade-off Request uses one FCI field, the 1445 content of which is depicted in Figure 4. The length of the FB 1446 message MUST be set to 2+2*N, where N is the number of FCI entries 1447 included. 1449 0 1 2 3 1450 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1453 | SSRC | 1454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1455 | Seq nr. | Reserved | Index | 1456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1458 Figure 4 - Syntax of the TSTR 1460 SSRC: The SSRC value of the target (or the media sender) of this 1461 specific TSTR request. 1463 Seq. nr: Request sequence number. The sequence number space is 1464 unique for each tuple consisting of the SSRC of request 1465 source and the SSRC of the request target. The sequence 1466 number SHALL be increased by 1 modulo 256 for each new 1467 command. A repetition SHALL NOT increase the sequence 1468 number. Initial value is arbitrary. 1470 Index: An integer value between 0 and 31 that indicates the 1471 relative trade off that is requested. An index value of 0 1472 index highest possible spatial quality, while 31 indicates 1473 highest possible temporal resolution. 1475 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1476 reception. 1478 4.3.2.3. Timing Rules 1480 The timing follows the rules outlined in section 3 of [RFC4585]. 1481 This request message is not time critical and SHOULD be sent using 1482 regular RTCP timing. Only if it is known that the user interface 1483 requires a quick feedback, the message MAY be sent with early or 1484 immediate feedback timing. 1486 4.3.2.4. Remarks 1488 The term "spatial quality" does not necessarily refer to the 1489 resolution, measured by the number of pixels the reconstructed video 1490 is using. In fact, in most scenarios the video resolution stays 1491 constant during the lifetime of a session. However, all video 1492 compression standards have means to adjust the spatial quality at a 1493 given resolution, often influenced by the Quantizer Parameter or QP. 1494 A numerically low QP results in a good reconstructed picture quality, 1495 whereas a numerically high QP yields a coarse picture. The typical 1496 reaction of an encoder to this request is to change its rate control 1497 parameters to use a lower frame rate and a numerically lower (on 1498 average) QP, or vice versa. The precise mapping of Index, frame 1499 rate, and QP is intentionally left open here, as it depends on 1500 factors such as compression standard employed, spatial resolution, 1501 content, bit rate, and many more. 1503 4.3.3. Temporal-Spatial Trade-off Announcement (TSTA) 1505 The TSTA FB message is identified by PT=PSFB and FMT=6. 1507 There SHALL be one or more TSTA contained in the FCI field. 1509 4.3.3.1. Semantics 1511 This feedback message is used to acknowledge the reception of a TSTR. 1512 A TSTA entry in a TSTA feedback message SHALL be sent for each TSTR 1513 entry targeted to this session participant, i.e. each TSTR received 1514 that in the SSRC field in the entry has the receiving entities SSRC. 1515 A single TSTA message MAY acknowledge multiple requests using 1516 multiple FCI entries. The index value included SHALL be the same in 1517 all FCI's part of the TSTA message. Including a FCI for each 1518 requestor allows each requesting entity to determine that the media 1519 sender targeted have received the request. The acknowledgement SHALL 1520 be sent also for repetitions received. If the request receiver has 1521 received TSTR with several different sequence numbers from a single 1522 requestor it SHALL only respond to the request with the highest 1523 (modulo 256) sequence number. 1525 The TSTA SHALL include the Temporal-Spatial Trade-off index that will 1526 be used as a result of the request. This is not necessarily the same 1527 index as requested, as media sender may need to aggregate requests 1528 from several requesting session participants. It may also have some 1529 other policies or rules that limit the selection. 1531 4.3.3.2. Message Format 1533 The Temporal-Spatial Trade-off Announcement uses one additional FCI 1534 field, the content of which is depicted in Figure 5. The length of 1535 the FB message MUST be set to 2+2*N, where N is the number of FCI 1536 entries. 1538 0 1 2 3 1539 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1541 | SSRC | 1542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1543 | Seq nr. | Reserved | Index | 1544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1546 Figure 5 - Syntax of the TSTA 1548 SSRC: The SSRC of the source of the TSTA request that is 1549 acknowledged. 1551 Seq. nr: The sequence number value from the TSTA request that is 1552 being acknowledged. 1554 Index: The trade-off value the media sender is using henceforth. 1556 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1557 reception. 1559 Informative note: The returned trade-off value (Index) may differ 1560 from the requested one, for example in cases where a media encoder 1561 cannot tune its trade-off, or when pre-recorded content is used. 1563 4.3.3.3. Timing Rules 1565 The timing follows the rules outlined in section 3 of [RFC4585]. 1566 This acknowledgement message is not extremely time critical and 1567 SHOULD be sent using regular RTCP timing. 1569 Edt. Note: a comment from Magnus: We might like to expand on this in 1570 relation to certain applications 1572 4.3.3.4. Remarks 1574 None 1576 4.3.4. H.271 VideoBackChannelMessage (VBCM) 1578 The VBCM FB message is identified by PT=PSFB and FMT=7. 1580 There MUST be one or more VBCM entry contained in the FCI field. 1582 4.3.4.1. Semantics 1584 The "payload" of VBCM indication carries codec specific, different 1585 types of feedback information. The type of feedback information can 1586 be classified as "status report" such as receiving bit stream 1587 without errors, loss of partial or complete picture or block or 1588 "update requests" such as complete refresh of the bit stream. 1590 Note: There are possible overlap between the VBCM sub-messages 1591 and CCM/AVPF feedback messages, such FIR. Please see section 1592 3.5.3 for further discussions. 1594 The different types of feedback sub-messages carried in the VBCM are 1595 indicated by the "payloadType" as defined in [VBCM]. The different 1596 sub-message types as defined in [VBCM] are re-produced below for 1597 convenience. "payloadType", in ITU-T Rec. H.271 terminology, 1598 refers to the sub-type of the H.271 message and should not be 1599 confused with an RTP payload type. 1601 Payload Type Message Content 1603 0 One or more pictures without detected bitstream error mismatch 1604 1 One or more pictures that are entirely or partially lost 1605 2 A set of blocks of one picture that is entirely or partially 1606 lost 1607 3 CRC for one parameter set 1608 4 CRC for all parameter sets of a certain type 1609 5 A "reset" request indicating that the sender should completely 1610 refresh the video bitstream as if no prior bitstream data had been 1611 received 1612 > 5 Reserved for future use by ITU-T 1614 The bit string or the "payload" of VBCM message is of variable 1615 length and is self-contained and coded in a variable length, binary 1616 format. The media sender necessarily has to be able to parse this 1617 optimized binary format to make use of VBCM messages 1619 Each of the different types of sub-messages (indicated by 1620 payloadType)e may have different semantic based on the codec used. 1622 4.3.4.2. Message Format 1624 The VBCM indication uses one FCI field and the syntax is depicted in 1625 Figure 6. 1627 0 1 2 3 1628 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1630 | SSRC | 1631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1632 | Seq. nr |0| Payload Type| Length | 1633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1634 | VBCM Bit String.... | Padding | 1635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1637 Figure 6 - Syntax for VBCM Message 1639 SSRC: The SSRC value of the media sender that is target of the 1640 message, i.e. the sender whose encoder should to react to the 1641 VBCM message 1643 Seq. nr : Command sequence number. The sequence number space is 1644 unique for each tuple consisting of the SSRC of command source 1645 and the SSRC of the command target. The sequence number SHALL 1646 be increased by 1 modulo 256 for each new command. A 1647 repetition SHALL NOT increase the sequence number. Initial 1648 value is arbitrary. 1650 0: Must be set to 0 and should not be acted upon receiving. 1652 Payload: The RTP payload type for which the VBCM bit stream must be 1653 interpreted. 1655 Length: The length of the VBCM bit string in octets. 1657 VBCM Bit String : This is the bit string generated by the decoder 1658 carrying a specific feedback sub-message. It is of variable 1659 length. 1661 Padding: Bits set to 0 to make up a 32 bit boundry 1663 Timing Rules 1665 The timing follows the rules outlined in section 3 of [RFC4585] 1667 Remarks 1668 Please see section 3.5.3 for the applicability of the VBCM message 1669 in relation to messages in both AVPF and this memo with similar 1670 functionality. 1672 Edt. note: Between the authors there is an ongoing discussion 1673 whether we need the payload type field in this message. It would 1674 be needed if there were potentially more than one VBCM-capable 1675 payload types in the same session, && that the semantics of a given 1676 VBCM message changes from PT to PT. This appears to be the case. 1677 For example, the picture identification mechanism in messages of 1678 H.271 type 0 is fundamentally different between H.263 and H.264 1679 (although both use the same syntax. So the payload field appears 1680 to be justified. It was further commented that for TSTS and FIR 1681 such a need may not exist, simply because the semantics of TSTS and 1682 FIR are either loosely enough defined, or generic enough, to apply 1683 to all video payloads currently in existence/envisioned. So that 1684 part of the draft seems ok. 1686 Edt. note: (related to SSRC field): Magnus commented [...]. There 1687 is also need to define what the meaning of the fixed header SSRC 1688 values are. 1690 5. Congestion Control 1692 The correct application of the AVPF timing rules prevents the network 1693 flooding by feedback messages. Hence, assuming a correct 1694 implementation, the RTCP channel cannot break its bit-rate commitment 1695 and introduce congestion. 1697 The reception of some of the feedback messages modifies the behaviour 1698 of the media senders or, more specifically, the media encoders. All 1699 of these modifications MUST only be performed within the bandwidth 1700 limits the applied congestion control provides. For example, when 1701 reacting to a FIR, the unusually high number of packets that form the 1702 decoder refresh point have to be paced in compliance with the 1703 congestion control algorithm, even if the user experience suffers 1704 from a slowly transmitted decoder refresh point. 1706 A change of the Temporary Maximum Media Bit-rate value can only 1707 mitigate congestion, but not cause congestion as long as congestion 1708 control is also employed. An increase of the value by a request 1709 REQUIRES the media sender to use congestion control when increasing 1710 its transmission rate to that value. A reduction of the value results 1711 in a reduced transmission bit-rate thus reducing the risk for 1712 congestion. 1714 6. Security Considerations 1716 The defined messages have certain properties that have security 1717 implications. These must be addressed and taken into account by users 1718 of this protocol. 1720 The defined setup signalling mechanism is sensitive to modification 1721 attacks that can result in session creation with sub-optimal 1722 configuration, and, in the worst case, session rejection. To prevent 1723 this type of attack, authentication and integrity protection of the 1724 setup signalling is required. 1726 Spoofed or maliciously created feedback messages of the type defined 1727 in this specification can have the following implications: 1728 a. Severely reduced media bit-rate due to false TMMBR messages 1729 that sets the maximum to a very low value. 1730 b. The assignment of the ownership of a bit-rate limit with a 1731 TMMBN message to the wrong participant. Thus potentially 1732 freezing the mechanism until a correct TMMBN message reached 1733 the participants. 1734 c. Sending TSTR that result in a video quality different from 1735 the user's desire, rendering the session less useful. 1736 d. Frequent FIR commands will potentially reduce the frame-rate 1737 making the video jerky due to the frequent usage of decoder 1738 refresh points. 1740 To prevent these attacks there is need to apply authentication and 1741 integrity protection of the feedback messages. This can be 1742 accomplished against group external threats using the RTP profile 1743 that combines SRTP [SRTP] and AVPF into SAVPF [SAVPF]. In the MCU 1744 cases, separate security contexts and filtering can be applied 1745 between the MCU and the participants thus protecting other MCU users 1746 from a misbehaving participant. 1748 7. SDP Definitions 1750 Section 4 of [RFC4585] defines new SDP [RFC2327] attributes that are 1751 used for the capability exchange of the AVPF commands and 1752 indications, such as Reference Picture selection, Picture loss 1753 indication etc. The defined SDP attribute is known as rtcp-fb and its 1754 ABNF is described in section 4.2 of [RFC4585]. In this section we 1755 extend the rtcp-fb attribute to include the commands and indications 1756 that are described in this document for codec control protocol. We 1757 also discuss the Offer/Answer implications for the codec control 1758 commands and indications. 1760 7.1. Extension of rtcp-fb attribute 1762 As described in [RFC4585], the rtcp-fb attribute is defined to 1763 indicate the capability of using RTCP feedback. As defined in AVPF 1764 the rtcp-fb attribute must only be used as a media level attribute 1765 and must not be provided at session level. 1767 All the rules described in [RFC4585] for rtcp-fb attribute relating 1768 to payload type, multiple rtcp-fb attributes in a session description 1769 hold for the new feedback messages for codec control defined in this 1770 document. 1772 The ABNF for rtcp-fb attributed as defined in [RFC4585] is 1774 Rtcp-fb-syntax = "a=rtcp-fb: " rtcp-fb-pt SP rtcp-fb-val CRLF 1776 Where rtcp-fb-pt is the payload type and rtcp-fb-val defines the type 1777 of the feedback message such as ack, nack, trr-int and rtcp-fb-id. 1778 For example to indicate the support of feedback of picture loss 1779 indication, the sender declares the following in SDP 1781 v=0 1782 o=alice 3203093520 3203093520 IN IP4 host.example.com 1783 s=Media with feedback 1784 t=0 0 1785 c=IN IP4 host.example.com 1786 m=audio 49170 RTP/AVPF 98 1787 a=rtpmap:98 H263-1998/90000 1788 a=rtcp-fb:98 nack pli 1790 In this document we define a new feedback value type called "ccm" 1791 which indicates the support of codec control using RTCP feedback 1792 messages. The "ccm" feedback value should be used with parameters, 1793 which indicates the support of which codec commands the session may 1794 use. In this draft we define four parameters, which can be used with 1795 the ccm feedback value type. 1797 o "fir" indicates the support of Full Intra Request 1798 o "tmmbr" indicates the support of Temporal Maximum Media Bit-rate 1799 o "tstr" indicates the support of temporal spatial trade-off 1800 request. 1801 O "vbcm" indicates the support of H.271 video back channel 1802 messages. 1804 In ABNF for rtcp-fb-val defined in [RFC4585], there is a placeholder 1805 called rtcp-fb-id to define new feedback types. The ccm is defined as 1806 a new feedback type in this document and the ABNF for the parameters 1807 for ccm are defined here (please refer section 4.2 of [RFC4585] for 1808 complete ABNF syntax). 1810 Rtcp-fb-param = SP "app" [SP byte-string] 1811 / SP rtcp-fb-ccm-param 1812 / ; empty 1814 rtcp-fb-ccm-param = "ccm" SP ccm-param 1816 ccm-param = "fir" ; Full Intra Request 1817 / "tmmbr" ; Temporary max media bit rate 1818 / "tstr" ; Temporal Spatial Trade Off 1819 / "vbcm" 1*[SP subMessageType] ; H.271 VBCM messages 1820 / token [SP byte-string] 1821 ; for future commands/indications 1822 subMessageType = 1*[integer]; 1823 byte-string = 1825 7.2. Offer-Answer 1827 The Offer/Answer [RFC3264] implications to codec control protocol 1828 feedback messages are similar to as described in [RFC4585]. The 1829 offerer MAY indicate the capability to support selected codec 1830 commands and indications. The answerer MUST remove all ccm 1831 parameters, which it does not understand or does not wish to use in 1832 this particular media session. The answerer MUST NOT add new ccm 1833 parameters in addition to what has been offered. The answer is 1834 binding for the media session and both offerer and answerer MUST only 1835 use feedback messages negotiated in this way. 1837 7.3. Examples 1839 Example 1: The following SDP describes a point-to-point video call 1840 with H.263 with the originator of the call declaring its capability 1841 to support codec control messages - fir, tstr. The SDP is carried in 1842 a high level signalling protocol like SIP 1844 v=0 1845 o=alice 3203093520 3203093520 IN IP4 host.example.com 1846 s=Point-to-Point call 1847 c=IN IP4 172.11.1.124 1848 m=audio 49170 RTP/AVP 0 1849 a=rtpmap:0 PCMU/8000 1850 m=video 51372 RTP/AVPF 98 1851 a=rtpmap:98 H263-1998/90000 1852 a=rtcp-fb:98 ccm tstr 1853 a=rtcp-fb:98 ccm fir 1855 In the above example the sender when it receives a TSTR message from 1856 the remote party can adjust the trade off as indicated in the RTCP 1857 TSTA feedback message. 1859 Example 2: The following SDP describes a SIP end point joining a 1860 video MCU that is hosting a multiparty video conferencing session. 1861 The participant supports only the FIR (Full Intra Request) codec 1862 control command and it declares it in its session description. The 1863 video MCU can send an FIR RTCP feedback message to this end point 1864 when it needs to send this participants video to other participants 1865 of the conference. 1867 v=0 1868 o=alice 3203093520 3203093520 IN IP4 host.example.com 1869 s=Multiparty Video Call 1870 c=IN IP4 172.11.1.124 1871 m=audio 49170 RTP/AVP 0 1872 a=rtpmap:0 PCMU/8000 1873 m=video 51372 RTP/AVPF 98 1874 a=rtpmap:98 H263-1998/90000 1875 a=rtcp-fb:98 ccm fir 1877 When the video MCU decides to route the video of this participant it 1878 sends an RTCP FIR feedback message. Upon receiving this feedback 1879 message the end point is mandated to generate a full intra request. 1881 Example 3: The following example describes the Offer/Answer 1882 implications for the codec control messages. The Offerer wishes to 1883 support "tstr", "fir" and "tmmbr" messages. The offered SDP is 1885 -------------> Offer 1886 v=0 1887 o=alice 3203093520 3203093520 IN IP4 host.example.com 1888 s=Offer/Answer 1889 c=IN IP4 172.11.1.124 1890 m=audio 49170 RTP/AVP 0 1891 a=rtpmap:0 PCMU/8000 1892 m=video 51372 RTP/AVPF 98 1893 a=rtpmap:98 H263-1998/90000 1894 a=rtcp-fb:98 ccm tstr 1895 a=rtcp-fb:98 ccm fir 1896 a=rtcp-fb:98 ccm tmmbr 1898 The answerer only wishes to support FIR and TSTR message as the codec 1899 control messages and the answerer SDP is 1900 <---------------- Answer 1902 v=0 1903 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 1904 s=Offer/Answer 1905 c=IN IP4 189.13.1.37 1906 m=audio 47190 RTP/AVP 0 1907 a=rtpmap:0 PCMU/8000 1908 m=video 53273 RTP/AVPF 98 1909 a=rtpmap:98 H263-1998/90000 1910 a=rtcp-fb:98 ccm tstr 1911 a=rtcp-fb:98 ccm fir 1913 Example 4: The following example describes the Offer/Answer 1914 implications for H.271 Video back channel messages (VBCM). The 1915 Offerer wishes to support VBCM and the submessages of payloadType 2( 1916 A set of blocks of one picture that is entirely or partially lost, 3 1917 (CRC for one parameter set) and 4 (CRC for all parameter sets of a 1918 certain type). 1920 -------------> Offer 1921 v=0 1922 o=alice 3203093520 3203093520 IN IP4 host.example.com 1923 s=Offer/Answer 1924 c=IN IP4 172.11.1.124 1925 m=audio 49170 RTP/AVP 0 1926 a=rtpmap:0 PCMU/8000 1927 m=video 51372 RTP/AVPF 98 1928 a=rtpmap:98 H263-1998/90000 1929 a=rtcp-fb:98 ccm vbcm 2 3 4 1931 The answerer only wishes to support sub-messages 3 and 4 only 1933 <---------------- Answer 1935 v=0 1936 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 1937 s=Offer/Answer 1938 c=IN IP4 189.13.1.37 1939 m=audio 47190 RTP/AVP 0 1940 a=rtpmap:0 PCMU/8000 1941 m=video 53273 RTP/AVPF 98 1942 a=rtpmap:98 H263-1998/90000 1943 a=rtcp-fb:98 ccm vbcm 3 4 1945 So in the above example only VBCM indication comprising of only 1946 "payloadType" 3 and 4 will be supported. 1948 8. IANA Considerations 1950 The new value of ccm for the rtcp-fb attribute needs to be registered 1951 with IANA. 1953 Value name: ccm 1954 Long Name: Codec Control Commands and Indications 1955 Reference: RFC XXXX 1957 For use with "ccm" the following values also needs to be 1958 registered. 1960 Value name: fir 1961 Long name: Full Intra Request Command 1962 Usable with: ccm 1963 Reference: RFC XXXX 1965 Value name: tmmbr 1966 Long name: Temporary Maximum Media Bit-rate 1967 Usable with: ccm 1968 Reference: RFC XXXX 1970 Value name: tstr 1971 Long name: temporal Spatial Trade Off 1972 Usable with: ccm 1973 Reference: RFC XXXX 1975 Value name: vbcm 1976 Long name: H.271 video back channel messages 1977 Usable with: ccm 1978 Reference: RFC XXXX 1980 9. Acknowledgements 1982 The authors would like to thank Andrea Basso, Orit Levin, Nermeen 1983 Ismail for their work on the requirement and discussion draft 1984 [Basso]. 1986 10. References 1988 10.1. Normative references 1990 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 1991 "Extended RTP Profile for Real-Time Transport Control 1992 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1993 2006 1994 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1995 Requirement Levels", BCP 14, RFC 2119, March 1997. 1996 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1997 Jacobson, "RTP: A Transport Protocol for Real-Time 1998 Applications", STD 64, RFC 3550, July 2003. 1999 [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description 2000 Protocol", RFC 2327, April 1998. 2001 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2002 with Session Description Protocol (SDP)", RFC 3264, June 2003 2002. 2004 [Topologies] M. Westerlund, and S. Wenger, "RTP Topologies", draft- 2005 ietf-avt-topologies-00, work in progress, August 2006 2007 10.2. Informative references 2009 [Basso] A. Basso, et. al., "Requirements for transport of video 2010 control commands", draft-basso-avt-videoconreq-02.txt, 2011 expired Internet Draft, October 2004. 2012 [AVC] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T 2013 Recommendation and Final Draft International Standard of 2014 Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 2015 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and 2016 ITU-T VCEG, JVT-G050, March 2003. 2017 [NEWPRED] S. Fukunaga, T. Nakai, and H. Inoue, "Error Resilient Video 2018 Coding by Dynamic Replacing of Reference Pictures," in 2019 Proc. Globcom'96, vol. 3, pp. 1503 - 1508, 1996. 2020 [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2021 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 2022 RFC 3711, March 2004. 2023 [RFC2032] Turletti, T. and C. Huitema, "RTP Payload Format for H.261 2024 Video Streams", RFC 2032, October 1996. 2025 [SAVPF] J. Ott, E. Carrara, "Extended Secure RTP Profile for RTCP- 2026 based Feedback (RTP/SAVPF)," draft-ietf-avt-profile-savpf- 2027 02.txt, July, 2005. 2028 [RFC3525] Groves, C., Pantaleo, M., Anderson, T., and T. Taylor, 2029 "Gateway Control Protocol Version 1", RFC 3525, June 2003. 2030 [VBCM] ITU-T Rec. H.271, "Video Bach Channel Messages", June 2006 2032 11. Authors' Addresses 2034 Stephan Wenger 2035 Nokia Corporation 2036 P.O. Box 100 2037 FIN-33721 Tampere 2038 FINLAND 2040 Phone: +358-50-486-0637 2041 EMail: stewe@stewe.org 2043 Umesh Chandra 2044 Nokia Research Center 2045 975, Page Mill Road, 2046 Palo Alto,CA 94304 2047 USA 2049 Phone: +1-650-796-7502 2050 Email: Umesh.Chandra@nokia.com 2052 Magnus Westerlund 2053 Ericsson Research 2054 Ericsson AB 2055 SE-164 80 Stockholm, SWEDEN 2057 Phone: +46 8 7190000 2058 EMail: magnus.westerlund@ericsson.com 2060 Bo Burman 2061 Ericsson Research 2062 Ericsson AB 2063 SE-164 80 Stockholm, SWEDEN 2065 Phone: +46 8 7190000 2066 EMail: bo.burman@ericsson.com 2068 12. List of Changes relative to previous drafts 2070 The following changes since draft-wenger-avt-avpf-ccm-01 have been 2071 made: 2073 - The topologies have been rewritten and clarified. 2074 - The TMMBR mechanism has been completely revised to use notification 2075 and suppress messages in deployments with large common SSRC spaces. 2077 The following changes since draft-wenger-avt-avpf-ccm-02 have been 2078 made: 2080 - Update of section 4.2.2.1 (TMMBN) as per discussions between 2081 Harikishan Desineni and Magnus Westerlund on the AVT list around 2082 Feb 21, 2006 2083 - Section 2.3.4 clarified as per email exchange between Colin Perkins 2084 and Magnus Westerlund around Feb 24 2085 - Section 3.5.2 and other occurrences throughout the draft, 2086 Temporal/Spatial Acknowledgement renamed to Temporal/Spatial 2087 Annoucement 2089 Changes relative to draft-wenger-avt-avpf-ccm-03 2091 - Moved "topologies" out to another draft 2092 - Editorial improvements 2093 - Added new code point VBCM for H.271 Video back channel messages. 2094 Sections 3,4 and 7 were modified in response to H.271 introduction. 2095 - Removed Basso use case referring to forward Freeze command, added 2096 justification. 2098 Full Copyright Statement 2100 Copyright (C) The Internet Society (2006). 2102 This document is subject to the rights, licenses and restrictions 2103 contained in BCP 78, and except as set forth therein, the authors 2104 retain all their rights. 2106 This document and the information contained herein are provided on an 2107 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2108 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2109 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2110 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2111 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2112 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2114 Intellectual Property Statement 2116 The IETF takes no position regarding the validity or scope of any 2117 Intellectual Property Rights or other rights that might be claimed to 2118 pertain to the implementation or use of the technology described in 2119 this document or the extent to which any license under such rights 2120 might or might not be available; nor does it represent that it has 2121 made any independent effort to identify any such rights. Information 2122 on the procedures with respect to rights in RFC documents can be 2123 found in BCP 78 and BCP 79. 2125 Copies of IPR disclosures made to the IETF Secretariat and any 2126 assurances of licenses to be made available, or the result of an 2127 attempt made to obtain a general license or permission for the use of 2128 such proprietary rights by implementers or users of this 2129 specification can be obtained from the IETF on-line IPR repository at 2130 http://www.ietf.org/ipr. 2132 The IETF invites any interested party to bring to its attention any 2133 copyrights, patents or patent applications, or other proprietary 2134 rights that may cover technology that may be required to implement 2135 this standard. Please address the information to the IETF at 2136 ietf-ipr@ietf.org. 2138 Acknowledgment 2140 Funding for the RFC Editor function is currently provided by the 2141 Internet Society. 2143 RFC Editor Considerations 2145 The RFC editor is requested to replace all occurrences of XXXX with 2146 the RFC number this document receives.