idnits 2.17.1 draft-ietf-avt-avpf-ccm-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2128. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2139. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2146. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2152. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 6 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 720 has weird spacing: '...sg type mul...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 20, 2006) is 6391 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2327 (Obsoleted by RFC 4566) == Outdated reference: A later version (-07) exists of draft-ietf-avt-topologies-00 ** Downref: Normative reference to an Informational draft: draft-ietf-avt-topologies (ref. 'Topologies') -- Obsolete informational reference (is this intentional?): RFC 2032 (Obsoleted by RFC 4587) == Outdated reference: A later version (-12) exists of draft-ietf-avt-profile-savpf-02 -- Obsolete informational reference (is this intentional?): RFC 3525 (Obsoleted by RFC 5125) Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Stephan Wenger 3 INTERNET-DRAFT Umesh Chandra 4 Expires: April 2007 Nokia 5 Magnus Westerlund 6 Bo Burman 7 Ericsson 8 October 20, 2006 10 Codec Control Messages in the 11 Audio-Visual Profile with Feedback (AVPF) 12 draft-ietf-avt-avpf-ccm-02.txt> 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 This document specifies a few extensions to the messages defined in 44 the Audio-Visual Profile with Feedback (AVPF). They are helpful 45 primarily in conversational multimedia scenarios where centralized 46 multipoint functionalities are in use. However some are also usable 47 in smaller multicast environments and point-to-point calls. The 48 extensions discussed are H.271 video back channel, Full Intra 49 Request, Temporary Maximum Media Bit-rate and Temporal Spatial Trade- 50 off. 52 TABLE OF CONTENTS 54 1. Introduction....................................................5 55 2. Definitions.....................................................7 56 2.1. Glossary...................................................7 57 2.2. Terminology................................................8 58 2.3. Topologies.................................................9 59 3. Motivation (Informative)........................................9 60 3.1. Use Cases.................................................10 61 3.2. Using the Media Path......................................12 62 3.3. Using AVPF................................................12 63 3.3.1. Reliability..........................................12 64 3.4. Multicast.................................................13 65 3.5. Feedback Messages.........................................13 66 3.5.1. Full Intra Request Command...........................13 67 3.5.1.1. Reliability.....................................14 68 3.5.2. Temporal Spatial Trade-off Request and Announcement..15 69 3.5.2.1. Point-to-point..................................15 70 3.5.2.2. Point-to-Multipoint using Multicast or Translators16 71 3.5.2.3. Point-to-Multipoint using RTP Mixer.............16 72 3.5.2.4. Reliability.....................................16 73 3.5.3. H.271 Video Back Channel Message conforming to ITU-T Rec. 74 H.271.......................................................17 75 3.5.3.1. Reliability.....................................19 76 3.5.4. Temporary Maximum Media Bit-rate Request.............19 77 3.5.4.1. MCU based Multi-point operation.................20 78 3.5.4.2. Point-to-Multipoint using Multicast or Translators22 79 3.5.4.3. Point-to-point operation........................22 80 3.5.4.4. Reliability.....................................22 81 4. RTCP Receiver Report Extensions................................24 82 4.1. Design Principles of the Extension Mechanism..............24 83 4.2. Transport Layer Feedback Messages.........................25 84 4.2.1. Temporary Maximum Media Bit-rate Request (TMMBR).....25 85 4.2.1.1. Semantics.......................................25 86 4.2.1.2. Message Format..................................27 87 4.2.1.3. Timing Rules....................................28 88 4.2.2. Temporary Maximum Media Bit-rate Notification (TMMBN) 28 89 4.2.2.1. Semantics.......................................28 90 4.2.2.2. Message Format..................................29 91 4.2.2.3. Timing Rules....................................30 92 4.3. Payload Specific Feedback Messages........................30 93 4.3.1. Full Intra Request (FIR) command.....................30 94 4.3.1.1. Semantics.......................................30 95 4.3.1.2. Message Format..................................32 96 4.3.1.3. Timing Rules....................................33 97 4.3.1.4. Remarks.........................................33 98 4.3.2. Temporal-Spatial Trade-off Request (TSTR)............34 99 4.3.2.1. Semantics.......................................34 100 4.3.2.2. Message Format..................................34 101 4.3.2.3. Timing Rules....................................35 102 4.3.2.4. Remarks.........................................35 103 4.3.3. Temporal-Spatial Trade-off Announcement (TSTA).......36 104 4.3.3.1. Semantics.......................................36 105 4.3.3.2. Message Format..................................36 106 4.3.3.3. Timing Rules....................................37 107 4.3.3.4. Remarks.........................................37 108 4.3.4. H.271 VideoBackChannelMessage (VBCM).................37 109 5. Congestion Control.............................................40 110 6. Security Considerations........................................41 111 7. SDP Definitions................................................41 112 7.1. Extension of rtcp-fb attribute............................42 113 7.2. Offer-Answer..............................................43 114 7.3. Examples..................................................43 115 8. IANA Considerations............................................46 116 9. Acknowledgements...............................................47 117 10. References....................................................48 118 10.1. Normative references.....................................48 119 10.2. Informative references...................................48 120 11. Authors' Addresses............................................49 121 12. List of Changes relative to previous drafts...................49 122 1. Introduction 124 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 125 developed, the main emphasis lied in the efficient support of point- 126 to-point and small multipoint scenarios without centralized 127 multipoint control. However, in practice, many small multipoint 128 conferences operate utilizing devices known as Multipoint Control 129 Units (MCUs). Long standing experience of the conversational video 130 conferencing industry suggests that there is a need for a few 131 additional feedback messages, to efficiently support MCU-based 132 multipoint conferencing. Some of the messages have applications 133 beyond centralized multipoint, and this is indicated in the 134 description of the message. This is especially true for the message 135 intended to carry ITU-T Rec. H.271 [H.271] bitstrings for video back 136 channel messages. 138 In RTP [RFC3550] terminology, MCUs comprise mixers and translators. 139 Most MCUs also include signalling support. During the development of 140 this memo, it was noticed that there is considerable confusion in the 141 community related to the use of terms such as mixer, translator, and 142 MCU. In response to these concerns, a number of topologies have been 143 identified that are of practical relevance to the industry, but were 144 not envisioned (or at least not documented in sufficient detail) in 145 RTP. These topologies are documented in [Topologies], and 146 understanding this memo requires previous or parallel study of 147 [Topologies]. 149 Some of the messages defined here are forward only, in that they do 150 not require an explicit acknowledgement. Other messages require 151 acknowledgement, leading to a two way communication model that could 152 suggest to some to be useful for control purposes. It is not the 153 intention of this memo to open up RTCP to a generalized control 154 protocol. All mentioned messages have relatively strict real-time 155 constraints -- in the sense that their value diminishes with 156 increased delay. This makes the use of more traditional control 157 protocol means, such as SIP re-invites, undesirable. Furthermore, 158 all messages are of a very simple format that can be easily processed 159 by an RTP/RTCP sender/receiver. Finally, all messages infer only to 160 the RTP stream they are related to, and not to any other property of 161 a communication system. 163 The Full Intra Request (FIR) Command requires the receiver of the 164 message (and sender of the stream) to immediately insert a decoder 165 refresh point. In video coding, one commonly used form of a decoder 166 refresh point is an IDR or Intra picture. Other codecs may have 167 other forms of decoder refresh points. In order to fulfil congestion 168 control constraints, sending a decoder refresh point may imply a 169 significant drop in frame rate, as they are commonly much larger than 170 regular predicted content. The use of this message is restricted to 171 cases where no other means of decoder refresh can be employed, e.g. 172 during the join-phase of a new participant in a multipoint 173 conference. It is explicitly disallowed to use the FIR command for 174 error resilience purposes, and instead it is referred to AVPF's 175 [RFC4585] PLI message, which reports lost pictures and has been 176 included in AVPF for precisely that purpose. The message does not 177 require an acknowledgement, as the presence of a decoder refresh 178 point can be easily derived from the media bit stream. Today, the 179 FIR message appears to be useful primarily with video streams, but in 180 the future it may become helpful also in conjunction with other media 181 codecs that support prediction across RTP packets. 183 The Temporary Maximum Media Bandwidth Request (TMMBR) Message allows 184 to signal, from media receiver to media sender, the current maximum 185 supported media bit-rate for a given media stream. Once a bandwidth 186 limitation is established by the media sender, that sender notifies 187 the initiator of the request, and all other session participants, by 188 sending a TMMBN notification message. One usage scenarios can be 189 seen as limiting media senders in multiparty conferencing to the 190 slowest receiver's maximum media bandwidth reception/handling 191 capability. Such a use is helpful, for example, because the 192 receiver's situation may have changed due to computational load, or 193 because the receiver has just joined the conference and it is helpful 194 to inform media sender(s) about its constraints, without waiting for 195 congestion induced bandwidth reduction. Another application involves 196 graceful bandwidth adaptation in scenarios where the upper limit 197 connection bandwidth to a receiver changes, but is known in the 198 interval between these dynamic changes. The TMMBR message is useful 199 for all media types that are not inherently of constant bit rate. 201 The Video back channel message (VBCM) allows conveying bit streams 202 conforming to ITU-T Rec. H.271 [H.271], from a video receiver to 203 video sender. This ITU-T Recommendation defines codepoints for a 204 number of video-specific feedback messages. Examples include 205 messages to signal: 206 - the corruption of reference pictures or parts thereof, 207 - the corruption of decoder state information, e.g. parameter sets, 208 - the suggestion of using a reference picture other than the one 209 typically used, e.g. to support the NEWPRED algorithm [NEWPRED]. 210 The ITU-T plans to add codepoints to H.271 every time a need arises, 211 e.g. with the introduction of new video codecs or new tools into 212 existing video codecs. 214 There exists some overlap between H.271 messages and "native" 215 messages specified in this memo and in AVPF. Examples include the 216 PLI message of [RFC4585] and the FIR message specified herein. As a 217 general rule, the "native" messages should be prefered over the 218 sending of VBCM messages when all senders and receivers implement 219 this memo. However, if gateways are in the picture, it may be more 220 advisable to utilize VBCM. Similarly, for feedback message types 221 that exist in H.271 but do not exist in this memo or AVPF, there is 222 no other choice but using VBCM. 223 Video feedback channel messages according to H.271 do not require 224 acknowledgements on a protocol level, because the appropriate 225 reaction of the video encoder and sender can be derived from the 226 forward video bit stream. 228 Finally, the Temporal-Spatial Trade-off Request (TSTR) Message 229 enables a video receiver to signal to the video sender its preference 230 for spatial quality or high temporal resolution (frame rate). The 231 receiver of the video stream generates this signal typically based on 232 input from its user interface, so to react to explicit requests of 233 the user. However, some implicit use forms are also known. For 234 example, the trade-offs commonly used for live video and document 235 camera content are different. Obviously, this indication is relevant 236 only with respect to video transmission. The message is acknowledged 237 by an announcement message indicating the newly chosen tradeoff, so 238 to allow immediate user feedback. 240 2. Definitions 242 2.1. Glossary 244 ASM - Asynchronous Multicast 245 AVPF - The Extended RTP Profile for RTCP-based Feedback 246 FEC - Forward Error Correction 247 FIR - Full Intra Request 248 MCU - Multipoint Control Unit 249 MPEG - Moving Picture Experts Group 250 PtM - Point to Multipoint 251 PtP - Point to Point 252 TMMBN - Temporary Maximum Media Bit-rate Notification 253 TMMBR - Temporary Maximum Media Bit-rate Request 254 PLI - Picture Loss Indication 255 TSTA - Temporal Spatial Trade-off Announcement 256 TSTR - Temporal Spatial Trade-off Request 257 VBCM - Video Back Channel Message indication. 259 2.2. Terminology 261 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 262 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 263 document are to be interpreted as described in RFC 2119 [RFC2119]. 265 Message: 266 Codepoint defined by this specification, of one of the 267 following types: 269 Request: 270 Message that requires Acknowledgement 272 Acknowledgment: 273 Message that answers a Request 275 Command: 276 Message that forces the receiver to an action 278 Indication: 279 Message that reports a situation 281 Notification: 282 See Indication. 284 Note that, with the exception of "Notification", this terminology 285 is in alignment with ITU-T Rec. H.245. 287 Decoder Refresh Point: 288 A bit string, packetised in one or more RTP packets, which 289 completely resets the decoder to a known state. Typical 290 examples of Decoder Refresh Points are H.261 Intra pictures 291 and H.264 IDR pictures. However, there are also much more 292 complex decoder refresh points. 294 Typical examples for "hard" decoder refresh points are Intra 295 pictures in H.261, H.263, MPEG 1, MPEG 2, and MPEG-4 part 2, 296 and IDR pictures in H.264. "Gradual" decoder refresh points 297 may also be used; see for example [AVC]. While both "hard" 298 and "gradual" decoder refresh points are acceptable in the 299 scope of this specification, in most cases the user 300 experience will benefit from using a "hard" decoder refresh 301 point. 303 A decoder refresh point also contains all header information 304 above the picture layer (or equivalent, depending on the 305 video compression standard) that is conveyed in-band. In 306 H.264, for example, a decoder refresh point contains 307 parameter set NAL units that generate parameter sets 308 necessary for the decoding of the following slice/data 309 partition NAL units (and that are not conveyed out of band). 310 To the best of the author's knowledge, the term "Decoder 311 Refresh Point" has been formally defined only in H.264; hence 312 we are referring here to this video compression standard. 314 Decoding: 315 The operation of reconstructing the media stream. 317 Rendering: 318 The operation of presenting (parts of) the reconstructed 319 media stream to the user. 321 Stream thinning: 322 The operation of removing some of the packets from a media 323 stream. Stream thinning, preferably, is performed media 324 aware, implying that media packets are removed in the order 325 of their relevance to the reproductive quality. However even 326 when employing media-aware stream thinning, most media 327 streams quickly lose quality when subject to increasing 328 levels of thinning. Media-unaware stream thinning leads to 329 even worse quality degradation. 331 2.3. Topologies 333 Please refer to [Topologies] for an in depth discussion. the 334 topologies referred to throughout this memo are labeled (consistent 335 with [Topologies] as follows: 337 Topo-Point-to-Point . . . . . point-to-point communication 338 Topo-Multicast . . . . . . . multicast communication as in RFC 3550 339 Topo-Translator . . . . . . . translator based as in RFC 3550 340 Topo-Mixer . . . . . . . . . mixer based as in RFC 3550 341 Topo-Video-switch-MCU . . . . video switching MCU, 342 Topo-RTCP-terminating-MCU . . mixer but terminating RTCP 344 3. Motivation (Informative) 346 This section discusses the motivation and usage of the different 347 video and media control messages. The video control messages have 348 been under discussion for a long time, and a requirement draft was 349 drawn up [Basso]. This draft has expired; however we do quote 350 relevant sections of it to provide motivation and requirements. 352 3.1. Use Cases 354 There are a number of possible usages for the proposed feedback 355 messages. Let's begin with looking through the use cases Basso et al. 356 [Basso] proposed. Some of the use cases have been reformulated and 357 commented: 359 1. An RTP video mixer composes multiple encoded video sources into a 360 single encoded video stream. Each time a video source is added, 361 the RTP mixer needs to request a decoder refresh point from the 362 video source, so as to start an uncorrupted prediction chain on 363 the spatial area of the mixed picture occupied by the data from 364 the new video source. 366 2. An RTP video mixer that receives multiple encoded RTP video 367 streams from conference participants, and dynamically selects one 368 of the streams to be included in its output RTP stream. At the 369 time of a bit stream change (determined through means such as 370 voice activation or the user interface), the mixer requests a 371 decoder refresh point from the remote source, in order to avoid 372 using unrelated content as reference data for inter picture 373 prediction. After requesting the decoder refresh point, the video 374 mixer stops the delivery of the current RTP stream and monitors 375 the RTP stream from the new source until it detects data belonging 376 to the decoder refresh point. At that time, the RTP mixer starts 377 forwarding the newly selected stream to the receiver(s). 379 3. An application needs to signal to the remote encoder a request of 380 change of the desired trade-off in temporal/spatial resolution. 381 For example, one user may prefer a higher frame rate and a lower 382 spatial quality, and another use may prefer the opposite. This 383 choice is also highly content dependent. Many current video 384 conferencing systems offer in the user interface a mechanism to 385 make this selection, usually in the form of a slider. The 386 mechanism is helpful in point-to-point, centralized multipoint and 387 non-centralized multipoint uses. 389 4. Use case 4 of the Basso draft applies only to AVPF's PLI [RFC4585] 390 and is not reproduced here. 392 5. Use case 5 of the Basso draft relates to a mechanism known as 393 "freeze picture request". Sending freeze picture requests 394 over a non-reliable forward RTCP channel has been identified as 395 problematic. Therefore, no freeze picture request has been 396 included in this memo, and the use case discussion is not 397 reproduced here. 399 6. A video mixer dynamically selects one of the received video 400 streams to be sent out to participants and tries to provide the 401 highest bit rate possible to all participants, while minimizing 402 stream transrating. One way of achieving this is to setup sessions 403 with endpoints using the maximum bit rate accepted by that 404 endpoint, and by the call admission method used by the mixer. By 405 means of commands that allow reducing the maximum media bitrate 406 beyond what has been negotiated during session setup, the mixer 407 can then reduce the maximum bit rate sent by endpoints to the 408 lowest common denominator of all received streams. As the lowest 409 common denominator changes due to endpoints joining, leaving, or 410 network congestion, the mixer can adjust the limits to which 411 endpoints can send their streams to match the new limit. The mixer 412 then would request a new maximum bit rate, which is equal or less 413 than the maximum bit-rate negotiated at session setup, for a 414 specific media stream, and the remote endpoint can respond with 415 the actual bit-rate that it can support. 417 The picture Basso, et al draws up covers most applications we 418 foresee. However we would like to extend the list with two additional 419 use cases: 421 7. The used congestion control algorithms (AMID and TFRC) probe for 422 more bandwidth as long as there is something to send. With 423 congestion control using packet-loss as the indication for 424 congestion, this probing does generally result in reduced media 425 quality (often to a point where the distortion is large enough to 426 make the media unusable), due to packet loss and increased delay. 427 In a number of deployment scenarios, especially cellular ones, the 428 bottleneck link is often the last hop link. That cellular link 429 also commonly has some type of QoS negotiation enabling the 430 cellular device to learn the maximal bit-rate available over this 431 last hop. Thus indicating the maximum available bit-rate to the 432 transmitting part can be beneficial to prevent it from even trying 433 to exceed the known hard limit that exists. For cellular or other 434 mobile devices the available known bit-rate can also quickly 435 change due to handover to another transmission technology, QoS 436 renegotiation due to congestion, etc. To enable minimal disruption 437 of service a possibility for quick convergence, especially in 438 cases of reduced bandwidth, a media path signalling method is 439 desired. 441 8. The use of reference picture selection as an error resilience tool 442 has been introduced in 1997 as NEWPRED [NEWPRED], and is now 443 widely deployed. It operates the receiver sending a feedback 444 message to the sender, indicating a reference picture that should 445 be used for future prediction. AVPF contains a mechanism for 446 conveying such a message, but did not specify for which codec and 447 according to which syntax the message conforms to. Recently, the 448 ITU-T finalized Rec. H.271 which (among other message types) also 449 includes a feedback message. It is expected that this feedback 450 message will enjoy wide support and fairly quickly. Therefore, a 451 mechanism to convey feedback messages according to H.271 appears 452 to be desirable. 454 3.2. Using the Media Path 456 There are multiple reasons why we propose to use the media path for 457 the codec control messages. First, systems employing MCUs are often 458 separating the control and media processing parts. As these messages 459 are intended or generated by the media part rather than the 460 signalling part of the MCU, having them on the media path avoids 461 interfaces and unnecessary control traffic between signalling and 462 processing. If the MCU is physically decomposite, the use of the 463 media path avoids the need for media control protocol extensions 464 (e.g. in MEGACO [RFC3525]). 466 Secondly, the signalling path quite commonly contains several 467 signalling entities, e.g. SIP-proxies and application servers. 468 Avoiding signalling entities avoids delay for several reasons. 469 Proxies have less stringent delay requirements than media processing 470 and due to their complex and more generic nature may result in 471 significant processing delay. The topological locations of the 472 signalling entities are also commonly not optimized for minimal 473 delay, rather other architectural goals. Thus the signalling path can 474 be significantly longer in both geographical and delay sense. 476 3.3. Using AVPF 478 The AVPF feedback message framework [RFC4585] provides a simple way 479 of implementing the new messages. Furthermore, AVPF implements rules 480 controlling the timing of feedback messages so to avoid congestion 481 through network flooding. We re-use these rules by referencing to 482 AVPF. 484 The signalling setup for AVPF allows each individual type of function 485 to be configured or negotiated on a RTP session basis. 487 3.3.1. Reliability 489 The use of RTCP messages implies that each message transfer is 490 unreliable, unless the lower layer transport provides reliability. 492 The different messages proposed in this specification have different 493 requirements in terms of reliability. However, in all cases, the 494 reaction to an (occasional) loss of a feedback message is specified. 496 3.4. Multicast 498 The media related requests might be used with multicast. The RTCP 499 timing rules specified in [RFC3550] and [RFC4585] ensure that the 500 messages do not cause overload of the RTCP connection. The use of 501 multicast may result in the reception of messages with inconsistent 502 semantics. The reaction to inconsistencies depends on the message 503 type, and is discussed for each message type separately. 505 3.5. Feedback Messages 507 This section describes the semantics of the different feedback 508 messages and how they apply to the different use cases. 510 3.5.1. Full Intra Request Command 512 A Full Intra Request (FIR) command, when received by the designated 513 media sender, requires that the media sender sends a "decoder refresh 514 point" (see 2.2) at the earliest opportunity. The evaluation of such 515 opportunity includes the current encoder coding strategy and the 516 current available network resources. 518 FIR is also known as an "instantaneous decoder refresh request" or 519 "video fast update request". 521 Using a decoder refresh point implies refraining from using any 522 picture sent prior to that point as a reference for the encoding 523 process of any subsequent picture sent in the stream. For predictive 524 media types that are not video, the analogue applies. For example, 525 if in MPEG-4 systems scene updates are used, the decoder refresh 526 point consists of the full representation of the scene and is not 527 delta-coded relative to previous updates. 529 Decoder Refresh points, especially Intra or IDR pictures, are in 530 general several times larger in size than predicted pictures. Thus, 531 in scenarios in which the available bandwidth is small, the use of a 532 decoder refresh point implies a delay that is significantly longer 533 than the typical picture duration. 535 Usage in multicast is possible; however aggregation of the commands 536 is recommended. A receiver that receives a request closely (within 2 537 times the longest Round Trip Time (RTT) known) after sending a 538 decoder refresh point should await a second request message to ensure 539 that the media receiver has not been served by the previously 540 delivered decoder refresh point. The reason for delaying 2 times the 541 longest known RTT is to avoid sending unnecessary decoder refresh 542 points. A session participant may have sent its own request while 543 another participants request was in-flight to them. Thus suppressing 544 those requests that may have been sent without knowledge about the 545 other request avoids this issue. 547 Full Intra Request is applicable in use-case 1, 2, and 5. 549 3.5.1.1. Reliability 551 The FIR message results in the delivery of a decoder refresh point, 552 unless the message is lost. Decoder refresh points are easily 553 identifiable from the bit stream. Therefore, there is no need for 554 protocol-level acknowledgement, and a simple command repetition 555 mechanism is sufficient for ensuring the level of reliability 556 required. However, the potential use of repetition does require a 557 mechanism to prevent the recipient from responding to messages 558 already received and responded to. 560 To ensure the best possible reliability, a sender of FIR may repeat 561 the FIR request until a response has been received. The repetition 562 interval is determined by the RTCP timing rules the session operates 563 under. Upon reception of a complete decoder refresh point or the 564 detection of an attempt to send a decoder refresh point (which got 565 damaged due to a packet loss) the repetition of the FIR must stop. If 566 another FIR is necessary, the request sequence number must be 567 increased. To combat loss of the decoder refresh points sent, the 568 sender that receives repetitions of the FIR 2*RTT after the 569 transmission of the decoder refresh point shall send a new decoder 570 refresh point. Two round trip times allow time for the request to 571 arrive at the media sender and the decoder refresh point to arrive 572 back to the requestor. A FIR sender shall not have more than one FIR 573 request (different request sequence number) outstanding at any time 574 per media sender in the session. 576 An RTP Mixer that receives an FIR from a media receiver is 577 responsible to ensure that a decoder refresh point is delivered to 578 the requesting receiver. It may be necessary to generate FIR commands 579 by the MCU. The two legs (FIR-requesting endpoint to MCU, and MCU to 580 decoder refresh point generating MCU) are handled independently from 581 each other from a reliability perspective. 583 3.5.2. Temporal Spatial Trade-off Request and Announcement 585 The Temporal Spatial Trade-off Request (TSTR) instructs the video 586 encoder to change its trade-off between temporal and spatial 587 resolution. Index values from 0 to 31 indicate monotonically a 588 desire for higher frame rate. In general the encoder reaction time 589 may be significantly longer than the typical picture duration. See 590 use case 3 for an example. The encoder decides if the request 591 results in a change of the trade off. An acknowledgement process has 592 been defined to provide feedback of the trade-off that is used 593 henceforth. 595 Informative note: TSTR and TSTA have been introduced primarily 596 because it is believed that control protocol mechanisms, e.g. a SIP 597 re-invite, are too heavyweight, and too slow to allow for a 598 reasonable user experience. Consider, for example, a user 599 interface where the remote user selects the temporal/spatial trade- 600 off with a slider (as it is common in state-of-the-art video 601 conferencing systems). An immediate feedback to any slider 602 movement is required for a reasonable user experience. A SIP re- 603 invite would require at least 2 round-trips more (compared to the 604 TSTR/TSTA mechanism) and may involve proxies and other complex 605 mechanisms. Even in a well-designed system, it may take a second 606 or so until finally the new trade-off is selected. 607 Furthermore the use of RTCP solves very efficiently the multicast 608 use case. 610 The use of TSTR and TSTA in multipoint scenarios is a non-trivial 611 subject, and can be solved in many implementation specific ways. 612 Problems are stemming from the fact that TSTRs will typically arrive 613 unsynchronized, and may request different trade-off values for the 614 same stream and/or endpoint encoder. This memo does not specify a 615 MCU's or endpoint's reaction to the reception of a suggested trade- 616 off as conveyed in the TSTR -- we only require the receiver of a TSTR 617 message to reply to it by sending a TSTA, carrying the new trade-off 618 chosen by its own criteria (which may or may not be based on the 619 trade-off conveyed by TSTR). In other words, the trade-off sent in 620 TSTR is a non-binding recommendation; nothing more. 622 With respect to TSTR/TSTA, four scenarios based on the topologies 623 described in [Topologies] need to be distinguished. The scenarios are 624 described in the following sub-clauses. 626 3.5.2.1. Point-to-point 628 In this most trivial case (Topo-Point-to-Point), the media sender 629 typically adjusts its temporal/spatial trade-off based on the 630 requested value in TSTR, and within its capabilities. The TSTA 631 message conveys back the new trade-off value (which may be identical 632 to the old one if, for example, the sender is not capable to adjust 633 its trade-off). 635 3.5.2.2. Point-to-Multipoint using Multicast or Translators 637 RTCP Multicast is used either with media multicast according to Topo- 638 Multicast, or following RFC 3550's translator model according to 639 Topo-Translator. In these cases, TSTR messages from different 640 receivers may be received unsynchronized, and possibly with different 641 requested trade-offs (because of different user preferences). This 642 memo does not specify how the media sender tunes its trade-off. 643 Possible strategies include selecting the mean, or median, of all 644 trade-off requests received, prioritize certain participants, or 645 continue using the previously selected trade-off (e.g. when the 646 sender is not capable of adjusting it). Again, all TSTR messages 647 need to be acknowledged by TSTA, and the value conveyed back has to 648 reflect the decision made. 650 3.5.2.3. Point-to-Multipoint using RTP Mixer 652 In this scenario (Topo-Mixer) the RTP Mixer receives all TSTR 653 messages, and has the opportunity to act on them based on its own 654 criteria. In most cases, the MCU should form a "consensus" of 655 potentially conflicting TSTR messages arriving from different 656 participants, and initiate its own TSTR message(s) to the media 657 sender(s). The strategy of forming this "consensus" is open for the 658 implementation, and can, for example, encompass averaging the 659 participant's request values, prioritizing certain participants, or 660 use session default values. If the Mixer changes its trade-off, it 661 needs to request from the media sender(s) the use of the new value, 662 by creating a TSTR of its own. Upon reaching a decision on the used 663 trade-off it includes that value in the acknowledgement. 665 Even if a Mixer or Translator performs transcoding, it is very 666 difficult to deliver media with the requested trade-off, unless the 667 content the MCU receives is already close to that trade-off. Only in 668 cases where the original source has substantially higher quality (and 669 bit-rate), it is likely that transcoding can result in the requested 670 trade-off. 672 3.5.2.4. Reliability 673 A request and reception acknowledgement mechanism is specified. The 674 Temporal Spatial Trade-off Announcement (TSTA) message informs the 675 request-sender that its request has been received, and what trade-off 676 is used henceforth. This acknowledgment mechanism is desirable for at 677 least the following reasons: 679 o A change in the trade-off cannot be directly identified from the 680 media bit stream, 681 o User feedback cannot be implemented without information of the 682 chosen trade-off value, according to the media sender's 683 constraints, 684 o Repetitive sending of messages requesting an unimplementable trade- 685 off can be avoided. 687 3.5.3. H.271 Video Back Channel Message 689 ITU-T Rec. H.271 defines syntax, semantics, and suggested encoder 690 reaction to a video back channel message. The codepoint defined in 691 this memo is used to transparently convey such a message from media 692 receiver to media sender. 694 We refrain from an in-depth discussion of the available codepoints 695 within H.271 in this memo for a number of reasons. The perhaps most 696 important reason is that we expect backward-compatible additions of 697 codepoints to H.271 outside the update/maturity cycle of this memo. 698 Another reason lies in the complexity of the H.271 specification: it 699 is a dense document with currently 16 pages of content. It does not 700 make any sense to try to summarize its content in a few sentences of 701 IETF lingo -- oversimplification and misguidance would be inevitable. 702 Finally, please note that H.271 contains many statements of 703 applicability and interpretation of its various messages in 704 conjunction with specific video compression standards. This type of 705 discussion would overload the present memo. 707 In so far, this memo follows the guidance of a decade of RTP payload 708 format specification work -- the details of the media format carried 709 is normally not described in any significant detail. 711 However, we note that some H.271 messages bear similarities with 712 native messages of AVPF and this memo. Furthermore, we note that 713 some H.271 message are known to require caution in multicast 714 environments -- or are plainly not usable in multicast or multipoint 715 scenarios. Table 1 provides a brief, oversimplifying overview of the 716 messages currenty defined in H.271, their similar AVPF or CCM 717 messages (the latter as specified in this memo), and an indication of 718 our current knowledge of their multicast safety. 720 H.271 msg type AVPF/CCM msg type multicast-safe 721 0 (when used for 722 reference picture 723 selection) AVPF RPSI No (positive ACK of pictures) 724 1 AVPF PLI Yes 725 2 AVPF SLI Yes 726 3 N/A Yes (no required sender action) 727 4 N/A Yes (no required sender action) 729 Table 1: H.271 messages and their AVPF/CCM equivalents 731 Note: H.271 message type 0 is not a strict equivalent to 732 AVPF's RPSI; it is an indication of known-as-correct reference 733 picture(s) at the decoder. It does not command an encoder to 734 use a defined reference picture (the form of control 735 information envisioned to be carried in RPSI). However, it is 736 believed and intended that H.271 message type 0 will be used 737 for the same purpose as AVPF's RPSI -- although other use 738 forms are also possible. 740 In response to the opaqueness of the H.271 messages especially with 741 respect to the multicast safety, the following guidelines MUST be 742 followed when an implementation wishes to employ the H.271 video back 743 channel message: 745 1. Implementations utilizing the H.271 feedback message MUST stay in 746 compliance with congestion control principles, as outlined in 747 section 5. 748 2. An implementation SHOULD utilize the native messages as defined in 749 [RFC4585] and in this memo instead of similar messages defined in 750 [H.271]. Our current understanding of similar messages is 751 documented in Table 1 above. One good reason to divert from the 752 SHOULD statement above would be if it is clearly understood that, 753 for a given application and video compression standard, the 754 aforementioned "similarity" is not given, in contrast to what 755 the table indicates. 756 3. It has been observed that some of the H.271 codepoints currently 757 in existence are not multicast-safe. Therefore, the sensible 758 thing to do is not to use the H.271 feedback message type in 759 multicast environments. It MAY be used only when all the issues 760 mentioned later are fully understood by the implementer, and 761 properly taken into account by all endpoints. In all other cases, 762 the H.271 message type MUST NOT be used in conjunction with 763 multicast. 764 4. It has been observed that even in centralized multipoint 765 environments, where the mixer should theoretically be able to 766 resolve issues as deocumented below, the implementation of such a 767 mixer and cooperative endpoints is a very difficult and tedious 768 task. Therefore, H.271 message MUST NOT be used in centralized 769 multipoint scenarios, unless all the issues mentioned below are 770 fully understood by the implementer, and properly taken into 771 account by both mixer and endpoints. 773 Issues with point to Multi-point: 775 1. Different state established on different receivers. One example is 776 the reference picture feedback message, which, when sent to 777 receivers in which the video codecs are at different state due to 778 previous losses or stream switches, the results can be 779 unpredictable and annoying. 780 2. Combination of multiple messages/requests by a media sender into 781 an action and or response. 782 3. Suppression of requests may need to go beyond the basic mechanism 783 described in AVPF. For example forward messages may be need to 784 suppress the generation of requests. 786 Issues with translators and mixers 787 1. Combination of multiple message or requests into an action or 788 response. 789 2. 791 3.5.3.1. Reliability 793 H.271 video back channel messages do not require reliable 794 transmission, and the reception of a message can be derived from the 795 forward video bit stream. Therefore, no specific reception 796 acknowledgement is specified. 798 With respect to re-sending rules, clause 3.5.1.1. applies. 800 3.5.4. Temporary Maximum Media Bit-rate Request 802 A receiver, translator or mixer uses the Temporary Maximum Media Bit- 803 rate Request (TMMBR, "timber") to request a sender to limit the 804 maximum bit-rate for a media stream to, or below, the provided value. 805 The primary usage for this is a scenario with MCU (use case 6), 806 corresponding to Topo-Translator or Topo-Mixer, but also Topo-Point- 807 to-Point. 809 The temporary maximum media bit-rate messages are generic messages 810 that can be applied to any media. 812 The reasoning below assumes that the participants have negotiated a 813 session maximum bit-rate, using the signalling protocol. This value 814 can be global, for example in case of point-to-point, multicast, or 815 translators. It may also be local between the participant and the 816 peer or mixer. In both cases, the bit-rate negotiated in signalling 817 is the one that the participant guarantees to be able to handle 818 (encode and decode). In practice, the connectivity of the 819 participant also bears an influence to the negotiated value -- it 820 does not necessarily make much sense to negotiate a media bit rate 821 that one's network interface does not support. 823 An already established temporary bit-rate value may be changed at any 824 time (subject to the timing rules of the feedback message sending), 825 and to any value between zero and the session maximum, as negotiated 826 during signalling. Even if a sender has received a TMMBR message 827 increasing the bit-rate, all increases must be governed by a 828 congestion control algorithm. TMMBR only indicates known limitations, 829 usually in the local environment, and does not provide any 830 guarantees. 832 If it is likely that the new bit-rate indicated by TMMBR will be 833 valid for the remainder of the session, the TMMBR sender can perform 834 a renegotiation of the session upper limit using the session 835 signalling protocol. 837 3.5.4.1. MCU based Multi-point operation 839 Assume a small mixer-based multiparty conference is ongoing, as 840 depicted in Topo-Mixer of [Topologies]. All participants (A-D) have 841 negotiated a common maximum bit-rate that this session can use. The 842 conference operates over a number of unicast links between the 843 participants and the MCU. The congestion situation on each of these 844 links can easily be monitored by the participant in question and by 845 the MCU, utilizing, for example, RTCP Receiver Reports. However, any 846 given participant has no knowledge of the congestion situation of the 847 connections to the other participants. Worse, without mechanisms 848 similar to the ones discussed in this draft, the MCU (who is aware of 849 the congestion situation on all connections it manages) has no 850 standardized means to inform participants to slow down, short of 851 forging its own receiver reports (which is undesirable). In 852 principle, an MCU confronted with such a situation is obliged to thin 853 or transcode streams intended for connections that detected 854 congestion. 856 In practice, stream thinning - if performed media aware - is 857 unfortunately a very difficult and cumbersome operation and adds 858 undesirable delay. If done media unaware, it leads very quickly to 859 unacceptable reproduced media quality. Hence, means to slow down 860 senders even in the absence of congestion on their connections to the 861 MCU are desirable. 863 To allow the MCU to perform congestion control on the individual 864 links, without performing transcoding, there is a need for a 865 mechanism that enables the MCU to request the participant's media 866 encoders to limit their maximum media bit-rate currently used. The 867 MCU handles the detection of a congestion state between itself and a 868 participant as follows: 869 1. Start thinning the media traffic to the supported bit-rate. 870 2. Use the TMMBR to request the media sender(s) to reduce the media 871 bit-rate sent by them to the MCU, to a value that is in compliance 872 with congestion control principles for the slowest link. Slow 873 refers here to the available bandwidth and packet rate after 874 congestion control. 875 3. As soon as the bit-rate has been reduced by the sending part, the 876 MCU stops stream thinning implicitly, because there is no need for 877 it any more as the stream is in compliance with congestion 878 control. 880 Above algorithms may suggest to some that there is no need for the 881 TMMBR - it should be sufficient to solely rely on stream thinning. 882 As much as this is desirable from a network protocol designer's 883 viewpoint, it has the disadvantage that it doesn't work very 884 well - the reproduced media quality quickly becomes unusable. 886 It appears to be a reasonable compromise to rely on stream thinning 887 as an immediate reaction tool to combat congestions, and have a quick 888 control mechanism that instructs the original sender to reduce its 889 bitrate. 891 Note also that the standard RTCP receiver report cannot serve for the 892 purpose mentioned. In an environment with RTP Mixers, the RTCP RR is 893 being sent between the RTP receiver in the endpoint and the RTP 894 sender in the Mixer only - as there is no multicast transmission. 895 The stream that needs to be bandwidth-reduced, however, is the one 896 between the original sending endpoint and the Mixer. This endpoint 897 doesn't see the aforementioned RTCP RRs, and hence needs explicitly 898 informed about desired bandwidth adjustments. 900 In this topology it is the Mixer's responsibility to collect, and 901 consider jointly, the different bit-rates which the different links 902 may support, into the bit rate requested. This aggregation may also 903 take into account that the Mixer may contain certain transcoding 904 capabilities (as discussed in under Topo-Mixer in [Topologies]), 905 which can be employed for those few of the session participants that 906 have the lowest available bit-rates. 908 3.5.4.2. Point-to-Multipoint using Multicast or Translators 910 In these topologies, corresponding to Topo-Multicast or Topo- 911 Translator RTCP RRs are transmitted globally which allows for the 912 detection of transmission problems such as congestion, on a medium 913 timescale. As all media senders are aware of the congestion 914 situation of all media receivers, the rationale of the use of TMMBR 915 of section 3.5.4.1 does not apply. However, even in this case the 916 congestion control response can be improved when the unicast links 917 are employing congestion controlled transport protocols (such as TCP 918 or DCCP). A peer may also report local limitation to the media 919 sender. 921 3.5.4.3. Point-to-point operation 923 In use case 7 it is possible to use TMMBR to improve the performance 924 at times of changes in the known upper limit of the bit-rate. In 925 this use case the signalling protocol has established an upper limit 926 for the session and media bit-rates. However at the time of 927 transport link bit-rate reduction, a receiver could avoid serious 928 congestion by sending a TMMBR to the sending side. 930 3.5.4.4. Reliability 932 The reaction of a media sender to the reception of a TMMBR message is 933 not immediately identifiable through inspection of the media stream. 934 Therefore a more explicit mechanism is needed to avoid unnecessary 935 re-sending of TMMBR messages. Using a statistically based 936 retransmission scheme would only provide statistical guarantees of 937 the request being received. It would also not avoid the 938 retransmission of already received messages. In addition it does not 939 allow for easy suppression of other participants requests. For the 940 reasons mentioned, a mechanism based on explicit notification is 941 used. 943 Upon the reception of a request a media sender sends a notification 944 containing the current applicable limitation of the bit-rate, and 945 which session participants that own that limit. That allows all other 946 participants to suppress any request they may have, with limitation 947 value equal or higher to the current one. The identity of the owner 948 allows for small message sizes and media sender states. A media 949 sender only keeps state for the SSRC of the current owner of the 950 limitation; all other requests and their sources are not saved. Only 951 the participant with the lowest value is allowed to remove or change 952 its limitation. Otherwise anyone that ever set a limitation would 953 need to remove it to allow the maximum bit-rate to be raised beyond 954 that value. 956 4. RTCP Receiver Report Extensions 958 This memo specifies six new feedback messages. The Full Intra Request 959 (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal-Spatial 960 Trade-off Announcement (TSTA), and Video Back Channel Message (VBCM) 961 are "Payload Specific Feedback Messages" in the sense of section 6.3 962 of AVPF [RFC4585]. The Temporary Maximum Media Bit-rate Request 963 (TMMBR) and Temporary Maximum Media Bit-rate Notification (TMMBN) are 964 "Transport Layer Feedback Messages" in the sense of section 6.2 of 965 AVPF. 967 In the following subsections, the new feedback messages are defined, 968 following a similar structure as in the AVPF specification's sections 969 6.2 and 6.3, respectively. 971 4.1. Design Principles of the Extension Mechanism 973 RTCP was originally introduced as a channel to convey presence, 974 reception quality statistics and hints on the desired media coding. 975 A limited set of media control mechanisms have been introduced in 976 early RTP payload formats for video formats, for example in RFC 2032 977 [RFC2032]. However, this specification, for the first time, suggests 978 a two-way handshake for one of its messages. There is danger that 979 this introduction could be misunderstood as the precedence for the 980 use of RTCP as an RTP session control protocol. In order to prevent 981 these misunderstandings, this subsection attempts to clarify the 982 scope of the extensions specified in this memo, and strongly suggests 983 that future extensions follow the rationale spelled out here, or 984 compellingly explain why they divert from the rationale. 986 In this memo, and in AVPF [RFC4585], only such messages have been 987 included which 989 a) have comparatively strict real-time constraints, which prevent the 990 use of mechanisms such as a SIP re-invite in most application 991 scenarios. The real-time constraints are explained separately for 992 each message where necessary 993 b) are multicast-safe in that the reaction to potentially 994 contradicting feedback messages is specified, as necessary for 995 each message 996 c) are directly related to activities of a certain media codec, class 997 of media codecs (e.g. video codecs), or the given media stream. 999 In this memo, a two-way handshake is only introduced for such 1000 messages that 1001 a) require a notification or acknowledgement due to their nature, 1002 which is motivated separately for each message 1003 b) the notification or acknowledgement cannot be easily derived from 1004 the media bit stream. 1006 All messages in AVPF [RFC4585] and in this memo follow a number of 1007 common design principles. In particular: 1009 a) Media receivers are not always implementing higher control 1010 protocol functionalities (SDP, XML parsers and such) in their 1011 media path. Therefore, simple binary representations are used in 1012 the feedback messages and not an (otherwise desirable) flexible 1013 format such as, for example, XML. 1015 4.2. Transport Layer Feedback Messages 1017 Transport Layer FB messages are identified by the value RTPFB (205) 1018 as RTCP packet type. 1020 In AVPF, one message of this category had been defined. This memo 1021 specifies two more messages for a total of three messages of this 1022 type. They are identified by means of the FMT parameter as follows: 1024 0: unassigned 1025 1: Generic NACK (as per AVPF) 1026 2: Temporary Maximum Media Bit-rate Request 1027 3: Temporary Maximum Media Bit-rate Notification 1028 4-30: unassigned 1029 31: reserved for future expansion of the identifier number space 1031 The following subsection defines the formats of the FCI field for 1032 this type of FB message. 1034 4.2.1. Temporary Maximum Media Bit-rate Request (TMMBR) 1036 The FCI field of a TMMBR Feedback message SHALL contain one or more 1037 FCI entries. 1039 4.2.1.1. Semantics 1041 The TMMBR is used to indicate the highest bit-rate per sender of a 1042 media, which the receiver currently supports in this RTP session. 1043 The media sender MAY use any lower bit-rate, as it may need to 1044 address a congestion situation or other limiting factors. See 1045 section 5 (congestion control) for more discussion. 1047 The "SSRC of the packet sender" field indicates the source of the 1048 request, and the "SSRC of media source" is not used and SHALL be set 1049 to 0. The SSRC of media sender in the FCI field denotes the media 1050 sender the message applies to. This is useful in the multicast or 1051 translator topologies where each media sender may be addressed in a 1052 single TMMBR message using multiple FCIs. 1054 A TMMBR FCI MAY be repeated in subsequent TMMBR messages if no 1055 applicable TMMBN FCI has been received at the time of transmission of 1056 the next RTCP packet. The bit-rate value of a TMMBR FCI MAY be 1057 changed from a previous TMMBR message and the next, regardless of the 1058 eventual reception of an applicable TMMBN FCI. 1060 Please note that a TMMBN message SHALL be sent by the media sender at 1061 the earliest possible point in time, as a result of any TMMBR 1062 messages received since the last sending of TMMBN. The TMMBN message 1063 indicates the limit and the owner of that limit at the time of the 1064 transmission of the message. The limit is the lowest of the previous 1065 value and all values received in TMMBR FCI's since the last TMMBN was 1066 transmitted. 1068 A media receiver who is not the owner of the bandwidth limit when 1069 planning to send a TMMBR, SHOULD request a bandwidth lower than their 1070 knowledge of currently established bandwidth limit for this media 1071 sender, or suppres their transmission for TMMBR. The exception to 1072 the above rule is when a receiver either doesn't know the limit or 1073 are certain that their local representation of the value is in error. 1074 All received requests for bandwidth limits greater or equal to the 1075 one currently established are ignored, with the exception of them 1076 resulting in the transmission of a TMMBN. A media receiver who is 1077 the owner of the current bandwidth limit, MAY lower the value 1078 further, raise the value or remove the restriction completely by 1079 setting the bandwidth limit equal to the session limit. 1081 Once a session participant receives the TMMBN in response to its 1082 TMMBR, with its own SSRC, it knows that it "owns" the bandwidth 1083 limitation. Only the "owner" of a bandwidth limitation can raise it 1084 or reset it to the session limit. 1086 Note that, due to the unreliable nature of transport of TMMBR and 1087 TMMBN, the above rules may lead to the sending of TMMBR messages 1088 disobeying the rules above. Furthermore, in multicast scenarios it 1089 can happen that more than one session participants believes it "owns" 1090 the current bandwidth limitation. This is not critical for a number 1091 of reasons: 1092 a) If a TMMBR message is lost in transmission, the media sender does 1093 not learn about the restrictions imposed on it. However, it also 1094 does not send a TMMBN message notifying reception of a request it 1095 has never received. Therefore, no new limit is established, the 1096 media receiver sending the more restrictive TMMBR is not the 1097 owner. Since this media receiver has not seen a notification 1098 corresponding to its request, it is free to re-send it. 1099 b) Similarly, if a TMMBN message gets lost, the media receiver that 1100 has sent the corresponding TMMBR request does not receive 1101 acknowledgement. In that case, it is also not the "owner" of the 1102 restriction and is free to re-send the request. 1103 c) If multiple competing TMMBR messages are sent by different session 1104 participants, then the resulting TMMBN indicates the lowest 1105 bandwidth requested; the owner is set to the sender of the TMMBR 1106 with the lowest requested bandwidth value. 1108 TMMBR feedback SHOULD NOT be used if the underlying transport 1109 protocol is capable of providing similar feedback information from 1110 the receiver to the sender. 1112 It also important to consider the security risks involved with faked 1113 TMMBRs. See security considerations in Section 6. 1115 The feedback messages may be used in both multicast and unicast 1116 sessions of any of the specified topologies. 1118 For sessions with a larger number of participants using the lowest 1119 common denominator, as required by this mechanism, may not be the 1120 most suitable course of action. Larger session may need to consider 1121 other ways to support adapted bit-rate to participants, such as 1122 partitioning the session in different quality tiers, or use some 1123 other method of achieving bit-rate scalability. 1125 If the value set by a TMMBR message is expected to be permanent the 1126 TMMBR setting party is RECOMMENDED to renegotiate the session 1127 parameters to reflect that using the setup signalling. 1129 An SSRC may time out according to the default rules for RTP session 1130 participants, i.e. the media sender has not received any RTCP packet 1131 from the owner for the last five regular reporting intervals. An SSRC 1132 may also leave the session, indicating this through the transmission 1133 of an RTCP BYE packet or an external signalling channel. In all of 1134 these cases the entity is considered to have left the session. In the 1135 case the "owner" leaves the session, the value SHALL be set to the 1136 session maximum and the transmission of a TMMBN is scheduled. 1138 4.2.1.2. Message Format 1139 The Feedback control information (FCI) consists of one or more TMMBR 1140 FCI entries with the following syntax: 1142 0 1 2 3 1143 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 | SSRC | 1146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 | Maximum bit-rate in units of 128 bits/s | 1148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 Figure 1 - Syntax for the TMMBR message 1152 SSRC: The SSRC value of the target of this specific maximum bit- 1153 rate request. 1155 Maximum bit-rate: The temporary maximum media bit-rate value in 1156 units of 128 bit/s. This provides range from 0 to 1157 549755813888 bits/s (~550 Tbit/s) with a granularity of 128 1158 bits/s. 1160 The length of the FB message is be set to 2+2*N where N is the number 1161 of TMMBR FCI entries. 1163 4.2.1.3. Timing Rules 1165 The first transmission of the request message MAY use early or 1166 immediate feedback in cases when timeliness is desirable. Any 1167 repetition of a request message SHOULD use regular RTCP mode for its 1168 transmission timing. 1170 4.2.2. Temporary Maximum Media Bit-rate Notification (TMMBN) 1172 The FCI field of the TMMBN Feedback message SHALL contain one TMMBN 1173 FCI entry. 1175 4.2.2.1. Semantics 1177 This feedback message is used to notify the senders of any TMMBR 1178 message that one or more TMMBR messages have been received. It 1179 indicates to all participants the currently employed maximum bit-rate 1180 value and the "owner" of the current limitation. The "owner" of a 1181 limitation is the sender of the last (most restrictive) TMMBR message 1182 received by the media sender. 1184 The "SSRC of the packet sender" field indicates the source of the 1185 notification. The "SSRC of media source" SHALL be set to the SSRC of 1186 the media receiver that currently owns the bit-rate limitation. 1188 A TMMBN message SHALL be scheduled for transmission after the 1189 reception of a TMMBR message with a FCI including the session 1190 participant's SSRC. Only a single TMMBN SHALL be sent, even if more 1191 than one TMMBR messages are received between the scheduling of the 1192 transmission and the actual transmission of the TMMBN message. The 1193 TMMBN message indicates the limit and the owner of that limit at the 1194 time of transmitting the message. The limit SHALL be the lowest of 1195 the existing and all values received in TMMBR messages since the last 1196 TMMBN was transmitted. The one sending that request SHALL become the 1197 owner of the limit. 1199 The reception of a TMMBR message with a transmission limit greater or 1200 equal than the current limit SHALL still result in the transmission 1201 of a TMMBN message. However the limit and owner is not changed, 1202 unless it was from the same owner, and the current limit and owner is 1203 indicated in the TMMBN message. This procedure allows session 1204 participants that haven't seen the last TMMBN message to get a 1205 correct view of this media sender's state. 1207 When a media sender determines an "owner" of a limitation has left 1208 the session, then the current limitation is removed, and the media 1209 sender SHALL send a TMMBN message indicating the maximum session 1210 bandwidth. 1212 4.2.2.2. Message Format 1214 The TMMBN Feedback control information (FCI) entry has the following 1215 syntax: 1217 0 1 2 3 1218 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1220 | Maximum bit-rate in units of 128 bits/s | 1221 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1223 Figure 2 - Syntax for the TMMBN message 1225 Maximum bit-rate: The current temporary maximum media bit-rate 1226 value in units of 128 bit/s. 1228 The length field value of the FB message SHALL be 3. 1230 4.2.2.3. Timing Rules 1232 The acknowledgement SHOULD be sent as soon as allowed by the applied 1233 timing rules for the session. Immediate or early feedback mode SHOULD 1234 be used for these messages. 1236 4.3. Payload Specific Feedback Messages 1238 Payload-Specific FB messages are identified by the value PT=PSFB 1239 (206) as RTCP packet type. 1241 AVPF defines three payload-specific FB messages and one application 1242 layer FB message. This memo specifies four additional payload 1243 specific feedback messages. All are identified by means of the FMT 1244 parameter as follows: 1246 0: unassigned 1247 1: Picture Loss Indication (PLI) 1248 2: Slice Lost Indication (SLI) 1249 3: Reference Picture Selection Indication (RPSI) 1250 4: Full Intra Request Command (FIR) 1251 5: Temporal-Spatial Trade-off Request (TSTR) 1252 6: Temporal-Spatial Trade-off Announcement (TSTA) 1253 7: Video Back Channel Message (VBCM) 1254 8-14: unassigned 1255 15: Application layer FB message 1256 16-30: unassigned 1257 31: reserved for future expansion of the number space 1259 The following subsections define the new FCI formats for the payload- 1260 specific FB messages. 1262 4.3.1. Full Intra Request (FIR) command 1264 The FIR command FB message is identified by PT=PSFB and FMT=4. 1266 There MUST be one or more FIR entry contained in the FCI field. 1268 4.3.1.1. Semantics 1270 Upon reception of a FIR message, an encoder MUST send a decoder 1271 refresh point (see Section 2.2) as soon as possible. 1273 Note: Currently, video appears to be the only useful application 1274 for FIR, as it appears to be the only RTP payloads widely deployed 1275 that relies heavily on media prediction across RTP packet 1276 boundaries. However, use of FIR could also reasonably be 1277 envisioned for other media types that share essential properties 1278 with compressed video, namely cross-frame prediction (whatever a 1279 frame may be for that media type). One possible example may be the 1280 dynamic updates of MPEG-4 scene descriptions. It is suggested that 1281 payload formats for such media types refer to FIR and other message 1282 types defined in this specification and in AVPF, instead of 1283 creating similar mechanisms in the payload specifications. The 1284 payload specifications may have to explain how the payload specific 1285 terminologies map to the video-centric terminology used here. 1287 Note: In environments where the sender has no control over the 1288 codec (e.g. when streaming pre-recorded and pre-coded content), the 1289 reaction to this command cannot be specified. One suitable 1290 reaction of a sender would be to skip forward in the video bit 1291 stream to the next decoder refresh point. In other scenarios, it 1292 may be preferable not to react to the command at all, e.g. when 1293 streaming to a large multicast group. Other reactions may also be 1294 possible. When deciding on a strategy, a sender could take into 1295 account factors such as the size of the receiving group, the 1296 "importance" of the sender of the FIR message (however "importance" 1297 may be defined in this specific application), the frequency of 1298 decoder refresh points in the content, and others. However a 1299 session which predominately handles pre-coded content shouldn't use 1300 the FIR at all. 1302 The sender MUST consider congestion control as outlined in section 5, 1303 which MAY restrict its ability to send a decoder refresh point 1304 quickly. 1306 Note: The relationship between the Picture Loss Indication and FIR 1307 is as follows. As discussed in section 6.3.1 of AVPF, a Picture 1308 Loss Indication informs the decoder about the loss of a picture and 1309 hence the likeliness of misalignment of the reference pictures in 1310 encoder and decoder. Such a scenario is normally related to losses 1311 in an ongoing connection. In point-to-point scenarios, and without 1312 the presence of advanced error resilience tools, one possible 1313 option an encoder has is to send a decoder refresh point. However, 1314 there are other options including ignoring the PLI, for example if 1315 only one receiver of many has sent a PLI or when the embedded 1316 stream redundancy is likely to clean up the reproduced picture 1317 within a reasonable amount of time. The FIR, in contrast, leaves a 1318 real-time encoder no choice but to send a decoder refresh point. 1319 It disallows the encoder to take into account any considerations 1320 such as the ones mentioned above. 1322 Note: Mandating a maximum delay for completing the sending of a 1323 decoder refresh point would be desirable from an application 1324 viewpoint, but may be problematic from a congestion control point 1325 of view. "As soon as possible" as mentioned above appears to be a 1326 reasonable compromise. 1328 FIR SHALL NOT be sent as a reaction to picture losses - it is 1329 RECOMMENDED to use PLI instead. FIR SHOULD be used only in such 1330 situations where not sending a decoder refresh point would render the 1331 video unusable for the users. 1333 Note: a typical example where sending FIR is adequate is when, in a 1334 multipoint conference, a new user joins the session and no regular 1335 decoder refresh point interval is established. Another example 1336 would be a video switching MCU that changes streams. Here, 1337 normally, the MCU issues a FIR to the new sender so to force it to 1338 emit a decoder refresh point. The decoder refresh point includes 1339 normally a Freeze Picture Release (defined outside this 1340 specification), which re-starts the rendering process of the 1341 receivers. Both techniques mentioned are commonly used in MCU- 1342 based multipoint conferences. 1344 Other RTP payload specifications such as RFC 2032 [RFC2032] already 1345 define a feedback mechanism for certain codecs. An application 1346 supporting both schemes MUST use the feedback mechanism defined in 1347 this specification when sending feedback. For backward compatibility 1348 reasons, such an application SHOULD also be capable to receive and 1349 react to the feedback scheme defined in the respective RTP payload 1350 format, if this is required by that payload format. 1352 The "SSRC of the packet sender" field indicates the source of the 1353 request, and the "SSRC of media source" is not used and SHALL be set 1354 to 0. The SSRC of media sender to which the FIR command applies to is 1355 in the FCI. 1357 4.3.1.2. Message Format 1359 Full Intra Request uses one additional FCI field, the content of 1360 which is depicted in Figure 3 The length of the FB message MUST be 1361 set to 2+2*N, where N is the number of FCI entries. 1363 0 1 2 3 1364 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1366 | SSRC | 1367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1368 | Seq. nr | Reserved | 1369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1371 Figure 3 - Syntax for the FIR message 1373 SSRC: The SSRC value of the media sender of this specific FIR 1374 command. 1376 Seq. nr: Command sequence number. The sequence number space is 1377 unique for each tuple consisting of the SSRC of command 1378 source and the SSRC of the command target. The sequence 1379 number SHALL be increased by 1 modulo 256 for each new 1380 command. A repetition SHALL NOT increase the sequence 1381 number. Initial value is arbitrary. 1383 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1384 reception. 1386 The semantics of this FB message is independent of the RTP payload 1387 type. 1389 4.3.1.3. Timing Rules 1391 The timing follows the rules outlined in section 3 of [RFC4585]. FIR 1392 commands MAY be used with early or immediate feedback. The FIR 1393 feedback message MAY be repeated. If using immediate feedback mode 1394 the repetition SHOULD wait at least onee RTT before being sent. In 1395 early or regular RTCP mode the repetition is sent in the next regular 1396 RTCP packet. 1398 4.3.1.4. Remarks 1400 FIR messages typically trigger the sending of full intra or IDR 1401 pictures. Both are several times larger then predicted (inter) 1402 pictures. Their size is independent of the time they are generated. 1403 In most environments, especially when employing bandwidth-limited 1404 links, the use of an intra picture implies an allowed delay that is a 1405 significant multitude of the typical frame duration. An example: If 1406 the sending frame rate is 10 fps, and an intra picture is assumed to 1407 be 10 times as big as an inter picture, then a full second of latency 1408 has to be accepted. In such an environment there is no need for a 1409 particular short delay in sending the FIR message. Hence waiting for 1410 the next possible time slot allowed by RTCP timing rules as per 1411 [RFC4585] may not have an overly negative impact on the system 1412 performance. 1414 4.3.2. Temporal-Spatial Trade-off Request (TSTR) 1416 The TSTR FB message is identified by PT=PSFB and FMT=5. 1418 There MUST be one or more TSTR entry contained in the FCI field. 1420 4.3.2.1. Semantics 1422 A decoder can suggest the use of a temporal-spatial trade-off by 1423 sending a TSTR message to an encoder. If the encoder is capable of 1424 adjusting its temporal-spatial trade-off, it SHOULD take into account 1425 the received TSTR message for future coding of pictures. A value of 1426 0 suggests a high spatial quality and a value of 31 suggests a high 1427 frame rate. The values from 0 to 31 indicate monotonically a desire 1428 for higher frame rate. Actual values do not correspond to precise 1429 values of spatial quality or frame rate. 1431 The reaction to the reception of more than one TSTR message by a 1432 media sender from different media receivers is left open to the 1433 implementation. The selected trade-off SHALL be communicated to the 1434 media receivers by the means of the TSTA message. 1436 The "SSRC of the packet sender" field indicates the source of the 1437 request, and the "SSRC of media source" is not used and SHALL be set 1438 to 0. The SSRC of media sender to which the TSTR applies to is in the 1439 FCI entries. 1441 A TSTR message may contain multiple requests to different media 1442 senders, using multiple FCI entries. 1444 4.3.2.2. Message Format 1446 The Temporal-Spatial Trade-off Request uses one FCI field, the 1447 content of which is depicted in Figure 4. The length of the FB 1448 message MUST be set to 2+2*N, where N is the number of FCI entries 1449 included. 1451 0 1 2 3 1452 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1454 | SSRC | 1455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1456 | Seq nr. | Reserved | Index | 1457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1459 Figure 4 - Syntax of the TSTR 1461 SSRC: The SSRC value of the target (or the media sender) of this 1462 specific TSTR request. 1464 Seq. nr: Request sequence number. The sequence number space is 1465 unique for each tuple consisting of the SSRC of request 1466 source and the SSRC of the request target. The sequence 1467 number SHALL be increased by 1 modulo 256 for each new 1468 command. A repetition SHALL NOT increase the sequence 1469 number. Initial value is arbitrary. 1471 Index: An integer value between 0 and 31 that indicates the 1472 relative trade off that is requested. An index value of 0 1473 index highest possible spatial quality, while 31 indicates 1474 highest possible temporal resolution. 1476 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1477 reception. 1479 4.3.2.3. Timing Rules 1481 The timing follows the rules outlined in section 3 of [RFC4585]. 1482 This request message is not time critical and SHOULD be sent using 1483 regular RTCP timing. Only if it is known that the user interface 1484 requires a quick feedback, the message MAY be sent with early or 1485 immediate feedback timing. 1487 4.3.2.4. Remarks 1489 The term "spatial quality" does not necessarily refer to the 1490 resolution, measured by the number of pixels the reconstructed video 1491 is using. In fact, in most scenarios the video resolution stays 1492 constant during the lifetime of a session. However, all video 1493 compression standards have means to adjust the spatial quality at a 1494 given resolution, often influenced by the Quantizer Parameter or QP. 1495 A numerically low QP results in a good reconstructed picture quality, 1496 whereas a numerically high QP yields a coarse picture. The typical 1497 reaction of an encoder to this request is to change its rate control 1498 parameters to use a lower frame rate and a numerically lower (on 1499 average) QP, or vice versa. The precise mapping of Index, frame 1500 rate, and QP is intentionally left open here, as it depends on 1501 factors such as compression standard employed, spatial resolution, 1502 content, bit rate, and many more. 1504 4.3.3. Temporal-Spatial Trade-off Announcement (TSTA) 1506 The TSTA FB message is identified by PT=PSFB and FMT=6. 1508 There SHALL be one or more TSTA contained in the FCI field. 1510 4.3.3.1. Semantics 1512 This feedback message is used to acknowledge the reception of a TSTR. 1513 A TSTA entry in a TSTA feedback message SHALL be sent for each TSTR 1514 entry targeted to this session participant, i.e. each TSTR received 1515 that in the SSRC field in the entry has the receiving entities SSRC. 1516 A single TSTA message MAY acknowledge multiple requests using 1517 multiple FCI entries. The index value included SHALL be the same in 1518 all FCI's part of the TSTA message. Including a FCI for each 1519 requestor allows each requesting entity to determine that the media 1520 sender targeted have received the request. The announcement SHALL be 1521 sent also for repetitions received. If the request receiver has 1522 received TSTR with several different sequence numbers from a single 1523 requestor it SHALL only respond to the request with the highest 1524 (modulo 256) sequence number. 1526 The TSTA SHALL include the Temporal-Spatial Trade-off index that will 1527 be used as a result of the request. This is not necessarily the same 1528 index as requested, as media sender may need to aggregate requests 1529 from several requesting session participants. It may also have some 1530 other policies or rules that limit the selection. 1532 The "SSRC of the packet sender" field indicates the source of the 1533 announcement, and the "SSRC of media source" is not used and SHALL be 1534 set to 0. The SSRC of the requesting entity to which the announcement 1535 applies to is in the FCI. 1537 4.3.3.2. Message Format 1539 The Temporal-Spatial Trade-off Announcement uses one additional FCI 1540 field, the content of which is depicted in Figure 5. The length of 1541 the FB message MUST be set to 2+2*N, where N is the number of FCI 1542 entries. 1544 0 1 2 3 1545 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1547 | SSRC | 1548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1549 | Seq nr. | Reserved | Index | 1550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1552 Figure 5 - Syntax of the TSTA 1554 SSRC: The SSRC of the source of the TSTA request that is 1555 acknowledged. 1557 Seq. nr: The sequence number value from the TSTA request that is 1558 being acknowledged. 1560 Index: The trade-off value the media sender is using henceforth. 1562 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1563 reception. 1565 Informative note: The returned trade-off value (Index) may differ 1566 from the requested one, for example in cases where a media encoder 1567 cannot tune its trade-off, or when pre-recorded content is used. 1569 4.3.3.3. Timing Rules 1571 The timing follows the rules outlined in section 3 of [RFC4585]. 1572 This acknowledgement message is not extremely time critical and 1573 SHOULD be sent using regular RTCP timing. 1575 4.3.3.4. Remarks 1577 None 1579 4.3.4. H.271 VideoBackChannelMessage (VBCM) 1581 The VBCM FB message is identified by PT=PSFB and FMT=7. 1583 There MUST be one or more VBCM entry contained in the FCI field. 1585 4.3.4.1. Semantics 1587 The "payload" of VBCM indication carries codec specific, different 1588 types of feedback information. The type of feedback information can 1589 be classified as "status report" such as receiving bit stream 1590 without errors, loss of partial or complete picture or block or 1591 "update requests" such as complete refresh of the bit stream. 1593 Note: There are possible overlap between the VBCM sub-messages 1594 and CCM/AVPF feedback messages, such FIR. Please see section 1595 3.5.3 for further discussions. 1597 The different types of feedback sub-messages carried in the VBCM are 1598 indicated by the "payloadType" as defined in [VBCM]. The different 1599 sub-message types as defined in [VBCM] are re-produced below for 1600 convenience. "payloadType", in ITU-T Rec. H.271 terminology, 1601 refers to the sub-type of the H.271 message and should not be 1602 confused with an RTP payload type. 1604 Payload Type Message Content 1606 0 One or more pictures without detected bitstream error mismatch 1607 1 One or more pictures that are entirely or partially lost 1608 2 A set of blocks of one picture that is entirely or partially 1609 lost 1610 3 CRC for one parameter set 1611 4 CRC for all parameter sets of a certain type 1612 5 A "reset" request indicating that the sender should completely 1613 refresh the video bitstream as if no prior bitstream data had 1614 been received 1615 > 5 Reserved for future use by ITU-T 1617 Table 2: H.271 message types 1619 The bit string or the "payload" of VBCM message is of variable 1620 length and is self-contained and coded in a variable length, binary 1621 format. The media sender necessarily has to be able to parse this 1622 optimized binary format to make use of VBCM messages 1624 Each of the different types of sub-messages (indicated by 1625 payloadType) may have different semantic based on the codec used. 1627 The "SSRC of the packet sender" field indicates the source of the 1628 request, and the "SSRC of media source" is not used and SHALL be set 1629 to 0. The SSRC of the media sender to which the VBCM message applies 1630 to is in the FCI. 1632 4.3.4.2. Message Format 1634 The VBCM indication uses one FCI field and the syntax is depicted in 1635 Figure 6. 1637 0 1 2 3 1638 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1640 | SSRC | 1641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1642 | Seq. nr |0| Payload Type| Length | 1643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1644 | VBCM Octet String.... | Padding | 1645 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1647 Figure 6 - Syntax for VBCM Message 1649 SSRC: The SSRC value of the media sender that is target of the 1650 message, i.e. the media sender whose encoder should to react 1651 to the VBCM message 1653 Seq. nr: Command sequence number. The sequence number space is unique 1654 for each tuple consisting of the SSRC of command source and 1655 the SSRC of the command target. The sequence number SHALL be 1656 increased by 1 modulo 256 for each new command. A repetition 1657 SHALL NOT increase the sequence number. Initial value is 1658 arbitrary. 1660 0: Must be set to 0 and should not be acted upon receiving. 1662 Payload: The RTP payload type for which the VBCM bit stream must be 1663 interpreted. 1665 Length: The length of the VBCM octet string in octets exclusive any 1666 padding octets 1668 VBCM Octet String: This is the octet string generated by the decoder 1669 carrying a specific feedback sub-message. It is of variable 1670 length. 1672 Padding: Bytes set to 0 to make up a 32 bit boundary. 1674 Timing Rules 1676 The timing follows the rules outlined in section 3 of [RFC4585]. 1677 The different sub-message types may have different properties in 1678 regards to the timing of messages that should be used. If several 1679 different types are included in the same feedback packet then the 1680 sub-message type with the most stringent requiremnts should be 1681 followed. 1683 Remarks 1684 Please see section 3.5.3 for the applicability of the VBCM message 1685 in relation to messages in both AVPF and this memo with similar 1686 functionality. 1688 Note: There has been some discussion whether the payload type field 1689 in this message is needed. It would be needed if there were 1690 potentially more than one VBCM-capable RTP payload types in the 1691 same session, and that the semantics of a given VBCM message 1692 changes from PT to PT. This appears to be the case. For example, 1693 the picture identification mechanism in messages of H.271 type 0 is 1694 fundamentally different between H.263 and H.264 (although both use 1695 the same syntax. Therefore, the payload field is justified here. 1696 It was further commented that for TSTS and FIR such a need does not 1697 exist, because the semantics of TSTS and FIR are either loosely 1698 enough defined, or generic enough, to apply to all video payloads 1699 currently in existence/envisioned. 1701 5. Congestion Control 1703 The correct application of the AVPF timing rules prevents the network 1704 flooding by feedback messages. Hence, assuming a correct 1705 implementation, the RTCP channel cannot break its bit-rate commitment 1706 and introduce congestion. 1708 The reception of some of the feedback messages modifies the behaviour 1709 of the media senders or, more specifically, the media encoders. All 1710 of these modifications MUST only be performed within the bandwidth 1711 limits the applied congestion control provides. For example, when 1712 reacting to a FIR, the unusually high number of packets that form the 1713 decoder refresh point have to be paced in compliance with the 1714 congestion control algorithm, even if the user experience suffers 1715 from a slowly transmitted decoder refresh point. 1717 A change of the Temporary Maximum Media Bit-rate value can only 1718 mitigate congestion, but not cause congestion as long as congestion 1719 control is also employed. An increase of the value by a request 1720 REQUIRES the media sender to use congestion control when increasing 1721 its transmission rate to that value. A reduction of the value results 1722 in a reduced transmission bit-rate thus reducing the risk for 1723 congestion. 1725 6. Security Considerations 1727 The defined messages have certain properties that have security 1728 implications. These must be addressed and taken into account by users 1729 of this protocol. 1731 The defined setup signalling mechanism is sensitive to modification 1732 attacks that can result in session creation with sub-optimal 1733 configuration, and, in the worst case, session rejection. To prevent 1734 this type of attack, authentication and integrity protection of the 1735 setup signalling is required. 1737 Spoofed or maliciously created feedback messages of the type defined 1738 in this specification can have the following implications: 1739 a. Severely reduced media bit-rate due to false TMMBR messages 1740 that sets the maximum to a very low value. 1741 b. The assignment of the ownership of a bit-rate limit with a 1742 TMMBN message to the wrong participant. Thus potentially 1743 freezing the mechanism until a correct TMMBN message reached 1744 the participants. 1745 c. Sending TSTR that result in a video quality different from 1746 the user's desire, rendering the session less useful. 1747 d. Frequent FIR commands will potentially reduce the frame-rate 1748 making the video jerky due to the frequent usage of decoder 1749 refresh points. 1751 To prevent these attacks there is need to apply authentication and 1752 integrity protection of the feedback messages. This can be 1753 accomplished against group external threats using the RTP profile 1754 that combines SRTP [SRTP] and AVPF into SAVPF [SAVPF]. In the MCU 1755 cases, separate security contexts and filtering can be applied 1756 between the MCU and the participants thus protecting other MCU users 1757 from a misbehaving participant. 1759 7. SDP Definitions 1761 Section 4 of [RFC4585] defines new SDP [RFC2327] attributes that are 1762 used for the capability exchange of the AVPF commands and 1763 indications, such as Reference Picture selection, Picture loss 1764 indication etc. The defined SDP attribute is known as rtcp-fb and its 1765 ABNF is described in section 4.2 of [RFC4585]. In this section we 1766 extend the rtcp-fb attribute to include the commands and indications 1767 that are described in this document for codec control protocol. We 1768 also discuss the Offer/Answer implications for the codec control 1769 commands and indications. 1771 7.1. Extension of rtcp-fb attribute 1773 As described in [RFC4585], the rtcp-fb attribute is defined to 1774 indicate the capability of using RTCP feedback. As defined in AVPF 1775 the rtcp-fb attribute must only be used as a media level attribute 1776 and must not be provided at session level. 1777 All the rules described in [RFC4585] for rtcp-fb attribute relating 1778 to payload type, multiple rtcp-fb attributes in a session description 1779 hold for the new feedback messages for codec control defined in this 1780 document. 1782 The ABNF for rtcp-fb attributed as defined in [RFC4585] is 1784 Rtcp-fb-syntax = "a=rtcp-fb: " rtcp-fb-pt SP rtcp-fb-val CRLF 1786 Where rtcp-fb-pt is the payload type and rtcp-fb-val defines the type 1787 of the feedback message such as ack, nack, trr-int and rtcp-fb-id. 1788 For example to indicate the support of feedback of picture loss 1789 indication, the sender declares the following in SDP 1791 v=0 1792 o=alice 3203093520 3203093520 IN IP4 host.example.com 1793 s=Media with feedback 1794 t=0 0 1795 c=IN IP4 host.example.com 1796 m=audio 49170 RTP/AVPF 98 1797 a=rtpmap:98 H263-1998/90000 1798 a=rtcp-fb:98 nack pli 1800 In this document we define a new feedback value type called "ccm" 1801 which indicates the support of codec control using RTCP feedback 1802 messages. The "ccm" feedback value should be used with parameters, 1803 which indicates the support of which codec commands the session may 1804 use. In this draft we define four parameters, which can be used with 1805 the ccm feedback value type. 1807 o "fir" indicates the support of Full Intra Request 1808 o "tmmbr" indicates the support of Temporal Maximum Media Bit-rate 1809 o "tstr" indicates the support of temporal spatial trade-off 1810 request. 1812 O "vbcm" indicates the support of H.271 video back channel 1813 messages. 1815 In ABNF for rtcp-fb-val defined in [RFC4585], there is a placeholder 1816 called rtcp-fb-id to define new feedback types. The ccm is defined as 1817 a new feedback type in this document and the ABNF for the parameters 1818 for ccm are defined here (please refer section 4.2 of [RFC4585] for 1819 complete ABNF syntax). 1821 Rtcp-fb-param = SP "app" [SP byte-string] 1822 / SP rtcp-fb-ccm-param 1823 / ; empty 1825 rtcp-fb-ccm-param = "ccm" SP ccm-param 1827 ccm-param = "fir" ; Full Intra Request 1828 / "tmmbr" ; Temporary max media bit rate 1829 / "tstr" ; Temporal Spatial Trade Off 1830 / "vbcm" 1*[SP subMessageType] ; H.271 VBCM messages 1831 / token [SP byte-string] 1832 ; for future commands/indications 1833 subMessageType = 1*[integer]; 1834 byte-string = 1836 7.2. Offer-Answer 1838 The Offer/Answer [RFC3264] implications to codec control protocol 1839 feedback messages are similar to as described in [RFC4585]. The 1840 offerer MAY indicate the capability to support selected codec 1841 commands and indications. The answerer MUST remove all ccm 1842 parameters, which it does not understand or does not wish to use in 1843 this particular media session. The answerer MUST NOT add new ccm 1844 parameters in addition to what has been offered. The answer is 1845 binding for the media session and both offerer and answerer MUST only 1846 use feedback messages negotiated in this way. 1848 7.3. Examples 1850 Example 1: The following SDP describes a point-to-point video call 1851 with H.263 with the originator of the call declaring its capability 1852 to support codec control messages - fir, tstr. The SDP is carried in 1853 a high level signalling protocol like SIP 1855 v=0 1856 o=alice 3203093520 3203093520 IN IP4 host.example.com 1857 s=Point-to-Point call 1858 c=IN IP4 172.11.1.124 1859 m=audio 49170 RTP/AVP 0 1860 a=rtpmap:0 PCMU/8000 1861 m=video 51372 RTP/AVPF 98 1862 a=rtpmap:98 H263-1998/90000 1863 a=rtcp-fb:98 ccm tstr 1864 a=rtcp-fb:98 ccm fir 1866 In the above example the sender when it receives a TSTR message from 1867 the remote party can adjust the trade off as indicated in the RTCP 1868 TSTA feedback message. 1870 Example 2: The following SDP describes a SIP end point joining a 1871 video MCU that is hosting a multiparty video conferencing session. 1872 The participant supports only the FIR (Full Intra Request) codec 1873 control command and it declares it in its session description. The 1874 video MCU can send an FIR RTCP feedback message to this end point 1875 when it needs to send this participants video to other participants 1876 of the conference. 1878 v=0 1879 o=alice 3203093520 3203093520 IN IP4 host.example.com 1880 s=Multiparty Video Call 1881 c=IN IP4 172.11.1.124 1882 m=audio 49170 RTP/AVP 0 1883 a=rtpmap:0 PCMU/8000 1884 m=video 51372 RTP/AVPF 98 1885 a=rtpmap:98 H263-1998/90000 1886 a=rtcp-fb:98 ccm fir 1888 When the video MCU decides to route the video of this participant it 1889 sends an RTCP FIR feedback message. Upon receiving this feedback 1890 message the end point is mandated to generate a full intra request. 1892 Example 3: The following example describes the Offer/Answer 1893 implications for the codec control messages. The Offerer wishes to 1894 support "tstr", "fir" and "tmmbr" messages. The offered SDP is 1896 -------------> Offer 1897 v=0 1898 o=alice 3203093520 3203093520 IN IP4 host.example.com 1899 s=Offer/Answer 1900 c=IN IP4 172.11.1.124 1901 m=audio 49170 RTP/AVP 0 1902 a=rtpmap:0 PCMU/8000 1903 m=video 51372 RTP/AVPF 98 1904 a=rtpmap:98 H263-1998/90000 1905 a=rtcp-fb:98 ccm tstr 1906 a=rtcp-fb:98 ccm fir 1907 a=rtcp-fb:98 ccm tmmbr 1909 The answerer only wishes to support FIR and TSTR message as the codec 1910 control messages and the answerer SDP is 1912 <---------------- Answer 1914 v=0 1915 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 1916 s=Offer/Answer 1917 c=IN IP4 189.13.1.37 1918 m=audio 47190 RTP/AVP 0 1919 a=rtpmap:0 PCMU/8000 1920 m=video 53273 RTP/AVPF 98 1921 a=rtpmap:98 H263-1998/90000 1922 a=rtcp-fb:98 ccm tstr 1923 a=rtcp-fb:98 ccm fir 1925 Example 4: The following example describes the Offer/Answer 1926 implications for H.271 Video back channel messages (VBCM). The 1927 Offerer wishes to support VBCM and the submessages of payloadType 2( 1928 A set of blocks of one picture that is entirely or partially lost, 3 1929 (CRC for one parameter set) and 4 (CRC for all parameter sets of a 1930 certain type). 1932 -------------> Offer 1933 v=0 1934 o=alice 3203093520 3203093520 IN IP4 host.example.com 1935 s=Offer/Answer 1936 c=IN IP4 172.11.1.124 1937 m=audio 49170 RTP/AVP 0 1938 a=rtpmap:0 PCMU/8000 1939 m=video 51372 RTP/AVPF 98 1940 a=rtpmap:98 H263-1998/90000 1941 a=rtcp-fb:98 ccm vbcm 2 3 4 1943 The answerer only wishes to support sub-messages 3 and 4 only 1945 <---------------- Answer 1946 v=0 1947 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 1948 s=Offer/Answer 1949 c=IN IP4 189.13.1.37 1950 m=audio 47190 RTP/AVP 0 1951 a=rtpmap:0 PCMU/8000 1952 m=video 53273 RTP/AVPF 98 1953 a=rtpmap:98 H263-1998/90000 1954 a=rtcp-fb:98 ccm vbcm 3 4 1956 So in the above example only VBCM indication comprising of only 1957 "payloadType" 3 and 4 will be supported. 1959 8. IANA Considerations 1961 The new value of ccm for the rtcp-fb attribute needs to be registered 1962 with IANA. 1964 Value name: ccm 1965 Long Name: Codec Control Commands and Indications 1966 Reference: RFC XXXX 1968 For use with "ccm" the following values also needs to be 1969 registered. 1971 Value name: fir 1972 Long name: Full Intra Request Command 1973 Usable with: ccm 1974 Reference: RFC XXXX 1976 Value name: tmmbr 1977 Long name: Temporary Maximum Media Bit-rate 1978 Usable with: ccm 1979 Reference: RFC XXXX 1981 Value name: tstr 1982 Long name: temporal Spatial Trade Off 1983 Usable with: ccm 1984 Reference: RFC XXXX 1986 Value name: vbcm 1987 Long name: H.271 video back channel messages 1988 Usable with: ccm 1989 Reference: RFC XXXX 1991 9. Acknowledgements 1993 The authors would like to thank Andrea Basso, Orit Levin, Nermeen 1994 Ismail for their work on the requirement and discussion draft 1995 [Basso]. 1997 Funding for the RFC Editor function is currently provided by the 1998 Internet Society. 2000 10. References 2002 10.1. Normative references 2004 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 2005 "Extended RTP Profile for Real-Time Transport Control 2006 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 2007 July 2006 2008 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2009 Requirement Levels", BCP 14, RFC 2119, March 1997. 2010 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2011 Jacobson, "RTP: A Transport Protocol for Real-Time 2012 Applications", STD 64, RFC 3550, July 2003. 2013 [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description 2014 Protocol", RFC 2327, April 1998. 2015 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2016 with Session Description Protocol (SDP)", RFC 3264, June 2017 2002. 2018 [Topologies] M. Westerlund, and S. Wenger, "RTP Topologies", draft- 2019 ietf-avt-topologies-00, work in progress, August 2006 2021 10.2. Informative references 2023 [Basso] A. Basso, et. al., "Requirements for transport of video 2024 control commands", draft-basso-avt-videoconreq-02.txt, 2025 expired Internet Draft, October 2004. 2026 [AVC] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T 2027 Recommendation and Final Draft International Standard of 2028 Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 2029 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and 2030 ITU-T VCEG, JVT-G050, March 2003. 2031 [NEWPRED] S. Fukunaga, T. Nakai, and H. Inoue, "Error Resilient 2032 Video Coding by Dynamic Replacing of Reference Pictures," 2033 in Proc. Globcom'96, vol. 3, pp. 1503 - 1508, 1996. 2034 [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2035 Norrman, "The Secure Real-time Transport Protocol 2036 (SRTP)", RFC 3711, March 2004. 2037 [RFC2032] Turletti, T. and C. Huitema, "RTP Payload Format for 2038 H.261 Video Streams", RFC 2032, October 1996. 2039 [SAVPF] J. Ott, E. Carrara, "Extended Secure RTP Profile for 2040 RTCP-based Feedback (RTP/SAVPF)," draft-ietf-avt-profile- 2041 savpf-02.txt, July, 2005. 2042 [RFC3525] Groves, C., Pantaleo, M., Anderson, T., and T. Taylor, 2043 "Gateway Control Protocol Version 1", RFC 3525, June 2044 2003. 2045 [VBCM] ITU-T Rec. H.271, "Video Back Channel Messages", June 2046 2006 2048 11. Authors' Addresses 2050 Stephan Wenger 2051 Nokia Corporation 2052 P.O. Box 100 2053 FIN-33721 Tampere 2054 FINLAND 2056 Phone: +358-50-486-0637 2057 EMail: stewe@stewe.org 2059 Umesh Chandra 2060 Nokia Research Center 2061 975, Page Mill Road, 2062 Palo Alto,CA 94304 2063 USA 2065 Phone: +1-650-796-7502 2066 Email: Umesh.Chandra@nokia.com 2068 Magnus Westerlund 2069 Ericsson Research 2070 Ericsson AB 2071 SE-164 80 Stockholm, SWEDEN 2073 Phone: +46 8 7190000 2074 EMail: magnus.westerlund@ericsson.com 2076 Bo Burman 2077 Ericsson Research 2078 Ericsson AB 2079 SE-164 80 Stockholm, SWEDEN 2081 Phone: +46 8 7190000 2082 EMail: bo.burman@ericsson.com 2084 12. List of Changes relative to previous drafts 2086 The following changes since draft-wenger-avt-avpf-ccm-01 have been 2087 made: 2089 - The topologies have been rewritten and clarified. 2090 - The TMMBR mechanism has been completely revised to use notification 2091 and suppress messages in deployments with large common SSRC spaces. 2093 The following changes since draft-wenger-avt-avpf-ccm-02 have been 2094 made: 2096 - Update of section 4.2.2.1 (TMMBN) as per discussions between 2097 Harikishan Desineni and Magnus Westerlund on the AVT list around 2098 Feb 21, 2006 2099 - Section 2.3.4 clarified as per email exchange between Colin Perkins 2100 and Magnus Westerlund around Feb 24 2101 - Section 3.5.2 and other occurrences throughout the draft, 2102 Temporal/Spatial Acknowledgement renamed to Temporal/Spatial 2103 Annoucement 2105 Changes relative to draft-wenger-avt-avpf-ccm-03 2107 - Moved "topologies" out to another draft 2108 - Editorial improvements 2109 - Added new code point VBCM for H.271 Video back channel messages. 2110 Sections 3,4 and 7 were modified in response to H.271 introduction. 2111 - Removed Basso use case referring to forward Freeze command, added 2112 justification. 2114 Full Copyright Statement 2116 Copyright (C) The Internet Society (2006). 2118 This document is subject to the rights, licenses and restrictions 2119 contained in BCP 78, and except as set forth therein, the authors 2120 retain all their rights. 2122 This document and the information contained herein are provided on an 2123 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2124 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2125 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2126 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2127 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2128 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2130 Intellectual Property Statement 2132 The IETF takes no position regarding the validity or scope of any 2133 Intellectual Property Rights or other rights that might be claimed to 2134 pertain to the implementation or use of the technology described in 2135 this document or the extent to which any license under such rights 2136 might or might not be available; nor does it represent that it has 2137 made any independent effort to identify any such rights. Information 2138 on the procedures with respect to rights in RFC documents can be 2139 found in BCP 78 and BCP 79. 2141 Copies of IPR disclosures made to the IETF Secretariat and any 2142 assurances of licenses to be made available, or the result of an 2143 attempt made to obtain a general license or permission for the use of 2144 such proprietary rights by implementers or users of this 2145 specification can be obtained from the IETF on-line IPR repository at 2146 http://www.ietf.org/ipr. 2148 The IETF invites any interested party to bring to its attention any 2149 copyrights, patents or patent applications, or other proprietary 2150 rights that may cover technology that may be required to implement 2151 this standard. Please address the information to the IETF at 2152 ietf-ipr@ietf.org. 2154 RFC Editor Considerations 2156 The RFC editor is requested to replace all occurrences of XXXX with 2157 the RFC number this document receives.