idnits 2.17.1 draft-ietf-avt-avpf-ccm-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2593. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2604. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2611. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2617. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 3 longer pages, the longest (page 13) being 61 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 14 instances of too long lines in the document, the longest one being 3 characters in excess of 72. == There are 6 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 753 has weird spacing: '...sg type mul...' == Line 930 has weird spacing: '...w value indic...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 5, 2007) is 6256 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2327' is defined on line 2489, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2327 (Obsoleted by RFC 4566) == Outdated reference: A later version (-07) exists of draft-ietf-avt-topologies-00 ** Downref: Normative reference to an Informational draft: draft-ietf-avt-topologies (ref. 'Topologies') -- Obsolete informational reference (is this intentional?): RFC 2032 (Obsoleted by RFC 4587) == Outdated reference: A later version (-12) exists of draft-ietf-avt-profile-savpf-02 -- Obsolete informational reference (is this intentional?): RFC 3525 (Obsoleted by RFC 5125) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 4 errors (**), 0 flaws (~~), 9 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Stephan Wenger 3 INTERNET-DRAFT Umesh Chandra 4 Expires: May 2007 Nokia 5 Magnus Westerlund 6 Bo Burman 7 Ericsson 8 March 5, 2007 10 Codec Control Messages in the 11 RTP Audio-Visual Profile with Feedback (AVPF) 12 draft-ietf-avt-avpf-ccm-04.txt> 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document specifies a few extensions to the messages defined in 44 the Audio-Visual Profile with Feedback (AVPF). They are helpful 45 primarily in conversational multimedia scenarios where centralized 46 multipoint functionalities are in use. However some are also usable 47 in smaller multicast environments and point-to-point calls. The 48 extensions discussed are messages related to the ITU-T H.271 Video 49 Back Channel, Full Intra Request, Temporary Maximum Media Stream Bit- 50 rate and Temporal Spatial Trade-off. 52 TABLE OF CONTENTS 54 1. Introduction....................................................5 55 2. Definitions.....................................................7 56 2.1. Glossary...................................................7 57 2.2. Terminology................................................8 58 2.3. Topologies.................................................9 59 3. Motivation (Informative).......................................10 60 3.1. Use Cases.................................................10 61 3.2. Using the Media Path......................................12 62 3.3. Using AVPF................................................13 63 3.3.1. Reliability..........................................13 64 3.4. Multicast.................................................13 65 3.5. Feedback Messages.........................................13 66 3.5.1. Full Intra Request Command...........................13 67 3.5.1.1. Reliability.....................................14 68 3.5.2. Temporal Spatial Trade-off Request and Announcement..15 69 3.5.2.1. Point-to-point..................................16 70 3.5.2.2. Point-to-Multipoint using Multicast or Translators16 71 3.5.2.3. Point-to-Multipoint using RTP Mixer.............17 72 3.5.2.4. Reliability.....................................17 73 3.5.3. H.271 Video Back Channel Message conforming to ITU-T Rec. 74 H.271.......................................................17 75 3.5.3.1. Reliability.....................................20 76 3.5.4. Temporary Maximum Media Bit-rate Request.............20 77 3.5.4.1. MCU based Multi-point operation.................25 78 3.5.4.2. Point-to-Multipoint using Multicast or Translators27 79 3.5.4.3. Point-to-point operation........................27 80 3.5.4.4. Reliability.....................................28 81 4. RTCP Receiver Report Extensions................................29 82 4.1. Design Principles of the Extension Mechanism..............29 83 4.2. Transport Layer Feedback Messages.........................30 84 4.2.1. Temporary Maximum Media Bit-rate Request (TMMBR).....30 85 4.2.1.1. Semantics.......................................31 86 4.2.1.2. Message Format..................................33 87 4.2.1.3. Timing Rules....................................34 88 4.2.2. Temporary Maximum Media Bit-rate Notification (TMMBN) 35 89 4.2.2.1. Semantics.......................................35 90 4.2.2.2. Message Format..................................36 91 4.2.2.3. Timing Rules....................................36 92 4.3. Payload Specific Feedback Messages........................37 93 4.3.1. Full Intra Request (FIR) command.....................37 94 4.3.1.1. Semantics.......................................37 95 4.3.1.2. Message Format..................................39 96 4.3.1.3. Timing Rules....................................40 97 4.3.1.4. Remarks.........................................40 98 4.3.2. Temporal-Spatial Trade-off Request (TSTR)............41 99 4.3.2.1. Semantics.......................................41 100 4.3.2.2. Message Format..................................41 101 4.3.2.3. Timing Rules....................................42 102 4.3.2.4. Remarks.........................................42 103 4.3.3. Temporal-Spatial Trade-off Announcement (TSTA).......43 104 4.3.3.1. Semantics.......................................43 105 4.3.3.2. Message Format..................................44 106 4.3.3.3. Timing Rules....................................44 107 4.3.3.4. Remarks.........................................45 108 4.3.4. H.271 VideoBackChannelMessage (VBCM).................45 109 5. Congestion Control.............................................48 110 6. Security Considerations........................................48 111 7. SDP Definitions................................................49 112 7.1. Extension of rtcp-fb attribute............................49 113 7.2. Offer-Answer..............................................51 114 7.3. Examples..................................................51 115 8. IANA Considerations............................................54 116 9. Acknowledgements...............................................54 117 10. References....................................................56 118 10.1. Normative references.....................................56 119 10.2. Informative references...................................56 120 11. Authors' Addresses............................................57 121 12. List of Changes relative to previous draftsError! Bookmark not defined. 123 1. 124 Introduction 126 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 127 developed, the main emphasis lay in the efficient support of point- 128 to-point and small multipoint scenarios without centralized 129 multipoint control. However, in practice, many small multipoint 130 conferences operate utilizing devices known as Multipoint Control 131 Units (MCUs). Long standing experience of the conversational video 132 conferencing industry suggests that there is a need for a few 133 additional feedback messages, to efficiently support centralized 134 multipoint conferencing. Some of the messages have applications 135 beyond centralized multipoint, and this is indicated in the 136 description of the message. This is especially true for the message 137 intended to carry ITU-T Rec. H.271 [H.271] bitstrings for Video Back 138 Channel messages. 140 In RTP [RFC3550] terminology, MCUs comprise mixers and translators. 141 Most MCUs also include signaling support. During the development of 142 this memo, it was noticed that there is considerable confusion in the 143 community related to the use of terms such as mixer, translator, and 144 MCU. In response to these concerns, a number of topologies have been 145 identified that are of practical relevance to the industry, but not 146 documented in sufficient detail in RTP. These topologies are 147 documented in [Topologies], and understanding this memo requires 148 previous or parallel study of [Topologies]. 150 Some of the messages defined here are forward only, in that they do 151 not require an explicit notification to the message emitter 152 indicating their reception and/or the message receiver's actions. 153 Other messages require notification, leading to a two way 154 communication model that could suggest to some to be useful for 155 control purposes. It is not the intention of this memo to open up 156 RTCP to a generalized control protocol. All mentioned messages have 157 relatively strict real-time constraints -- in the sense that their 158 value diminishes with increased delay. This makes the use of more 159 traditional control protocol means, such as SIP re-invites [RFC3261], 160 undesirable. Furthermore, all messages are of a very simple format 161 that can be easily processed by an RTP/RTCP sender/receiver. 162 Finally, all messages infer only to the RTP stream they are related 163 to, and not to any other property of a communication system. 165 The Full Intra Request (FIR) requires the receiver of the message 166 (and sender of the stream) to immediately insert a decoder refresh 167 point. In video coding, one commonly used form of a decoder refresh 168 point is an IDR or Intra picture, depending on the video compression 169 technology in use. Other codecs may have other forms of decoder 170 refresh points. In order to fulfill congestion control constraints, 171 sending a decoder refresh point may imply a significant drop in frame 172 rate, as they are commonly much larger than regular predicted 173 content. The use of this message is restricted to cases where no 174 other means of decoder refresh can be employed, e.g. during the join- 175 phase of a new participant in a multipoint conference. It is 176 explicitly disallowed to use the FIR command for error resilience 177 purposes, and instead it is referred to AVPF's [RFC4585] PLI message, 178 which reports lost pictures and has been included in AVPF for 179 precisely that purpose. The message does not require a reception 180 notification, as the presence of a decoder refresh point can be 181 easily derived from the media bit stream. Today, the FIR message 182 appears to be useful primarily with video streams, but in the future 183 it may also prove helpful in conjunction with other media codecs that 184 support prediction across RTP packets. 186 The Temporary Maximum Media Stream Bitrate Request (TMMBR) allows to 187 signal, from media receiver to media sender, the current maximum 188 media stream bit-rate for a given media stream. The maximum media 189 stream bit-rate is defined as a tuple. The first value is the bit- 190 rate available for the packet stream at the layer reported on. The 191 second value is the measured header sizes between the start of the 192 header for the layer reported on and the beginning of the RTP 193 payload. Once, the media sender has received the TMMBR request on 194 the bitrate limitation, it notifies the initiator of the request, and 195 all other session participants, by sending a Temporal Maximum Media 196 Stream Bitrate Notification (TMMBN). The TMMBN contains a list of 197 the current applicable restrictions to help the participants to 198 suppress TMMBR requests that wouldn't result in further restrictions 199 for the sender. One usage scenario can be seen as limiting media 200 senders in multiparty conferencing to the slowest receiver's Maximum 201 Media Stream bitrate reception/handling capability. Such a use is 202 helpful, for example, because the receiver's situation may have 203 changed due to computational load, or because the receiver has just 204 joined the conference, and considers it helpful to inform media 205 sender(s) about its constraints, without waiting for congestion 206 induced bitrate reduction. Another application involves graceful 207 bitrate adaptation in scenarios where the upper limit connection 208 bitrate to a receiver changes, but is known in the interval between 209 these dynamic changes. The TMMBR/TMMBN messages are useful for all 210 media types that are not inherently of constant bit rate. However, 211 TMMBR is not a congestion control mechanism and can't replace the 212 need to implement one. 214 The Video Back Channel Message (VBCM) allows conveying bit streams 215 conforming to ITU-T Rec. H.271 [H.271], from a video receiver to 216 video sender. This ITU-T Recommendation defines codepoints for a 217 number of video-specific feedback messages. Examples include 218 messages to signal: 219 - the corruption of reference pictures or parts thereof, 220 - the corruption of decoder state information, e.g. parameter sets, 221 - the suggestion of using a reference picture other than the one 222 typically used, e.g. to support the NEWPRED algorithm [NEWPRED]. 223 The ITU-T has the authority to add codepoints to H.271 every time a 224 need arises, e.g. with the introduction of new video codecs or new 225 tools into existing video codecs. 227 There exists some overlap between VBCM messages and native messages 228 specified in this memo and in AVPF. Examples include the PLI message 229 of [RFC4585] and the FIR message specified herein. As a general 230 rule, the native messages should be preferred over the sending of 231 VBCM messages when all senders and receivers implement this memo. 232 However, if gateways are in the picture, it may be more advisable to 233 utilize VBCM. Similarly, for feedback message types that exist in 234 H.271 but do not exist in this memo or AVPF, there is no other choice 235 but using VBCM. 237 Video Back Channel Messages according to H.271 do not require a 238 notification on a protocol level, because the appropriate reaction of 239 the video encoder and sender can be derived from the forward video 240 bit stream. 242 Finally, the Temporal-Spatial Trade-off Request (TSTR) enables a 243 video receiver to signal to the video sender its preference for 244 spatial quality or high temporal resolution (frame rate). Typically, 245 the receiver of the video stream generates this signal based on input 246 from its user interface, in reaction to explicit requests of the 247 user. However, some implicit use forms are also known. For example, 248 the trade-offs commonly used for live video and document camera 249 content are different. Obviously, this indication is relevant only 250 with respect to video transmission. The message is acknowledged by a 251 notification message indicating the newly chosen tradeoff, so to 252 allow immediate user feedback. 254 2. 255 Definitions 257 2.1. 258 Glossary 260 AMID - Additive Increase Multiplicative Decrease 261 ASM - Asynchronous Multicast 262 AVPF - The Extended RTP Profile for RTCP-based Feedback 263 FEC - Forward Error Correction 264 FIR - Full Intra Request 265 MCU - Multipoint Control Unit 266 MPEG - Moving Picture Experts Group 267 PtM - Point to Multipoint 268 PtP - Point to Point 269 TMMBN - Temporary Maximum Media Stream Bitrate Notification 270 TMMBR - Temporary Maximum Media Stream Bitrate Request 271 PLI - Picture Loss Indication 272 TSTN - Temporal Spatial Trade-off Notification 273 TSTR - Temporal Spatial Trade-off Request 274 VBCM - Video Back Channel Message indication. 276 2.2. 277 Terminology 279 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 280 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 281 document are to be interpreted as described in RFC 2119 [RFC2119]. 283 Message: 284 Codepoint defined by this specification, of one of the 285 following types: 287 Request: 288 Message that requires Acknowledgement 290 Command: 291 Message that forces the receiver to an action 293 Indication: 294 Message that reports a situation 296 Notification: 297 See Indication. 299 Note that, with the exception of ''Notification'', this 300 terminology is in alignment with ITU-T Rec. H.245. 302 Decoder Refresh Point: 303 A bit string, packetised in one or more RTP packets, which 304 completely resets the decoder to a known state. Typical 305 examples of Decoder Refresh Points are H.261 Intra pictures 306 and H.264 IDR pictures. However, there are also much more 307 complex decoder refresh points, as discussed below. 309 Examples for "hard" decoder refresh points are Intra pictures 310 in H.261, H.263, MPEG 1, MPEG 2, and MPEG-4 part 2, and IDR 311 pictures in H.264. "Gradual" decoder refresh points may also 312 be used; see for example [AVC]. While both "hard" and 313 "gradual" decoder refresh points are acceptable in the scope 314 of this specification, in most cases the user experience will 315 benefit from using a "hard" decoder refresh point. 317 A decoder refresh point also contains all header information 318 above the picture layer (or equivalent, depending on the 319 video compression standard) that is conveyed in-band. In 320 H.264, for example, a decoder refresh point contains 321 parameter set NAL units that generate parameter sets 322 necessary for the decoding of the following slice/data 323 partition NAL units (and that are not conveyed out of band). 325 Decoding: 326 The operation of reconstructing the media stream. 328 Rendering: 329 The operation of presenting (parts of) the reconstructed 330 media stream to the user. 332 Stream thinning: 333 The operation of removing some of the packets from a media 334 stream. Stream thinning, preferably, is media-aware, 335 implying that media packets are removed in the order of their 336 relevance to the reproductive quality. However even when 337 employing media-aware stream thinning, most media streams 338 quickly lose quality when subject to increasing levels of 339 thinning. Media-unaware stream thinning leads to even worse 340 quality degradation. In contrast to transcoding, stream 341 thinning is typically seen as a computationally lightweight 342 operation 344 Media: Often used (sometimes in conjunction with terms like 345 bitrate, stream, sender, ...) to identify the content of the 346 forward RTP packet stream carrying the codec data to which 347 the codec control message applies to. 349 Media Stream: The stream of packets carrying the media (and in some 350 case also repair information such as retransmission or 351 Forward Error Correction (FEC) information). We further 352 include within this specification the RTP packetization and 353 the usage of additional protocol headers on these packets to 354 carry them from sender to receiver. 356 2.3. 357 Topologies 359 Please refer to [Topologies] for an in depth discussion. the 360 topologies referred to throughout this memo are labeled (consistent 361 with [Topologies] as follows: 363 Topo-Point-to-Point . . . . . point-to-point communication 364 Topo-Multicast . . . . . . . multicast communication as in RFC 3550 365 Topo-Translator . . . . . . . translator based as in RFC 3550 366 Topo-Mixer . . . . . . . . . mixer based as in RFC 3550 367 Topo-Video-switch-MCU . . . . video switching MCU, 368 Topo-RTCP-terminating-MCU . . mixer but terminating RTCP 370 3. 371 Motivation (Informative) 373 This section discusses the motivation and usage of the different 374 video and media control messages. The video control messages have 375 been under discussion for a long time, and a requirement draft was 376 drawn up [Basso]. This draft has expired; however we do quote 377 relevant sections of it to provide motivation and requirements. 379 3.1. 380 Use Cases 382 There are a number of possible usages for the proposed feedback 383 messages. Let's begin with looking through the use cases Basso et al. 384 [Basso] proposed. Some of the use cases have been reformulated and 385 commented: 387 1. An RTP video mixer composes multiple encoded video sources into a 388 single encoded video stream. Each time a video source is added, 389 the RTP mixer needs to request a decoder refresh point from the 390 video source, so as to start an uncorrupted prediction chain on 391 the spatial area of the mixed picture occupied by the data from 392 the new video source. 394 2. An RTP video mixer that receives multiple encoded RTP video 395 streams from conference participants, and dynamically selects one 396 of the streams to be included in its output RTP stream. At the 397 time of a bit stream change (determined through means such as 398 voice activation or the user interface), the mixer requests a 399 decoder refresh point from the remote source, in order to avoid 400 using unrelated content as reference data for inter picture 401 prediction. After requesting the decoder refresh point, the video 402 mixer stops the delivery of the current RTP stream and monitors 403 the RTP stream from the new source until it detects data belonging 404 to the decoder refresh point. At that time, the RTP mixer starts 405 forwarding the newly selected stream to the receiver(s). 407 3. An application needs to signal to the remote encoder a request of 408 change of the desired trade-off in temporal/spatial resolution. 409 For example, one user may prefer a higher frame rate and a lower 410 spatial quality, and another user may prefer the opposite. This 411 choice is also highly content dependent. Many current video 412 conferencing systems offer in the user interface a mechanism to 413 make this selection, usually in the form of a slider. The 414 mechanism is helpful in point-to-point, centralized multipoint and 415 non-centralized multipoint uses. 417 4. Use case 4 of the Basso draft applies only to AVPF's PLI [RFC4585] 418 and is not reproduced here. 420 5. Use case 5 of the Basso draft relates to a mechanism known as 421 "freeze picture request". Sending freeze picture requests 422 over a non-reliable forward RTCP channel has been identified as 423 problematic. Therefore, no freeze picture request has been 424 included in this memo, and the use case discussion is not 425 reproduced here. 427 6. A video mixer dynamically selects one of the received video 428 streams to be sent out to participants and tries to provide the 429 highest bit rate possible to all participants, while minimizing 430 stream transrating. One way of achieving this is to setup sessions 431 with endpoints using the maximum bit rate accepted by that 432 endpoint, and by the call admission method used by the mixer. By 433 means of commands that allow reducing the Maximum Media Stream 434 bitrate beyond what has been negotiated during session setup, the 435 mixer can then reduce the maximum bit rate sent by endpoints to 436 the lowest common denominator of all received streams. As the 437 lowest common denominator changes due to endpoints joining, 438 leaving, or network congestion, the mixer can adjust the limits to 439 which endpoints can send their streams to match the new limit. The 440 mixer then would request a new maximum bit rate, which is equal or 441 less than the maximum bit-rate negotiated at session setup, for a 442 specific media stream, and the remote endpoint can respond with 443 the actual bit-rate that it can support. 445 The picture Basso, et al draws up covers most applications we 446 foresee. However we would like to extend the list with two additional 447 use cases: 449 7. The used congestion control algorithms (AMID and TFRC [RFC3448]) 450 probe for more available capacity as long as there is something to 451 send. With congestion control using packet-loss as the indication 452 for congestion, this probing does generally result in reduced 453 media quality (often to a point where the distortion is large 454 enough to make the media unusable), due to packet loss and 455 increased delay. In a number of deployment scenarios, especially 456 cellular ones, the bottleneck link is often the last hop link. 457 That cellular link also commonly has some type of QoS negotiation 458 enabling the cellular device to learn the maximal bit-rate 459 available over this last hop. Thus, indicating the maximum 460 available bit-rate to the transmitting part can be beneficial to 461 prevent it from even trying to exceed the known hard limit that 462 exists. For cellular or other mobile devices the available known 463 bit-rate can also quickly change due to handover to another 464 transmission technology, QoS renegotiation due to congestion, etc. 465 To enable minimal disruption of service quick convergence is 466 necessary, and therefore media path signaling is desirable. 468 8. The use of reference picture selection (RPS) as an error 469 resilience tool has been introduced in 1997 as NEWPRED [NEWPRED], 470 and is now widely deployed. When RPS is in use, simplisticly put, 471 the receiver can send a feedback message to the sender, indicating 472 a reference picture that should be used for future prediction. 473 ([NEWPRED] mentions other forms of feedback as well.) AVPF 474 contains a mechanism for conveying such a message, but did not 475 specify for which codec and according to which syntax the message 476 conforms to. Recently, the ITU-T finalized Rec. H.271 which 477 (among other message types) also includes a feedback message. It 478 is expected that this feedback message will enjoy wide support and 479 fairly quickly. Therefore, a mechanism to convey feedback 480 messages according to H.271 appears to be desirable. 482 3.2. 483 Using the Media Path 485 There are multiple reasons why we use the media path for the codec 486 control messages. 488 First, systems employing MCUs are often separating the control and 489 media processing parts. As these messages are intended or generated 490 by the media part rather than the signaling part of the MCU, having 491 them on the media path avoids interfaces and unnecessary control 492 traffic between signaling and processing. If the MCU is physically 493 decomposite, the use of the media path avoids the need for media 494 control protocol extensions (e.g. in MEGACO [RFC3525]). 496 Secondly, the signaling path quite commonly contains several 497 signaling entities, e.g. SIP-proxies and application servers. 498 Avoiding going through signaling entities avoids delay for several 499 reasons. Proxies have less stringent delay requirements than media 500 processing and due to their complex and more generic nature may 501 result in significant processing delay. The topological locations of 502 the signaling entities are also commonly not optimized for minimal 503 delay, but rather towards other architectural goals. Thus the 504 signaling path can be significantly longer in both geographical and 505 delay sense. 507 3.3. 508 Using AVPF 510 The AVPF feedback message framework [RFC4585] provides a simple way 511 of implementing the new messages. Furthermore, AVPF implements rules 512 controlling the timing of feedback messages so to avoid congestion 513 through network flooding by RTCP traffic. We re-use these rules by 514 referencing AVPF. 516 The signaling setup for AVPF allows each individual type of function 517 to be configured or negotiated on a RTP session basis. 519 3.3.1. 520 Reliability 522 The use of RTCP messages implies that each message transfer is 523 unreliable, unless the lower layer transport provides reliability. 524 The different messages proposed in this specification have different 525 requirements in terms of reliability. However, in all cases, the 526 reaction to an (occasional) loss of a feedback message is specified. 528 3.4. 529 Multicast 531 The codec control messages might be used with multicast. The RTCP 532 timing rules specified in [RFC3550] and [RFC4585] ensure that the 533 messages do not cause overload of the RTCP connection. The use of 534 multicast may result in the reception of messages with inconsistent 535 semantics. The reaction to inconsistencies depends on the message 536 type, and is discussed for each message type separately. 538 3.5. 539 Feedback Messages 541 This section describes the semantics of the different feedback 542 messages and how they apply to the different use cases. 544 3.5.1. 545 Full Intra Request Command 547 A Full Intra Request (FIR) Command, when received by the designated 548 media sender, requires that the media sender sends a Decoder Refresh 549 Point (see 2 550 .2) at the earliest opportunity. The evaluation of such 551 opportunity includes the current encoder coding strategy and the 552 current available network resources. 554 FIR is also known as an ''instantaneous decoder refresh request'' 555 or ''video fast update request''. 557 Using a decoder refresh point implies refraining from using any 558 picture sent prior to that point as a reference for the encoding 559 process of any subsequent picture sent in the stream. For predictive 560 media types that are not video, the analogue applies. For example, 561 if in MPEG-4 systems scene updates are used, the decoder refresh 562 point consists of the full representation of the scene and is not 563 delta-coded relative to previous updates. 565 Decoder Refresh Points, especially Intra or IDR pictures, are in 566 general several times larger in size than predicted pictures. Thus, 567 in scenarios in which the available bit-rate is small, the use of a 568 Decoder Refresh Point implies a delay that is significantly longer 569 than the typical picture duration. 571 Usage in multicast is possible; however aggregation of the commands 572 is recommended. A receiver that receives a request closely (within 2 573 times the longest Round Trip Time (RTT) known) after sending a 574 Decoder Refresh Point should await a second request message to ensure 575 that the media receiver has not been served by the previously 576 delivered Decoder Refresh Point. The reason for delaying 2 times the 577 longest known RTT is to avoid sending unnecessary Decoder Refresh 578 Points. A session participant may have sent its own request while 579 another participant's request was in-flight to them. Suppressing 580 those requests that may have been sent without knowledge about the 581 other request avoids this issue. 583 Full Intra Request is applicable in use-case 1, 2, and 5. 585 3.5.1.1. 586 Reliability 588 The FIR message results in the delivery of a Decoder Refresh Point, 589 unless the message is lost. Decoder Refresh Points are easily 590 identifiable from the bit stream. Therefore, there is no need for 591 protocol-level notification, and a simple command repetition 592 mechanism is sufficient for ensuring the level of reliability 593 required. However, the potential use of repetition does require a 594 mechanism to prevent the recipient from responding to messages 595 already received and responded to. 597 To ensure the best possible reliability, a sender of FIR may repeat 598 the FIR request until a response has been received. The repetition 599 interval is determined by the RTCP timing rules applicable to the 600 session. Upon reception of a complete Decoder Refresh Point or the 601 detection of an attempt to send a Decoder Refresh Point (which got 602 damaged due to a packet loss), the repetition of the FIR must stop. 603 If another FIR is necessary, the request sequence number must be 604 increased. To combat loss of the Decoder Refresh Points sent, the 605 sender that receives repetitions of the FIR 2*RTT after the 606 transmission of the Decoder Refresh Point shall send a new Decoder 607 Refresh Point. Two round trip times allow time for the request to 608 arrive at the media sender and the Decoder Refresh Point to arrive 609 back to the requestor. A FIR sender shall not have more than one FIR 610 request (different request sequence number) outstanding at any time 611 per media sender in the session. 613 An RTP Mixer that receives an FIR from a media receiver is 614 responsible to ensure that a Decoder Refresh Point is delivered to 615 the requesting receiver. It may be necessary for the mixer to 616 generate FIR commands. The two legs (FIR-requesting endpoint to 617 mixer, and mixer to Decoder Refresh Point generating endpoint) are 618 handled independently from each other from a reliability perspective. 620 3.5.2. 621 Temporal Spatial Trade-off Request and Notification 623 The Temporal Spatial Trade-off Request (TSTR) instructs the video 624 encoder to change its trade-off between temporal and spatial 625 resolution. Index values from 0 to 31 indicate monotonically a 626 desire for higher frame rate. That is, a requester asking for an 627 index of 0 prefers a high quality and is willing to accept a low 628 frame rate, whereas a requester asking for 31 wishes a high frame 629 rate, potentially at the cost of low spatial quality. 631 In general the encoder reaction time may be significantly longer than 632 the typical picture duration. See use case 3 for an example. The 633 encoder decides if the request results in a change of the trade off. 634 The Temporal Spatial Trade-Off Notification message (TSTN) has been 635 defined to provide feedback of the trade-off that is used henceforth. 637 Informative note: TSTR and TSTN have been introduced primarily 638 because it is believed that control protocol mechanisms, e.g. a SIP 639 re-invite, are too heavyweight, and too slow to allow for a 640 reasonable user experience. Consider, for example, a user 641 interface where the remote user selects the temporal/spatial trade- 642 off with a slider (as it is common in state-of-the-art video 643 conferencing systems). An immediate feedback to any slider 644 movement is required for a reasonable user experience. A SIP re- 645 invite [RFC3261] would require at least 2 round-trips more 646 (compared to the TSTR/TSTN mechanism) and may involve proxies and 647 other complex mechanisms. Even in a well-designed system, it may 648 take a second or so until finally the new trade-off is selected. 649 Furthermore the use of RTCP solves very efficiently the multicast 650 use case. 652 The use of TSTR and TSTN in multipoint scenarios is a non-trivial 653 subject, and can be solved in many implementation-specific ways. 654 Problems are stemming from the fact that TSTRs will typically arrive 655 unsynchronized, and may request different trade-off values for the 656 same stream and/or endpoint encoder. This memo does not specify a 657 translator, mixer or endpoint's reaction to the reception of a 658 suggested trade-off as conveyed in the TSTR -- we only require the 659 receiver of a TSTR message to reply to it by sending a TSTN, carrying 660 the new trade-off chosen by its own criteria (which may or may not be 661 based on the trade-off conveyed by TSTR). In other words, the trade- 662 off sent in TSTR is a non-binding recommendation; nothing more. 664 With respect to TSTR/TSTN, four scenarios based on the topologies 665 described in [Topologies] need to be distinguished. The scenarios are 666 described in the following sub-clauses. 668 3.5.2.1. 669 Point-to-point 671 In this most trivial case (Topo-Point-to-Point), the media sender 672 typically adjusts its temporal/spatial trade-off based on the 673 requested value in TSTR, and within its capabilities. The TSTN 674 message conveys back the new trade-off value (which may be identical 675 to the old one if, for example, the sender is not capable of 676 adjusting its trade-off). 678 3.5.2.2. 679 Point-to-Multipoint using Multicast or Translators 681 RTCP Multicast is used either with media multicast according to Topo- 682 Multicast, or following RFC 3550's translator model according to 683 Topo-Translator. In these cases, TSTR messages from different 684 receivers may be received unsynchronized, and possibly with different 685 requested trade-offs (because of different user preferences). This 686 memo does not specify how the media sender tunes its trade-off. 687 Possible strategies include selecting the mean, or median, of all 688 trade-off requests received, prioritize certain participants, or 689 continue using the previously selected trade-off (e.g. when the 690 sender is not capable of adjusting it). Again, all TSTR messages 691 need to be acknowledged by TSTN, and the value conveyed back has to 692 reflect the decision made. 694 3.5.2.3. 695 Point-to-Multipoint using RTP Mixer 697 In this scenario (Topo-Mixer) the RTP Mixer receives all TSTR 698 messages, and has the opportunity to act on them based on its own 699 criteria. In most cases, the Mixer should form a ''consensus'' of 700 potentially conflicting TSTR messages arriving from different 701 participants, and initiate its own TSTR message(s) to the media 702 sender(s). The strategy of forming this ''consensus'' is open for 703 the implementation, and can, for example, encompass averaging the 704 participants request values, prioritizing certain participants, or 705 use session default values. If the Mixer changes its trade-off, it 706 needs to request from the media sender(s) the use of the new value, 707 by creating a TSTR of its own. Upon reaching a decision on the used 708 trade-off it includes that value in the acknowledgement. 710 Even if a Mixer or Translator performs transcoding, it is very 711 difficult to deliver media with the requested trade-off, unless the 712 content the Mixer or Translator receives is already close to that 713 trade-off. Only in cases where the original source has substantially 714 higher quality (and bit-rate), it is likely that transcoding can 715 result in the requested trade-off. 717 3.5.2.4. 718 Reliability 720 A request and reception acknowledgement mechanism is specified. The 721 Temporal Spatial Trade-off Notification (TSTN) message informs the 722 request-sender that its request has been received, and what trade-off 723 is used henceforth. This acknowledgment mechanism is desirable for at 724 least the following reasons: 726 o A change in the trade-off cannot be directly identified from the 727 media bit stream, 728 o User feedback cannot be implemented without information of the 729 chosen trade-off value, according to the media sender's 730 constraints, 731 o Repetitive sending of messages requesting an unimplementable trade- 732 off can be avoided. 734 3.5.3. 735 H.271 Video Back Channel Message 737 ITU-T Rec. H.271 defines syntax, semantics, and suggested encoder 738 reaction to a video back channel message. The codepoint defined in 739 this memo is used to transparently convey such a message from media 740 receiver to media sender. In this memo, we refrain from an in-depth 741 discussion of the available codepoints within H.271 and refer to the 742 specification text instead [H.271]. 744 However, we note that some H.271 messages bear similarities with 745 native messages of AVPF and this memo. Furthermore, we note that 746 some H.271 message are known to require caution in multicast 747 environments -- or are plainly not usable in multicast or multipoint 748 scenarios. Table 1 provides a brief, oversimplifying overview of the 749 messages currently defined in H.271, their similar AVPF or CCM 750 messages (the latter as specified in this memo), and an indication of 751 our current knowledge of their multicast safety. 753 H.271 msg type AVPF/CCM msg type multicast-safe 754 --------------------------------------------------------------------- 755 0 (when used for 756 reference picture 757 selection) AVPF RPSI No (positive ACK of pictures) 758 1 AVPF PLI Yes 759 2 AVPF SLI Yes 760 3 N/A Yes (no required sender action) 761 4 N/A Yes (no required sender action) 763 Table 1: H.271 messages and their AVPF/CCM equivalents 765 Note: H.271 message type 0 is not a strict equivalent to 766 AVPF's RPSI; it is an indication of known-as-correct reference 767 picture(s) at the decoder. It does not command an encoder to 768 use a defined reference picture (the form of control 769 information envisioned to be carried in RPSI). However, it is 770 believed and intended that H.271 message type 0 will be used 771 for the same purpose as AVPF's RPSI -- although other use 772 forms are also possible. 774 In response to the opaqueness of the H.271 messages especially with 775 respect to the multicast safety, the following guidelines MUST be 776 followed when an implementation wishes to employ the H.271 video back 777 channel message: 779 1. Implementations utilizing the H.271 feedback message MUST stay in 780 compliance with congestion control principles, as outlined in 781 section 5 782 .. 783 2. An implementation SHOULD utilize the native messages as defined in 784 [RFC4585] and in this memo instead of similar messages defined in 785 [H.271]. Our current understanding of similar messages is 786 documented in Table 1 above. One good reason to divert from the 787 SHOULD statement above would be if it is clearly understood that, 788 for a given application and video compression standard, the 789 aforementioned ''similarity'' is not given, in contrast to what 790 the table indicates. 792 3. It has been observed that some of the H.271 codepoints currently 793 in existence are not multicast-safe. Therefore, the sensible 794 thing to do is not to use the H.271 feedback message type in 795 multicast environments. It MAY be used only when all the issues 796 mentioned later are fully understood by the implementer, and 797 properly taken into account by all endpoints. In all other cases, 798 the H.271 message type MUST NOT be used in conjunction with 799 multicast. 800 4. It has been observed that even in centralized multipoint 801 environments, where the mixer should theoretically be able to 802 resolve issues as documented below, the implementation of such a 803 mixer and cooperative endpoints is a very difficult and tedious 804 task. Therefore, H.271 message MUST NOT be used in centralized 805 multipoint scenarios, unless all the issues mentioned below are 806 fully understood by the implementer, and properly taken into 807 account by both mixer and endpoints. 809 Issues to be taken into account when considering the use of H.271 in 810 multipoint environments: 812 1. Different state on different receivers. In many environments it 813 cannot be guarantied that the decoder state of all media receivers 814 is identical at any given point in time. The most obvious reason 815 for such a possible misalignment of state is a loss that occurs on 816 the link to only one of many media receivers. However, there are 817 other not so obvious reasons, such as recent joins to the 818 multipoint conference (be it by joining the multicast group or 819 through additional mixer output). Different states can lead the 820 media receivers to issue potentially contradicting H.271 messages 821 (or one media receiver issuing an H.271 message that, when 822 observed by the media sender, is not helpful for the other media 823 receivers). A naive reaction of the media sender to these 824 contradicting messages can lead to unpredictable and annoying 825 results. 826 2. Combining messages from different media receivers in a media 827 sender is a non-trivial task. As reasons, we note that these 828 messages may be contradicting each other, and that their transport 829 is unreliable (there may well be other reasons). In case of many 830 H.271 messages (i.e. types 0, 2, 3, and 4), the algorithm for 831 combining must be both aware of the network/protocol environment 832 (i.e. with respect to congestion) and of the media codec employed, 833 as H.271 messages of a given type can have different semantics for 834 different media codecs. 835 3. The suppression of requests may need to go beyond the basic 836 mechanism described in AVPF (which are driven exclusively by 837 timing and transport considerations on the protocol level). For 838 example, a receiver is often required to refrain from (or delay) 839 generating requests, based on information it receives from the 840 media stream. For instance, it makes no sense for a receiver to 841 issue a FIR when a transmission of an Intra/IDR picture is 842 ongoing. 843 4. When using the non-multicast-safe messages (e.g. H.271 type 0 844 positive ACK of received pictures/slices) in larger multicast 845 groups, the media receiver will likely be forced to delay or even 846 omit sending these messages. For the media sender this looks like 847 data has not been properly received (although it was received 848 properly), and a naively implemented media sender reacts to these 849 perceived problems where it shouldn't. 851 3.5.3.1. 852 Reliability 854 H.271 Video Back Channel messages do not require reliable 855 transmission, and the reception of a message can be derived from the 856 forward video bit stream. Therefore, no specific reception 857 acknowledgement is specified. 859 With respect to re-sending rules, clause 3.5.1.1. applies. 861 3.5.4. 862 Temporary Maximum Media Stream Bit-rate Request and Notification 864 A receiver, translator or mixer uses the Temporary Maximum Media 865 Stream Bit-rate Request (TMMBR, "timber") to request a sender to 866 limit the maximum bit-rate for a media stream to, or below, the 867 provided value. The Temporary Maximum Media Stream Bit-rate 868 Notification (TMMBN) advises the media receiver(s) of the changed 869 bitrate it is not going to exceed henceforth. The primary usage for 870 this is a scenario with a MCU or Mixer (use case 6), corresponding to 871 Topo-Translator or Topo-Mixer, but also Topo-Point-to-Point. 873 The temporary limitation on the media stream is expressed as a tuple; 874 one value limiting the bit-rate at the layer for which the overhead 875 is calculated to. A second value provides the per packet header 876 overhead between the layer for which bit-rate is reported and the 877 start of the RTP payload. By having both values the media stream 878 sender can determine the effect of changing the packet rate for the 879 media stream in an environment which contains translators or mixers 880 that affect the amount of per packet overhead. For example a gateway 881 that convert between IPv4 and IPv6 would affect the per packet 882 overhead commonly with 20 bytes. There exist also other mechanisms, 883 like tunnels, that change the amount of headers that are present at a 884 particular bottleneck for which the TMMBR sending entity has 885 knowledge about. The problem with varying overhead is also discussed 886 in [RFC3890]. 888 The above way of measuring allows for one to provide bit-rate and 889 overhead values for different protocol layers, for example on IP 890 level, out part of a tunnel protocol, or the link layer. The level a 891 peer report on, is fully dependent on the level of integration the 892 peer has, as it needs to be able to extract the information from that 893 level. It is expected that peers will be able to report values at 894 least for the IP layer, but in certain implementations link layer may 895 be available to allow for more precise information. 897 The temporary maximum media stream bit-rate messages are generic 898 messages that can be applied to any RTP packet stream. This 899 separates it a bit from the other codec control messages defined in 900 this specification that applies only to specific media types or 901 payload formats. The TMMBR functionality applies to the transport and 902 the requirements it places on the media encoding. 904 The reasoning below assumes that the participants have negotiated a 905 session maximum bit-rate, using a signaling protocol. This value can 906 be global, for example in case of point-to-point, multicast, or 907 translators. It may also be local between the participant and the 908 peer or mixer. In both cases, the bit-rate negotiated in signaling is 909 the one that the participant guarantees to be able to handle (encode 910 and decode). In practice, the connectivity of the participant also 911 bears an influence to the negotiated value -- it does not necessarily 912 make much sense to negotiate a media bit rate that one's network 913 interface does not support. 915 It is also beneficial to have negotiated a maximum packet rate for 916 the session or sender. RFC 3890 provides such a SDP [RFC4566] 917 attribute, however that is not usable in RTP sessions established 918 using offer/answer [RFC3264]. Therefore a max packet rate signaling 919 parameter is specified. 921 An already established temporary limit may be changed at any time 922 (subject to the timing rules of the feedback message sending), and to 923 any values between zero and the session maximum, as negotiated during 924 session establishment signaling. Even if a sender has received a 925 TMMBR message allowing an increase in the bit-rate, all increases 926 must be governed by a congestion control mechanism. TMMBR only 927 indicates known limitations, usually in the local environment, and 928 does not provide any guarantees about the full path. 930 If it is likely that the new value indicated by TMMBR will be valid 931 for the remainder of the session, the TMMBR sender can perform a 932 renegotiation of the session upper limit using the session signaling 933 protocol. 935 3.5.4.1. 936 Behavior for media receivers using TMMBR 938 In multipart scenarios, different receivers likely have different 939 limits for receiving bitrate. Therefore, an algorithm to identify 940 the most restrictive TMMBR requests is specified in section 4 941 ..2.2.1. 942 The general behavior is explaind in this section and the gist of the 943 algorithm to determine the most restrictive values are explained 944 informally in the next section. 946 Immediately after session setup, the bitrate limit is set to the 947 session limit as established by the session setup signaling (or 948 equivalent). The overhead value is set to 0. When the session setup 949 signaling does not specify a limit, then unlimited bitrate is 950 assumed. Note that many codecs specify their own limits, e.g. 951 through H.264's level concept. 953 At any given time, a media receiver can send a TMMBR with a limit 954 that is lower than the current limit. The media receiver use the 955 algorithm outlined in the below Section 3.5.4.2 to determine if its 956 limit is stricter than already existing ones. The media sender upon 957 receiving the TMMBR request will also excersie the algorithm to 958 determine the set of most restrictive limitations and then send a 959 TMMBN containg that set. Once the media sender has sent the TMMBN 960 message, the receivers indicated in that message becomes ''owners'' 961 of the limitations. Most likely, the owner is the original sender of 962 the TMMBR -- for the handling of corner-cases (i.e. concurrent TMMBRs 963 from different receivers, lost TMMBRs and sender side optimisations) 964 please see the formal specification. ''Owners'' and limits are 965 usually known session wide, as both TMMBR and TMMBN are forwarded to 966 all in the session unless a Mixer or Translator separate the session 967 from RTCP handling point of view. 969 Only a ''owner'' is allowed to raise the bitrate limit to a value 970 higher than the session has been notified of, but not higher than the 971 session limit negotiated by the session setup signaling (see above). 972 A ''owner'' does not need to take into account TMMBR messages sent by 973 anyone else (although that may well be a desirable optimization). If 974 a ''owner'' sets a new session limit that is too high for someone 975 else's liking, other media receivers can react to the situation by 976 emmitting their own TMMBR message (and, in the process, become a 977 ''owner''). Limitations belonging to ''owners'' timing out from the 978 session are removed by the media sender who notifies the session 979 about the event by sending a TMMBN. 981 Obviously, when there is only one media receiver, this receiver 982 becomes ''owner'' once it receives the first TMMBN in response to its 983 own TMMBR, and stays ''owner'' for the rest of the session. 984 Therefore, when it is known that there will always be only a single 985 media receiver, the above algorithm is not required. Media receivers 986 that are aware they are the only ones in a session can send TMMBR 987 messages with bitrate limits both higher and lower than the 988 previously notified limit at any time (subject to AVPF's RTCP RR send 989 timing rules). However, it may be difficult for a session 990 participant to determine if it is the only receiver in the session. 991 Due to that any one implementing TMMBR are required to implement this 992 algorithm. 994 3.5.4.2. 995 Algorithm for exstablishing current limitations 997 First it is important to consider the implications of using a tuple 998 for limiting the media sender's behavior. The bit-rate and the 999 overhead value results in a 2-dimensional solution space for possible 1000 media streams. Fortunately the two variables are linked. The bit-rate 1001 available for RTP payloads will be equal to the TMMBR reported bit- 1002 rate minus the packet rate used times the TMMBR reported overhead. 1003 This has the result in a session with two different participants 1004 having set limitations, the used packet rate will determine which of 1005 the two that applies. 1007 Example: 1009 Receiver A: TMMBR_BR = 35 kbps, TMMBR_OH = 40 1010 Receiver B: TMMBR_BR = 40 kbps, TMMBR_OH = 60 1012 For a given packet rate (PR) the bit-rate available for media 1013 payloads in RTP will be: 1015 Max_media_BR_A = TMMBR_BR_A - PR * TMMBR_OH_A * 8 1016 Max_media_BR_B = TMMBR_BR_B - PR * TMMBR_OH_B * 8 1018 For a PR = 20 these calculations will yield a Max_media_BR_A = 28600 1019 bps and Max_media_BR_B = 30400 bps, which shows that receiver A is 1020 the limiting one for this packet rate. However there will be a PR 1021 when the difference in bit-rate restriction will be equal to the 1022 difference in packet overheads. This can be found by setting 1023 Max_media_BR_A equal to Max_media_BR_B and breaking out PR: 1025 TMMBR_BR_A - TMMBR_BR_B 1026 PR = --------------------------- 1027 8*(TMMBR_OH_A - TMMBR_OH_B) 1029 Which, for the numbers above yields 31.25 as the intersection point 1030 between the two limits. The implications of this have to be 1031 considered by application implementors that are going to control 1032 media encoding and its packetization. Because, as exemplified above, 1033 there might be multiple TMMBR limits that applies to the trade-off 1034 between media bit-rate and packet rate. Which limitation that applies 1035 depends on the packet rate considered to be used. 1037 This also has implications for how the TMMBR mechanism needs to work. 1038 First, there is the possibility that multiple TMMBR tuples are 1039 providing limitations on the media sender. Secondly there is a need 1040 for any session participant (meda sender and receivers) to be able to 1041 determine if a given tuple will become a limitation upon the media 1042 sender, or if the set of already given limitations are stricter than 1043 the given values. Otherwise the suppression of TMMBR requests would 1044 not work. 1046 Thus any session participant needs to be able from a given set X of 1047 tuples determine which is the minimal set need to express the 1048 limitations for all packet rates from 0 to highest possible. Where 1049 the highest possible either is application limited and indicated 1050 trough session setup signaling or as a result of the given 1051 limitations when the available bit-rate is fully consumed by headers. 1053 First determine what the highest possible bit-rate given all the 1054 limitations is. If there is provided a session maximum packet rate 1055 (SMAXPR) then this can be used. In addition one needs to calculate 1056 for each tuple in the set what its maximum is by calculating bit-rate 1057 (BR) divided by overhead (OH) per packet converted to bits. 1059 MaxPR = SMAXPR 1060 For i=1 to size(X) { 1061 tmp_pr = X(i).BR / 8*X(i).OH; 1062 If (tmp_pr < MaxPR) then MaxPR = tmp_pr 1063 } 1065 For a zero packet rate the TMMBR signaled bit-rate will be the only 1066 limiting factor, thus the tuple with the smallest available bit-rate 1067 is a limitation at this point of the range and function as a start 1068 value in the algorithm. 1070 Start by finding the element X(l) in X with the lowest bit-rate value 1071 and the highest overhead if there are multiple on the same bit-rate. 1072 The set Y that is the minimal set of tuples that provide restrictions 1073 initially contain only X(l). Then for each other tuple X(i) calculate 1074 if there exist an intersection between the currently selected tuple 1075 X(s) (initially s=l) and which of the tuples within the set that has 1076 this intersection at the lowest packet rate. Having found the lowest 1077 packet rate, compare it with the sessions maximum packet rate. If 1078 lower than that limit this tuple provide a session limit and the 1079 tuple is added to Y. Update the value of s to the found tuple and 1080 repeat search for the tuple that has the intersection at the lowest 1081 packet rate, but still higher than the previous intersection. 1082 Algorithm has finished when it can't find any new tuple with an 1083 intersection at a packet rate lower than the session maximum. 1085 // Find the element with the lowest bit-rate in X 1086 l=0; 1087 for (i=1:size(X)){ 1088 if (X(i).BR <= X(l).BR) & (X(i).OH > X(l).OH) then 1089 l=I; 1090 } 1092 tuple_index = l; // The lowest bit-rate tuple 1093 Y = X(l); // Initilize Y to X(l) 1094 start_pr = 0; // Start from zero bit-rate 1095 do { 1096 current_low = MaxPr; //Reset packet-rate 1097 current_index = tuple_index; // To allow for no intersection 1098 For i=each element in X 1099 pr = (X(i).BR - X(tuple_index).BR) / 1100 (X(i).OH - X(tuple_index).OH) 1101 // Calculate packet rate compared to element i 1102 If (pr < current_low && pr > start_pr) then { 1103 // Update lowest intersection packet rate 1104 current_low = pr; 1105 current_index = i; 1106 } 1107 } 1108 If (current_index != tuple_index) { 1109 // A tuple intersecting below maxpacket rate 1110 Y(size(Y)+1) = X(current_index) // Add to Y 1111 tuple_index = current_index; // Update which to compare with 1112 start_pr = current_low; // Update packet rate to seek from. 1113 } 1114 } while (current_low < MaxPr) 1116 The above algorithm yields the set of applicable restriction Y. 1118 3.5.4.3. 1119 Use of TMMBR in a Mixer based Multi-point operation 1121 Assume a small mixer-based multiparty conference is ongoing, as 1122 depicted in Topo-Mixer of [Topologies]. All participants have 1123 negotiated a common maximum bit-rate that this session can use. The 1124 conference operates over a number of unicast paths between the 1125 participants and the mixer. The congestion situation on each of 1126 these paths can be monitored by the participant in question and by 1127 the mixer, utilizing, for example, RTCP Receiver Reports or the 1128 transport protocol, e.g. DCCP [RFC4340]. However, any given 1129 participant has no knowledge of the congestion situation of the 1130 connections to the other participants. Worse, without mechanisms 1131 similar to the ones discussed in this draft, the mixer (who is aware 1132 of the congestion situation on all connections it manages) has no 1133 standardized means to inform media senders to slow down, short of 1134 forging its own receiver reports (which is undesirable). In 1135 principle, a mixer confronted with such a situation is obliged to 1136 thin or transcode streams intended for connections that detected 1137 congestion. 1139 In practice, media-aware stream thinning is unfortunately a very 1140 difficult and cumbersome operation and adds undesirable delay. If 1141 media-unaware, it leads very quickly to unacceptable reproduced media 1142 quality. Hence, means to slow down senders even in the absence of 1143 congestion on their connections to the mixer are desirable. 1145 To allow the mixer to perform congestion control on the individual 1146 links, without performing transcoding, there is a need for a 1147 mechanism that enables the mixer to request the participant's media 1148 encoders to limit their Maximum Media Stream bit-rate currently used. 1149 The mixer handles the detection of a congestion state between itself 1150 and a participant as follows: 1151 1. Start thinning the media traffic to the supported bit-rate. 1152 2. Use the TMMBR to request the media sender(s) to reduce the media 1153 bit-rate sent by them to the mixer, to a value that is in 1154 compliance with congestion control principles for the slowest 1155 link. Slow refers here to the available 1156 bandwidth/bitrate/capacity and packet rate after congestion 1157 control. 1158 3. As soon as the bit-rate has been reduced by the sending part, the 1159 mixer stops stream thinning implicitly, because there is no need 1160 for it any more as the stream is in compliance with congestion 1161 control. 1163 Above algorithms may suggest to some that there is no need for the 1164 TMMBR - it should be sufficient to solely rely on stream thinning. 1165 As much as this is desirable from a network protocol designer's 1166 viewpoint, it has the disadvantage that it doesn't work very 1167 well - the reproduced media quality quickly becomes unusable. 1169 It appears to be a reasonable compromise to rely on stream thinning 1170 as an immediate reaction tool to combat congestions, and have a quick 1171 control mechanism that instructs the original sender to reduce its 1172 bitrate. 1174 Note also that the standard RTCP receiver report cannot serve for the 1175 purpose mentioned. In an environment with RTP mixers, the RTCP RR is 1176 being sent between the RTP receiver in the endpoint and the RTP 1177 sender in the mixer only - as there is no multicast transmission. 1178 The stream that needs to be bitrate-reduced, however, is the one 1179 between the original sending endpoint and the mixer. This endpoint 1180 doesn't see the aforementioned RTCP RRs, and hence needs to be 1181 explicitly informed about desired bitrate adjustments. 1183 In this topology it is the mixer's responsibility to collect, and 1184 consider jointly, the different bit-rates which the different links 1185 may support, into the bit rate requested. This aggregation may also 1186 take into account that the mixer may contain certain transcoding 1187 capabilities (as discussed in under Topo-Mixer in [Topologies]), 1188 which can be employed for those few of the session participants that 1189 have the lowest available bit-rates. 1191 3.5.4.4. 1192 Use of TMMBR in Point-to-Multipoint using Multicast or 1193 Translators 1195 In these topologies, corresponding to Topo-Multicast or Topo- 1196 Translator RTCP RRs are transmitted globally which allows for the 1197 detection of transmission problems such as congestion, on a medium 1198 timescale. As all media senders are aware of the congestion 1199 situation of all media receivers, the rationale of the use of TMMBR 1200 of section 3 1201 .5.4.3 does not apply. However, even in this case the 1202 congestion control response can be improved when the unicast links 1203 are employing congestion controlled transport protocols (such as TCP 1204 or DCCP). A peer may also report local limitation to the media 1205 sender. 1207 3.5.4.5. 1208 Use of TMMBR in Point-to-point operation 1210 In use case 7 it is possible to use TMMBR to improve the performance 1211 at times of changes in the known upper limit of the bit-rate. In 1212 this use case the signaling protocol has established an upper limit 1213 for the session and media bit-rates. However, at the time of 1214 transport link bit-rate reduction, a receiver could avoid serious 1215 congestion by sending a TMMBR to the sending side. Thus TMMBR is 1216 useful for putting restrictions on the application and thus placing 1217 the congestion control mechanism in the right ballpark. However TMMBR 1218 is usually unable to have continuously quick feedback loop required 1219 for real congestion control. Its semantics is also not a match for 1220 congestion control due to its different purpose. Because of these 1221 reasons TMMBR SHALL NOT be used for congestion control. 1223 3.5.4.6. 1224 Reliability 1226 The reaction of a media sender to the reception of a TMMBR message is 1227 not immediately identifiable through inspection of the media stream. 1228 Therefore, a more explicit mechanism is needed to avoid unnecessary 1229 re-sending of TMMBR messages. Using a statistically based 1230 retransmission scheme would only provide statistical guarantees of 1231 the request being received. It would also not avoid the 1232 retransmission of already received messages. In addition, it does not 1233 allow for easy suppression of other participants requests. For the 1234 reasons mentioned, a mechanism based on explicit notification is 1235 used, as discussed already in section 3.5.4.1. 1237 Upon the reception of a request a media sender sends a notification 1238 containing the current applicable limitation of the bit-rate, and 1239 which session participants that own that limit. In multicast 1240 scenarios, that allows all other participants to suppress any request 1241 they may have, with limitation values less strict than the current 1242 ones. The identity of the owners allows for small message sizes and 1243 media sender states. A media sender only keeps state for the SSRCs of 1244 the current owners of the limitations; all other requests and their 1245 sources are not saved. Only the owners are allowed to remove or 1246 change its limitation. Otherwise, anyone that ever set a limitation 1247 would need to remove it to allow the maximum bit-rate to be raised 1248 beyond that value. 1250 4. 1251 RTCP Receiver Report Extensions 1253 This memo specifies six new feedback messages. The Full Intra Request 1254 (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal-Spatial 1255 Trade-off Notification (TSTN), and Video Back Channel Message (VBCM) 1256 are "Payload Specific Feedback Messages" as defined in Section 6.3 of 1257 AVPF [RFC4585]. The Temporary Maximum Media Stream Bit-rate Request 1258 (TMMBR) and Temporary Maximum Media Stream Bit-rate Notification 1259 (TMMBN) are "Transport Layer Feedback Messages" as defined in Section 1260 6.2 of AVPF. 1262 In the following subsections, the new feedback messages are defined, 1263 following a similar structure as in the AVPF specification's sections 1264 6.2 and 6.3, respectively. 1266 4.1. 1267 Design Principles of the Extension Mechanism 1269 RTCP was originally introduced as a channel to convey presence, 1270 reception quality statistics and hints on the desired media coding. 1271 A limited set of media control mechanisms have been introduced in 1272 early RTP payload formats for video formats, for example in RFC 2032 1273 [RFC2032]. However, this specification, for the first time, suggests 1274 a two-way handshake for some of its messages. There is danger that 1275 this introduction could be misunderstood as the precedence for the 1276 use of RTCP as an RTP session control protocol. In order to prevent 1277 these misunderstandings, this subsection attempts to clarify the 1278 scope of the extensions specified in this memo, and strongly suggests 1279 that future extensions follow the rationale spelled out here, or 1280 compellingly explain why they divert from the rationale. 1282 In this memo, and in AVPF [RFC4585], only such messages have been 1283 included which 1285 a) have comparatively strict real-time constraints, which prevent the 1286 use of mechanisms such as a SIP re-invite in most application 1287 scenarios. The real-time constraints are explained separately for 1288 each message where necessary 1289 b) are multicast-safe in that the reaction to potentially 1290 contradicting feedback messages is specified, as necessary for 1291 each message 1292 c) are directly related to activities of a certain media codec, class 1293 of media codecs (e.g. video codecs), or a given RTP packet stream. 1295 In this memo, a two-way handshake is only introduced for such 1296 messages that 1297 a) require a notification or acknowledgement due to their nature, 1298 which is motivated separately for each message 1299 b) the notification or acknowledgement cannot be easily derived from 1300 the media bit stream. 1302 All messages in AVPF [RFC4585] and in this memo implement their 1303 codepoints in a simple, fixed binary format. The reason behind this 1304 design principle lies in that media receivers do not always implement 1305 higher control protocol functionalities (SDP, XML parsers and such) 1306 in their media path. Therefore, simple binary representations are 1307 used in the feedback messages and not an (otherwise desirable) 1308 flexible format such as, for example, XML. 1310 4.2. 1311 Transport Layer Feedback Messages 1313 Transport Layer FB messages are identified by the value RTPFB (205) 1314 as RTCP packet type (see section 6.1 of RFC 4585 [RFC4585]. 1316 In AVPF, one message of this category had been defined. This memo 1317 specifies two more messages, for a total of three messages of this 1318 type. They are identified by means of the FMT parameter as follows: 1320 0: unassigned 1321 1: Generic NACK (as per AVPF) 1322 2: reserved (see note below) 1323 3: Temporary Maximum Media Stream Bit-rate Request (TMMBR) 1324 4: Temporary Maximum Media Stream Bit-rate Notification (TMMBN) 1325 5-30: unassigned 1326 31: reserved for future expansion of the identifier number space 1328 Note: early drafts of AVPF [RFC4585] reserved FMT=2 for a 1329 codepoint that has later been removed. It has been pointed 1330 out that there may be implementations in the field using this 1331 value for according to the expired draft. As there is 1332 sufficient numbering space available, we mark FMT=2 as 1333 reserved so to avoid possible interoperability problems with 1334 implementations that are standard-incompliant with respect to 1335 RFC 4585 in this very point. 1337 The following subsection defines the formats of the FCI field for 1338 this type of FB message. 1340 4.2.1. 1341 Temporary Maximum Media Stream Bit-rate Request and Notification 1343 The FCI field of a Temporary Maximum Media Stream Bit-Rate Request 1344 (TMMBR) message SHALL contain one or more FCI entries. 1346 4.2.1.1. 1347 Semantics 1349 TMMBR is used to indicate the transport related limitation in the 1350 form of a tuple. The first value is the highest bit-rate per sender 1351 of a media, which the receiver currently supports in this RTP session 1352 observed at a particular protocol layer. The second value is the 1353 measured header overhead in bytes on the packets received for the 1354 stream. Counting from the start of the header on the protocol layer 1355 for which the bit-rate is reported until the RTP payload's start. 1356 The measurement of the overhead is a running averaging that is 1357 updated for each packet received for this particular media source 1358 (SSRC). For each packet received the overhead is calculated (pckt_OH) 1359 and then added to the average overhead (avg_OH) by calculating: 1360 avg_OH = 15/16*avg_OH + 1/16*pckt_OH. 1362 The bit-rate values used in this formats are averaged out over a 1363 reasonable timescale. What reasonable timescales are, depends on the 1364 application. However the goal is be able to ignore any burstiness on 1365 very short timescales, below for example 100 ms, introduced by 1366 scheduling or link layer packetization effects. 1368 The media sender MAY use any combination of packet rate and RTP 1369 payload bit-rate to produce a lower media stream bit-rate, as it may 1370 need to address a congestion situation or other limiting factors. 1371 See section 5 1372 . (congestion control) for more discussion. 1374 The ''SSRC of the packet sender'' field indicates the source of the 1375 request, and the ''SSRC of media source'' is not used and SHALL be 1376 set to 0. The SSRC of media sender in the FCI field denotes the media 1377 sender the message applies to. This is useful in the multicast or 1378 translator topologies where each media sender may be addressed in a 1379 single TMMBR message using multiple FCIs. 1381 A TMMBR FCI MAY be repeated in subsequent TMMBR messages if no 1382 applicable Temporal Maximum Media Stream Bit-Rate Notification 1383 (TMMBN) FCI has been received at the time of transmission of the next 1384 RTCP packet. A TMMBN is applicable if it either indicate the sender 1385 of the TMMBR as an owner, or contains limitations that are stricter 1386 than one sent in the TMMBR message. The bit-rate value of a TMMBR 1387 FCI MAY be changed from a previous TMMBR message and the next, 1388 regardless of the eventual reception of an applicable TMMBN FCI. The 1389 overhead measurement SHALL be updated to the current value of avg_OH. 1391 A TMMBN message SHALL be sent by the media sender at the earliest 1392 possible point in time, as a result of any TMMBR messages received 1393 since the last sending of TMMBN. The TMMBN message indicates the 1394 limits and the owners of those limits at the time of the transmission 1395 of the message. The limits SHALL be set to the set of the stricts 1396 limits of the previous limits and all limits received in TMMBR FCI's 1397 since the last TMMBN was transmitted. 1399 A media receiver considering sending a TMMBR, who is not a ''owner'' 1400 of a limitation, SHOULD request a limitation stricter than their 1401 knowledge of the currently established limits for this media sender, 1402 or suppress their transmission of the TMMBR. The exception to the 1403 above rule is when a receiver either doesn't know the limit or is 1404 certain that their local representation of the set of limitations are 1405 in error. All received requests for limits equally or less strict 1406 compared to the ones currently established MUST BE ignored, with the 1407 exception of them resulting in the transmission of a TMMBN containg 1408 the current set of limitations. A media receiver who is the owner of 1409 a current limitation MAY lower the value further, raise the value or 1410 remove the restriction completely by setting the bit-rate part of the 1411 limit equal to the session bit-rate limit. 1413 A limitation tuple LT can be determined to be stricter or not 1414 compared to the current set of limitations if LT is part of the set Y 1415 produced by the algorithm described in Section 3.5.4.2. 1417 Once a session participant receives the TMMBN in response to its 1418 TMMBR, with its own SSRC, it knows that it "owns" the bitrate 1419 limitation. Only the "owner" of a bitrate limitation can raise it or 1420 reset it to the session limit. 1422 Note that, due to the unreliable nature of transport of TMMBR and 1423 TMMBN, the above rules may lead to the sending of TMMBR messages 1424 disobeying the rules above. Furthermore, in multicast scenarios it 1425 can happen that more than one session participants believes it "owns" 1426 the current bitrate limitation. This is not critical for a number of 1427 reasons: 1428 a) If a TMMBR message is lost in transmission, the media sender does 1429 not learn about the restrictions imposed on it. However, it also 1430 does not send a TMMBN message notifying reception of a request it 1431 has never received. Therefore, no new limit is established, the 1432 media receiver sending a more restrictive TMMBR is not the owner. 1433 Since this media receiver has not seen a notification 1434 corresponding to its request, it is free to re-send it. 1435 b) Similarly, if a TMMBN message gets lost, the media receiver that 1436 has sent the corresponding TMMBR request does not receive the 1437 Notification. In that case, it is also not the "owner" of the 1438 restriction and is free to re-send the request. 1439 c) If multiple competing TMMBR messages are sent by different session 1440 participants, then the resulting TMMBN indicates the most 1441 restrictive limits requested including its owners. 1443 d) If more than one session participant incidently send TMMBR 1444 messages at the same time and with the same limit, the media 1445 sender selects one of them and addresses it as the ''owner''. 1446 Session-wide, the correct limit is thereby established. 1448 It is also important to consider the security risks involved with 1449 faked TMMBRs. See security considerations in Section 6 1450 . 1452 The feedback messages may be used in both multicast and unicast 1453 sessions of any of the specified topologies. 1455 For sessions with a large number of participants using the lowest 1456 common denominator, as required by this mechanism, may not be the 1457 most suitable course of action. Large session may need to consider 1458 other ways to support adapted bit-rate to participants, such as 1459 partitioning the session in different quality tiers, or use some 1460 other method of achieving bit-rate scalability. 1462 If the value set by a TMMBR message is expected to be permanent, the 1463 TMMBR setting party is RECOMMENDED to renegotiate the session 1464 parameters to reflect that using session setup signaling, e.g. a SIP 1465 re-invite. 1467 An SSRC may time out according to the default rules for RTP session 1468 participants, i.e. the media sender has not received any RTCP packet 1469 from the owner for the last five regular reporting intervals. An SSRC 1470 may also leave the session, indicating this through the transmission 1471 of an RTCP BYE packet or an external signaling channel. In all of 1472 these cases the entity is considered to have left the session. In the 1473 case the "owner" leaves the session, the limit SHALL be removed and 1474 the transmission of a TMMBN is scheduled indicating the remaining 1475 limitations. 1477 4.2.1.2. 1478 Message Format 1480 The Feedback Control Information (FCI) consists of one or more TMMBR 1481 FCI entries with the following syntax: 1483 0 1 2 3 1484 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1486 | SSRC | 1487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1488 | MMBR Exp | MMBR Mantissa |Measured Overhead| 1489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1491 Figure 1 - Syntax for the TMMBR message 1492 SSRC: The SSRC value of the media sender that is requested to 1493 obey the new maximum bit-rate). 1494 MMBR Exp (6 bits): The exponential scaling of the mantissa for the 1495 Maximum Media Stream bit-rate value. The value is non 1496 signed integer [0..63]. 1498 MMBR Mantissa (17 bits): The mantissa of the Maximum Media Stream 1499 Bit-rate value as a non-signed integer. 1501 Measured Overhead (9 bits): The measured average packet overhead 1502 value in bytes. The measurement SHALL be done according to 1503 above description in Section 4.2.1.1. 1505 The maximum media stream bit-rate (MMBR) value in bits per second is 1506 calculated from the MMBR exponent (exp) and mantissa in the following 1507 way: 1509 MMBR = mantissa * 2^exp 1511 This allows for 17 bits of resolution in the range 0 to 131072*2^63 1512 (approximately 1.2*10^24). 1514 The length of the FB message is be set to 2+2*N where N is the number 1515 of TMMBR FCI entries. 1517 4.2.1.3. 1518 Timing Rules 1520 The first transmission of the request message MAY use early or 1521 immediate feedback in cases when timeliness is desirable. Any 1522 repetition of a request message SHOULD use regular RTCP mode for its 1523 transmission timing. 1525 4.2.1.4. 1526 Handling in Translator and Mixers 1528 Media Translators and Mixers will need to receive and respond to 1529 TMMBR messages as they are part of the chain that provides a certain 1530 media stream to the receiver. The mixer or translator may act locally 1531 on the TMMBR request and thus generate a TMMBN to indicate that it 1532 has done so. Alternatively it can forward the request in the case of 1533 a media translator, or generate one of itself in the case of the 1534 mixer. In case it generates a TMMBR, it will need to send a TMMBN 1535 back to the original requestor to indicate that it is handling the 1536 request. 1538 4.2.2. 1539 Temporary Maximum Media Stream Bit-rate Notification (TMMBN) 1541 The FCI field of the TMMBN Feedback message may contain zero, one or 1542 more TMMBN FCI entry. 1544 4.2.2.1. 1545 Semantics 1547 This feedback message is used to notify the senders of any TMMBR 1548 message that one or more TMMBR messages have been received or that a 1549 owner has left the session. It indicates to all participants the set 1550 of currently employed limitations and the ''owners'' of those. 1552 The ''SSRC of the packet sender'' field indicates the source of the 1553 notification. The ''SSRC of media source'' is not used and SHALL be 1554 set to 0. 1556 A TMMBN message SHALL be scheduled for transmission after the 1557 reception of a TMMBR message with a FCI identifying this media 1558 sender. Only a single TMMBN SHALL be sent, even if more than one 1559 TMMBR messages are received between the scheduling of the 1560 transmission and the actual transmission of the TMMBN message. The 1561 TMMBN message indicates the limits and their owners at the time of 1562 transmitting the message. The limits included SHALL be the set of 1563 most restrictive values in the previously established set and 1564 received TMMBR messages since the last TMMBN was transmitted. 1566 The reception of a TMMBR message with a transmission limit equally or 1567 less restrictive than the set of current limits SHALL still result in 1568 the transmission of a TMMBN message. However the limits and their 1569 owners are not changed, unless it was from an owner of a limit within 1570 the current set of limitations. This procedure allows session 1571 participants that haven't seen the last TMMBN message to get a 1572 correct view of this media sender's state. 1574 When a media sender determines an ''owner'' of a limitation has left 1575 the session, then that limitation is removed, and the media sender 1576 SHALL send a TMMBN message indicating the remaining limitations. In 1577 case there are no remaining limitations a TMMBN without any FCI SHALL 1578 be sent to indicate this. 1580 In unicast scenarios (i.e. where a single sender talks to a single 1581 receiver), the aforementioned algorithm to determine ownership 1582 degenerates to the media receiver becoming the ''owner'' as soon as 1583 the media receiver has issued the first TMMBR message. 1585 4.2.2.2. 1586 Message Format 1588 The Feedback Control Information (FCI) consists of zero, one or more 1589 TMMBN FCI entries with the following syntax: 1591 0 1 2 3 1592 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1594 | SSRC | 1595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1596 | MMBR Exp | MMBR Mantissa |Measured Overhead| 1597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1599 Figure 2 - Syntax for the TMMBR message 1601 SSRC: The SSRC value of the ''owner'' of this limitation. 1603 MMBR Exp (6 bits): The exponential scaling of the mantissa for the 1604 Maximum Media Stream bit-rate value. The value is non- 1605 signed integer [0..63]. 1607 MMBR Mantissa (17 bits): The mantissa of the Maximum Media Stream 1608 Bit-rate value as non-signed integer. 1610 Measured Overhead (9 bits): The measured average packet overhead 1611 value in bytes represented as non-signed integer. 1613 Thus the FCI contains blocks indicating the applicable limitations as 1614 the owner followed by the applicable maximum media stream bit-rate 1615 and overhead value. 1617 The length of the FB message is be set to 2+2*N where N is the number 1618 of TMMBR FCI entries. 1620 4.2.2.3. 1621 Timing Rules 1623 The acknowledgement SHOULD be sent as soon as allowed by the applied 1624 timing rules for the session. Immediate or early feedback mode SHOULD 1625 be used for these messages. 1627 4.2.2.4. 1628 Handling by Translators and Mixers 1630 As discussed in Section 4.2.1.4 mixer or translators may need to 1631 issue TMMBN messages as response to TMMBR messages handled by the 1632 mixer or translator. 1634 4.3. 1635 Payload Specific Feedback Messages 1637 Payload-Specific FB messages are identified by the value PT=PSFB 1638 (206) as RTCP packet type (see section 6.1 of RFC 4585 [RFC4585]). 1640 AVPF defines three payload-specific FB messages and one application 1641 layer FB message. This memo specifies four additional payload- 1642 specific feedback messages. All are identified by means of the FMT 1643 parameter as follows: 1645 0: unassigned 1646 1: Picture Loss Indication (PLI) 1647 2: Slice Lost Indication (SLI) 1648 3: Reference Picture Selection Indication (RPSI) 1649 4: Full Intra Request Command (FIR) 1650 5: Temporal-Spatial Trade-off Request (TSTR) 1651 6: Temporal-Spatial Trade-off Notification (TSTN) 1652 7: Video Back Channel Message (VBCM) 1653 8-14: unassigned 1654 15: Application layer FB message 1655 16-30: unassigned 1656 31: reserved for future expansion of the number space 1658 The following subsections define the new FCI formats for the payload- 1659 specific FB messages. 1661 4.3.1. 1662 Full Intra Request (FIR) 1664 The FIR message is identified by PT=PSFB and FMT=4. 1666 There MUST be one or more FIR entry contained in the FCI field. 1668 4.3.1.1. 1669 Semantics 1671 Upon reception of FIR, the encoder MUST send a Decoder Refresh Point 1672 (see Section 2 1673 ..2) as soon as possible. 1675 Note: Currently, video appears to be the only useful application 1676 for FIR, as it appears to be the only RTP payloads widely deployed 1677 that relies heavily on media prediction across RTP packet 1678 boundaries. However, use of FIR could also reasonably be 1679 envisioned for other media types that share essential properties 1680 with compressed video, namely cross-frame prediction (whatever a 1681 frame may be for that media type). One possible example may be the 1682 dynamic updates of MPEG-4 scene descriptions. It is suggested that 1683 payload formats for such media types refer to FIR and other message 1684 types defined in this specification and in AVPF, instead of 1685 creating similar mechanisms in the payload specifications. The 1686 payload specifications may have to explain how the payload-specific 1687 terminologies map to the video-centric terminology used herein. 1689 Note: In environments where the sender has no control over the 1690 codec (e.g. when streaming pre-recorded and pre-coded content), the 1691 reaction to this command cannot be specified. One suitable 1692 reaction of a sender would be to skip forward in the video bit 1693 stream to the next decoder refresh point. In other scenarios, it 1694 may be preferable not to react to the command at all, e.g. when 1695 streaming to a large multicast group. Other reactions may also be 1696 possible. When deciding on a strategy, a sender could take into 1697 account factors such as the size of the receiving group, the 1698 ''importance'' of the sender of the FIR message (however 1699 ''importance'' may be defined in this specific application), the 1700 frequency of Decoder Refresh Points in the content, and others. 1701 However a session which predominately handles pre-coded content is 1702 not expected to use FIR at all. 1704 The sender MUST consider congestion control as outlined in section 5 1705 ., 1706 which MAY restrict its ability to send a decoder refresh point 1707 quickly. 1709 Note: The relationship between the Picture Loss Indication and FIR 1710 is as follows. As discussed in section 6.3.1 of AVPF, a Picture 1711 Loss Indication informs the decoder about the loss of a picture and 1712 hence the likeliness of misalignment of the reference pictures in 1713 the encoder and decoder. Such a scenario is normally related to 1714 losses in an ongoing connection. In point-to-point scenarios, and 1715 without the presence of advanced error resilience tools, one 1716 possible option of an encoder consists in sending a Decoder Refresh 1717 Point. However, there are other options. One example is that the 1718 media sender ignores the PLI, because the embedded stream 1719 redundancy is likely to clean up the reproduced picture within a 1720 reasonable amount of time. The FIR, in contrast, leaves a (real- 1721 time) encoder no choice but to send a Decoder Refresh Point. It 1722 disallows the encoder to take into account any considerations such 1723 as the ones mentioned above. 1725 Note: Mandating a maximum delay for completing the sending of a 1726 Decoder Refresh Point would be desirable from an application 1727 viewpoint, but may be problematic from a congestion control point 1728 of view. ''As soon as possible'' as mentioned above appears to be 1729 a reasonable compromise. 1731 FIR SHALL NOT be sent as a reaction to picture losses -- it is 1732 RECOMMENDED to use PLI instead. FIR SHOULD be used only in such 1733 situations where not sending a decoder refresh point would render the 1734 video unusable for the users. 1736 Note: A typical example where sending FIR is appropriate is when, 1737 in a multipoint conference, a new user joins the session and no 1738 regular Decoder Refresh Point interval is established. Another 1739 example would be a video switching MCU that changes streams. Here, 1740 normally, the MCU issues a FIR to the new sender so to force it to 1741 emit a Decoder Refresh Point. The Decoder Refresh Point includes 1742 normally a Freeze Picture Release (defined outside this 1743 specification), which re-starts the rendering process of the 1744 receivers. Both techniques mentioned are commonly used in MCU- 1745 based multipoint conferences. 1747 Other RTP payload specifications such as RFC 2032 [RFC2032] already 1748 define a feedback mechanism for certain codecs. An application 1749 supporting both schemes MUST use the feedback mechanism defined in 1750 this specification when sending feedback. For backward compatibility 1751 reasons, such an application SHOULD also be capable to receive and 1752 react to the feedback scheme defined in the respective RTP payload 1753 format, if this is required by that payload format. 1755 The ''SSRC of the packet sender'' field indicates the source of the 1756 request, and the ''SSRC of media source'' is not used and SHALL be 1757 set to 0. The SSRC of media sender to which the FIR command applies 1758 to is in the FCI. 1760 4.3.1.2. 1761 Message Format 1763 Full Intra Request uses one additional FCI field, the content of 1764 which is depicted in Figure 3 The length of the FB message MUST be 1765 set to 2+2*N, where N is the number of FCI entries. 1767 0 1 2 3 1768 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1769 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1770 | SSRC | 1771 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1772 | Seq. nr | Reserved | 1773 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1775 Figure 3 - Syntax for the FIR message 1776 SSRC: The SSRC value of the media sender which is requested to 1777 send a Decoder Refresh Point. 1779 Seq. nr: Command sequence number. The sequence number space is 1780 unique for each tuple consisting of the SSRC of command 1781 source and the SSRC of the command target. The sequence 1782 number SHALL be increased by 1 modulo 256 for each new 1783 command. A repetition SHALL NOT increase the sequence 1784 number. Initial value is arbitrary. 1786 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1787 reception. 1789 The semantics of this FB message is independent of the RTP payload 1790 type. 1792 4.3.1.3. 1793 Timing Rules 1795 The timing follows the rules outlined in section 3 of [RFC4585]. FIR 1796 commands MAY be used with early or immediate feedback. The FIR 1797 feedback message MAY be repeated. If using immediate feedback mode 1798 the repetition SHOULD wait at least one RTT before being sent. In 1799 early or regular RTCP mode the repetition is sent in the next regular 1800 RTCP packet. 1802 4.3.1.4. 1803 Handling of message in Mixer and Translators 1805 A media translator or a mixer performing media encoding of the 1806 content for which the session participant has issued a FIR is 1807 responsible for acting upon it. A mixer acting upon a FIR SHOULD NOT 1808 forward the message unaltered, instead it SHOULD issue a FIR itself. 1810 4.3.1.5. 1811 Remarks 1813 In conjunction with video codecs, FIR messages typically trigger the 1814 sending of full intra or IDR pictures. Both are several times larger 1815 then predicted (inter) pictures. Their size is independent of the 1816 time they are generated. In most environments, especially when 1817 employing bandwidth-limited links, the use of an intra picture 1818 implies an allowed delay that is a significant multitude of the 1819 typical frame duration. An example: If the sending frame rate is 10 1820 fps, and an intra picture is assumed to be 10 times as big as an 1821 inter picture, then a full second of latency has to be accepted. In 1822 such an environment there is no need for a particularly short delay 1823 in sending the FIR message. Hence waiting for the next possible time 1824 slot allowed by RTCP timing rules as per [RFC4585] may not have an 1825 overly negative impact on the system performance. 1827 4.3.2. 1828 Temporal-Spatial Trade-off Request (TSTR) 1830 The TSTR FB message is identified by PT=PSFB and FMT=5. 1832 There MUST be one or more TSTR entry contained in the FCI field. 1834 4.3.2.1. 1835 Semantics 1837 A decoder can suggest the use of a temporal-spatial trade-off by 1838 sending a TSTR message to an encoder. If the encoder is capable of 1839 adjusting its temporal-spatial trade-off, it SHOULD take into account 1840 the received TSTR message for future coding of pictures. A value of 1841 0 suggests a high spatial quality and a value of 31 suggests a high 1842 frame rate. The values from 0 to 31 indicate monotonically a desire 1843 for higher frame rate. Actual values do not correspond to precise 1844 values of spatial quality or frame rate. 1846 The reaction to the reception of more than one TSTR message by a 1847 media sender from different media receivers is left open to the 1848 implementation. The selected trade-off SHALL be communicated to the 1849 media receivers by the means of the TSTN message. 1851 The ''SSRC of the packet sender'' field indicates the source of the 1852 request, and the ''SSRC of media source'' is not used and SHALL be 1853 set to 0. The SSRC of media sender to which the TSTR applies to is in 1854 the FCI entries. 1856 A TSTR message may contain multiple requests to different media 1857 senders, using multiple FCI entries. 1859 4.3.2.2. 1860 Message Format 1862 The Temporal-Spatial Trade-off Request uses one FCI field, the 1863 content of which is depicted in Figure 4. The length of the FB 1864 message MUST be set to 2+2*N, where N is the number of FCI entries 1865 included. 1867 0 1 2 3 1868 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1870 | SSRC | 1871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1872 | Seq nr. | Reserved | Index | 1873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1875 Figure 4 - Syntax of the TSTR 1877 SSRC: The SSRC of the media sender which is requested to apply 1878 the tradeoff value in Index. 1880 Seq. nr: Request sequence number. The sequence number space is 1881 unique for each tuple consisting of the SSRC of request 1882 source and the SSRC of the request target. The sequence 1883 number SHALL be increased by 1 modulo 256 for each new 1884 command. A repetition SHALL NOT increase the sequence 1885 number. Initial value is arbitrary. 1887 Index: An integer value between 0 and 31 that indicates the 1888 relative trade off that is requested. An index value of 0 1889 index highest possible spatial quality, while 31 indicates 1890 highest possible temporal resolution. 1892 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1893 reception. 1895 4.3.2.3. 1896 Timing Rules 1898 The timing follows the rules outlined in section 3 of [RFC4585]. 1899 This request message is not time critical and SHOULD be sent using 1900 regular RTCP timing. Only if it is known that the user interface 1901 requires a quick feedback, the message MAY be sent with early or 1902 immediate feedback timing. 1904 4.3.2.4. 1905 Handling of message in Mixers and Translators 1907 Mixer or Media translators that encodes content sent to the session 1908 participant issuing the TSTR SHALL consider the request to determine 1909 if it can fulfill it by changing its own encoding parameters. A media 1910 translator unable to fulfill the request MAY forward the request 1911 unaltered towards the media sender. A Mixer encoding for multiple 1912 session participants will need to consider the joint needs before 1913 generating a TSTR for itself towards the media sender. See also 1914 discussion in Section . 1915 3.5.2. 1917 4.3.2.5. 1918 Remarks 1920 The term "spatial quality" does not necessarily refer to the 1921 resolution, measured by the number of pixels the reconstructed video 1922 is using. In fact, in most scenarios the video resolution stays 1923 constant during the lifetime of a session. However, all video 1924 compression standards have means to adjust the spatial quality at a 1925 given resolution, often influenced by the Quantizer Parameter or QP. 1926 A numerically low QP results in a good reconstructed picture quality, 1927 whereas a numerically high QP yields a coarse picture. The typical 1928 reaction of an encoder to this request is to change its rate control 1929 parameters to use a lower frame rate and a numerically lower (on 1930 average) QP, or vice versa. The precise mapping of Index, frame 1931 rate, and QP is intentionally left open here, as it depends on 1932 factors such as compression standard employed, spatial resolution, 1933 content, bit rate, and many more. 1935 4.3.3. 1936 Temporal-Spatial Trade-off Notification (TSTN) 1938 The TSTN message is identified by PT=PSFB and FMT=6. 1940 There SHALL be one or more TSTN contained in the FCI field. 1942 4.3.3.1. 1943 Semantics 1945 This feedback message is used to acknowledge the reception of a TSTR. 1946 A TSTN entry in a TSTN feedback message SHALL be sent for each TSTR 1947 entry targeted to this session participant, i.e. each TSTR received 1948 that in the SSRC field in the entry has the receiving entities SSRC. 1949 A single TSTN message MAY acknowledge multiple requests using 1950 multiple FCI entries. The index value included SHALL be the same in 1951 all FCI's part of the TSTN message. Including a FCI for each 1952 requestor allows each requesting entity to determine that the media 1953 sender targeted have received the request. The Notification SHALL be 1954 sent also for repetitions received. If the request receiver has 1955 received TSTR with several different sequence numbers from a single 1956 requestor it SHALL only respond to the request with the highest 1957 (modulo 256) sequence number. 1959 The TSTN SHALL include the Temporal-Spatial Trade-off index that will 1960 be used as a result of the request. This is not necessarily the same 1961 index as requested, as media sender may need to aggregate requests 1962 from several requesting session participants. It may also have some 1963 other policies or rules that limit the selection. 1965 The ''SSRC of the packet sender'' field indicates the source of the 1966 Notification, and the ''SSRC of media source'' is not used and SHALL 1967 be set to 0. The SSRC of the requesting entity to which the 1968 Notification applies to is in the FCI. 1970 4.3.3.2. 1971 Message Format 1973 The Temporal-Spatial Trade-off Notification uses one additional FCI 1974 field, the content of which is depicted in Figure 5. The length of 1975 the FB message MUST be set to 2+2*N, where N is the number of FCI 1976 entries. 1978 0 1 2 3 1979 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1981 | SSRC | 1982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1983 | Seq nr. | Reserved | Index | 1984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1986 Figure 5 - Syntax of the TSTN 1988 SSRC: The SSRC of the source of the TSTR request which resulted 1989 in this Notification. 1991 Seq. nr: The sequence number value from the TSTN request that is 1992 being acknowledged. 1994 Index: The trade-off value the media sender is using henceforth. 1996 Reserved: All bits SHALL be set to 0 and SHALL be ignored on 1997 reception. 1999 Informative note: The returned trade-off value (Index) may differ 2000 from the requested one, for example in cases where a media encoder 2001 cannot tune its trade-off, or when pre-recorded content is used. 2003 4.3.3.3. 2004 Timing Rules 2006 The timing follows the rules outlined in section 3 of [RFC4585]. 2007 This acknowledgement message is not extremely time critical and 2008 SHOULD be sent using regular RTCP timing. 2010 4.3.3.4. 2011 Handling of message in Mixer and Translators 2013 A Mixer or Translator that act upon a TSTR SHALL also send the 2014 corresponding TSTN. In cases it needs to forward a TSTR itself the 2015 notification message MAY need to be delayed until that request has 2016 been responded to. 2018 4.3.3.5. 2019 Remarks 2021 None 2023 4.3.4. 2024 H.271 Video Back Channel Message (VBCM) 2026 The VBCM is identified by PT=PSFB and FMT=7. 2028 There MUST be one or more VBCM entry contained in the FCI field. 2030 4.3.4.1. 2031 Semantics 2033 The "payload" of VBCM indication carries codec-specific, different 2034 types of feedback information. The type of feedback information can 2035 be classified as a 'status report' (such as receiving bit stream 2036 without errors, or loss of a partial or complete picture or block) or 2037 'update requests' (such as complete refresh of the bit stream). 2039 Note: There are possible overlaps between the VBCM sub- 2040 messages and CCM/AVPF feedback messages, such FIR. Please see 2041 section 3 2042 ..5.3 for further discussions. 2044 The different types of feedback sub-messages carried in the VBCM are 2045 indicated by the ''payloadType'' as defined in [VBCM]. The different 2046 sub-message types as defined in [VBCM] are re-produced below for 2047 convenience. ''payloadType'', in ITU-T Rec. H.271 terminology, 2048 refers to the sub-type of the H.271 message and should not be 2049 confused with an RTP payload type. 2051 Payload Type Message Content 2052 --------------------------------------------------------------------- 2053 0 One or more pictures without detected bitstream error mismatch 2054 1 One or more pictures that are entirely or partially lost 2055 2 A set of blocks of one picture that is entirely or partially 2056 lost 2057 3 CRC for one parameter set 2058 4 CRC for all parameter sets of a certain type 2059 5 A "reset" request indicating that the sender should completely 2060 refresh the video bitstream as if no prior bitstream data had 2061 been received 2062 > 5 Reserved for future use by ITU-T 2064 Table 2: H.271 message types 2066 The bit string or the "payload" of VBCM message is of variable 2067 length and is self-contained and coded in a variable length, binary 2068 format. The media sender necessarily has to be able to parse this 2069 optimized binary format to make use of VBCM messages 2071 Each of the different types of sub-messages (indicated by 2072 payloadType) may have different semantic based on the codec used. 2074 The ''SSRC of the packet sender'' field indicates the source of the 2075 request, and the ''SSRC of media source'' is not used and SHALL be 2076 set to 0. The SSRC of the media sender to which the VBCM message 2077 applies to is in the FCI. 2079 4.3.4.2. 2080 Message Format 2082 The VBCM indication uses one FCI field and the syntax is depicted in 2083 Figure 6. 2085 0 1 2 3 2086 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2088 | SSRC | 2089 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2090 | Seq. nr |0| Payload Type| Length | 2091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2092 | VBCM Octet String.... | Padding | 2093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2095 Figure 6 - Syntax for VBCM Message 2096 SSRC: The SSRC value of the media sender that is requested to 2097 instruct its encoder to react to the VBCM message 2099 Seq. nr: Command sequence number. The sequence number space is unique 2100 for each tuple consisting of the SSRC of command source and 2101 the SSRC of the command target. The sequence number SHALL be 2102 increased by 1 modulo 256 for each new command. A repetition 2103 SHALL NOT increase the sequence number. Initial value is 2104 arbitrary. 2106 0: Must be set to 0 and should not be acted upon receiving. 2108 Payload: The RTP payload type for which the VBCM bit stream must be 2109 interpreted. 2111 Length: The length of the VBCM octet string in octets exclusive any 2112 padding octets 2114 VBCM Octet String: This is the octet string generated by the decoder 2115 carrying a specific feedback sub-message. It is of variable 2116 length. 2118 Padding: Bytes set to 0 to make up a 32 bit boundary. 2120 4.3.4.3. 2121 Timing Rules 2123 The timing follows the rules outlined in section 3 of [RFC4585]. The 2124 different sub-message types may have different properties in regards 2125 to the timing of messages that should be used. If several different 2126 types are included in the same feedback packet then the sub-message 2127 type with the most stringent requirements should be followed. 2129 4.3.4.4. 2130 Handling of message in Mixer or Translator 2132 The handling of VBCM in a mixer or translator are sub-message type 2133 dependent. 2135 4.3.4.5. 2136 Remarks 2138 Please see section 3.5.3 for the applicability of the VBCM message in 2139 relation to messages in both AVPF and this memo with similar 2140 functionality. 2142 Note: There has been some discussion whether the payload type field 2143 in this message is needed. It would be needed if there were 2144 potentially more than one VBCM-capable RTP payload types in the same 2145 session, and that the semantics of a given VBCM message changes from 2146 PT to PT. This appears to be the case. For example, the picture 2147 identification mechanism in messages of H.271 type 0 is fundamentally 2148 different between H.263 and H.264 (although both use the same syntax. 2149 Therefore, the payload field is justified here. It was further 2150 commented that for TSTS and FIR such a need does not exist, because 2151 the semantics of TSTS and FIR are either loosely enough defined, or 2152 generic enough, to apply to all video payloads currently in 2153 existence/envisioned. 2155 5. 2156 Congestion Control 2158 The correct application of the AVPF timing rules prevents the network 2159 from being flooded by feedback messages. Hence, assuming a correct 2160 implementation, the RTCP channel cannot break its bit-rate commitment 2161 and introduce congestion. 2163 The reception of some of the feedback messages modifies the behaviour 2164 of the media senders or, more specifically, the media encoders. All 2165 of these modifications MUST only be performed within the bandwidth 2166 limits the applied congestion control provides. For example, when 2167 reacting to a FIR, the unusually high number of packets that form the 2168 decoder refresh point have to be paced in compliance with the 2169 congestion control algorithm, even if the user experience suffers 2170 from a slowly transmitted decoder refresh point. 2172 A change of the Temporary Maximum Media Stream Bit-rate value can 2173 only mitigate congestion, but not cause congestion as long as 2174 congestion control is also employed. An increase of the value by a 2175 request REQUIRES the media sender to use congestion control when 2176 increasing its transmission rate to that value. A reduction of the 2177 value results in a reduced transmission bit-rate thus reducing the 2178 risk for congestion. 2180 6. 2181 Security Considerations 2183 The defined messages have certain properties that have security 2184 implications. These must be addressed and taken into account by users 2185 of this protocol. 2187 The defined setup signaling mechanism is sensitive to modification 2188 attacks that can result in session creation with sub-optimal 2189 configuration, and, in the worst case, session rejection. To prevent 2190 this type of attack, authentication and integrity protection of the 2191 setup signaling is required. 2193 Spoofed or maliciously created feedback messages of the type defined 2194 in this specification can have the following implications: 2195 a. Severely reduced media bit-rate due to false TMMBR messages 2196 that sets the maximum to a very low value. 2197 b. The assignment of the ownership of a bit-rate limit with a 2198 TMMBN message to the wrong participant. Thus potentially 2199 freezing the mechanism until a correct TMMBN message reached 2200 the participants. 2201 c. Sending TSTR that result in a video quality different from 2202 the user's desire, rendering the session less useful. 2203 d. Frequent FIR commands will potentially reduce the frame-rate 2204 making the video jerky due to the frequent usage of decoder 2205 refresh points. 2207 To prevent these attacks there is a need to apply authentication and 2208 integrity protection of the feedback messages. This can be 2209 accomplished against threats external to the current RTP session 2210 using the RTP profile that combines SRTP [SRTP] and AVPF into SAVPF 2211 [SAVPF]. In the Mixer cases, separate security contexts and filtering 2212 can be applied between the Mixer and the participants thus protecting 2213 other users on the Mixer from a misbehaving participant. 2215 7. 2216 SDP Definitions 2218 Section 4 of [RFC4585] defines new SDP [RFC4566] attributes that are 2219 used for the capability exchange of the AVPF commands and 2220 indications, such as Reference Picture selection, Picture loss 2221 indication etc. The defined SDP attribute is known as rtcp-fb and its 2222 ABNF is described in section 4.2 of [RFC4585]. In this section we 2223 extend the rtcp-fb attribute to include the commands and indications 2224 that are described in this document for codec control protocol. We 2225 also discuss the Offer/Answer implications for the codec control 2226 commands and indications. 2228 7.1. 2229 Extension of rtcp-fb attribute 2231 As described in [RFC4585], the rtcp-fb attribute is defined to 2232 indicate the capability of using RTCP feedback. As defined in AVPF 2233 the rtcp-fb attribute must only be used as a media level attribute 2234 and must not be provided at session level. All the rules described 2235 in [RFC4585] for rtcp-fb attribute relating to payload type and to 2236 multiple rtcp-fb attributes in a session description also apply to 2237 the new feedback messages defined in this memo. 2239 The ABNF for rtcp-fb as defined in [RFC4585] is 2241 Rtcp-fb-syntax = "a=rtcp-fb: " rtcp-fb-pt SP rtcp-fb-val CRLF 2243 Where rtcp-fb-pt is the payload type and rtcp-fb-val defines the type 2244 of the feedback message such as ack, nack, trr-int and rtcp-fb-id. 2245 For example to indicate the support of feedback of picture loss 2246 indication, the sender declares the following in SDP 2248 v=0 2249 o=alice 3203093520 3203093520 IN IP4 host.example.com 2250 s=Media with feedback 2251 t=0 0 2252 c=IN IP4 host.example.com 2253 m=audio 49170 RTP/AVPF 98 2254 a=rtpmap:98 H263-1998/90000 2255 a=rtcp-fb:98 nack pli 2257 In this document we define a new feedback value type called "ccm" 2258 which indicates the support of codec control using RTCP feedback 2259 messages. The "ccm" feedback value should be used with parameters, 2260 which indicates the support of which codec commands the session may 2261 use. In this draft we define four parameters, which can be used with 2262 the ccm feedback value type. 2264 o "fir" indicates the support of Full Intra Request 2265 o "tmmbr" indicates the support of Temporal Maximum Media Stream 2266 Bit-rate. It has an optional sub parameter to indicate the 2267 session maximum packet rate to be used. If not included it 2268 defaults to infinity. 2269 o "tstr" indicates the support of temporal spatial trade-off 2270 request. 2271 O "vbcm" indicates the support of H.271 video back channel 2272 messages. 2274 In ABNF for rtcp-fb-val defined in [RFC4585], there is a placeholder 2275 called rtcp-fb-id to define new feedback types. The ccm is defined as 2276 a new feedback type in this document and the ABNF for the parameters 2277 for ccm are defined here (please refer section 4.2 of [RFC4585] for 2278 complete ABNF syntax). 2280 Rtcp-fb-param = SP "app" [SP byte-string] 2281 / SP rtcp-fb-ccm-param 2282 / ; empty 2284 rtcp-fb-ccm-param = "ccm" SP ccm-param 2286 ccm-param = "fir" ; Full Intra Request 2287 / "tmmbr" [SP "smaxpr=" MaxPacketRateValue] 2288 ; Temporary max media bit rate 2289 / "tstr" ; Temporal Spatial Trade Off 2290 / "vbcm" *(SP subMessageType] ; H.271 VBCM messages 2291 / token [SP byte-string] 2292 ; for future commands/indications 2293 subMessageType = 1*8DIGIT 2294 byte-string = 2295 MaxPacketRateValue = 1*15DIGIT 2297 7.2. 2298 Offer-Answer 2300 The Offer/Answer [RFC3264] implications to codec control protocol 2301 feedback messages are similar those described in [RFC4585]. The 2302 offerer MAY indicate the capability to support selected codec 2303 commands and indications. The answerer MUST remove all ccm parameters 2304 which it does not understand or does not wish to use in this 2305 particular media session. The answerer MUST NOT add new ccm 2306 parameters in addition to what has been offered. The answer is 2307 binding for the media session and both offerer and answerer MUST only 2308 use feedback messages negotiated in this way. 2310 The session maximum packet rate parameter part of the TMMBR 2311 indication is declarative and everyone shall use the highest value 2312 indicated in a response. If not present in a offer is SHALL NOT be 2313 included by the answerer. 2315 7.3. 2316 Examples 2318 Example 1: The following SDP describes a point-to-point video call 2319 with H.263 with the originator of the call declaring its capability 2320 to support codec control messages - fir, tstr. The SDP is carried in 2321 a high level signaling protocol like SIP 2323 v=0 2324 o=alice 3203093520 3203093520 IN IP4 host.example.com 2325 s=Point-to-Point call 2326 c=IN IP4 172.11.1.124 2327 m=audio 49170 RTP/AVP 0 2328 a=rtpmap:0 PCMU/8000 2329 m=video 51372 RTP/AVPF 98 2330 a=rtpmap:98 H263-1998/90000 2331 a=rtcp-fb:98 ccm tstr 2332 a=rtcp-fb:98 ccm fir 2334 In the above example the sender when it receives a TSTR message from 2335 the remote party can adjust the trade off as indicated in the RTCP 2336 TSTN feedback message. 2338 Example 2: The following SDP describes a SIP end point joining a 2339 video Mixer that is hosting a multiparty video conferencing session. 2340 The participant supports only the FIR (Full Intra Request) codec 2341 control command and it declares it in its session description. The 2342 video Mixer can send an FIR RTCP feedback message to this end point 2343 when it needs to send this participants video to other participants 2344 of the conference. 2346 v=0 2347 o=alice 3203093520 3203093520 IN IP4 host.example.com 2348 s=Multiparty Video Call 2349 c=IN IP4 172.11.1.124 2350 m=audio 49170 RTP/AVP 0 2351 a=rtpmap:0 PCMU/8000 2352 m=video 51372 RTP/AVPF 98 2353 a=rtpmap:98 H263-1998/90000 2354 a=rtcp-fb:98 ccm fir 2356 When the video MCU decides to route the video of this participant it 2357 sends an RTCP FIR feedback message. Upon receiving this feedback 2358 message the end point is mandated to generate a full intra request. 2360 Example 3: The following example describes the Offer/Answer 2361 implications for the codec control messages. The Offerer wishes to 2362 support "tstr", "fir" and "tmmbr" messages. The offered SDP is 2364 -------------> Offer 2365 v=0 2366 o=alice 3203093520 3203093520 IN IP4 host.example.com 2367 s=Offer/Answer 2368 c=IN IP4 172.11.1.124 2369 m=audio 49170 RTP/AVP 0 2370 a=rtpmap:0 PCMU/8000 2371 m=video 51372 RTP/AVPF 98 2372 a=rtpmap:98 H263-1998/90000 2373 a=rtcp-fb:98 ccm tstr 2374 a=rtcp-fb:98 ccm fir 2375 a=rtcp-fb:* ccm tmmbr smaxpr=120 2377 The answerer only wishes to support FIR and TSTR message as the codec 2378 control messages and the answerer SDP is 2380 <---------------- Answer 2382 v=0 2383 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 2384 s=Offer/Answer 2385 c=IN IP4 189.13.1.37 2386 m=audio 47190 RTP/AVP 0 2387 a=rtpmap:0 PCMU/8000 2388 m=video 53273 RTP/AVPF 98 2389 a=rtpmap:98 H263-1998/90000 2390 a=rtcp-fb:98 ccm tstr 2391 a=rtcp-fb:98 ccm fir 2393 Example 4: The following example describes the Offer/Answer 2394 implications for H.271 Video back channel messages (VBCM). The 2395 Offerer wishes to support VBCM and the submessages of payloadType 1 2396 (One or more pictures that are entirely or partially lost) and 2 (a 2397 set of blocks of one picture that is entirely or partially lost). 2399 -------------> Offer 2400 v=0 2401 o=alice 3203093520 3203093520 IN IP4 host.example.com 2402 s=Offer/Answer 2403 c=IN IP4 172.11.1.124 2404 m=audio 49170 RTP/AVP 0 2405 a=rtpmap:0 PCMU/8000 2406 m=video 51372 RTP/AVPF 98 2407 a=rtpmap:98 H263-1998/90000 2408 a=rtcp-fb:98 ccm vbcm 1 2 2410 The answerer only wishes to support sub-messages 1 only 2412 <---------------- Answer 2414 v=0 2415 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 2416 s=Offer/Answer 2417 c=IN IP4 189.13.1.37 2418 m=audio 47190 RTP/AVP 0 2419 a=rtpmap:0 PCMU/8000 2420 m=video 53273 RTP/AVPF 98 2421 a=rtpmap:98 H263-1998/90000 2422 a=rtcp-fb:98 ccm vbcm 1 2424 So in the above example only VBCM indication comprising of only 2425 "payloadType" 1 will be supported. 2427 8. 2428 IANA Considerations 2430 The new value of ccm for the rtcp-fb attribute needs to be registered 2431 with IANA. 2433 Value name: ccm 2434 Long Name: Codec Control Commands and Indications 2435 Reference: RFC XXXX 2437 For use with "ccm" the following values also needs to be 2438 registered. 2440 Value name: fir 2441 Long name: Full Intra Request Command 2442 Usable with: ccm 2443 Reference: RFC XXXX 2445 Value name: tmmbr 2446 Long name: Temporary Maximum Media Stream Bit-rate 2447 Usable with: ccm 2448 Reference: RFC XXXX 2450 Value name: tstr 2451 Long name: temporal Spatial Trade Off 2452 Usable with: ccm 2453 Reference: RFC XXXX 2455 Value name: vbcm 2456 Long name: H.271 video back channel messages 2457 Usable with: ccm 2458 Reference: RFC XXXX 2460 9. 2461 Acknowledgements 2463 The authors would like to thank Andrea Basso, Orit Levin, Nermeen 2464 Ismail for their work on the requirement and discussion draft 2465 [Basso]. 2467 Drafts of this memo were reviewed and extensively commented by Roni 2468 Even, Colin Perkins, Randell Jesup, Keith Lantz, Harikishan Desineni, 2469 Guido Franceschini and others. The authors appreciate these reviews. 2471 Funding for the RFC Editor function is currently provided by the 2472 Internet Society. 2474 10. 2475 References 2477 10.1. 2478 Normative references 2480 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 2481 "Extended RTP Profile for Real-Time Transport Control 2482 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 2483 July 2006 2484 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2485 Requirement Levels", BCP 14, RFC 2119, March 1997. 2486 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2487 Jacobson, "RTP: A Transport Protocol for Real-Time 2488 Applications", STD 64, RFC 3550, July 2003. 2489 [RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description 2490 Protocol", RFC 2327, April 1998. 2491 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2492 with Session Description Protocol (SDP)", RFC 3264, June 2493 2002. 2494 [Topologies] M. Westerlund, and S. Wenger, "RTP Topologies", draft- 2495 ietf-avt-topologies-00, work in progress, August 2006 2497 10.2. 2498 Informative references 2500 [Basso] A. Basso, et. al., "Requirements for transport of video 2501 control commands", draft-basso-avt-videoconreq-02.txt, 2502 expired Internet Draft, October 2004. 2503 [AVC] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T 2504 Recommendation and Final Draft International Standard of 2505 Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 2506 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG and 2507 ITU-T VCEG, JVT-G050, March 2003. 2508 [NEWPRED] S. Fukunaga, T. Nakai, and H. Inoue, "Error Resilient 2509 Video Coding by Dynamic Replacing of Reference Pictures," 2510 in Proc. Globcom'96, vol. 3, pp. 1503 - 1508, 1996. 2511 [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2512 Norrman, "The Secure Real-time Transport Protocol 2513 (SRTP)", RFC 3711, March 2004. 2514 [RFC2032] Turletti, T. and C. Huitema, "RTP Payload Format for 2515 H.261 Video Streams", RFC 2032, October 1996. 2516 [SAVPF] J. Ott, E. Carrara, "Extended Secure RTP Profile for 2517 RTCP-based Feedback (RTP/SAVPF)," draft-ietf-avt-profile- 2518 savpf-02.txt, July, 2005. 2519 [RFC3525] Groves, C., Pantaleo, M., Anderson, T., and T. Taylor, 2520 "Gateway Control Protocol Version 1", RFC 3525, June 2521 2003. 2523 [RFC3448] M. Handley, S. Floyd, J. Padhye, J. Widmer, "TCP Friendly 2524 Rate Control (TFRC): Protocol Specification", RFC 3448, 2525 Jan 2003 2526 [VBCM] ITU-T Rec. H.271, "Video Back Channel Messages", June 2527 2006 2528 [RFC3890] Westerlund, M., "A Transport Independent Bandwidth 2529 Modifier for the Session Description Protocol (SDP)", RFC 2530 3890, September 2004. 2531 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 2532 Congestion Control Protocol (DCCP)", RFC 4340, March 2533 2006. 2534 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 2535 Description Protocol", RFC 4566, July 2006. 2536 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 2537 A., Peterson, J., Sparks, R., Handley, M., and E. 2538 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 2539 June 2002. 2541 11. 2542 Authors' Addresses 2544 Stephan Wenger 2545 Nokia Corporation 2546 P.O. Box 100 2547 FIN-33721 Tampere 2548 FINLAND 2550 Phone: +358-50-486-0637 2551 EMail: stewe@stewe.org 2553 Umesh Chandra 2554 Nokia Research Center 2555 975, Page Mill Road, 2556 Palo Alto,CA 94304 2557 USA 2559 Phone: +1-650-796-7502 2560 Email: Umesh.Chandra@nokia.com 2562 Magnus Westerlund 2563 Ericsson Research 2564 Ericsson AB 2565 SE-164 80 Stockholm, SWEDEN 2567 Phone: +46 8 7190000 2568 EMail: magnus.westerlund@ericsson.com 2570 Bo Burman 2571 Ericsson Research 2572 Ericsson AB 2573 SE-164 80 Stockholm, SWEDEN 2575 Phone: +46 8 7190000 2576 EMail: bo.burman@ericsson.com 2578 Full Copyright Statement 2580 Copyright (C) The IETF Trust (2007). 2582 This document is subject to the rights, licenses and restrictions 2583 contained in BCP 78, and except as set forth therein, the authors 2584 retain all their rights. 2586 This document and the information contained herein are provided on an 2587 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2588 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST 2589 AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 2590 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT 2591 THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 2592 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 2593 PURPOSE. 2595 Intellectual Property 2597 The IETF takes no position regarding the validity or scope of any 2598 Intellectual Property Rights or other rights that might be claimed to 2599 pertain to the implementation or use of the technology described in 2600 this document or the extent to which any license under such rights 2601 might or might not be available; nor does it represent that it has 2602 made any independent effort to identify any such rights. Information 2603 on the procedures with respect to rights in RFC documents can be 2604 found in BCP 78 and BCP 79. 2606 Copies of IPR disclosures made to the IETF Secretariat and any 2607 assurances of licenses to be made available, or the result of an 2608 attempt made to obtain a general license or permission for the use of 2609 such proprietary rights by implementers or users of this 2610 specification can be obtained from the IETF on-line IPR repository at 2611 http://www.ietf.org/ipr. 2613 The IETF invites any interested party to bring to its attention any 2614 copyrights, patents or patent applications, or other proprietary 2615 rights that may cover technology that may be required to implement 2616 this standard. Please address the information to the IETF at 2617 ietf-ipr@ietf.org. 2619 Acknowledgement 2621 Funding for the RFC Editor function is provided by the IETF 2622 Administrative Support Activity (IASA). 2624 RFC Editor Considerations 2626 The RFC editor is requested to replace all occurrences of XXXX with 2627 the RFC number this document receives.