idnits 2.17.1 draft-ietf-avt-avpf-ccm-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2962. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2973. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2980. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2986. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Line 756 has weird spacing: '...sg type mul...' == Line 1143 has weird spacing: '... ab c s...' == Line 1145 has weird spacing: '... ba s...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 26, 2007) is 6020 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCxxxx' is mentioned on line 2811, but not defined ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 2434 (Obsoleted by RFC 5226) ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) -- Obsolete informational reference (is this intentional?): RFC 2032 (Obsoleted by RFC 4587) == Outdated reference: A later version (-12) exists of draft-ietf-avt-profile-savpf-11 -- Obsolete informational reference (is this intentional?): RFC 3525 (Obsoleted by RFC 5125) -- Obsolete informational reference (is this intentional?): RFC 3448 (Obsoleted by RFC 5348) == Outdated reference: A later version (-07) exists of draft-ietf-avt-topologies-06 == Outdated reference: A later version (-13) exists of draft-levin-mmusic-xml-media-control-11 Summary: 4 errors (**), 0 flaws (~~), 8 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Stephan Wenger 3 INTERNET-DRAFT Umesh Chandra 4 Expires: April 2008 Nokia 5 Intended Status: Proposed Standard Magnus Westerlund 6 Bo Burman 7 Ericsson 8 October 26, 2007 10 Codec Control Messages in the 11 RTP Audio-Visual Profile with Feedback (AVPF) 12 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six 27 months and may be updated, replaced, or obsoleted by other documents 28 at any time. It is inappropriate to use Internet-Drafts as 29 reference material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2007). 41 Abstract 43 This document specifies a few extensions to the messages defined in 44 the Audio-Visual Profile with Feedback (AVPF). They are helpful 45 primarily in conversational multimedia scenarios where centralized 46 multipoint functionalities are in use. However, some are also 47 usable in smaller multicast environments and point-to-point calls. 49 The extensions discussed are messages related to the ITU-T H.271 50 Video Back Channel, Full Intra Request, Temporary Maximum Media 51 Stream Bit Rate and Temporal Spatial Trade-off. 53 TABLE OF CONTENTS 55 1. Introduction..................................................5 56 2. Definitions...................................................6 57 2.1. Glossary...................................................6 58 2.2. Terminology................................................6 59 2.3. Topologies.................................................9 60 3. Motivation...................................................10 61 3.1. Use Cases.................................................10 62 3.2. Using the Media Path......................................12 63 3.3. Using AVPF................................................13 64 3.3.1. Reliability..........................................13 65 3.4. Multicast.................................................13 66 3.5. Feedback Messages.........................................13 67 3.5.1. Full Intra Request Command...........................13 68 3.5.1.1. Reliability.....................................14 69 3.5.2. Temporal Spatial Trade-off Request and Notification..15 70 3.5.2.1. Point-to-Point..................................16 71 3.5.2.2. Point-to-Multipoint Using Multicast or Translators16 72 3.5.2.3. Point-to-Multipoint Using RTP Mixer.............17 73 3.5.2.4. Reliability.....................................17 74 3.5.3. H.271 Video Back Channel Message.....................18 75 3.5.3.1. Reliability.....................................20 76 3.5.4. Temporary Maximum Media Stream Bit Rate Request and 77 Notification................................................20 78 3.5.4.1. Behavior for media receivers using TMMBR........23 79 3.5.4.2. Algorithm for establishing current limitations..24 80 3.5.4.3. Use of TMMBR in a Mixer Based Multipoint Operation31 81 3.5.4.4. Use of TMMBR in Point-to-Multipoint Using 82 Multicast or Translators........................32 83 3.5.4.5. Use of TMMBR in Point-to-point operation........32 84 3.5.4.6. Reliability.....................................33 85 4. RTCP Receiver Report Extensions..............................34 86 4.1. Design Principles of the Extension Mechanism..............34 87 4.2. Transport Layer Feedback Messages.........................35 88 4.2.1. Temporary Maximum Media Stream Bit Rate Request (TMMBR)36 89 4.2.1.1. Message Format..................................36 90 4.2.1.2. Semantics.......................................37 91 4.2.1.3. Timing Rules....................................41 92 4.2.1.4. Handling in Translator and Mixers...............41 93 4.2.2. Temporary Maximum Media Stream Bit Rate Notification 94 (TMMBN)..............................................41 95 4.2.2.1. Message Format..................................41 96 4.2.2.2. Semantics.......................................42 97 4.2.2.3. Timing Rules....................................43 98 4.2.2.4. Handling by Translators and Mixers..............43 99 4.3. Payload Specific Feedback Messages........................43 100 4.3.1. Full Intra Request (FIR).............................44 101 4.3.1.1. Message Format..................................44 102 4.3.1.2. Semantics.......................................45 103 4.3.1.3. Timing Rules....................................46 104 4.3.1.4. Handling of FIR Message in Mixer and Translators 46 105 4.3.1.5. Remarks.........................................46 106 4.3.2. Temporal-Spatial Trade-off Request (TSTR)............48 107 4.3.2.1. Message Format..................................48 108 4.3.2.2. Semantics.......................................49 109 4.3.2.3. Timing Rules....................................49 110 4.3.2.4. Handling of message in Mixers and Translators...50 111 4.3.2.5. Remarks.........................................50 112 4.3.3. Temporal-Spatial Trade-off Notification (TSTN).......50 113 4.3.3.1. Message Format..................................50 114 4.3.3.2. Semantics.......................................51 115 4.3.3.3. Timing Rules....................................52 116 4.3.3.4. Handling of TSTN in Mixer and Translators.......52 117 4.3.3.5. Remarks.........................................52 118 4.3.4. H.271 Video Back Channel Message (VBCM)..............52 119 4.3.4.1. Message Format..................................52 120 4.3.4.2. Semantics.......................................53 121 4.3.4.3. Timing Rules....................................55 122 4.3.4.4. Handling of message in Mixer or Translator......55 123 4.3.4.5. Remarks.........................................55 124 5. Congestion Control...........................................55 125 6. Security Considerations......................................56 126 7. SDP Definitions..............................................57 127 7.1. Extension of the rtcp-fb Attribute........................57 128 7.2. Offer-Answer..............................................59 129 7.3. Examples..................................................59 130 8. IANA Considerations..........................................63 131 9. Contributors.................................................64 132 10. Acknowledgements.............................................64 133 11. References...................................................65 134 11.1. Normative references.....................................65 135 11.2. Informative references...................................65 136 12. Authors' Addresses...........................................67 137 1. Introduction 139 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 140 developed, the main emphasis lay in the efficient support of point- 141 to-point and small multipoint scenarios without centralized 142 multipoint control. However, in practice, many small multipoint 143 conferences operate utilizing devices known as Multipoint Control 144 Units (MCUs). Long-standing experience of the conversational video 145 conferencing industry suggests that there is a need for a few 146 additional feedback messages, to support centralized multipoint 147 conferencing efficiently. Some of the messages have applications 148 beyond centralized multipoint, and this is indicated in the 149 description of the message. This is especially true for the message 150 intended to carry ITU-T Rec. H.271 [H.271] bit strings for Video 151 Back Channel messages. 153 In Real-time Transport Protocol (RTP) [RFC3550] terminology, MCUs 154 comprise mixers and translators. Most MCUs also include signaling 155 support. During the development of this memo, it was noticed that 156 there is considerable confusion in the community related to the use 157 of terms such as mixer, translator, and MCU. In response to these 158 concerns, a number of topologies have been identified that are of 159 practical relevance to the industry, but are not documented in 160 sufficient detail in [RFC3550]. These topologies are documented in 161 [Topologies], and understanding this memo requires previous or 162 parallel study of [Topologies]. 164 Some of the messages defined here are forward only, in that they do 165 not require an explicit notification to the message emitter that 166 they have been received and/or indicating the message receiver's 167 actions. Other messages require a response, leading to a two way 168 communication model that one could view as useful for control 169 purposes. However, it is not the intention of this memo to open up 170 RTP Control Protocol (RTCP) to a generalized control protocol. All 171 mentioned messages have relatively strict real-time constraints, in 172 the sense that their value diminishes with increased delay. This 173 makes the use of more traditional control protocol means, such as 174 Session Initiation Protocol (SIP) [RFC3261], undesirable when used 175 for the same purpose. That is why this solution is recommended 176 instead of "XML Schema for Media Control" [XML-MC], which uses SIP 177 Info to transfer XML messages with similar semantics to what are 178 defined in this memo. Furthermore, all messages are of a very 179 simple format that can be easily processed by an RTP/RTCP 180 sender/receiver. Finally, and most importantly, all messages relate 181 only to the RTP stream with which they are associated, and not to 182 any other property of a communication system. In particular, none 183 of them relate to the properties of the access links traversed by 184 the session. 186 2. Definitions 188 2.1. Glossary 190 AIMD - Additive Increase Multiplicative Decrease 191 AVPF - The extended RTP profile for RTCP-based feedback 192 FEC - Forward Error Correction 193 FCI - Feedback Control Information [RFC4585] 194 FIR - Full Intra Request 195 MCU - Multipoint Control Unit 196 MPEG - Moving Picture Experts Group 197 TMMBN - Temporary Maximum Media Stream Bit Rate Notification 198 TMMBR - Temporary Maximum Media Stream Bit Rate Request 199 PLI - Picture Loss Indication 200 PR - Packet rate 201 QP - Quantizer Parameter 202 RTT - Round trip time 203 SSRC - Synchronization Source 204 TSTN - Temporal Spatial Trade-off Notification 205 TSTR - Temporal Spatial Trade-off Request 206 VBCM - Video Back Channel Message indication. 208 2.2. Terminology 210 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 211 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 212 this document are to be interpreted as described in RFC 2119 213 [RFC2119]. 215 Message: 216 An RTCP feedback message [RFC4585] defined by this 217 specification, of one of the following types: 219 Request: 220 Message that requires acknowledgement 222 Command: 223 Message that forces the receiver to an action 225 Indication: 226 Message that reports a situation 228 Notification: 229 Message that provides a notification that an event has 230 occurred. Notifications are commonly generated in 231 response to a Request. 233 Note that, with the exception of "Notification", this 234 terminology is in alignment with ITU-T Rec. H.245 [H245]. 236 Decoder Refresh Point: 237 A bit string, packetized in one or more RTP packets, which 238 completely resets the decoder to a known state. 240 Examples for "hard" decoder refresh points are Intra pictures 241 in H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 part 2, and 242 Instantaneous Decoder Refresh (IDR) pictures in H.264. 243 "Gradual" decoder refresh points may also be used; see for 244 example [AVC]. While both "hard" and "gradual" decoder 245 refresh points are acceptable in the scope of this 246 specification, in most cases the user experience will benefit 247 from using a "hard" decoder refresh point. 249 A decoder refresh point also contains all header information 250 above the picture layer (or equivalent, depending on the 251 video compression standard) that is conveyed in-band. In 252 H.264, for example, a decoder refresh point contains 253 parameter set Network Adaptation Layer (NAL) units that 254 generate parameter sets necessary for the decoding of the 255 following slice/data partition NAL units (and that are not 256 conveyed out of band). 258 Decoding: 259 The operation of reconstructing the media stream. 261 Rendering: 262 The operation of presenting (parts of) the reconstructed 263 media stream to the user. 265 Stream thinning: 266 The operation of removing some of the packets from a media 267 stream. Stream thinning, preferably, is media-aware, 268 implying that media packets are removed in the order of 269 increasing relevance to the reproductive quality. However, 270 even when employing media-aware stream thinning, most media 271 streams quickly lose quality when subjected to increasing 272 levels of thinning. Media-unaware stream thinning leads to 273 even worse quality degradation. In contrast to transcoding, 274 stream thinning is typically seen as a computationally 275 lightweight operation. 277 Media: 278 Often used (sometimes in conjunction with terms like bit 279 rate, stream, sender ...) to identify the content of the 280 forward RTP packet stream (carrying the codec data), to which 281 the codec control message applies. 283 Media Stream: 284 The stream of RTP packets labeled with a single 285 Synchronization Source (SSRC) carrying the media (and also in 286 some cases repair information such as retransmission or 287 Forward Error Correction (FEC) information). 289 Total media bit rate: 290 The total bits per second transferred in a media stream, 291 measured at an observer-selected protocol layer and averaged 292 over a reasonable timescale, the length of which depends on 293 the application. In general, a media sender and a media 294 receiver will observe different total media bit rates for the 295 same stream, first because they may have selected different 296 reference protocol layers, and second, because of changes in 297 per-packet overhead along the transmission path. The goal 298 with bit rate averaging is to be able to ignore any 299 burstiness on very short timescales, below for example 100 300 ms, introduced by scheduling or link layer packetization 301 effects. 303 Maximum total media bit rate: 304 The upper limit on total media bit rate for a given media 305 stream at a particular receiver and for its selected protocol 306 layer. Note that this value cannot be measured on the 307 received media stream, instead it needs to be calculated or 308 determined through other means, such as QoS negotiations or 309 local resource limitations. Also note that this value is an 310 average (on a timescale that is reasonable for the 311 application) and that it may be different from the 312 instantaneous bit-rate seen by packets in the media stream. 314 Overhead: 315 All protocol header information required to convey a packet 316 with media data from sender to receiver, from the application 317 layer down to a pre-defined protocol level (for example down 318 to, and including, the IP header). Overhead may include, for 319 example, IP, UDP, and RTP headers, any layer 2 headers, any 320 Contributing Sources (CSRCs), RTP-Padding, and RTP header 321 extensions. Overhead excludes any RTP payload headers and 322 the payload itself. 324 Net media bit rate: 325 The bit rate carried by a media stream, net of overhead. 326 That is, the bits per second accounted for by encoded media, 327 any applicable payload headers, and any directly associated 328 meta payload information placed in the RTP packet. A typical 329 example of the latter is redundancy data provided by the use 330 of RFC 2198 [RFC2198]. Note that, unlike the total media bit 331 rate, the net media bit rate will have the same value at the 332 media sender and at the media receiver unless any mixing or 333 translating of the media has occurred. 335 For a given observer, the total media bit rate for a media 336 stream is equal to the sum of the net media bit rate and the 337 per-packet overhead as defined above multiplied by the packet 338 rate. 340 Feasible region: 341 The set of all combinations of packet rate and net media bit 342 rate that do not exceed the restrictions in maximum media bit 343 rate placed on a given media sender by the Temporary Maximum 344 Media Stream Bit-rate Request (TMMBR) messages it has 345 received. The feasible region will change as new TMMBR 346 messages are received. 348 Bounding set: 349 The set of TMMBR tuples, selected from all those received at 350 a given media sender, that define the feasible region for 351 that media sender. The media sender uses an algorithm such 352 as that in section 3.5.4.2 to determine or iteratively 353 approximate the current bounding set, and reports that set 354 back to the media receivers in a Temporary Maximum Media 355 Stream Bit-rate Notification (TMMBN) message. 357 2.3. Topologies 359 Please refer to [Topologies] for an in depth discussion. The 360 topologies referred to throughout this memo are labeled 361 (consistently with [Topologies]) as follows: 363 Topo-Point-to-Point . . . . . Point-to-point communication 364 Topo-Multicast . . . . . . . Multicast communication 365 Topo-Translator . . . . . . . Translator based 366 Topo-Mixer . . . . . . . . . Mixer based 367 Topo-RTP-switch-MCU . . . . RTP stream switching MCU, 368 Topo-RTCP-terminating-MCU . . Mixer but terminating RTCP 370 3. Motivation 372 This section discusses the motivation and usage of the different 373 video and media control messages. The video control messages have 374 been under discussion for a long time, and a requirement draft was 375 drawn up [Basso]. This draft has expired; however we quote relevant 376 sections of it to provide motivation and requirements. 378 3.1. Use Cases 380 There are a number of possible usages for the proposed feedback 381 messages. Let us begin by looking through the use cases Basso et 382 al. [Basso] proposed. Some of the use cases have been reformulated 383 and comments have been added. 385 1. An RTP video mixer composes multiple encoded video sources into a 386 single encoded video stream. Each time a video source is added, 387 the RTP mixer needs to request a decoder refresh point from the 388 video source, so as to start an uncorrupted prediction chain on 389 the spatial area of the mixed picture occupied by the data from 390 the new video source. 392 2. An RTP video mixer receives multiple encoded RTP video streams 393 from conference participants, and dynamically selects one of the 394 streams to be included in its output RTP stream. At the time of 395 a bit stream change (determined through means such as voice 396 activation or the user interface), the mixer requests a decoder 397 refresh point from the remote source, in order to avoid using 398 unrelated content as reference data for inter picture prediction. 399 After requesting the decoder refresh point, the video mixer stops 400 the delivery of the current RTP stream and monitors the RTP 401 stream from the new source until it detects data belonging to the 402 decoder refresh point. At that time, the RTP mixer starts 403 forwarding the newly selected stream to the receiver(s). 405 3. An application needs to signal to the remote encoder that the 406 desired trade-off between temporal and spatial resolution has 407 changed. For example, one user may prefer a higher frame rate 408 and a lower spatial quality, and another user may prefer the 409 opposite. This choice is also highly content dependent. Many 410 current video conferencing systems offer in the user interface a 411 mechanism to make this selection, usually in the form of a 412 slider. The mechanism is helpful in point-to-point, centralized 413 multipoint and non-centralized multipoint uses. 415 4. Use case 4 of the Basso draft applies only to Picture Loss 416 Indication (PLI) as defined in AVPF [RFC4585] and is not 417 reproduced here. 419 5. Use case 5 of the Basso draft relates to a mechanism known as 420 "freeze picture request". Sending freeze picture requests 421 over a non-reliable forward RTCP channel has been identified as 422 problematic. Therefore, no freeze picture request has been 423 included in this memo, and the use case discussion is not 424 reproduced here. 426 6. A video mixer dynamically selects one of the received video 427 streams to be sent out to participants and tries to provide the 428 highest bit rate possible to all participants, while minimizing 429 stream trans-rating. One way of achieving this is to set up 430 sessions with endpoints using the maximum bit rate accepted by 431 each endpoint, and accepted by the call admission method used by 432 the mixer. By means of commands that reduce the maximum media 433 stream bit rate below what has been negotiated during session set 434 up, the mixer can reduce the maximum bit rate sent by endpoints 435 to the lowest of all the accepted bit rates. As the lowest 436 accepted bit rate changes due to endpoints joining and leaving or 437 due to network congestion, the mixer can adjust the limits at 438 which endpoints can send their streams to match the new value. 439 The mixer then requests a new maximum bit rate, which is equal to 440 or less than the maximum bit rate negotiated at session setup for 441 a specific media stream, and the remote endpoint can respond with 442 the actual bit rate that it can support. 444 The picture Basso et al draws up covers most applications we 445 foresee. However, we would like to extend the list with two 446 additional use cases: 448 7. Currently deployed congestion control algorithms (AIMD and TFRC 449 [RFC3448]) probe for additional available capacity as long as 450 there is something to send. With congestion control algorithms 451 using packet loss as the indication for congestion, this probing 452 generally results in reduced media quality (often to a point 453 where the distortion is large enough to make the media unusable), 454 due to packet loss and increased delay. 456 In a number of deployment scenarios, especially cellular ones, 457 the bottleneck link is often the last hop link. That cellular 458 link also commonly has some type of QoS negotiation enabling the 459 cellular device to learn the maximal bit rate available over this 460 last hop. A media receiver behind this link can, in most (if not 461 all) cases, calculate at least an upper bound for the bit rate 462 available for each media stream it presently receives. How this 463 is done is an implementation detail and not discussed herein. 464 Indicating the maximum available bit rate to the transmitting 465 party for the various media streams can be beneficial to prevent 466 that party from probing for bandwidth for this stream in excess 467 of a known hard limit. For cellular or other mobile devices, the 468 known available bit rate for each stream (deduced from the link 469 bit rate) can change quickly, due to handover to another 470 transmission technology, QoS renegotiation due to congestion, 471 etc. To enable minimal disruption of service, quick convergence 472 is necessary, and therefore media path signaling is desirable. 474 8. The use of reference picture selection (RPS) as an error 475 resilience tool has been introduced in 1997 as NEWPRED [NEWPRED], 476 and is now widely deployed. When RPS is in use, simplistically 477 put, the receiver can send a feedback message to the sender, 478 indicating a reference picture that should be used for future 479 prediction. ([NEWPRED] mentions other forms of feedback as 480 well.) AVPF contains a mechanism for conveying such a message, 481 but did not specify for which codec and according to which syntax 482 the message should conform. Recently, the ITU-T finalized Rec. 483 H.271 which (among other message types) also includes a feedback 484 message. It is expected that this feedback message will fairly 485 quickly enjoy wide support. Therefore, a mechanism to convey 486 feedback messages according to H.271 appears to be desirable. 488 3.2. Using the Media Path 490 There are two reasons why we use the media path for the codec 491 control messages. 493 First, systems employing MCUs often separate the control and media 494 processing parts. As these messages are intended for or generated 495 by the media part rather than the signaling part of the MCU, having 496 them on the media path avoids transmission across interfaces and 497 unnecessary control traffic between signaling and processing. If 498 the MCU is physically decomposed, the use of the media path avoids 499 the need for media control protocol extensions (e.g. in MEGACO 500 [RFC3525]). 502 Secondly, the signaling path quite commonly contains several 503 signaling entities, e.g. SIP proxies and application servers. 504 Avoiding going through signaling entities avoids delay for several 505 reasons. Proxies have less stringent delay requirements than media 506 processing and due to their complex and more generic nature may 507 result in significant processing delay. The topological locations 508 of the signaling entities are also commonly not optimized for 509 minimal delay, but rather towards other architectural goals. Thus, 510 the signaling path can be significantly longer in both geographical 511 and delay sense. 513 3.3. Using AVPF 515 The AVPF feedback message framework [RFC4585] provides the 516 appropriate framework to implement the new messages. AVPF 517 implements rules controlling the timing of feedback messages to 518 avoid congestion through network flooding by RTCP traffic. We re- 519 use these rules by referencing AVPF. 521 The signaling setup for AVPF allows each individual type of function 522 to be configured or negotiated on an RTP session basis. 524 3.3.1. Reliability 526 The use of RTCP messages implies that each message transfer is 527 unreliable, unless the lower layer transport provides reliability. 528 The different messages proposed in this specification have different 529 requirements in terms of reliability. However, in all cases, the 530 reaction to an (occasional) loss of a feedback message is specified. 532 3.4. Multicast 534 The codec control messages might be used with multicast. The RTCP 535 timing rules specified in [RFC3550] and [RFC4585] ensure that the 536 messages do not cause overload of the RTCP connection. The use of 537 multicast may result in the reception of messages with inconsistent 538 semantics. The reaction to inconsistencies depends on the message 539 type, and is discussed for each message type separately. 541 3.5. Feedback Messages 543 This section describes the semantics of the different feedback 544 messages and how they apply to the different use cases. 546 3.5.1. Full Intra Request Command 548 A Full Intra Request (FIR) Command, when received by the designated 549 media sender, requires that the media sender sends a Decoder Refresh 550 Point (see 2.2) at the earliest opportunity. The evaluation of such 551 opportunity includes the current encoder coding strategy and the 552 current available network resources. 554 FIR is also known as an "instantaneous decoder refresh request", 555 "fast video update request" or "video fast update request". 557 Using a decoder refresh point implies refraining from using any 558 picture sent prior to that point as a reference for the encoding 559 process of any subsequent picture sent in the stream. For 560 predictive media types that are not video, the analogue applies. 561 For example, if in MPEG-4 systems scene updates are used, the 562 decoder refresh point consists of the full representation of the 563 scene and is not delta-coded relative to previous updates. 565 Decoder refresh points, especially Intra or IDR pictures, are in 566 general several times larger in size than predicted pictures. Thus, 567 in scenarios in which the available bit rate is small, the use of a 568 decoder refresh point implies a delay that is significantly longer 569 than the typical picture duration. 571 Usage in multicast is possible; however aggregation of the commands 572 is recommended. A receiver that receives a request closely after 573 sending a decoder refresh point -- within 2 times the longest Round 574 Trip Time (RTT) known, plus and AVPF-induced RTCP packet sending 575 delays -- should await a second request message to ensure that the 576 media receiver has not been served by the previously delivered 577 decoder refresh point. The reason for the specified delay is to 578 avoid sending unnecessary decoder refresh points. A session 579 participant may have sent its own request while another 580 participant's request was in-flight to them. Suppressing those 581 requests that may have been sent without knowledge about the other 582 request avoids this issue. 584 Using the FIR command to recover from errors is explicitly 585 disallowed, and instead the PLI message defined in AVPF [RFC4585] 586 should be used. The PLI message reports lost pictures and has been 587 included in AVPF for precisely that purpose. 589 Full Intra Request is applicable in use-cases 1 and 2. 591 3.5.1.1. Reliability 593 The FIR message results in the delivery of a decoder refresh point, 594 unless the message is lost. Decoder refresh points are easily 595 identifiable from the bit stream. Therefore, there is no need for 596 protocol-level notification, and a simple command repetition 597 mechanism is sufficient for ensuring the level of reliability 598 required. However, the potential use of repetition does require a 599 mechanism to prevent the recipient from responding to messages 600 already received and responded to. 602 To ensure the best possible reliability, a sender of FIR may repeat 603 the FIR request until the desired content has been received. The 604 repetition interval is determined by the RTCP timing rules 605 applicable to the session. Upon reception of a complete decoder 606 refresh point or the detection of an attempt to send a decoder 607 refresh point (which got damaged due to a packet loss), the 608 repetition of the FIR must stop. If another FIR is necessary, the 609 request sequence number must be increased. A FIR sender shall not 610 have more than one FIR request (different request sequence number) 611 outstanding at any time per media sender in the session. 613 The receiver of FIR (i.e. the media sender) behaves in complementary 614 fashion to ensure delivery of a decoder refresh point. If it 615 receives repetitions of the FIR more than 2*RTT after it has sent a 616 decoder refresh point, it shall send a new decoder refresh point. 617 Two round trip times allow time for the decoder refresh point to 618 arrive back to the requestor and for the end of repetitions of FIR 619 to reach and be detected by the media sender. 621 An RTP mixer or RTP switching MCU that receive a FIR from a media 622 receiver is responsible to ensure that a decoder refresh point is 623 delivered to the requesting receiver. It may be necessary for the 624 mixer/MCU to generate FIR commands. From a reliability perspective, 625 the two legs (FIR-requesting endpoint to mixer/MCU, and mixer/MCU to 626 decoder refresh point generating endpoint) are handled independently 627 from each other. 629 3.5.2. Temporal Spatial Trade-off Request and Notification 631 The Temporal Spatial Trade-off Request (TSTR) instructs the video 632 encoder to change its trade-off between temporal and spatial 633 resolution. Index values from 0 to 31 indicate monotonically a 634 desire for higher frame rate. That is, a requester asking for an 635 index of 0 prefers a high quality and is willing to accept a low 636 frame rate, whereas a requester asking for 31 wishes a high frame 637 rate, potentially at the cost of low spatial quality. 639 In general the encoder reaction time may be significantly longer 640 than the typical picture duration. See use case 3 for an example. 641 The encoder decides whether and to what extent the request results 642 in a change of the trade-off. It returns a Temporal Spatial Trade- 643 Off Notification (TSTN) message to indicate the trade-off that it 644 will use henceforth. 646 TSTR and TSTN have been introduced primarily because it is believed 647 that control protocol mechanisms, e.g. a SIP re-invite, are too 648 heavyweight and too slow to allow for a reasonable user experience. 649 Consider, for example, a user interface where the remote user 650 selects the temporal/spatial trade-off with a slider. An immediate 651 feedback to any slider movement is required for a reasonable user 652 experience. A SIP re-INVITE [RFC3261] would require at least two 653 round-trips more (compared to the TSTR/TSTN mechanism) and may 654 involve proxies and other complex mechanisms. Even in a well- 655 designed system, it could take a second or so until the new trade- 656 off is finally selected. Furthermore the use of RTCP solves the 657 multicast use case very efficiently. 659 The use of TSTR and TSTN in multipoint scenarios is a non-trivial 660 subject, and can be achieved in many implementation-specific ways. 661 Problems stem from the fact that TSTRs will typically arrive 662 unsynchronized, and may request different trade-off values for the 663 same stream and/or endpoint encoder. This memo does not specify a 664 translator's, mixer's or endpoint's reaction to the reception of a 665 suggested trade-off as conveyed in the TSTR. We only require the 666 receiver of a TSTR message to reply to it by sending a TSTN, 667 carrying the new trade-off chosen by its own criteria (which may or 668 may not be based on the trade-off conveyed by the TSTR). In other 669 words, the trade-off sent in TSTR is a non-binding recommendation, 670 nothing more. 672 Three TSTR/TSTN scenarios need to be distinguished, based on the 673 topologies described in [Topologies]. The scenarios are described 674 in the following sub-clauses. 676 3.5.2.1. Point-to-Point 678 In this most trivial case (Topo-Point-to-Point), the media sender 679 typically adjusts its temporal/spatial trade-off based on the 680 requested value in TSTR, subject to its own capabilities. The TSTN 681 message conveys back the new trade-off value (which may be identical 682 to the old one if, for example, the sender is not capable of 683 adjusting its trade-off). 685 3.5.2.2. Point-to-Multipoint Using Multicast or Translators 687 RTCP Multicast is used either with media multicast according to 688 Topo-Multicast, or following RFC 3550's translator model according 689 to Topo-Translator. In these cases, unsynchronized TSTR messages 690 from different receivers may be received, possibly with different 691 requested trade-offs (because of different user preferences). This 692 memo does not specify how the media sender tunes its trade-off. 693 Possible strategies include selecting the mean or median of all 694 trade-off requests received, giving priority to certain 695 participants, or continuing to use the previously selected trade-off 696 (e.g. when the sender is not capable of adjusting it). Again, all 697 TSTR messages need to be acknowledged by TSTN, and the value 698 conveyed back has to reflect the decision made. 700 3.5.2.3. Point-to-Multipoint Using RTP Mixer 702 In this scenario (Topo-Mixer) the RTP mixer receives all TSTR 703 messages, and has the opportunity to act on them based on its own 704 criteria. In most cases, the mixer should form a "consensus" of 705 potentially conflicting TSTR messages arriving from different 706 participants, and initiate its own TSTR message(s) to the media 707 sender(s). As in the previous scenario, the strategy for forming 708 this "consensus" is up to the implementation, and can, for example, 709 encompass averaging the participants' request values, giving 710 priority to certain participants, or using session default values. 712 Even if a mixer or translator performs transcoding, it is very 713 difficult to deliver media with the requested trade-off, unless the 714 content the mixer or translator receives is already close to that 715 trade-off. Thus, if the mixer changes its trade-off, it needs to 716 request the media sender(s) to use the new value, by creating a TSTR 717 of its own. Upon reaching a decision on the used trade-off it 718 includes that value in the acknowledgement to the downstream 719 requestors. Only in cases where the original source has 720 substantially higher quality (and bit rate) is it likely that 721 transcoding alone can result in the requested trade-off. 723 3.5.2.4. Reliability 725 A request and reception acknowledgement mechanism is specified. The 726 Temporal Spatial Trade-off Notification (TSTN) message informs the 727 requester that its request has been received, and what trade-off is 728 used henceforth. This acknowledgment mechanism is desirable for at 729 least the following reasons: 731 o A change in the trade-off cannot be directly identified from the 732 media bit stream. 733 o User feedback cannot be implemented without knowing the chosen 734 trade-off value, according to the media sender's constraints. 735 o Repetitive sending of messages requesting an unimplementable 736 trade-off can be avoided. 738 3.5.3. H.271 Video Back Channel Message 740 ITU-T Rec. H.271 defines syntax, semantics, and suggested encoder 741 reaction to a video back channel message. The structure defined in 742 this memo is used to transparently convey such a message from media 743 receiver to media sender. In this memo, we refrain from an in-depth 744 discussion of the available code points within H.271 and refer to 745 the specification text [H.271] instead. 747 However, we note that some H.271 messages bear similarities with 748 native messages of AVPF and this memo. Furthermore, we note that 749 some H.271 message are known to require caution in multicast 750 environments -- or are plainly not usable in multicast or multipoint 751 scenarios. Table 1 provides a brief, oversimplifed overview of the 752 messages currently defined in H.271, their roughly corresponding 753 AVPF or CCM messages (the latter as specified in this memo), and an 754 indication of our current knowledge of their multicast safety. 756 H.271 msg type AVPF/CCM msg type multicast-safe 757 -------------------------------------------------------------------- 758 0 (when used for 759 reference picture 760 selection) AVPF RPSI No (positive ACK of pictures) 761 1 picture loss AVPF PLI Yes 762 2 partial loss AVPF SLI Yes 763 3 one parameter CRC N/A Yes (no required sender action) 764 4 all parameter CRC N/A Yes (no required sender action) 765 5 refresh point CCM FIR Yes 767 Table 1: H.271 messages and their AVPF/CCM equivalents 769 Note: H.271 message type 0 is not a strict equivalent to 770 AVPF's Reference Picture Selection Indication (RPSI); it is 771 an indication of known-as-correct reference picture(s) at the 772 decoder. It does not command an encoder to use a defined 773 reference picture (the form of control information envisioned 774 to be carried in RPSI). However, it is believed and intended 775 that H.271 message type 0 will be used for the same purpose 776 as AVPF's RPSI -- although other use forms are also possible. 778 In response to the opaqueness of the H.271 messages, especially with 779 respect to the multicast safety, the following guidelines MUST be 780 followed when an implementation wishes to employ the H.271 video 781 back channel message: 783 1. Implementations utilizing the H.271 feedback message MUST stay in 784 compliance with congestion control principles, as outlined in 785 section 5. 787 2. An implementation SHOULD utilize the IETF-native messages as 788 defined in [RFC4585] and in this memo instead of similar messages 789 defined in [H.271]. Our current understanding of similar 790 messages is documented in Table 1 above. One good reason to 791 divert from the SHOULD statement above would be if it is clearly 792 understood that, for a given application and video compression 793 standard, the aforementioned "similarity" is not given, in 794 contrast to what the table indicates. 796 3. It has been observed that some of the H.271 code points currently 797 in existence are not multicast-safe. Therefore, the sensible 798 thing to do is not to use the H.271 feedback message type in 799 multicast environments. It MAY be used only when all the issues 800 mentioned later are fully understood by the implementer, and 801 properly taken into account by all endpoints. In all other 802 cases, the H.271 message type MUST NOT be used in conjunction 803 with multicast. 805 4. It has been observed that even in centralized multipoint 806 environments, where the mixer should theoretically be able to 807 resolve issues as documented below, the implementation of such a 808 mixer and cooperative endpoints is a very difficult and tedious 809 task. Therefore, H.271 messages MUST NOT be used in centralized 810 multipoint scenarios, unless all the issues mentioned below are 811 fully understood by the implementer, and properly taken into 812 account by both mixer and endpoints. 814 Issues to be taken into account when considering the use of H.271 in 815 multipoint environments: 817 1. Different state on different receivers. In many environments it 818 cannot be guaranteed that the decoder state of all media 819 receivers is identical at any given point in time. The most 820 obvious reason for such a possible misalignment of state is a 821 loss that occurs on the path to only one of many media receivers. 822 However, there are other not so obvious reasons, such as recent 823 joins to the multipoint conference (be it by joining the 824 multicast group or through additional mixer output). Different 825 states can lead the media receivers to issue potentially 826 contradicting H.271 messages (or one media receiver issuing an 827 H.271 message that, when observed by the media sender, is not 828 helpful for the other media receivers). A naive reaction of the 829 media sender to these contradicting messages can lead to 830 unpredictable and annoying results. 832 2. Combining messages from different media receivers in a media 833 sender is a non-trivial task. As reasons, we note that these 834 messages may be contradicting each other, and that their 835 transport is unreliable (there may well be other reasons). In 836 case of many H.271 messages (i.e. types 0, 2, 3, and 4), the 837 algorithm for combining must be aware both of the 838 network/protocol environment (i.e. with respect to congestion) 839 and of the media codec employed, as H.271 messages of a given 840 type can have different semantics for different media codecs. 842 3. The suppression of requests may need to go beyond the basic 843 mechanisms described in AVPF (which are driven exclusively by 844 timing and transport considerations on the protocol level). For 845 example, a receiver is often required to refrain from (or delay) 846 generating requests, based on information it receives from the 847 media stream. For instance, it makes no sense for a receiver to 848 issue a FIR when a transmission of an Intra/IDR picture is 849 ongoing. 851 4. When using the non-multicast-safe messages (e.g. H.271 type 0 852 positive ACK of received pictures/slices) in larger multicast 853 groups, the media receiver will likely be forced to delay or even 854 omit sending these messages. For the media sender this looks 855 like data has not been properly received (although it was 856 received properly), and a naively implemented media sender reacts 857 to these perceived problems where it should not. 859 3.5.3.1. Reliability 861 H.271 Video Back Channel messages do not require reliable 862 transmission, and confirmation of the reception of a message can be 863 derived from the forward video bit stream. Therefore, no specific 864 reception acknowledgement is specified. 866 With respect to re-sending rules, clause 3.5.1.1 applies. 868 3.5.4. Temporary Maximum Media Stream Bit Rate Request and Notification 870 A receiver, translator or mixer uses the Temporary Maximum Media 871 Stream Bit Rate Request (TMMBR, "timber") to request a sender to 872 limit the maximum bit rate for a media stream (see 2.2) to, or 873 below, the provided value. The Temporary Maximum Media Stream Bit 874 Rate Notification (TMMBN) contains the media sender's current view 875 of the most limiting subset of the TMMBR-defined limits it has 876 received, to help the participants to suppress TMMBR requests that 877 would not further restrict the media sender. The primary usage for 878 the TMMBR/TMMBN messages is in a scenario with an MCU or mixer (use 879 case 6), corresponding to Topo-Translator or Topo-Mixer, but also to 880 Topo-Point-to-Point. 882 Each temporary limitation on the media stream is expressed as a 883 tuple. The first component of the tuple is the maximum total media 884 bit rate (as defined in section 2.2) that the media receiver is 885 currently prepared to accept for this media stream. The second 886 component is the per-packet overhead that the media receiver has 887 observed for this media stream at its chosen reference protocol 888 layer. 890 As indicated in section 2.2, the overhead as observed by the sender 891 of the TMMBR (i.e. the media receiver) may differ from the overhead 892 observed at the receiver of the TMMBR (i.e. the media sender) due to 893 use of a different reference protocol layer at the other end or due 894 to the intervention of translators or mixers that affect the amount 895 of per packet overhead. For example, a gateway in between the two 896 that converts between IPv4 and IPv6 affects the per-packet overhead 897 by 20 bytes. Other mechanisms that change the overhead include 898 tunnels. The problem with varying overhead is also discussed in 899 [RFC3890]. As will be seen in the description of the algorithm for 900 use of TMMBR, the difference in perceived overhead between the 901 sending and receiving ends presents no difficulty because 902 calculations are carried out in terms of variables that have the 903 same value at the sender as at the receiver -- for example, packet 904 rate and net media rate. 906 Reporting both maximum total media bit rate and per-packet overhead 907 allows different receivers to provide bit rate and overhead values 908 for different protocol layers, for example at the IP level, at the 909 outer part of a tunnel protocol, or at the link layer. The protocol 910 level a peer reports on depends on the level of integration the peer 911 has, as it needs to be able to extract the information from that 912 protocol level. For example, an application with no knowledge of 913 the IP version it is running over can not meaningfully determine the 914 overhead of the IP header, and hence will not want to include IP 915 overhead in the overhead or maximum total media bit rate 916 calculation. 918 It is expected that most peers will be able to report values at 919 least for the IP layer. In certain implementations it may be 920 advantageous to also include information pertaining to the link 921 layer, which in turn allows for a more precise overhead calculation 922 and a better optimization of connectivity resources. 924 The Temporary Maximum Media Stream Bit Rate messages are generic 925 messages that can be applied to any RTP packet stream. This 926 separates them from the other codec control messages defined in this 927 specification, which apply only to specific media types or payload 928 formats. The TMMBR functionality applies to the transport, and the 929 requirements the transport places on the media encoding. 931 The reasoning below assumes that the participants have negotiated a 932 session maximum bit rate, using a signaling protocol. This value 933 can be global, for example in case of point-to-point, multicast, or 934 translators. It may also be local between the participant and the 935 peer or mixer. In either case, the bit rate negotiated in signaling 936 is the one that the participant guarantees to be able to handle 937 (depacketize and decode). In practice, the connectivity of the 938 participant also influences the negotiated value -- it does not make 939 much sense to negotiate a total media bit rate that one's network 940 interface does not support. 942 It is also beneficial to have negotiated a maximum packet rate for 943 the session or sender. RFC 3890 provides an SDP [RFC4566] attribute 944 that can be used for this purpose; however, that attribute is not 945 usable in RTP sessions established using offer/answer [RFC3264]. 946 Therefore an optional maximum packet rate signaling parameter is 947 specified in this memo. 949 An already established maximum total media bit rate may be changed 950 at any time, subject to the timing rules governing the sending of 951 feedback messages. The limit may change to any value between zero 952 and the session maximum, as negotiated during session establishment 953 signaling. However, even if a sender has received a TMMBR message 954 allowing an increase in the bit rate, all increases must be governed 955 by a congestion control mechanism. TMMBR indicates known 956 limitations only, usually in the local environment, and does not 957 provide any guarantees about the full path. Furthermore, any 958 increases in TMMBR-established bit rate limits are to be executed 959 only after a certain delay from the sending of the TMMBN message 960 that notifies the world about the increase in limit. The delay is 961 specified as at least twice the longest RTT as known by the media 962 sender, plus the media sender's calculation of the required wait 963 time for the sending of another TMMBR message for this session based 964 on AVPF timing rules. This delay is introduced to allow other 965 session participants to make known their bit rate limit 966 requirements, which may be lower. 968 If it is likely that the new value indicated by TMMBR will be valid 969 for the remainder of the session, the TMMBR sender is expected to 970 perform a renegotiation of the session upper limit using the session 971 signaling protocol. 973 3.5.4.1. Behavior for media receivers using TMMBR 975 This section is an informal description of behaviour described more 976 precisely in section 4.2. 978 A media sender begins the session limited by the maximum media bit 979 rate and maximum packet rate negotiated in session signaling, if 980 any. Note that this value may be negotiated for another protocol 981 layer than the one the participant uses in its TMMBR messages. Each 982 media receiver selects a reference protocol layer, forms an estimate 983 of the overhead it is observing (or estimating it if no packets has 984 been seen yet) at that reference level, and determines the maximum 985 total media bit rate it can accept, taking into account its own 986 limitations and any transport path limitations of which it may be 987 aware. In case the current limitations are more restricting than 988 what was agreed on in the session signaling, the media receiver 989 reports its initial estimate of these two quantities to the media 990 sender using a TMMBR message. Overall message traffic is reduced by 991 the possibility of including tuples for multiple media senders in 992 the same TMMBR message. 994 The media sender applies an algorithm such as that specified in 995 section 3.5.4.2 to select which of the tuples it has received are 996 most limiting (i.e. the bounding set as defined in section 2.2). It 997 modifies its operation to stay within the feasible region (as 998 defined in section 2.2), and also sends out a TMMBN notification to 999 the media receivers indicating the selected bounding set. That 1000 notification also indicates who was responsible for the tuples in 1001 the bounding set, i.e. the "owner"(s) of the limitation. A session 1002 participant that owns no tuple in the bounding set is called a "non- 1003 owner". 1005 If a media receiver does not own one of the tuples in the bounding 1006 set reported by the TMMBN, it applies the same algorithm as the 1007 media sender to determine if its current estimated (maximum total 1008 media bit rate, overhead) tuple would enter the bounding set if 1009 known to the media sender. If so, it issues a TMMBR request 1010 reporting the tuple value to the sender. Otherwise it takes no 1011 action for the moment. Periodically, its estimated tuple values may 1012 change or it may receive a new TMMBN. If so, it reapplies the 1013 algorithm to decide whether it needs to issue a TMMBR request. 1015 If, alternatively, a media receiver owns one of the tuples in the 1016 reported bounding set, it takes no action until such time as its 1017 estimate of its own tuple values changes. At that time it sends a 1018 TMMBR request to the media sender to report the changed values. 1020 A media receiver may change status between owner and non-owner of a 1021 bounding tuple between one TMMBN message and the next. Thus, it 1022 must check the contents of each TMMBN to determine its subsequent 1023 actions. 1025 Implementations may use other algorithms of their choosing, as long 1026 as the bit rate limitations resulting from the exchange of TMMBR and 1027 TMMBN messages are at least as strict (at least as low, in the bit 1028 rate dimension) as the ones resulting from the use of the 1029 aforementioned algorithm. 1031 Obviously, in point-to-point cases, when there is only one media 1032 receiver, this receiver becomes "owner" once it receives the first 1033 TMMBN in response to its own TMMBR, and stays "owner" for the rest 1034 of the session. Therefore, when it is known that there will always 1035 be only a single media receiver, the above algorithm is not 1036 required. Media receivers that are aware they are the only ones in 1037 a session can send TMMBR messages with bit rate limits both higher 1038 and lower than the previously notified limit, at any time (subject 1039 to the AVPF [RFC4585] RTCP RR send timing rules). However, it may 1040 be difficult for a session participant to determine if it is the 1041 only receiver in the session. Because of this any implementation of 1042 TMMBR is required to include the algorithm described in the next 1043 section or a stricter equivalent. 1045 3.5.4.2. Algorithm for establishing current limitations 1047 This section introduces an example algorithm for the calculation of 1048 a session limit. Other algorithms can be employed, as long as the 1049 result of the calculation is at least as restrictive as the result 1050 that is obtained by this algorithm. 1052 First, it is important to consider the implications of using a tuple 1053 for limiting the media sender's behavior. The bit rate and the 1054 overhead value result in a two-dimensional solution space for the 1055 calculation of the bit rate of media streams. Fortunately, the two 1056 variables are linked. Specifically, the bit rate available for RTP 1057 payloads is equal to the TMMBR reported bit rate minus the packet 1058 rate used, multiplied by the TMMBR reported overhead converted to 1059 bits. As a result, when different bit rate/overhead combinations 1060 need to be considered, the packet rate determines the correct 1061 limitation. This is perhaps best explained by an example: 1063 Example: 1065 Receiver A: TMMBR_max total BR = 35 kbps, TMMBR_OH = 40 bytes 1066 Receiver B: TMMBR_max total BR = 40 kbps, TMMBR_OH = 60 bytes 1067 For a given packet rate (PR) the bit rate available for media 1068 payloads in RTP will be: 1070 Max_net media_BR_A = 1071 TMMBR_max total BR_A - PR * TMMBR_OH_A * 8 ... (1) 1073 Max_net media_BR_B = 1074 TMMBR_max total BR_B - PR * TMMBR_OH_B * 8 ... (2) 1076 For a PR = 20 these calculations will yield a Max_net media_BR_A = 1077 28600 bps and Max_net media_BR_B = 30400 bps, which suggests that 1078 receiver A is the limiting one for this packet rate. However, at a 1079 certain PR there is a switchover point at which receiver B becomes 1080 the limiting one. The switchover point can be identified by setting 1081 Max_media_BR_A equal to Max_media_BR_B and breaking out PR: 1083 TMMBR_max total BR_A - TMMBR_max total BR_B 1084 PR = ------------------------------------------- ... (3) 1085 8*(TMMBR_OH_A - TMMBR_OH_B) 1087 which, for the numbers above yields 31.25 as the switchover point 1088 between the two limits. That is, for packet rates below 31.25 per 1089 second, receiver A is the limiting receiver, and for higher packet 1090 rates, receiver B is more limiting. The implications of this 1091 behavior have to be considered by implementations that are going to 1092 control media encoding and its packetization. As exemplified above, 1093 multiple TMMBR limits may apply to the trade-off between net media 1094 bit rate and packet rate. Which limitation applies depends on the 1095 packet rate being considered. 1097 This also has implications for how the TMMBR mechanism needs to 1098 work. First, there is the possibility that multiple TMMBR tuples 1099 are providing limitations on the media sender. Secondly there is a 1100 need for any session participant (media sender and receivers) to be 1101 able to determine if a given tuple will become a limitation upon the 1102 media sender, or if the set of already given limitations is stricter 1103 than the given values. In the absence of the ability to make this 1104 determination the suppression of TMMBR requests would not work. 1106 The basic idea of the algorithm is as follows. Each TMMBR tuple can 1107 be viewed as the equation of a straight line (cf. equations (1) and 1108 (2)) in a space where packet rate lies along the X-axis and maximum 1109 bit rate lies along the Y-axis. The lower envelope of the set of 1110 lines corresponding to the complete set of TMMR tuples, together 1111 with the X and Y axes, defines a polygon. Points lying within this 1112 polygon are combinations of packet rate and bit rate that meet all 1113 of the TMMBR constraints. The highest feasible packet rate within 1114 this region is the minimum of the rate at which the bounding polygon 1115 meets the X-axis or the session maximum packet rate (SMAXPR, 1116 measured in packets per second) provided by signaling, if any. 1117 Typically a media sender will prefer to operate at a lower rate than 1118 this theoretical maximum, so as to increase the rate at which actual 1119 media content reaches the receivers. The purpose of the algorithm 1120 is to distinguish the TMMBR tuples constituting the bounding set and 1121 thus delineate the feasible region, so that the media sender can 1122 select its preferred operating point within that region 1124 Figure 1 below shows a bounding polygon formed by TMMBR tuples A and 1125 B. A third tuple C lies outside the bounding polygon and is 1126 therefore irrelevant in determining feasible tradeoffs between media 1127 rate and packet rate. The line labeled ss..s represents the limit 1128 on packet rate imposed by the session maximum packet rate (SMAXPR) 1129 obtained by signaling during session setup. In Figure 1 the limit 1130 determined by tuple B happens to be more restrictive than SMAXPR. 1131 The situation could easily be the reverse, meaning that the bounding 1132 polygon is terminated on the right by the vertical line representing 1133 the SMAXPR constraint. 1135 Net ^ 1136 Media|a c b s 1137 Bit | a c b s 1138 Rate | a c b s 1139 | a cb s 1140 | a c s 1141 | a bc s 1142 | a b c s 1143 | ab c s 1144 | Feasible b c s 1145 | region ba s 1146 | b a s c 1147 | b s c 1148 | b s a 1149 |_____________________bs________ 1150 +------------------------------>____________ 1152 Packet rate 1154 Figure 1 - Geometric Interpretation of TMMBR Tuples 1156 Note that the slopes of the lines making up the bounding polygon are 1157 increasingly negative as one moves in the direction of increasing 1158 packet rate. Note also that with slight rearrangement, equations 1159 (1) and (2) have the canonical form: 1161 y = mx + b 1163 where 1164 m is the slope and has value equal to the negative of the tuple 1165 overhead (in bits), 1166 and 1167 b is the y-intercept and has value equal to the tuple maximum 1168 total media bit rate. 1170 These observations lead to the conclusion that when processing the 1171 TMMBR tuples to select the initial bounding set, one should sort and 1172 process the tuples by order of increasing overhead. Once a 1173 particular tuple has been added to the bounding set, all tuples not 1174 already selected and having lower overhead can be eliminated, 1175 because the next side of the bounding polygon has to be steeper 1176 (i.e. the corresponding TMMBR must have higher overhead) than the 1177 latest added tuple. 1179 Line cc..c in Figure 1 illustrates another principle. This line is 1180 parallel to line aa..a, but has a higher Y-intercept. That is, the 1181 corresponding TMMBR tuple contains a higher maximum total media bit 1182 rate value. Since line cc..c is outside the bounding polygon, it 1183 illustrates the conclusion that if two TMMBR tuples have the same 1184 overhead value, the one with higher maximum total media bit rate 1185 value cannot be part of the bounding set and can be set aside. 1187 Two further observations complete the algorithm. Obviously, moving 1188 from the left, the successive corners of the bounding polygon (i.e. 1189 the intersection points between successive pairs of sides) lie at 1190 successively higher packet rates. On the other hand, again moving 1191 from the left, each successive line making up the bounding set 1192 crosses the X-axis at a lower packet rate. 1194 The complete algorithm can now be specified. The algorithm works 1195 with two lists of TMMBR tuples, the candidate list X and the 1196 selected list Y, both ordered by increasing overhead value. The 1197 algorithm terminates when all members of X have been discarded or 1198 removed for processing. Membership of the selected list Y is 1199 probationary until the algorithm is complete. Each member of the 1200 selected list is associated with an intersection value, which is the 1201 packet rate at which the line corresponding to that TMMBR tuple 1202 intersects with the line corresponding to the previous TMMBR tuple 1203 in the selected list. Each member of the selected list is also 1204 associated with a maximum packet rate value, which is the lesser of 1205 the session maximum packet rate SMAXPR (if any) and the packet rate 1206 at which the line corresponding to that tuple crosses the X-axis. 1208 When the algorithm terminates, the selected list is equal to the 1209 bounding set as defined in section 2.2. 1211 Initial Algorithm 1213 This algorithm is used by the media sender when it has received one 1214 or more TMMBR requests and before it has determined a bounding set 1215 for the first time. 1217 1. Sort the TMMBR tuples by order of increasing overhead. This is 1218 the initial candidate list X. 1220 2. When multiple tuples in the candidate list have the same overhead 1221 value, discard all but the one with the lowest maximum total media 1222 bit rate value. 1224 3. Select and remove from the candidate list the TMMBR tuple with the 1225 lowest maximum total media bit rate value. If there is more than 1226 one tuple with that value, choose the one with the highest 1227 overhead value. This is the first member of the selected list Y. 1228 Set its intersection value equal to zero. Calculate its maximum 1229 packet rate as the minimum of SMAXPR (if available) and the value 1230 obtained from the following formula, which is the packet rate at 1231 which the corresponding line crosses the X-axis. 1233 Max PR = TMMBR max total BR / (8 * TMMBR OH) ... (4) 1235 4. Discard from the candidate list all tuples with a lower overhead 1236 value than the selected tuple. 1238 5. Remove the first remaining tuple from the candidate list for 1239 processing. Call this the current candidate. 1241 6. Calculate the packet rate PR at the intersection of the line 1242 generated by the current candidate with the line generated by the 1243 last tuple in the selected list Y, using equation (3). 1245 7. If the calculated value PR is equal to or lower than the 1246 intersection value stored for the last tuple of the selected list, 1247 discard the last tuple of the selected list and go back to step 6 1248 (retaining the same current candidate). 1250 Note that the choice of the initial member of the selected list Y 1251 in step 3 guarantees that the selected list will never be emptied 1252 by this process, meaning that the algorithm must eventually (if 1253 not immediately) fall through to the step 8. 1255 8. (This step is reached when the calculated PR value of the current 1256 candidate is greater than the intersection value of the current 1257 last member of the selected list Y.) If the calculated value PR 1258 of the current candidate is lower than the maximum packet rate 1259 associated with the last tuple in the selected list, add the 1260 current candidate tuple to the end of the selected list. Store PR 1261 as its intersection value. Calculate its maximum packet rate as 1262 the lesser of SMAXPR (if available) and the maximum packet rate 1263 calculated using equation (4). 1265 9. If any tuples remain in the candidate list, go back to step 5. 1267 Incremental Algorithm 1269 The previous algorithm covered the initial case, where no selected 1270 list had previously been created. It also applied only to the media 1271 sender. When a previously-created selected list is available at 1272 either the media sender or media receiver, two other cases can be 1273 considered: 1275 o when a TMMBR tuple not currently in the selected list is a 1276 candidate for addition; 1278 o when the values change in a TMMBR tuple currently in the 1279 selected list. 1281 At the media receiver these cases correspond respectively to those 1282 of the non-owner and owner of a tuple in the TMMBN-reported bounding 1283 set. 1285 In either case, the process of updating the selected list to take 1286 account of the new/changed tuple can use the basic algorithm 1287 described above, with the modification that the initial candidate 1288 set consists only of the existing selected list and the new or 1289 changed tuple. Some further optimization is possible (beyond 1290 starting with a reduced candidate set) by taking advantage of the 1291 following observations. 1293 The first observation is that if the new/changed candidate becomes 1294 part of the new selected list, the result may be to cause zero or 1295 more other tuples to be dropped from the list. However, if more 1296 than one other tuple is dropped, the dropped tuples will be 1297 consecutive. This can be confirmed geometrically by visualizing a 1298 new line that cuts off a series of segments from the previously- 1299 existing bounding polygon. The cut-off segments are connected one 1300 to the next, the geometric equivalent of consecutive tuples in a 1301 list ordered by overhead value. Beyond the dropped set in either 1302 direction all of the tuples that were in the earlier selected list 1303 will be in the updated one. The second observation is that, leaving 1304 aside the new candidate, the order of tuples remaining in the 1305 updated selected list is unchanged because their overhead values 1306 have not changed. 1308 The consequence of these two observations is that, once the 1309 placement of the new candidate and the extent of the dropped set of 1310 tuples (if any) has been determined, the remaining tuples can be 1311 copied directly from the candidate list into the selected list, 1312 preserving their order. This conclusion suggests the following 1313 modified algorithm: 1315 o Run steps 1-4 of the basic algorithm. 1317 o If the new candidate has survived steps 2 and 4 and has become 1318 the new first member of the selected list, run steps 5-9 on 1319 subsequent candidates until another candidate is added to the 1320 selected list. Then move all remaining candidates to the 1321 selected list, preserving their order. 1323 o If the new candidate has survived steps 2 and 4 and has not 1324 become the new first member of the selected list, start by 1325 moving all tuples in the candidate list with lower overhead 1326 values than that of the new candidate to the selected list, 1327 preserving their order. Run steps 5 through 9 for the new 1328 candidate, with the modification that the intersection values 1329 and maximum packet rates for the tuples on the selected list 1330 have to be calculated on the fly because they were not 1331 previously stored. Continue processing only until a 1332 subsequent tuple has been added to the selected list, then 1333 move all remaining candidates to the selected list, preserving 1334 their order. 1336 Note that the new candidate could be added to the selected 1337 list only to be dropped again when the next tuple is 1338 processed. It can easily be seen that in this case the new 1339 candidate does not displace any of the earlier tuples in the 1340 selected list. The limitations of ASCII art make this 1341 difficult to show in a figure. Line cc..c in Figure 1 would 1342 be an example if it had a steeper slope (tuple C had a higher 1343 overhead value), but still intersected line aa..a beyond where 1344 line aa..a intersects line bb..b. 1346 The algorithm just described is approximate, because it does not 1347 take account of tuples outside the selected list. To see how such 1348 tuples can become relevant, consider Figure 1 and suppose that the 1349 maximum total media bit rate in tuple A increases to the point that 1350 line aa..a moves outside line cc..c. Tuple A will remain in the 1351 bounding set calculated by the media sender. However, once it 1352 issues a new TMMBN, media receiver C will apply the algorithm and 1353 discover that its tuple C should now enter the bounding set. It 1354 will issue a TMMBR request to the media sender, which will repeat 1355 its calculation and come to the appropriate conclusion. 1357 The rules of section 4.2 require that the media sender refrain from 1358 raising its sending rate until media receivers have had a chance to 1359 respond to the TMMBN. In the example just given, this delay ensures 1360 that the relaxation of tuple A does not actually result in an 1361 attempt to send media at a rate exceeding the capacity at C. 1363 3.5.4.3. Use of TMMBR in a Mixer Based Multipoint Operation 1365 Assume a small mixer-based multiparty conference is ongoing, as 1366 depicted in Topo-Mixer of [Topologies]. All participants have 1367 negotiated a common maximum bit rate that this session can use. The 1368 conference operates over a number of unicast paths between the 1369 participants and the mixer. The congestion situation on each of 1370 these paths can be monitored by the participant in question and by 1371 the mixer, utilizing, for example, RTCP receiver reports (RR) or the 1372 transport protocol, e.g. DCCP [RFC4340]. However, any given 1373 participant has no knowledge of the congestion situation of the 1374 connections to the other participants. Worse, without mechanisms 1375 similar to the ones discussed in this draft, the mixer (which is 1376 aware of the congestion situation on all connections it manages) has 1377 no standardized means to inform media senders to slow down, short of 1378 forging its own receiver reports (which is undesirable). In 1379 principle, a mixer confronted with such a situation is obliged to 1380 thin or transcode streams intended for connections that detected 1381 congestion. 1383 In practice, unfortunately, media-aware streaming thinning is a very 1384 difficult and cumbersome operation and adds undesirable delay. If 1385 media-unaware, it leads very quickly to unacceptable reproduced 1386 media quality. Hence, a means to slow down senders even in the 1387 absence of congestion on their connections to the mixer is 1388 desirable. 1390 To allow the mixer to throttle traffic on the individual links, 1391 without performing transcoding, there is a need for a mechanism that 1392 enables the mixer to ask a participant's media encoders to limit the 1393 media stream bit rate they are currently generating. TMMBR provides 1394 the required mechanism. When the mixer detects congestion between 1395 itself and a given participant, it executes the following procedure: 1397 1. It starts thinning the media traffic to the congested participant 1398 to the supported bit rate. 1400 2. It uses TMMBR to request the media sender(s) to reduce the total 1401 media bit rate sent by them to the mixer, to a value that is in 1402 compliance with congestion control principles for the slowest 1403 link. Slow refers here to the available bandwidth / bit rate / 1404 capacity and packet rate after congestion control. 1406 3. As soon as the bit rate has been reduced by the sending part, the 1407 mixer stops stream thinning implicitly, because there is no need 1408 for it once the stream is in compliance with congestion control. 1410 This use of stream thinning as an immediate reaction tool followed 1411 up by a quick control mechanism appears to be a reasonable 1412 compromise between media quality and the need to combat congestion. 1414 3.5.4.4. Use of TMMBR in Point-to-Multipoint Using Multicast or 1415 Translators 1417 In these topologies, corresponding to Topo-Multicast or Topo- 1418 Translator, RTCP RRs are transmitted globally. This allows all 1419 participants to detect transmission problems such as congestion, on 1420 a medium timescale. As all media senders are aware of the 1421 congestion situation of all media receivers, the rationale for the 1422 use of TMMBR in the previous section does not apply. However, even 1423 in this case the congestion control response can be improved when 1424 the unicast links are using congestion controlled transport 1425 protocols (such as TCP or DCCP). A peer may also report local 1426 limitations to the media sender. 1428 3.5.4.5. Use of TMMBR in Point-to-point operation 1430 In use case 7 it is possible to use TMMBR to improve the performance 1431 when the known upper limit of the bit rate changes. In this use 1432 case the signaling protocol has established an upper limit for the 1433 session and total media bit rates. However, at the time of 1434 transport link bit rate reduction, a receiver can avoid serious 1435 congestion by sending a TMMBR to the sending side. Thus, TMMBR is 1436 useful for putting restrictions on the application and thus placing 1437 the congestion control mechanism in the right ballpark. However, 1438 TMMBR is usually unable to provide the continuously quick feedback 1439 loop required for real congestion control. Nor do its semantics 1440 match those of congestion control given its different purpose. For 1441 these reasons TMMBR SHALL NOT be used as a substitute for congestion 1442 control. 1444 3.5.4.6. Reliability 1446 The reaction of a media sender to the reception of a TMMBR message 1447 is not immediately identifiable through inspection of the media 1448 stream. Therefore, a more explicit mechanism is needed to avoid 1449 unnecessary re-sending of TMMBR messages. Using a statistically 1450 based retransmission scheme would only provide statistical 1451 guarantees of the request being received. It would also not avoid 1452 the retransmission of already received messages. In addition, it 1453 would not allow for easy suppression of other participants' 1454 requests. For these reasons, a mechanism based on explicit 1455 notification is used. 1457 Upon the reception of a request a media sender sends a TMMBN 1458 notification containing the current bounding set, and indicating 1459 which session participants own that limit. In multicast scenarios, 1460 that allows all other participants to suppress any request they may 1461 have, if their limitations are less strict than the current ones 1462 (i.e. define lines lying outside the feasible region as defined in 1463 section 2.2). Keeping and notifying only the bounding set of tuples 1464 allows for small message sizes and media sender states. A media 1465 sender only keeps state for the SSRCs of the current owners of the 1466 bounding set of tuples; all other requests and their sources are not 1467 saved. Once the bounding set has been established, new TMMBR 1468 messages should be generated only by owners of the bounding tuples 1469 and by other entities that determine (by applying the algorithm of 1470 section 3.5.4.2 or its equivalent) that their limitations should now 1471 be part of the bounding set. 1473 4. RTCP Receiver Report Extensions 1475 This memo specifies six new feedback messages. The Full Intra 1476 Request (FIR), Temporal-Spatial Trade-off Request (TSTR), Temporal- 1477 Spatial Trade-off Notification (TSTN), and Video Back Channel 1478 Message (VBCM) are "Payload Specific Feedback Messages" as defined 1479 in Section 6.3 of AVPF [RFC4585]. The Temporary Maximum Media 1480 Stream Bit Rate Request (TMMBR) and Temporary Maximum Media Stream 1481 Bit Rate Notification (TMMBN) are "Transport Layer Feedback 1482 Messages" as defined in Section 6.2 of AVPF. 1484 The new feedback messages are defined in the following subsections, 1485 following a similar structure to that in sections 6.2 and 6.3 of the 1486 AVPF specification [RFC4585]. 1488 4.1. Design Principles of the Extension Mechanism 1490 RTCP was originally introduced as a channel to convey presence, 1491 reception quality statistics and hints on the desired media coding. 1492 A limited set of media control mechanisms were introduced in early 1493 RTP payload formats for video formats, for example in RFC 2032 1494 [RFC2032]. However, this specification, for the first time, 1495 suggests a two-way handshake for some of its messages. There is 1496 danger that this introduction could be misunderstood as a precedent 1497 for the use of RTCP as an RTP session control protocol. To prevent 1498 such a misunderstanding, this subsection attempts to clarify the 1499 scope of the extensions specified in this memo, and strongly 1500 suggests that future extensions follow the rationale spelled out 1501 here, or compellingly explain why they divert from the rationale. 1503 In this memo, and in AVPF [RFC4585], only such messages have been 1504 included as: 1506 a) have comparatively strict real-time constraints, which prevent 1507 the use of mechanisms such as a SIP re-invite in most application 1508 scenarios. The real-time constraints are explained separately 1509 for each message where necessary. 1511 b) are multicast-safe in that the reaction to potentially 1512 contradicting feedback messages is specified, as necessary for 1513 each message; and 1515 c) are directly related to activities of a certain media codec, 1516 class of media codecs (e.g. video codecs), or a given RTP packet 1517 stream. 1519 In this memo, a two-way handshake is introduced only for messages 1520 for which: 1522 a) a notification or acknowledgement is required due to their 1523 nature. An analysis to determine whether this requirement exists 1524 has been performed separately for each message. 1526 b) the notification or acknowledgement cannot be easily derived from 1527 the media bit stream. 1529 All messages in AVPF [RFC4585] and in this memo present their 1530 contents in a simple, fixed binary format. This accommodates media 1531 receivers which have not implemented higher control protocol 1532 functionalities (SDP, XML parsers and such) in their media path. 1534 Messages that do not conform to the design principles just described 1535 are not an appropriate use of RTCP or of the Codec Control Framework 1536 defined in this document. 1538 4.2. Transport Layer Feedback Messages 1540 As specified in section 6.1 of RFC 4585 [RFC4585], Transport Layer 1541 Feedback messages are identified by the RTCP packet type value RTPFB 1542 (205). 1544 In AVPF, one message of this category had been defined. This memo 1545 specifies two more such messages. They are identified by means of 1546 the FMT parameter as follows: 1548 Assigned in AVPF [RFC4585]: 1550 1: Generic NACK 1551 31: reserved for future expansion of the identifier number 1552 space 1554 Assigned in this memo: 1556 2: reserved (see note below) 1557 3: Temporary Maximum Media Stream Bit Rate Request (TMMBR) 1558 4: Temporary Maximum Media Stream Bit Rate Notification 1559 (TMMBN) 1561 Note: early drafts of AVPF [RFC4585] reserved FMT=2 for a 1562 code point that has later been removed. It has been pointed 1563 out that there may be implementations in the field using this 1564 value in accordance with the expired draft. As there is 1565 sufficient numbering space available, we mark FMT=2 as 1566 reserved so to avoid possible interoperability problems with 1567 any such early implementations. 1569 Available for assignment: 1571 0: unassigned 1572 5-30: unassigned 1574 The following subsection defines the formats of the Feedback Control 1575 Information (FCI) entries for the TMMBR and TMMBN messages 1576 respectively and specify the associated behaviour at the media 1577 sender and receiver. 1579 4.2.1. Temporary Maximum Media Stream Bit Rate Request (TMMBR) 1581 The Temporary Maximum Media Stream Bit Rate Request is identified by 1582 RTCP packet type value PT=RTPFB and FMT=3. 1584 The FCI field of a Temporary Maximum Media Stream Bit-Rate Request 1585 (TMMBR) message SHALL contain one or more FCI entries. 1587 4.2.1.1. Message Format 1589 The Feedback Control Information (FCI) consists of one or more TMMBR 1590 FCI entries with the following syntax: 1592 0 1 2 3 1593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1595 | SSRC | 1596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1597 | MxTBR Exp | MxTBR Mantissa |Measured Overhead| 1598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1600 Figure 2 - Syntax of an FCI entry in the TMMBR message 1602 SSRC (32 bits): The SSRC value of the media sender that is 1603 requested to obey the new maximum bit rate. 1605 MxTBR Exp (6 bits): The exponential scaling of the mantissa for 1606 the maximum total media bit rate value. The value is an 1607 unsigned integer [0..63]. 1609 MxTBR Mantissa (17 bits): The mantissa of the maximum total media 1610 bit rate value as an unsigned integer. 1612 Measured Overhead (9 bits): The measured average packet overhead 1613 value in bytes. The measurement SHALL be done according 1614 to the description in section 4.2.1.2. The value is an 1615 unsigned integer [0..511]. 1617 The maximum total media bit rate (MxTBR) value in bits per second is 1618 calculated from the MxTBR exponent (exp) and mantissa in the 1619 following way: 1621 MxTBR = mantissa * 2^exp 1623 This allows for 17 bits of resolution in the range 0 to 131072*2^63 1624 (approximately 1.2*10^24). 1626 The length of the TMMBR feedback message SHALL be set to 2+2*N where 1627 N is the number of TMMBR FCI entries. 1629 4.2.1.2. Semantics 1631 Behaviour at the Media Receiver (Sender of the TMMBR) 1633 TMMBR is used to indicate a transport related limitation at the 1634 reporting entity acting as a media receiver. TMMBR has the form of 1635 a tuple containing two components. The first value is the highest 1636 bit rate per sender of a media stream, available at a receiver- 1637 chosen protocol layer, which the receiver currently supports in this 1638 RTP session. The second value is the measured header overhead in 1639 bytes as defined in section 2.2 and measured at the chosen protocol 1640 layer in the packets received for the stream. The measurement of 1641 the overhead is a running average that is updated for each packet 1642 received for this particular media source (SSRC), using the 1643 following formula: 1645 avg_OH (new) = 15/16*avg_OH (old) + 1/16*pckt_OH, 1647 where avg_OH is the running (exponentially smoothed) average and 1648 pckt_OH is the overhead observed in the latest packet. 1650 If a maximum bit rate has been negotiated through signaling, the 1651 maximum total media bit rate that the receiver reports in a TMMBR 1652 message MUST NOT exceed the negotiated value converted to a common 1653 basis (i.e. with overheads adjusted to bring it to the same 1654 reference protocol layer). 1656 Within the common packet header for feedback messages (as defined in 1657 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 1658 indicates the source of the request, and the "SSRC of media source" 1659 is not used and SHALL be set to 0. Within a particular TMMBR FCI 1660 entry, the "SSRC of media sender" in the FCI field denotes the media 1661 sender the tuple applies to. This is useful in the multicast or 1662 translator topologies where the reporting entity may address all of 1663 the media senders in a single TMMBR message using multiple FCI 1664 entries. 1666 The media receiver SHALL save the contents of the latest TMMBN 1667 message received from each media sender. 1669 The media receiver MAY send a TMMBR FCI entry to a particular media 1670 sender under the following circumstances: 1672 o before any TMMBN message has been received from that media 1673 sender; 1675 o when the media receiver has been identified as the source of a 1676 bounding tuple within the latest TMMBN message received from 1677 that media sender, and the value of the maximum total media 1678 bit rate or the overhead relating to that media sender has 1679 changed; 1681 o when the media receiver has not been identified as the source 1682 of a bounding tuple within the latest TMMBN message received 1683 from that media sender, and, after the media receiver applies 1684 the incremental algorithm from section 3.5.4.2 or a stricter 1685 equivalent, the media receiver's tuple relating to that media 1686 sender is determined to belong to the bounding set. 1688 A TMMBR FCI entry MAY be repeated in subsequent TMMBR messages if no 1689 Temporary Maximum Media Stream Bit-Rate Notification (TMMBN) FCI has 1690 been received from the media sender at the time of transmission of 1691 the next RTCP packet. The bit rate value of a TMMBR FCI entry MAY 1692 be changed from one TMMBR message to the next. The overhead 1693 measurement SHALL be updated to the current value of avg_OH each 1694 time the entry is sent. 1696 If the value set by a TMMBR message is expected to be permanent, the 1697 TMMBR setting party SHOULD renegotiate the session parameters to 1698 reflect that using session setup signaling, e.g. a SIP re-invite. 1700 Behaviour at the Media Sender (Receiver of the TMMBR) 1702 When it receives a TMMBR message containing an FCI entry relating to 1703 it, the media sender SHALL use an initial or incremental algorithm 1704 as applicable to determine the bounding set of tuples based on the 1705 new information. The algorithm used SHALL be at least as strict as 1706 the corresponding algorithm defined in section 3.5.4.2. The media 1707 sender MAY accumulate TMMBR requests over a small interval (relative 1708 to the RTCP sending interval) before making this calculation. 1710 Once it has determined the bounding set of tuples, the media sender 1711 MAY use any combination of packet rate and net media bit rate within 1712 the feasible region that these tuples describe to produce a lower 1713 total media stream bit rate, as it may need to address a congestion 1714 situation or other limiting factors. See section 5 (congestion 1715 control) for more discussion. 1717 If the media sender concludes that it can increase the maximum total 1718 media bit rate value, it SHALL wait before actually doing so, for a 1719 period long enough to allow a media receiver to respond to the TMMBN 1720 if it determines that its tuple belongs in the bounding set. This 1721 delay period is estimated by the formula: 1723 2 * RTT + T_Dither_Max, 1725 where RTT is the longest round trip time known to the media sender 1726 and T_Dither_Max is defined in section 3.4 of [RFC4585]. Even in 1727 point-to-point sessions a media sender MUST obey to the 1728 aforementioned rule, as it is not guaranteed that a participant is 1729 able to determine correctly whether all the sources are co-located 1730 in a single node, and are coordinated. 1732 A TMMBN message SHALL be sent by the media sender at the earliest 1733 possible point in time, in response to any TMMBR messages received 1734 since the last sending of TMMBN. The TMMBN message indicates the 1735 calculated set of bounding tuples and the owners of those tuples at 1736 the time of the transmission of the message. 1738 An SSRC may time out according to the default rules for RTP session 1739 participants, i.e. the media sender has not received any RTP or RTCP 1740 packets from the owner for the last five regular reporting 1741 intervals. An SSRC may also explicitly leave the session, with the 1742 participant indicating this through the transmission of an RTCP BYE 1743 packet or using an external signaling channel. If the media sender 1744 determines that the owner of a tuple in the bounding set has left 1745 the session, the media sender SHALL transmit a new TMMBN containing 1746 the previously-determined set of bounding tuples but with the tuple 1747 belonging to the departed owner removed. 1749 A media sender MAY proactively initiate the equivalent to a TMMBR 1750 message to itself, when it is aware that its transmission path is 1751 more restrictive than the current limitations. As a result, a TMMBN 1752 indicating the media source itself as the owner of a tuple is being 1753 sent, thereby avoiding unnecessary TMMBR messages from other 1754 participants. However, like any other participant, when the media 1755 sender becomes aware of changed limitations, it is required to 1756 change the tuple, and to send a corresponding TMMBN. 1758 Discussion 1760 Due to the unreliable nature of transport of TMMBR and TMMBN, the 1761 above rules may lead to the sending of TMMBR messages which appear 1762 to disobey those rules. Furthermore, in multicast scenarios it can 1763 happen that more than one "non-owning" session participant may 1764 determine, rightly or wrongly, that its tuple belongs in the 1765 bounding set. This is not critical for a number of reasons: 1767 a) If a TMMBR message is lost in transmission, either the media 1768 sender sends a new TMMBN message in response to some other media 1769 receiver or it does not send a new TMMBN message at all. In the 1770 first case, the media receiver applies the incremental algorithm 1771 and, if it determines that its tuple should be part of the 1772 bounding set, sends out another TMMBR. In the second case, it 1773 repeats the sending of a TMMBR unconditionally. Either way, the 1774 media sender eventually gets the information it needs. 1776 b) Similarly, if a TMMBN message gets lost, the media receiver that 1777 has sent the corresponding TMMBR request does not receive the 1778 notification and is expected to re-send the request and trigger 1779 the transmission of another TMMBN. 1781 c) If multiple competing TMMBR messages are sent by different 1782 session participants, then the algorithm can be applied taking 1783 all of these messages into account, and the resulting TMMBN 1784 provides the participants with an updated view of how their 1785 tuples compare with the bounded set. 1787 d) If more than one session participant happens to send TMMBR 1788 messages at the same time and with the same tuple component 1789 values, it does not matter which of those tuples is taken into 1790 the bounding set. The losing session participant will determine, 1791 after applying the algorithm, that its tuple does not enter the 1792 bounding set, and will therefore stop sending its TMMBR request. 1794 It is important to consider the security risks involved with faked 1795 TMMBRs. See the security considerations in Section 6. 1797 As indicated already, the feedback messages may be used in both 1798 multicast and unicast sessions in any of the specified topologies. 1799 However, for sessions with a large number of participants, using the 1800 lowest common denominator, as required by this mechanism, may not be 1801 the most suitable course of action. Large sessions may need to 1802 consider other ways to adapt the bit rate to participants' 1803 capabilities, such as partitioning the session into different 1804 quality tiers, or using some other method of achieving bit rate 1805 scalability. 1807 4.2.1.3. Timing Rules 1809 The first transmission of the TMMBR request message MAY use early or 1810 immediate feedback in cases when timeliness is desirable. Any 1811 repetition of a request message SHOULD use regular RTCP mode for its 1812 transmission timing. 1814 4.2.1.4. Handling in Translator and Mixers 1816 Media translators and mixers will need to receive and respond to 1817 TMMBR messages as they are part of the chain that provides a certain 1818 media stream to the receiver. The mixer or translator may act 1819 locally on the TMMBR request and thus generate a TMMBN to indicate 1820 that it has done so. Alternatively, in the case of a media 1821 translator it can forward the request, or in the case of a mixer 1822 generate one of its own and pass it forward. In the latter case, 1823 the mixer will need to send a TMMBN back to the original requestor 1824 to indicate that it is handling the request. 1826 4.2.2. Temporary Maximum Media Stream Bit Rate Notification (TMMBN) 1828 The Temporary Maximum Media Stream Bit Rate Notification is 1829 identified by RTCP packet type value PT=RTPFB and FMT=4. 1831 The FCI field of the TMMBN Feedback message may contain zero, one or 1832 more TMMBN FCI entries. 1834 4.2.2.1. Message Format 1836 The Feedback Control Information (FCI) consists of zero, one or more 1837 TMMBN FCI entries with the following syntax: 1839 0 1 2 3 1840 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1842 | SSRC | 1843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1844 | MxTBR Exp | MxTBR Mantissa |Measured Overhead| 1845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1846 Figure 3 - Syntax of an FCI entry in the TMMBN message 1848 SSRC (32 bits): The SSRC value of the "owner" of this tuple. 1850 MxTBR Exp (6 bits): The exponential scaling of the mantissa for 1851 the maximum total media bit rate value. The value is an 1852 unsigned integer [0..63]. 1854 MxTBR Mantissa (17 bits): The mantissa of the maximum total media 1855 bit rate value as an unsigned integer. 1857 Measured Overhead (9 bits): The measured average packet overhead 1858 value in bytes represented as an unsigned integer 1859 [0..511]. 1861 Thus, the FCI within the TMMBN message contains entries indicating 1862 the bounding tuples. For each tuple, the entry gives the owner by 1863 the SSRC, followed by the applicable maximum total media bit rate 1864 and overhead value. 1866 The length of the TMMBN message SHALL be set to 2+2*N where N is the 1867 number of TMMBN FCI entries. 1869 4.2.2.2. Semantics 1871 This feedback message is used to notify the senders of any TMMBR 1872 message that one or more TMMBR messages have been received or that 1873 an owner has left the session. It indicates to all participants the 1874 current set of bounding tuples and the "owners" of those tuples. 1876 Within the common packet header for feedback messages (as defined in 1877 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 1878 indicates the source of the notification. The "SSRC of media 1879 source" is not used and SHALL be set to 0. 1881 A TMMBN message SHALL be scheduled for transmission after the 1882 reception of a TMMBR message with an FCI entry identifying this 1883 media sender. Only a single TMMBN SHALL be sent, even if more than 1884 one TMMBR message is received between the scheduling of the 1885 transmission and the actual transmission of the TMMBN message. The 1886 TMMBN message indicates the bounding tuples and their owners at the 1887 time of transmitting the message. The bounding tuples included 1888 SHALL be the set arrived at through application of the applicable 1889 algorithm of section 3.5.4.2 or an equivalent, applied to the 1890 previous bounding set, if any, and tuples received in TMMBR messages 1891 since the last TMMBN was transmitted. 1893 The reception of a TMMBR message SHALL still result in the 1894 transmission of a TMMBN message even if, after application of the 1895 algorithm, the newly reported TMMBR tuple is not accepted into the 1896 bounding set. In such a case the bounding tuples and their owners 1897 are not changed, unless the TMMBR was from an owner of a tuple 1898 within the previously calculated bounding set. This procedure 1899 allows session participants that did not see the last TMMBN message 1900 to get a correct view of this media sender's state. 1902 As indicated in section 4.2.1.2, when a media sender determines that 1903 an "owner" of a bounding tuple has left the session, then that tuple 1904 is removed from the bounding set, and the media sender SHALL send a 1905 TMMBN message indicating the remaining bounding tuples. If there 1906 are no remaining bounding tuples a TMMBN without any FCI SHALL be 1907 sent to indicate this. Without a remaining bounding tuple, the 1908 maximum media bit rate and maximum packet rate negotiated in session 1909 signaling, if any, apply. 1911 Note: if any media receivers remain in the session, this last will 1912 be a temporary situation. The empty TMMBN will cause every 1913 remaining media receiver to determine that its limitation belongs 1914 in the bounding set and send a TMMBR in consequence. 1916 In unicast scenarios (i.e. where a single sender talks to a single 1917 receiver), the aforementioned algorithm to determine ownership 1918 degenerates to the media receiver becoming the "owner" of the one 1919 bounding tuple as soon as the media receiver has issued the first 1920 TMMBR message. 1922 4.2.2.3. Timing Rules 1924 The TMMBN acknowledgement SHOULD be sent as soon as allowed by the 1925 applied timing rules for the session. Immediate or early feedback 1926 mode SHOULD be used for these messages. 1928 4.2.2.4. Handling by Translators and Mixers 1930 As discussed in Section 4.2.1.4 mixers or translators may need to 1931 issue TMMBN messages as responses to TMMBR messages for SSRC's 1932 handled by them. 1934 4.3. Payload Specific Feedback Messages 1935 As specified by section 6.1 of RFC 4585 [RFC4585], Payload-Specific 1936 FB messages are identified by the RTCP packet type value PSFB (206). 1938 AVPF [RFC4585] defines three payload-specific feedback messages and 1939 one application layer feedback message. This memo specifies four 1940 additional payload-specific feedback messages. All are identified 1941 by means of the FMT parameter as follows: 1943 Assigned in [RFC4585]: 1945 1: Picture Loss Indication (PLI) 1946 2: Slice Lost Indication (SLI) 1947 3: Reference Picture Selection Indication (RPSI) 1948 15: Application layer FB message 1949 31: reserved for future expansion of the number space 1951 Assigned in this memo: 1953 4: Full Intra Request Command (FIR) 1954 5: Temporal-Spatial Trade-off Request (TSTR) 1955 6: Temporal-Spatial Trade-off Notification (TSTN) 1956 7: Video Back Channel Message (VBCM) 1958 Unassigned: 1960 0: unassigned 1961 8-14: unassigned 1962 16-30: unassigned 1964 The following subsections define the new FCI formats for the 1965 payload-specific feedback messages. 1967 4.3.1. Full Intra Request (FIR) 1969 The FIR message is identified by RTCP packet type value PT=PSFB and 1970 FMT=4. 1972 The FCI field MUST contain one or more FIR entries. Each entry 1973 applies to a different media sender, identified by its SSRC. 1975 4.3.1.1. Message Format 1977 The Feedback Control Information (FCI) for the Full Intra Request 1978 consists of one or more FCI entries, the content of which is 1979 depicted in Figure 4. The length of the FIR feedback message MUST 1980 be set to 2+2*N, where N is the number of FCI entries. 1982 0 1 2 3 1983 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1985 | SSRC | 1986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1987 | Seq. nr | Reserved | 1988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1990 Figure 4 - Syntax of an FCI entry in the FIR message 1992 SSRC (32 bits): The SSRC value of the media sender which is 1993 requested to send a decoder refresh point. 1995 Seq. nr (8 bits): Command sequence number. The sequence number 1996 space is unique for each pairing of the SSRC of command 1997 source and the SSRC of the command target. The sequence 1998 number SHALL be increased by 1 modulo 256 for each new 1999 command. A repetition SHALL NOT increase the sequence 2000 number. The initial value is arbitrary. 2002 Reserved (24 bits): All bits SHALL be set to 0 by the sender and 2003 SHALL be ignored on reception. 2005 The semantics of this feedback message is independent of the RTP 2006 payload type. 2008 4.3.1.2. Semantics 2010 Within the common packet header for feedback messages (as defined in 2011 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 2012 indicates the source of the request, and the "SSRC of media source" 2013 is not used and SHALL be set to 0. The SSRCs of the media senders 2014 to which the FIR command applies are in the corresponding FCI 2015 entries. A FIR message MAY contain requests to multiple media 2016 senders, using one FCI entry per target media sender. 2018 Upon reception of FIR, the encoder MUST send a decoder refresh point 2019 (see section 2.2) as soon as possible. 2021 The sender MUST consider congestion control as outlined in section 2022 5, which MAY restrict its ability to send a decoder refresh point 2023 quickly. 2025 FIR SHALL NOT be sent as a reaction to picture losses -- it is 2026 RECOMMENDED to use PLI [RFC4585] instead. FIR SHOULD be used only 2027 in situations where not sending a decoder refresh point would render 2028 the video unusable for the users. 2030 A typical example where sending FIR is appropriate is when, in a 2031 multipoint conference, a new user joins the session and no regular 2032 decoder refresh point interval is established. Another example 2033 would be a video switching MCU that changes streams. Here, 2034 normally, the MCU issues a FIR to the new sender so to force it to 2035 emit a decoder refresh point. The decoder refresh point normally 2036 includes a Freeze Picture Release (defined outside this 2037 specification), which re-starts the rendering process of the 2038 receivers. Both techniques mentioned are commonly used in MCU-based 2039 multipoint conferences. 2041 Other RTP payload specifications such as RFC 2032 [RFC2032] already 2042 define a feedback mechanism for certain codecs. An application 2043 supporting both schemes MUST use the feedback mechanism defined in 2044 this specification when sending feedback. For backward 2045 compatibility reasons such an application SHOULD also be capable of 2046 receiving and reacting to the feedback scheme defined in the 2047 respective RTP payload format, if this is required by that payload 2048 format. 2050 4.3.1.3. Timing Rules 2052 The timing follows the rules outlined in section 3 of [RFC4585]. 2053 FIR commands MAY be used with early or immediate feedback. The FIR 2054 feedback message MAY be repeated. If using immediate feedback mode 2055 the repetition SHOULD wait at least one RTT before being sent. In 2056 early or regular RTCP mode the repetition is sent in the next 2057 regular RTCP packet. 2059 4.3.1.4. Handling of FIR Message in Mixer and Translators 2061 A media translator or a mixer performing media encoding of the 2062 content for which the session participant has issued a FIR is 2063 responsible for acting upon it. A mixer acting upon a FIR SHOULD 2064 NOT forward the message unaltered; instead it SHOULD issue a FIR 2065 itself. 2067 4.3.1.5. Remarks 2068 Currently, video appears to be the only useful application for FIR, 2069 as it appears to be the only RTP payload widely deployed that relies 2070 heavily on media prediction across RTP packet boundaries. However, 2071 use of FIR could also reasonably be envisioned for other media types 2072 that share essential properties with compressed video, namely cross- 2073 frame prediction (whatever a frame may be for that media type). One 2074 possible example may be the dynamic updates of MPEG-4 scene 2075 descriptions. It is suggested that payload formats for such media 2076 types refer to FIR and other message types defined in this 2077 specification and in AVPF [RFC4585], instead of creating similar 2078 mechanisms in the payload specifications. The payload 2079 specifications may have to explain how the payload-specific 2080 terminologies map to the video-centric terminology used herein. 2082 In conjunction with video codecs, FIR messages typically trigger the 2083 sending of full intra or IDR pictures. Both are several times 2084 larger then predicted (inter) pictures. Their size is independent 2085 of the time they are generated. In most environments, especially 2086 when employing bandwidth-limited links, the use of an intra picture 2087 implies an allowed delay that is a significant multiple of the 2088 typical frame duration. An example: if the sending frame rate is 10 2089 fps, and an intra picture is assumed to be 10 times as big as an 2090 inter picture, then a full second of latency has to be accepted. In 2091 such an environment there is no need for a particularly short delay 2092 in sending the FIR message. Hence, waiting for the next possible 2093 time slot allowed by RTCP timing rules as per [RFC4585] should not 2094 have an overly negative impact on the system performance. 2096 Mandating a maximum delay for completing the sending of a decoder 2097 refresh point would be desirable from an application viewpoint, but 2098 is problematic from a congestion control point of view. "As soon as 2099 possible" as mentioned above appears to be a reasonable compromise. 2101 In environments where the sender has no control over the codec (e.g. 2102 when streaming pre-recorded and pre-coded content), the reaction to 2103 this command cannot be specified. One suitable reaction of a sender 2104 would be to skip forward in the video bit stream to the next decoder 2105 refresh point. In other scenarios, it may be preferable not to 2106 react to the command at all, e.g. when streaming to a large 2107 multicast group. Other reactions may also be possible. When 2108 deciding on a strategy, a sender could take into account factors 2109 such as the size of the receiving group, the "importance" of the 2110 sender of the FIR message (however "importance" may be defined in 2111 this specific application), the frequency of decoder refresh points 2112 in the content, and so on. However, a session which predominately 2113 handles pre-coded content is not expected to use FIR at all. 2115 The relationship between the Picture Loss Indication and FIR is as 2116 follows. As discussed in section 6.3.1 of AVPF [RFC4585], a Picture 2117 Loss Indication informs the decoder about the loss of a picture and 2118 hence the likelihood of misalignment of the reference pictures 2119 between the encoder and decoder. Such a scenario is normally 2120 related to losses in an ongoing connection. In point-to-point 2121 scenarios, and without the presence of advanced error resilience 2122 tools, one possible option for an encoder consists in sending a 2123 decoder refresh point. However, there are other options. One 2124 example is that the media sender ignores the PLI, because the 2125 embedded stream redundancy is likely to clean up the reproduced 2126 picture within a reasonable amount of time. The FIR, in contrast, 2127 leaves a (real-time) encoder no choice but to send a decoder refresh 2128 point. It does not allow the encoder to take into account any 2129 considerations such as the ones mentioned above. 2131 4.3.2. Temporal-Spatial Trade-off Request (TSTR) 2133 The TSTR feedback message is identified by RTCP packet type value 2134 PT=PSFB and FMT=5. 2136 The FCI field MUST contain one or more TSTR FCI entries. 2138 4.3.2.1. Message Format 2140 The content of the FCI entry for the Temporal-Spatial Trade-off 2141 Request is depicted in Figure 5. The length of the feedback message 2142 MUST be set to 2+2*N, where N is the number of FCI entries included. 2144 0 1 2 3 2145 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2147 | SSRC | 2148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2149 | Seq nr. | Reserved | Index | 2150 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2152 Figure 5 - Syntax of an FCI Entry in the TSTR Message 2154 SSRC (32 bits): The SSRC of the media sender which is requested to 2155 apply the tradeoff value given in Index. 2157 Seq. nr (8 bits): Request sequence number. The sequence number 2158 space is unique for pairing of the SSRC of request source 2159 and the SSRC of the request target. The sequence number 2160 SHALL be increased by 1 modulo 256 for each new command. 2161 A repetition SHALL NOT increase the sequence number. The 2162 initial value is arbitrary. 2164 Reserved (19 bits): All bits SHALL be set to 0 by the sender and 2165 SHALL be ignored on reception. 2167 Index (5 bits): An integer value between 0 and 31 that indicates 2168 the relative trade-off that is requested. An index value 2169 of 0 indicates highest possible spatial quality, while 31 2170 indicates highest possible temporal resolution. 2172 4.3.2.2. Semantics 2174 A decoder can suggest a temporal-spatial trade-off level by sending 2175 a TSTR message to an encoder. If the encoder is capable of 2176 adjusting its temporal-spatial trade-off, it SHOULD take into 2177 account the received TSTR message for future coding of pictures. A 2178 value of 0 suggests a high spatial quality and a value of 31 2179 suggests a high frame rate. The progression of values from 0 to 31 2180 indicate monotonically a desire for higher frame rate. The index 2181 values do not correspond to precise values of spatial quality or 2182 frame rate. 2184 The reaction to the reception of more than one TSTR message by a 2185 media sender from different media receivers is left open to the 2186 implementation. The selected trade-off SHALL be communicated to the 2187 media receivers by the means of the TSTN message. 2189 Within the common packet header for feedback messages (as defined in 2190 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 2191 indicates the source of the request, and the "SSRC of media source" 2192 is not used and SHALL be set to 0. The SSRCs of the media senders 2193 to which the TSTR applies are in the corresponding FCI entries. 2195 A TSTR message MAY contain requests to multiple media senders, using 2196 one FCI entry per target media sender. 2198 4.3.2.3. Timing Rules 2200 The timing follows the rules outlined in section 3 of [RFC4585]. 2201 This request message is not time critical and SHOULD be sent using 2202 regular RTCP timing. Only if it is known that the user interface 2203 requires quick feedback, the message MAY be sent with early or 2204 immediate feedback timing. 2206 4.3.2.4. Handling of message in Mixers and Translators 2208 A mixer or media translator that encodes content sent to the session 2209 participant issuing the TSTR SHALL consider the request to determine 2210 if it can fulfill it by changing its own encoding parameters. A 2211 media translator unable to fulfill the request MAY forward the 2212 request unaltered towards the media sender. A mixer encoding for 2213 multiple session participants will need to consider the joint needs 2214 of these participants before generating a TSTR on its own behalf 2215 towards the media sender. See also the discussion in Section 3.5.2. 2217 4.3.2.5. Remarks 2219 The term "spatial quality" does not necessarily refer to the 2220 resolution as measured by the number of pixels the reconstructed 2221 video is using. In fact, in most scenarios the video resolution 2222 stays constant during the lifetime of a session. However, all video 2223 compression standards have means to adjust the spatial quality at a 2224 given resolution, often influenced by the Quantizer Parameter or QP. 2225 A numerically low QP results in a good reconstructed picture 2226 quality, whereas a numerically high QP yields a coarse picture. The 2227 typical reaction of an encoder to this request is to change its rate 2228 control parameters to use a lower frame rate and a numerically lower 2229 (on average) QP, or vice versa. The precise mapping of Index value 2230 to frame rate and QP is intentionally left open here, as it depends 2231 on factors such as the compression standard employed, spatial 2232 resolution, content, bit rate, and so on. 2234 4.3.3. Temporal-Spatial Trade-off Notification (TSTN) 2236 The TSTN message is identified by RTCP packet type value PT=PSFB and 2237 FMT=6. 2239 The FCI field SHALL contain one or more TSTN FCI entries. 2241 4.3.3.1. Message Format 2243 The content of an FCI entry for the Temporal-Spatial Trade-off 2244 Notification is depicted in Figure 6. The length of the TSTN 2245 message MUST be set to 2+2*N, where N is the number of FCI entries. 2247 0 1 2 3 2248 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2250 | SSRC | 2251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2252 | Seq nr. | Reserved | Index | 2253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2255 Figure 6 - Syntax of the TSTN 2257 SSRC (32 bits): The SSRC of the source of the TSTR request which 2258 resulted in this Notification. 2260 Seq. nr (8 bits): The sequence number value from the TSTR request 2261 that is being acknowledged. 2263 Reserved (19 bits): All bits SHALL be set to 0 by the sender and 2264 SHALL be ignored on reception. 2266 Index (5 bits): The trade-off value the media sender is using 2267 henceforth. 2269 Informative note: The returned trade-off value (Index) may differ 2270 from the requested one, for example in cases where a media encoder 2271 cannot tune its trade-off, or when pre-recorded content is used. 2273 4.3.3.2. Semantics 2275 This feedback message is used to acknowledge the reception of a 2276 TSTR. For each TSTR received targeted at the session participant, a 2277 TSTN entry SHALL be sent included in a TSTN feedback message. A 2278 single TSTN message MAY acknowledge multiple requests using multiple 2279 FCI entries. The index value included SHALL be the same in all FCI 2280 entries of the TSTN message. Including a FCI for each requestor 2281 allows each requesting entity to determine that the media sender 2282 received the request. The Notification SHALL also be sent in 2283 response to TSTR repetitions received. If the request receiver has 2284 received TSTR with several different sequence numbers from a single 2285 requestor it SHALL only respond to the request with the highest 2286 (modulo 256) sequence number. Note that the highest sequence number 2287 may be a smaller integer value due to the wrapping of the field. 2288 Section A.1 of [RFC3550] has an algorithm for keeping track of the 2289 highest received sequence number for RTP packets, this could be 2290 adapted for this usage. 2292 The TSTN SHALL include the Temporal-Spatial Trade-off index that 2293 will be used as a result of the request. This is not necessarily 2294 the same index as requested, as the media sender may need to 2295 aggregate requests from several requesting session participants. It 2296 may also have some other policies or rules that limit the selection. 2298 Within the common packet header for feedback messages (as defined in 2299 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 2300 indicates the source of the Notification, and the "SSRC of media 2301 source" is not used and SHALL be set to 0. The SSRCs of the 2302 requesting entities to which the Notification applies are in the 2303 corresponding FCI entries. 2305 4.3.3.3. Timing Rules 2307 The timing follows the rules outlined in section 3 of [RFC4585]. 2308 This acknowledgement message is not extremely time critical and 2309 SHOULD be sent using regular RTCP timing. 2311 4.3.3.4. Handling of TSTN in Mixer and Translators 2313 A mixer or translator that acts upon a TSTR SHALL also send the 2314 corresponding TSTN. In cases where it needs to forward a TSTR 2315 itself the notification message MAY need to be delayed until the 2316 TSTR has been responded to. 2318 4.3.3.5. Remarks 2320 None 2322 4.3.4. H.271 Video Back Channel Message (VBCM) 2324 The VBCM is identified by RTCP packet type value PT=PSFB and FMT=7. 2326 The FCI field MUST contain one or more VBCM FCI entries. 2328 4.3.4.1. Message Format 2330 The syntax of an FCI entry within the VBCM indication is depicted in 2331 Figure 7. 2333 0 1 2 3 2334 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2336 | SSRC | 2337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2338 | Seq. nr |0| Payload Type| Length | 2339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2340 | VBCM Octet String.... | Padding | 2341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2343 Figure 7 - Syntax of an FCI Entry in the VBCM Message 2345 SSRC (32 bits): The SSRC value of the media sender that is requested 2346 to instruct its encoder to react to the VBCM message 2348 Seq. nr (8 bits): Command sequence number. The sequence number 2349 space is unique for pairing of the SSRC of command source and 2350 the SSRC of the command target. The sequence number SHALL be 2351 increased by 1 modulo 256 for each new command. A repetition 2352 SHALL NOT increase the sequence number. The initial value is 2353 arbitrary. 2355 0: Must be set to 0 by the sender and should not be acted upon by 2356 the message receiver. 2358 Payload Type (7 bits): The RTP payload type for which the VBCM bit 2359 stream must be interpreted. 2361 Length (16 bits): The length of the VBCM octet string in octets 2362 exclusive of any padding octets 2364 VBCM Octet String (Variable length): This is the octet string 2365 generated by the decoder carrying a specific feedback sub- 2366 message. 2368 Padding (Variable length): Bits set to 0 to make up a 32 bit 2369 boundary. 2371 4.3.4.2. Semantics 2373 The "payload" of the VBCM indication carries different types of 2374 codec-specific, feedback information. The type of feedback 2375 information can be classified as a 'status report' (such as an 2376 indication that a bit stream was received without errors, or that a 2377 partial or complete picture or block was lost) or 'update requests' 2378 (such as complete refresh of the bit stream). 2380 Note: There are possible overlaps between the VBCM sub- 2381 messages and CCM/AVPF feedback messages, such as FIR. Please 2382 see section 3.5.3 for further discussion. 2384 The different types of feedback sub-messages carried in the VBCM are 2385 indicated by the "payloadType" as defined in [VBCM]. These sub- 2386 message types are reproduced below for convenience. "payloadType", 2387 in ITU-T Rec. H.271 terminology, refers to the sub-type of the H.271 2388 message and should not be confused with an RTP payload type. 2390 Payload Message Content 2391 Type 2392 -------------------------------------------------------------------- 2393 0 One or more pictures without detected bit stream error 2394 mismatch 2395 1 One or more pictures that are entirely or partially lost 2396 2 A set of blocks of one picture that is entirely or partially 2397 lost 2398 3 CRC for one parameter set 2399 4 CRC for all parameter sets of a certain type 2400 5 A "reset" request indicating that the sender should 2401 completely 2402 refresh the video bit stream as if no prior bit stream data 2403 had been received 2404 > 5 Reserved for future use by ITU-T 2406 Table 2: H.271 message types ("payloadTypes") 2408 The bit string or the "payload" of a VBCM message is of variable 2409 length and is self-contained and coded in a variable length, binary 2410 format. The media sender necessarily has to be able to parse this 2411 optimized binary format to make use of VBCM messages. 2413 Each of the different types of sub-messages (indicated by 2414 payloadType) may have different semantics depending on the codec 2415 used. 2417 Within the common packet header for feedback messages (as defined in 2418 section 6.1 of [RFC4585]), the "SSRC of the packet sender" field 2419 indicates the source of the request, and the "SSRC of media source" 2420 is not used and SHALL be set to 0. The SSRCs of the media senders 2421 to which the VBCM message applies to are in the corresponding FCI 2422 entries. The sender of the VBCM message MAY send H.271 messages to 2423 multiple media senders and MAY send more than one H.271 message to 2424 the same media sender within the same VBCM message. 2426 4.3.4.3. Timing Rules 2428 The timing follows the rules outlined in section 3 of [RFC4585]. 2429 The different sub-message types may have different properties in 2430 regards to the timing of messages that should be used. If several 2431 different types are included in the same feedback packet then the 2432 requirements for the sub-message type with the most stringent 2433 requirements should be followed. 2435 4.3.4.4. Handling of message in Mixer or Translator 2437 The handling of VBCM in a mixer or translator is sub-message type 2438 dependent. 2440 4.3.4.5. Remarks 2442 Please see section 3.5.3 for a discussion of the usage of H.271 2443 messages and messages defined in AVPF [RFC4585] and this memo with 2444 similar functionality. 2446 Note: There has been some discussion whether the RTP payload type 2447 field in this message is needed. It will be needed if there is 2448 potentially more than one VBCM-capable RTP payload type in the 2449 same session, and the semantics of a given VBCM message changes 2450 between payload types. For example, the picture identification 2451 mechanism in messages of H.271 type 0 is fundamentally different 2452 between H.263 and H.264 (although both use the same syntax). 2453 Therefore, the payload field is justified here. There was a 2454 further comment that for TSTR and FIR such a need does not exist, 2455 because the semantics of TSTR and FIR are either loosely enough 2456 defined, or generic enough, to apply to all video payloads 2457 currently in existence/envisioned. 2459 5. Congestion Control 2461 The correct application of the AVPF [RFC4585] timing rules prevents 2462 the network from being flooded by feedback messages. Hence, 2463 assuming a correct implementation and configuration, the RTCP 2464 channel cannot break its bit rate commitment and introduce 2465 congestion. 2467 The reception of some of the feedback messages modifies the 2468 behaviour of the media senders or, more specifically, the media 2469 encoders. Thus, modified behaviour MUST respect the bandwidth 2470 limits that the application of congestion control provides. For 2471 example, when a media sender is reacting to a FIR, the unusually 2472 high number of packets that form the decoder refresh point have to 2473 be paced in compliance with the congestion control algorithm, even 2474 if the user experience suffers from a slowly transmitted decoder 2475 refresh point. 2477 A change of the Temporary Maximum Media Stream Bit Rate value can 2478 only mitigate congestion, but not cause congestion as long as 2479 congestion control is also employed. An increase of the value by a 2480 request REQUIRES the media sender to use congestion control when 2481 increasing its transmission rate to that value. A reduction of the 2482 value results in a reduced transmission bit rate, thus reducing the 2483 risk for congestion. 2485 6. Security Considerations 2487 The defined messages have certain properties that have security 2488 implications. These must be addressed and taken into account by 2489 users of this protocol. 2491 The defined setup signaling mechanism is sensitive to modification 2492 attacks that can result in session creation with sub-optimal 2493 configuration, and, in the worst case, session rejection. To 2494 prevent this type of attack, authentication and integrity protection 2495 of the setup signaling is required. 2497 Spoofed or maliciously created feedback messages of the type defined 2498 in this specification can have the following implications: 2500 a. severely reduced media bit rate due to false TMMBR messages 2501 that sets the maximum to a very low value; 2503 b. assignment of the ownership of a bounding tuple to the wrong 2504 participant within a TMMBN message, potentially causing 2505 unnecessary oscillation in the bounding set as the mistakenly 2506 identified owner reports a change in its tuple and the true 2507 owner possibly holds back on changes until a correct TMMBN 2508 message reaches the participants; 2510 c. sending TSTR requests that result in a video quality 2511 different from the user's desire, rendering the session less 2512 useful; 2514 d. sending multiple FIR commands to reduce the frame-rate, and 2515 make the video jerky, due to the frequent usage of decoder 2516 refresh points. 2518 To prevent these attacks there is a need to apply authentication and 2519 integrity protection of the feedback messages. This can be 2520 accomplished against threats external to the current RTP session 2521 using the RTP profile that combines SRTP [SRTP] and AVPF into SAVPF 2522 [SAVPF]. In the mixer cases, separate security contexts and 2523 filtering can be applied between the mixer and the participants, 2524 thus protecting other users on the mixer from a misbehaving 2525 participant. 2527 7. SDP Definitions 2529 Section 4 of [RFC4585] defines a new SDP [RFC4566] attribute, rtcp- 2530 fb, that may be used to negotiate the capability to handle specific 2531 AVPF commands and indications, such as Reference Picture Selection, 2532 Picture Loss Indication etc. The ABNF for rtcp-fb is described in 2533 section 4.2 of [RFC4585]. In this section we extend the rtcp-fb 2534 attribute to include the commands and indications that are described 2535 for codec control in the present document. We also discuss the 2536 Offer/Answer implications for the codec control commands and 2537 indications. 2539 7.1. Extension of the rtcp-fb Attribute 2541 As described in AVPF [RFC4585], the rtcp-fb attribute indicates the 2542 capability of using RTCP feedback. AVPF specifies that the rtcp-fb 2543 attribute must only be used as a media level attribute and must not 2544 be provided at session level. All the rules described in [RFC4585] 2545 for rtcp-fb attribute relating to payload type and to multiple rtcp- 2546 fb attributes in a session description also apply to the new 2547 feedback messages defined in this memo. 2549 The ABNF [RFC4234] for rtcp-fb as defined in [RFC4585] is 2551 "a=rtcp-fb: " rtcp-fb-pt SP rtcp-fb-val CRLF 2553 where rtcp-fb-pt is the payload type and rtcp-fb-val defines the 2554 type of the feedback message such as ack, nack, trr-int and rtcp-fb- 2555 id. For example, to indicate the support of feedback of picture 2556 loss indication, the sender declares the following in SDP 2558 v=0 2559 o=alice 3203093520 3203093520 IN IP4 host.example.com 2560 s=Media with feedback 2561 t=0 0 2562 c=IN IP4 host.example.com 2563 m=audio 49170 RTP/AVPF 98 2564 a=rtpmap:98 H263-1998/90000 2565 a=rtcp-fb:98 nack pli 2567 In this document we define a new feedback value "ccm" which 2568 indicates the support of codec control using RTCP feedback messages. 2569 The "ccm" feedback value SHOULD be used with parameters that 2570 indicate the specific codec control commands supported. In this 2571 draft we define four such parameters, namely: 2573 o "fir" indicates support of the Full Intra Request (FIR). 2574 o "tmmbr" indicates support of the Temporary Maximum Media Stream 2575 Bit Rate Request/Notification (TMMBR/TMMBN). It has an 2576 optional sub parameter to indicate the session maximum packet 2577 rate (measured in packets per second) to be used. If not 2578 included this defaults to infinity. 2579 o "tstr" indicates support of the Temporal-Spatial Trade-off 2580 Request/Notification (TSTR/TSTN). 2581 O "vbcm" indicates support of H.271 video back channel messages 2582 (VBCM). It has zero or more subparameters identifying the 2583 supported H.271 "payloadType" values. 2585 In the ABNF for rtcp-fb-val defined in [RFC4585], there is a 2586 placeholder called rtcp-fb-id to define new feedback types. "ccm" 2587 is defined as a new feedback type in this document and the ABNF for 2588 the parameters for ccm are defined here (please refer to section 4.2 2589 of [RFC4585] for complete ABNF syntax). 2591 rtcp-fb-param = SP "app" [SP byte-string] 2592 / SP rtcp-fb-ccm-param 2593 / ; empty 2595 rtcp-fb-ccm-param = "ccm" SP ccm-param 2597 ccm-param = "fir" ; Full Intra Request 2598 / "tmmbr" [SP "smaxpr=" MaxPacketRateValue] 2599 ; Temporary max media bit rate 2600 / "tstr" ; Temporal Spatial Trade Off 2601 / "vbcm" *(SP subMessageType) ; H.271 VBCM messages 2602 / token [SP byte-string] 2603 ; for future commands/indications 2604 subMessageType = 1*8DIGIT 2605 byte-string = 2606 MaxPacketRateValue = 1*15DIGIT 2608 7.2. Offer-Answer 2610 The Offer/Answer [RFC3264] implications for codec control protocol 2611 feedback messages are similar to those described in [RFC4585]. The 2612 offerer MAY indicate the capability to support selected codec 2613 commands and indications. The answerer MUST remove all ccm 2614 parameters corresponding to the CCM messages that it does not wish 2615 to support in this particular media session (for example because it 2616 does not implement the message in question, or because its 2617 application logic suggests the support of the message adds no 2618 value). The answerer MUST NOT add new ccm parameters in addition to 2619 what has been offered. The answer is binding for the media session 2620 and both offerer and answerer MUST NOT use any feedback messages 2621 other than what both sides have explicitly indicated as being 2622 supported. In others words only the joint subset of CCM parameters 2623 from the offer and answer may be used. 2625 Note, that including a CCM parameter in an offer or answer indicates 2626 that the party (offerer or answerer) is at least capable of 2627 receiving the corresponding CCM message(s) and act upon them. In 2628 cases when the reception of a negotiated CCM messages mandates the 2629 party to respond with another CCM message, it must also have that 2630 capability. Although it is not mandated to initiate CCM messages of 2631 any negotiated type, it is generally expected that an party will 2632 initiate CCM messages when appropriate. 2634 The session maximum packet rate parameter part of the TMMBR 2635 indication is declarative and everyone SHALL use the highest value 2636 indicated in a response. If the session maximum packet rate 2637 parameter is not present in an offer it SHALL NOT be included by the 2638 answerer. 2640 7.3. Examples 2642 Example 1: The following SDP describes a point-to-point video call 2643 with H.263, with the originator of the call declaring its capability 2644 to support the FIR and TSTR/TSTN codec control messages. The SDP is 2645 carried in a high level signaling protocol like SIP. 2647 v=0 2648 o=alice 3203093520 3203093520 IN IP4 host.example.com 2649 s=Point-to-Point call 2650 c=IN IP4 192.0.2.124 2651 m=audio 49170 RTP/AVP 0 2652 a=rtpmap:0 PCMU/8000 2653 m=video 51372 RTP/AVPF 98 2654 a=rtpmap:98 H263-1998/90000 2655 a=rtcp-fb:98 ccm tstr 2656 a=rtcp-fb:98 ccm fir 2658 In the above example, when the sender receives a TSTR message from 2659 the remote party it is capable of adjusting the trade off as 2660 indicated in the RTCP TSTN feedback message. 2662 Example 2: The following SDP describes a SIP end point joining a 2663 video mixer that is hosting a multiparty video conferencing session. 2664 The participant supports only the FIR (Full Intra Request) codec 2665 control command and it declares it in its session description. 2667 v=0 2668 o=alice 3203093520 3203093520 IN IP4 host.example.com 2669 s=Multiparty Video Call 2670 c=IN IP4 192.0.2.124 2671 m=audio 49170 RTP/AVP 0 2672 a=rtpmap:0 PCMU/8000 2673 m=video 51372 RTP/AVPF 98 2674 a=rtpmap:98 H263-1998/90000 2675 a=rtcp-fb:98 ccm fir 2677 When the video MCU decides to route the video of this participant it 2678 sends an RTCP FIR feedback message. Upon receiving this feedback 2679 message the end point is required to generate a full intra request. 2681 Example 3: The following example describes the Offer/Answer 2682 implications for the codec control messages. The Offerer wishes to 2683 support "tstr", "fir" and "tmmbr". The offered SDP is 2685 -------------> Offer 2686 v=0 2687 o=alice 3203093520 3203093520 IN IP4 host.example.com 2688 s=Offer/Answer 2689 c=IN IP4 192.0.2.124 2690 m=audio 49170 RTP/AVP 0 2691 a=rtpmap:0 PCMU/8000 2692 m=video 51372 RTP/AVPF 98 2693 a=rtpmap:98 H263-1998/90000 2694 a=rtcp-fb:98 ccm tstr 2695 a=rtcp-fb:98 ccm fir 2696 a=rtcp-fb:* ccm tmmbr smaxpr=120 2698 The answerer wishes to support only the FIR and TSTR/TSTN messages 2699 and the answerer SDP is 2701 <---------------- Answer 2703 v=0 2704 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 2705 s=Offer/Answer 2706 c=IN IP4 192.0.2.37 2707 m=audio 47190 RTP/AVP 0 2708 a=rtpmap:0 PCMU/8000 2709 m=video 53273 RTP/AVPF 98 2710 a=rtpmap:98 H263-1998/90000 2711 a=rtcp-fb:98 ccm tstr 2712 a=rtcp-fb:98 ccm fir 2714 Example 4: The following example describes the Offer/Answer 2715 implications for H.271 Video back channel messages (VBCM). The 2716 Offerer wishes to support VBCM and the sub-messages of payloadType 1 2717 (one or more pictures that are entirely or partially lost) and 2 (a 2718 set of blocks of one picture that are entirely or partially lost). 2720 -------------> Offer 2721 v=0 2722 o=alice 3203093520 3203093520 IN IP4 host.example.com 2723 s=Offer/Answer 2724 c=IN IP4 192.0.2.124 2725 m=audio 49170 RTP/AVP 0 2726 a=rtpmap:0 PCMU/8000 2727 m=video 51372 RTP/AVPF 98 2728 a=rtpmap:98 H263-1998/90000 2729 a=rtcp-fb:98 ccm vbcm 1 2 2731 The answerer only wishes to support sub-messages of type 1 only 2733 <---------------- Answer 2735 v=0 2736 o=alice 3203093520 3203093524 IN IP4 otherhost.example.com 2737 s=Offer/Answer 2738 c=IN IP4 192.0.2.37 2739 m=audio 47190 RTP/AVP 0 2740 a=rtpmap:0 PCMU/8000 2741 m=video 53273 RTP/AVPF 98 2742 a=rtpmap:98 H263-1998/90000 2743 a=rtcp-fb:98 ccm vbcm 1 2745 So, in the above example, only VBCM indications comprised of 2746 "payloadType" 1 will be supported. 2748 8. IANA Considerations 2750 The new value "ccm" needs to be registered with IANA in the "rtcp- 2751 fb" Attribute Values registry located at the time of publication at: 2752 http://www.iana.org/assignments/sdp-parameters 2754 Value name: ccm 2755 Long Name: Codec Control Commands and Indications 2756 Reference: RFC XXXX 2758 A new registry "Codec Control Messages" needs to be created to hold 2759 "ccm" parameters located at time of publication at: 2760 http://www.iana.org/assignments/sdp-parameters 2762 New registration in this registry follows the "Specification 2763 required" policy as defined by [RFC2434]. In addition they are 2764 required to indicate which, if any additional RTCP feedback types, 2765 such as "nack", "ack". 2767 The initial content of the registry is the following values: 2769 Value name: fir 2770 Long name: Full Intra Request Command 2771 Usable with: ccm 2772 Reference: RFC XXXX 2774 Value name: tmmbr 2775 Long name: Temporary Maximum Media Stream Bit Rate 2776 Usable with: ccm 2777 Reference: RFC XXXX 2779 Value name: tstr 2780 Long name: temporal Spatial Trade Off 2781 Usable with: ccm 2782 Reference: RFC XXXX 2784 Value name: vbcm 2785 Long name: H.271 video back channel messages 2786 Usable with: ccm 2787 Reference: RFC XXXX 2789 The following values need to be registered as FMT values in the "FMT 2790 Values for RTPFB Payload Types" registry located at the time of 2791 publication at: http://www.iana.org/assignments/rtp-parameters 2792 RTPFB range 2793 Name Long Name Value Reference 2794 -------------- --------------------------------- ----- --------- 2795 Reserved 2 [RFCxxxx] 2796 TMMBR Temporary Maximum Media Stream Bit 3 [RFCxxxx] 2797 Rate Request 2798 TMMBN Temporary Maximum Media Stream Bit 4 [RFCxxxx] 2799 Rate Notification 2801 The following values need to be registered as FMT values in the "FMT 2802 Values for PSFB Payload Types" registry located at the time of 2803 publication at: http://www.iana.org/assignments/rtp-parameters 2805 PSFB range 2806 Name Long Name Value Reference 2807 -------------- --------------------------------- ----- ------- 2808 FIR Full Intra Request Command 4 [RFCxxxx] 2809 TSTR Temporal-Spatial Trade-off Request 5 [RFCxxxx] 2810 TSTN Temporal-Spatial Trade-off Notification 6 [RFCxxxx] 2811 VBCM Video Back Channel Message 7 [RFCxxxx] 2813 9. Contributors 2815 Tom Taylor has made a very significant contribution, for which the 2816 authors are very grateful, to this specification by helping rewrite 2817 the specification. Especially the parts regarding the algorithm for 2818 determining bounding sets for TMMBR have benefited. 2820 10. Acknowledgements 2822 The authors would like to thank Andrea Basso, Orit Levin, Nermeen 2823 Ismail for their work on the requirement and discussion draft 2824 [Basso]. 2826 Drafts of this memo were reviewed and extensively commented by Roni 2827 Even, Colin Perkins, Randell Jesup, Keith Lantz, Harikishan 2828 Desineni, Guido Franceschini and others. The authors appreciate 2829 these reviews. 2831 Funding for the RFC Editor function is currently provided by the 2832 Internet Society. 2834 11. References 2836 11.1. Normative references 2838 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., Rey, J., 2839 "Extended RTP Profile for Real-Time Transport Control 2840 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 2841 July 2006 2842 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2843 Requirement Levels", BCP 14, RFC 2119, March 1997. 2844 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 2845 Jacobson, "RTP: A Transport Protocol for Real-Time 2846 Applications", STD 64, RFC 3550, July 2003. 2847 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 2848 Description Protocol", RFC 4566, July 2006. 2849 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 2850 with Session Description Protocol (SDP)", RFC 3264, June 2851 2002. 2852 [RFC2434] Narten, T. and H. Alvestrand, "Guidelines for Writing an 2853 IANA Considerations Section in RFCs", BCP 26, RFC 2434, 2854 October 1998. 2855 [RFC4234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 2856 Specifications: ABNF", RFC 4234, October 2005. 2858 11.2. Informative references 2860 [Basso] A. Basso, et. al., "Requirements for transport of video 2861 control commands", draft-basso-avt-videoconreq-02.txt, 2862 expired Internet Draft, October 2004. 2863 [AVC] Joint Video Team of ITU-T and ISO/IEC JTC 1, Draft ITU-T 2864 Recommendation and Final Draft International Standard of 2865 Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 2866 14496-10 AVC), Joint Video Team (JVT) of ISO/IEC MPEG 2867 and ITU-T VCEG, JVT-G050, March 2003. 2868 [H245] ITU-T Rec. HG.245, "Control protocol for multimedia 2869 communication", MAY 2006 2870 [NEWPRED] S. Fukunaga, T. Nakai, and H. Inoue, "Error Resilient 2871 Video Coding by Dynamic Replacing of Reference 2872 Pictures," in Proc. Globcom'96, vol. 3, pp. 1503 - 1508, 2873 1996. 2874 [SRTP] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and 2875 K. Norrman, "The Secure Real-time Transport Protocol 2876 (SRTP)", RFC 3711, March 2004. 2877 [RFC2032] Turletti, T. and C. Huitema, "RTP Payload Format for 2878 H.261 Video Streams", RFC 2032, October 1996. 2880 [SAVPF] J. Ott, E. Carrara, "Extended Secure RTP Profile for 2881 RTCP-based Feedback (RTP/SAVPF)," draft-ietf-avt- 2882 profile-savpf-11.txt, February, 2007. 2883 [RFC3525] Groves, C., Pantaleo, M., Anderson, T., and T. Taylor, 2884 "Gateway Control Protocol Version 1", RFC 3525, June 2885 2003. 2886 [RFC3448] M. Handley, S. Floyd, J. Padhye, J. Widmer, "TCP 2887 Friendly Rate Control (TFRC): Protocol Specification", 2888 RFC 3448, Jan 2003 2889 [VBCM] ITU-T Rec. H.271, "Video Back Channel Messages", June 2890 2006 2891 [RFC3890] Westerlund, M., "A Transport Independent Bandwidth 2892 Modifier for the Session Description Protocol (SDP)", 2893 RFC 3890, September 2004. 2894 [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram 2895 Congestion Control Protocol (DCCP)", RFC 4340, March 2896 2006. 2897 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 2898 A., Peterson, J., Sparks, R., Handley, M., and E. 2899 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 2900 June 2002. 2901 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 2902 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 2903 Parisis, "RTP Payload for Redundant Audio Data", RFC 2904 2198, September 1997. 2905 [Topologies] M. Westerlund, and S. Wenger, "RTP Topologies", draft- 2906 ietf-avt-topologies-06, work in progress, Aug 2007. 2907 [XML-MC] O. Levin, R. Even, P. Hagendorf, "XML Schema for Media 2908 Control," draft-levin-mmusic-xml-media-control-11, work 2909 in progress, July 2007. 2911 12. Authors' Addresses 2913 Stephan Wenger 2914 Nokia Corporation 2915 975, Page Mill Road, 2916 Palo Alto,CA 94304 2917 USA 2919 Phone: +1-650-862-7368 2920 EMail: stewe@stewe.org 2922 Umesh Chandra 2923 Nokia Research Center 2924 975, Page Mill Road, 2925 Palo Alto,CA 94304 2926 USA 2928 Phone: +1-650-796-7502 2929 Email: Umesh.1.Chandra@nokia.com 2931 Magnus Westerlund 2932 Ericsson Research 2933 Ericsson AB 2934 SE-164 80 Stockholm, SWEDEN 2936 Phone: +46 8 7190000 2937 EMail: magnus.westerlund@ericsson.com 2939 Bo Burman 2940 Ericsson Research 2941 Ericsson AB 2942 SE-164 80 Stockholm, SWEDEN 2944 Phone: +46 8 7190000 2945 EMail: bo.burman@ericsson.com 2947 Full Copyright Statement 2949 Copyright (C) The IETF Trust (2007). 2951 This document is subject to the rights, licenses and restrictions 2952 contained in BCP 78, and except as set forth therein, the authors 2953 retain all their rights. 2955 This document and the information contained herein are provided on an 2956 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2957 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST 2958 AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 2959 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT 2960 THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 2961 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 2962 PURPOSE. 2964 Intellectual Property 2966 The IETF takes no position regarding the validity or scope of any 2967 Intellectual Property Rights or other rights that might be claimed to 2968 pertain to the implementation or use of the technology described in 2969 this document or the extent to which any license under such rights 2970 might or might not be available; nor does it represent that it has 2971 made any independent effort to identify any such rights. Information 2972 on the procedures with respect to rights in RFC documents can be 2973 found in BCP 78 and BCP 79. 2975 Copies of IPR disclosures made to the IETF Secretariat and any 2976 assurances of licenses to be made available, or the result of an 2977 attempt made to obtain a general license or permission for the use of 2978 such proprietary rights by implementers or users of this 2979 specification can be obtained from the IETF on-line IPR repository at 2980 http://www.ietf.org/ipr. 2982 The IETF invites any interested party to bring to its attention any 2983 copyrights, patents or patent applications, or other proprietary 2984 rights that may cover technology that may be required to implement 2985 this standard. Please address the information to the IETF at 2986 ietf-ipr@ietf.org. 2988 Acknowledgement 2990 Funding for the RFC Editor function is provided by the IETF 2991 Administrative Support Activity (IASA). 2993 RFC Editor Considerations 2995 The RFC editor is requested to replace all occurrences of XXXX with 2996 the RFC number this document receives.