idnits 2.17.1 draft-westerlund-avtext-codec-operation-point-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 5, 2012) is 4434 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'H264' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtext-multiple-clock-rates-02 == Outdated reference: A later version (-05) exists of draft-westerlund-avtext-rtp-stream-pause-00 == Outdated reference: A later version (-02) exists of draft-westerlund-mmusic-sdp-bw-attribute-00 -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Standards Track L. Hamm 5 Expires: September 6, 2012 Ericsson 6 March 5, 2012 8 Codec Operation Point RTCP Extension 9 draft-westerlund-avtext-codec-operation-point-00 11 Abstract 13 The Audio-Visual Profile with Feedback (AVPF) specification defines a 14 framework and messages for fast feedback and media control over RTCP. 15 The Codec Control Messages (CCM) specification defines an extension 16 to AVPF, by specifying additional messages for codec control and 17 feedback. This specification extends CCM, by specifying messages 18 that let participants dynamically communicate a set of codec 19 configuration parameters, which enables better optimization of 20 resource efficiency and quality of media transmission. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on September 6, 2012. 39 Copyright Notice 41 Copyright (c) 2012 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 59 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 6 60 2.3. Requirements Language . . . . . . . . . . . . . . . . . . 7 61 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 3.1. Problem Description . . . . . . . . . . . . . . . . . . . 7 63 3.2. Legacy Methods . . . . . . . . . . . . . . . . . . . . . . 10 64 3.2.1. Relation to SDP . . . . . . . . . . . . . . . . . . . 10 65 3.2.2. Relation to RTCP . . . . . . . . . . . . . . . . . . . 10 66 4. Use Cases for COP . . . . . . . . . . . . . . . . . . . . . . 11 67 4.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 11 68 4.2. Media Receiver to RTP Mixer . . . . . . . . . . . . . . . 12 69 4.3. RTP Mixer to Media Sender . . . . . . . . . . . . . . . . 13 70 4.4. Media Receiver in Multicast or with RTP Transport 71 Translator . . . . . . . . . . . . . . . . . . . . . . . . 15 72 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 18 73 6. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 19 74 6.1. Message Structure . . . . . . . . . . . . . . . . . . . . 21 75 6.2. Codec Configuration Parameter Use . . . . . . . . . . . . 22 76 6.3. Operation Point . . . . . . . . . . . . . . . . . . . . . 22 77 6.4. Request . . . . . . . . . . . . . . . . . . . . . . . . . 24 78 6.5. Notification . . . . . . . . . . . . . . . . . . . . . . . 25 79 6.6. Status Report . . . . . . . . . . . . . . . . . . . . . . 26 80 6.7. Adding and Removing Operation Points . . . . . . . . . . . 27 81 7. Codec Control Message Extension . . . . . . . . . . . . . . . 27 82 7.1. COP Message . . . . . . . . . . . . . . . . . . . . . . . 28 83 7.2. FCI Format . . . . . . . . . . . . . . . . . . . . . . . . 28 84 7.2.1. Message Item Format . . . . . . . . . . . . . . . . . 28 85 7.2.2. Message Item Types . . . . . . . . . . . . . . . . . . 29 86 7.2.3. Operation Point Identification . . . . . . . . . . . . 29 87 7.3. Codec Operation Point Notification . . . . . . . . . . . . 30 88 7.3.1. Message Format . . . . . . . . . . . . . . . . . . . . 30 89 7.3.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 31 90 7.3.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 34 91 7.3.4. Handling in Mixers and Translators . . . . . . . . . . 34 92 7.4. Codec Operation Point Request . . . . . . . . . . . . . . 35 93 7.4.1. Message Format . . . . . . . . . . . . . . . . . . . . 35 94 7.4.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 36 95 7.4.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 38 96 7.4.4. Handling in Mixers and Translators . . . . . . . . . . 38 97 7.5. Codec Operation Point Status . . . . . . . . . . . . . . . 39 98 7.5.1. Message Format . . . . . . . . . . . . . . . . . . . . 39 99 7.5.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 40 100 7.5.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 42 101 7.5.4. Handling in Mixers and Translators . . . . . . . . . . 42 102 8. Parameter Types . . . . . . . . . . . . . . . . . . . . . . . 42 103 8.1. Parameter Format . . . . . . . . . . . . . . . . . . . . . 42 104 8.2. ALT . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 105 8.3. ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 106 8.4. Payload Type . . . . . . . . . . . . . . . . . . . . . . . 47 107 8.5. Bitrate . . . . . . . . . . . . . . . . . . . . . . . . . 48 108 8.6. Token Bucket Size . . . . . . . . . . . . . . . . . . . . 49 109 8.7. Framerate . . . . . . . . . . . . . . . . . . . . . . . . 50 110 8.8. Horizontal Pixels . . . . . . . . . . . . . . . . . . . . 51 111 8.9. Vertical Pixels . . . . . . . . . . . . . . . . . . . . . 51 112 8.10. Channels . . . . . . . . . . . . . . . . . . . . . . . . . 52 113 8.11. Sampling Rate . . . . . . . . . . . . . . . . . . . . . . 52 114 8.12. Maximum RTP Packet Size . . . . . . . . . . . . . . . . . 54 115 8.13. Maximum RTP Packet Rate . . . . . . . . . . . . . . . . . 54 116 8.14. Application Data Unit Aggregation . . . . . . . . . . . . 55 117 9. SDP Extensions . . . . . . . . . . . . . . . . . . . . . . . . 56 118 9.1. Extension of the rtcp-fb Attribute . . . . . . . . . . . . 56 119 9.2. Offer/Answer Usage . . . . . . . . . . . . . . . . . . . . 57 120 9.3. Declarative Usage . . . . . . . . . . . . . . . . . . . . 58 121 10. Codec Sub-Stream Identification . . . . . . . . . . . . . . . 58 122 10.1. H.264 AVC . . . . . . . . . . . . . . . . . . . . . . . . 59 123 10.2. H.264 SVC . . . . . . . . . . . . . . . . . . . . . . . . 59 124 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 125 11.1. SDP Offer/Answer . . . . . . . . . . . . . . . . . . . . . 60 126 11.2. Dynamic Video Re-sizing . . . . . . . . . . . . . . . . . 62 127 11.3. Illegal Request . . . . . . . . . . . . . . . . . . . . . 64 128 11.4. Reference Response to Modification of Scalable Layer . . . 65 129 11.5. Successful Request to Add Codec Operation Point . . . . . 67 130 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 69 131 13. Security Considerations . . . . . . . . . . . . . . . . . . . 69 132 14. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 70 133 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 70 134 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 70 135 16.1. Normative References . . . . . . . . . . . . . . . . . . . 70 136 16.2. Informative References . . . . . . . . . . . . . . . . . . 71 137 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 72 139 1. Introduction 141 Multimedia real-time communication services, such as video telephony 142 and videoconferencing, use the real-time transport (RTP/RTCP) 143 [RFC3550] protocol to transmit media streams, such as audio and 144 video. A session establishment protocol, such as SIP [RFC3261], in 145 combination with a capability negotiation protocol, such as SDP 146 offer/answer [RFC3264] is normally used to establish the session and 147 negotiate media capabilities. In some cases, a set of codec 148 parameters is negotiated that does not express any specific limit or 149 capability, but just describes a certain codec configuration. 151 During session establishment, the participating endpoints normally 152 have limited knowledge about the session environment, e.g. whether 153 the session will be point-to-point or contain some multi-party 154 scenario, how users will interact with the application, how network 155 conditions will vary during the session, etc. To take those 156 variations into account, the participants can re-negotiate session 157 parameters to better suit the communication environment. At times, 158 when variations or changes are frequent in nature, it will require 159 the needed reaction time to be short, which may make repeated session 160 re-negotiation inefficient and/or too slow. In addition, variations 161 may not even affect negotiated session parameters, if the variations 162 occur within the negotiated boundaries. 164 The above scenario can become critical especially in cases where a 165 given media stream is transmitted towards, and received by, multiple 166 receivers. In multi-party environments, scalable encoding or 167 simulcast can be used to make the system more efficient and provide 168 better quality to participants that are capable of receiving and 169 utilizing the higher quality. These use cases results in that a 170 sending party is requested to deliver multiple encoder operation 171 points. 173 The Audio-Visual Profile with Feedback (AVPF) specification [RFC4585] 174 defines a framework and messages for fast feedback and media control 175 over RTCP. The Codec Control Messages (CCM) specification [RFC5104] 176 defines an extension to AVPF, by specifying additional messages for 177 codec control and feedback. This specification extends CCM, by 178 specifying messages that let participants dynamically communicate a 179 set of codec configuration parameters, which enables better 180 optimization of resource efficiency and quality of media 181 transmission. 183 The codec configuration parameters specified in this document focus 184 on some basic audio and video properties, such as video resolution, 185 video frame rate, media stream bit-rate, audio sampling rate, number 186 of audio channels, maximum RTP packet size and rate. Additional 187 parameters can be standardized in the future. 189 The codec control messages are not meant to replace configuration 190 performed using e.g. SDP. Instead, the messages can be used to 191 communicate dynamic and frequent changes that take place within 192 boundaries that have been negotiated as part of the session 193 establishment. 195 2. Definitions 197 2.1. Terminology 199 The following terms and abbreviations are used in this document: 201 Bandwidth: The network resource needed to transport a certain 202 bitrate and any transport overhead, measured in bits per second. 203 There will be spare network bandwidth when the (media) data 204 bitrate and overhead is less than the available bandwidth. 205 Similarly, data will have to be buffered when the available 206 bandwidth excluding transport overhead is less than the bitrate 207 used by the sender, or the excess data will be lost. The 208 available bandwidth typically varies dynamically over time. 210 Bitrate: The amount of (media) data transmitted per time unit, 211 measured in bits per second, utilizing some amount of the 212 available network bandwidth resource. In the context of this 213 specification and unless otherwise specified, it excludes IP/UDP/ 214 RTP overhead. Depending on (media) data source, the bitrate can 215 either be constant or vary dynamically over time. 217 Codec Configuration Parameter: The configurable value describing a 218 certain codec property, which may impact user-perceived media 219 fidelity, encoded media stream characteristics, or both. The 220 parameter has a type (Codec Parameter Type, see below) and a 221 value, where the type describes what kind of codec property that 222 is controlled, and the value describes the property setting as 223 well as how the value should be used in comparison operations. A 224 single Parameter Value can express one specific value or an open- 225 ended range. A pair of Parameter Values with different comparison 226 types can describe a value range. Such value range can also be 227 combined with a third, target value within that range. 229 Codec Operation Point: Also denoted just Operation Point. A set of 230 Codec Configuration Parameter values, describing the 231 characteristics of one single encoding. For scalable encoding, it 232 describes the resulting characteristics from combining a set of 233 dependent sub-streams. 235 Codec Parameter Type: The specific type of a Codec Configuration 236 Parameter. Each parameter type defines what unit the value has. 237 This specification defines a number of generally useful parameter 238 types in Section 8 that can be used to control codec operation. 240 Encoding: A particular encoding is the resulting media stream from 241 applying a certain choice of Codec Configuration Parameters to the 242 encoder. The media stream will have a certain fidelity (quality) 243 from that encoding through the choice of sampling, bit-rate and 244 other configuration parameters. 246 Endpoint: A host or node that have a presence in the RTP session 247 with one or more Synchronization Sources (SSRC)s. 249 Mixer: An RTP session centralized node that generates media streams 250 based on incoming media streams from other endpoints. See Topo- 251 Mixer in RTP Topologies [RFC5117]. 253 RTP Session: An association among a set of participants 254 communicating with RTP. The distinguishing feature of an RTP 255 session (defined in [RFC3550]) is that each RTP session maintains 256 a full, separate space of SSRC identifiers. Each participant in 257 the RTP session can see SSRC or CSRC identifiers from the other 258 participants, either by RTP, RTCP, or both. 260 Sub-Stream: An individually decodeable part of a scalable media 261 stream, including all dependent sub-streams. The characteristics 262 of a certain sub-stream can be described by a Codec Operation 263 Point. 265 Translator: An RTP session centralized node that forwards all media 266 streams from other endpoints, modified to some extent, e.g. 267 addressing, encoding, fidelity. See Topo-Translator in RTP 268 Topologies [RFC5117]. 270 2.2. Abbreviations 272 AVC: Advanced Video Coding 274 AVPF: Extended RTP Profile for RTCP-Based Feedback 276 CCP: Codec Configuration Parameter 278 COP: Codec Operation Point 279 COPN: Codec Operation Point Notification 281 COPR: Codec Operation Point Request 283 COPS: Codec Operation Point Status 285 CPT: Codec Parameter Type 287 FCI: Feedback Control Information 289 FMT: Feedback Message Type 291 GUI: Graphical User Interface 293 MST: Multi-Session Transmission 295 MVC: Multiview Video Coding 297 OP: Operation Point 299 OPID: Operation Point Identification number 301 PPS: Picture Parameter Set 303 SPS: Sequence Parameter Set 305 SST: Single-Session Transmission 307 SVC: Scalable Video Coding 309 TLV: Type-Length-Value 311 2.3. Requirements Language 313 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 314 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 315 document are to be interpreted as described in [RFC2119]. 317 3. Motivation 319 3.1. Problem Description 321 Networks can contain endpoints with different capabilities, including 322 CPU power, capture and render device fidelity (e.g. image 323 resolution), and codecs. In addition, the characteristics and 324 properties of networks can vary, which endpoints have to cope with. 325 For example, in videoconferencing and telepresence services, a large 326 number of endpoints may participate, and there may be a large number 327 of media streams associated with the session. Such multi-party 328 scenarios typically use entities for media mixing, switching and 329 transcoding. The aim is generally to provide the best possible 330 quality to each endpoint, taking endpoint and network capabilities 331 into consideration. 333 Many communication services today use codecs that can be configured 334 in a number of different ways. Often, the codecs have multiple 335 properties that can be configured and those properties may also be 336 inter-related, often in complex ways. One example is the H.264 (AVC) 337 [H264] video codec and its scalable (SVC) and multi-view (MVC) 338 versions. Most other video codecs, and codecs for many other types 339 of media, also have multiple configurable properties. Such 340 configurable properties will be referred to as "Codec Configuration 341 Parameters" in this specification. 343 There can be several reasons to change the media rate or other 344 encoding or packetization properties during an ongoing communication 345 session. One reason can be that the available network bandwidth 346 varies. Another reason can be that other network properties changes, 347 such as effective MTU or packet rate limitations. Other reasons can 348 be that the quality or representation of the media rendered to the 349 end user changes, maybe as a direct result of the user manipulating 350 the GUI (e.g. changing window position or size), the relative 351 importance of the received media stream changes (e.g. active or non- 352 active speaker in a conferencing scenario), or the user selects to 353 show some other content source that is available among the advertised 354 media streams. 356 The codec changes above can be made directly between endpoints in a 357 point-to-point scenario, or they may involve, and be acted upon, by 358 media aware intermediaries (e.g. RTP mixers). An RTP mixer can do 359 transcoding to provide each receiver with media streams of adapted 360 quality, but transcoding has drawbacks as it always consumes 361 processing power, typically impacts media quality in a negative way, 362 and often introduces additional delays. 364 In order to avoid separate transcoding towards each endpoint, an RTP 365 Mixer can, by taking the capabilities of the endpoints into account, 366 decide to request specific codec configurations from endpoints, which 367 will minimize the need for transcoding. Also, in scenarios where no 368 RTP Mixers are used and transmitted media reaches multiple endpoints, 369 the sender will have to take into account that each endpoint may have 370 different capabilities. The use cases section (Section 4) shows 371 different use cases, with and without RTP Mixers. 373 Resource optimization involving bandwidth is expected to be one of 374 the major reasons for changing encoding properties, since it is in 375 general desirable to avoid using more bandwidth than absolutely 376 necessary, especially considering that 378 o the expectation for high media quality will likely continue to 379 increase; 381 o the bitrate required to transmit the media, despite increasingly 382 efficient media coding, can due to the above also be expected to 383 increase; 385 o the relation between media bitrate and media codec configuration, 386 the used set of media codec property values, is typically complex 387 and the mapping between each individual codec property and bitrate 388 is in general not linear; 390 o the used media bitrate does not uniquely identify the media codec 391 configuration, but there are in general multiple codec 392 configurations that can generate the same media bitrate; 394 o the media receiver preferences how the codec property values 395 should be set for a certain media bitrate will typically vary with 396 the specific end-user service requirements (for example, but not 397 limited to, users with special needs) and the current media stream 398 role in the application; 400 o the communication scenarios will not be limited to point-to-point, 401 potentially involving multiple and at least partly conflicting 402 constraints from different receivers; and 404 o the available bandwidth is commonly a scarce and/or costly 405 resource and will likely continue to be so also in the future. 407 Other resources that may be desirable to optimize include, but is not 408 limited to, endpoint and middle node processing (CPU) utilization, 409 and transport quality (QoS). 411 A media receiver cannot be assumed to know exactly what codec 412 configuration will be best for the media sender to use, given that 413 the sender needs to take multiple aspects into account, including 414 implementation limitations in the actual encoder. It should be more 415 likely to find a value acceptable to both sender and receiver if the 416 receiver can indicate an acceptable range instead of just a single 417 value. 419 When an RTP Mixer distributes streams to multiple receivers with 420 different media quality requirements, it is sometimes possible to 421 avoid targeted transcoding for every single receiver. That can be 422 accomplished if the media sender has the ability to produce multiple 423 media versions, such as for example scalable encoding or simulcast. 424 Thus there is a need to both address specific media versions and 425 describe the fact that multiple media versions with different 426 configurations should be used. 428 3.2. Legacy Methods 430 3.2.1. Relation to SDP 432 The session description protocol (SDP) [RFC4566] is commonly used to 433 negotiate and configure codecs and establish RTP/RTCP session 434 parameters during session establishment, and during sessions, e.g. by 435 using it in conjunction with SIP [RFC3261] and SDP Offer/Answer 436 [RFC3264]. 438 As described Section 3.1 above, many of the underlying reasons that 439 makes media receivers desire certain codec encoding properties are 440 highly dynamic in nature and using SIP/SDP to re-negotiate the 441 session will in many cases be too slow to be useful. SIP messages 442 containing an SDP may become quite large for sessions containing many 443 media, and since there is no defined way to send a partial SDP, even 444 very small changes require sending the entire SDP. Most of the 445 current defined properties in SDP are also oriented to be common for 446 all media streams in the same RTP session, rather than be specific to 447 one media stream. 449 The mechanism in this specification does not replace SDP, or the SDP 450 Offer/Answer mechanism. It is expected that SDP is used in order to 451 negotiate and configure boundary values for codec properties, and COP 452 can then be used to communicate specific values within those 453 boundaries, as long as there is no impact on the values negotiated 454 using SDP. It is possible to establish communication sessions even 455 if one or more endpoints do not support COP. 457 3.2.2. Relation to RTCP 459 As discussed in CCM, regular RTCP reporting or extended reports 460 [RFC3611] can to some extent be used to re-configure an encoder, but 461 the reported measures seldom map directly back to encoding properties 462 and they typically cannot express an unwanted situation in terms of 463 encoding properties and what the receiver would like to receive 464 instead. Communicating codec properties indirectly as a set of 465 network properties will require interpretation by both sender and 466 receiver and will thus risk misinterpretations and ambiguity. Since 467 it is likely that a decoder is able to identify unwanted 468 characteristics of the media stream in terms of encoding properties, 469 the most straightforward approach is to convey those properties 470 directly to the encoder. 472 Responsive techniques to control encoding are already available, e.g. 473 Codec Control Messages (CCM) [RFC5104]. Although highly applicable, 474 the possibilities to control encoding is however not explicit enough, 475 both in terms of the amount of available parameters to control, and 476 the fact that they may be inter-related, alternative, or both. 478 Some codecs define codec-specific methods to enable receiver control 479 of some encoding aspects, but it should be beneficial for 480 interoperability to use codec agnostic signaling instead. 482 4. Use Cases for COP 484 This section discusses a number of use cases for Codec Operation 485 Points. 487 4.1. Point to Point 489 This set of use cases are all focused on that communication is 490 directly point to point between a media sender and a receiver. There 491 is no need for further forwarding of the media streams. Thus, the 492 goal should be to produce a media stream, transport it to the media 493 receiver, where it is consumed as optimal as possible for the 494 application. Thanks to this one-to-one mapping between encoder and 495 decoder, great flexibility exists to produce a media stream tailored 496 to the receiver's needs, given the constraints that exist from media 497 sender, transport network and the receiver. 499 Some constraints will be static (and thus suitable for session 500 configuration signalling), but a number of these are highly dynamical 501 and thus desirable to adapt to during the session: 503 Video Resolution in GUI: In a video communication application, 504 including WebRTC based ones, the window where the media senders 505 media stream is presented may change, for example due to the user 506 modifying the size of the window. It might also be due to other 507 application related actions, like selecting to show a 508 collaborative work space and thus reducing the area used to show 509 the remote video in. In both of these cases it is the receiver 510 side that knows how big the actual screen area is and what the 511 most suitable resolution would be. It thus appears suitable to 512 let the receiver request the media sender to send a media stream 513 conforming to the displayed video size. 515 Network Bit-rate Limitations: If the receiver discovers a network 516 bandwidth limitation, it can choose to meet it by requesting media 517 stream bit-rate limitations. Especially in cases where a media 518 sender provides multiple media streams, the relative distribution 519 of available bit-rate could help the application provide the most 520 suitable experience in a constrained situation. 522 CPU Constraint: A media receiver may become constrained in the 523 amount of available processing resources. This may occur in the 524 middle of a session for example due to the user selecting a power 525 saving mode, or starting additional applications requiring 526 resources. When this occurs, the receiving application can select 527 which codec parameters to constrain and how much constrained they 528 should be to best suit the needs of the application. For example, 529 if lower framerate is somehow a better constraint than lower 530 resolution. 532 4.2. Media Receiver to RTP Mixer 534 This section considers a multiparty session with a centralized media 535 intermediary, like an RTP mixer, where the media receiver uses COP to 536 affect the delivered media. 538 +------------+ +---+ 539 | |--RTP-->| B | 540 | |<--COP--| | 541 | | +---+ 542 | | 543 +---+ | | +---+ 544 | A |-RTP->| Mixer |--RTP-->| C | 545 +---+ | | +---+ 546 | | 547 | | +---+ 548 | |--RTP-->| D | 549 +------------+ +---+ 551 Figure 1: Receiver (B) using COP to adapt media stream 553 In the above Figure 1 we focus on the possible usages of COP by a 554 media receiver, like B. Here the functional role of the intermediary 555 becomes important. An RTP mixer uses its own SSRC(s) to channel 556 selected media streams to B from other participants like A. If the 557 intermediary is instead a translator, the Receiver B can see A's 558 SSRC(s) directly instead of possibly showing up as CSRC. We will in 559 this section focus on the Mixer case. The RTP translator case is 560 further discussed in Section 4.4. 562 The RTP mixer's usage of its own SSRC allows particular mixer to 563 receiver media flows to be associated with a particular role or 564 purpose in the application rather than a given media source. When 565 there exist multiple RTP streams from the mixer to a receiver, the 566 receiver can use COP to request an operations point that better suits 567 the receiver needs on each particular stream and possibly role of the 568 media stream. It also allows the receiver to select its desired 569 trade-off in properties and quality between multiple delivered media 570 streams. 572 There exist some different reasons why B would need to indicate 573 changes in its capabilities to receive a particular media stream; 575 Network Path: The receiver detects changes in the network that on a 576 mid to long term will result in a new capability regarding the 577 maximum bit-rate that can be supported. 579 Bandwidth trade-off: In an application receiving multiple media 580 streams, if the receiving application likes to change the relative 581 bit-rate trade-off between the streams. 583 Presentation or Graphical User Interface Changes: If the 584 presentation or graphical user interface (GUI) changes on the 585 receiving side results in other requirements or needs on the media 586 streams. For example if the application window is re-sized by the 587 user, the amount of screen estate to present the different video 588 elements changes. To optimize the video quality in relation to 589 bit-rate the receiver indicates the new preferred video 590 resolution. 592 In all the above cases the receiver sends a COP request to the mixer 593 for new codec operation points on mixer controlled media stream(s). 594 It then becomes the mixer's responsibility to determine if and how 595 the requested COPs can be supported. For example by requesting new 596 operations points from the media source as discussed in Section 4.3. 597 The selection of another media source to deliver in a media stream 598 can result in that the mixer may have to update the receiver on the 599 properties of the operations point. 601 4.3. RTP Mixer to Media Sender 603 This section looks at the usage of COP in cases of multiparty with 604 centralized media intermediary, like an RTP mixer, selecting and 605 requesting tailored media stream or streams a media sender delivers 606 to the intermediary for further forwarding or manipulation. This 607 usage can be simplified down to looking at the media streams from one 608 media sender (A), which is currently being delivered to multiple 609 receivers (B-D) as depicted in Figure 2. 611 +------------+ +---+ 612 | |--RTP-->| B | 613 | | +---+ 614 +---+ | | 615 | A |<-COP-| | +---+ 616 | |-RTP->| Mixer |--RTP-->| C | 617 +---+ | | +---+ 618 | | 619 | | +---+ 620 | |--RTP-->| D | 621 +------------+ +---+ 623 Figure 2: Mixer using COP to adapt media streams to multiple 624 receivers 626 The media path from the Mixer to B, C and D are different and thus 627 the available resources may vary between them. In addition B, C and 628 D may have different capabilities when it comes to handling media 629 streams. These limitations can be learned by the Mixer through 630 session configuration signalling, media transmission feedback (e.g. 631 RTCP), or usage of COP by the receivers (See Section 4.2). 632 Limitations are also expected to be updated during the session 633 lifetime. 635 The media sender (A) has certain capabilities and what is possible to 636 do will depend on A's capabilities and what has been configured 637 between A and the Mixer. Let's look at a few different cases of the 638 capabilities A may have and how that influence how the Mixer can use 639 COP to affect the media stream(s) delivered to the Mixer. 641 Single Media Encoding: If A can only provide a single media encoding 642 of a particular media source, then the Mixer has to make a choice 643 on what property it would like to request for that media stream. 644 The most basic choice is to request the lowest common denominator 645 across the receiver population. If the mixer has certain 646 capabilities for media transcoding it could select to request 647 another operation point for the media encoding with higher quality 648 and then transcode to some few receivers. That enables a higher 649 quality to several receivers while still being able to serve end- 650 points with the least capabilities. In these cases the Mixer has 651 to make COP requests that indicate only a single operation point 652 with parameters that best matches the restrictions. 654 Scalable Media Encoding: If A is capable of producing a scalable 655 media stream encoding, the Mixer can request multiple operation 656 points for the same media stream. For example, if A is capable of 657 producing three different operation points, the Mixer in the above 658 Figure 2 would potentially be able to request scalability layers 659 that would allow it to match the capabilities of all the three 660 receivers B, C and D. If several receivers are close in 661 capabilities, the mixer may choose to request fewer operation 662 points. Something that arise in this use case which wasn't 663 present in the single media encoding above is that the mixer must 664 determine which packets or parts of packets that are to be sent to 665 each receiver based on their capabilities. This requires that the 666 mixer is capable of identifying in the media stream which 667 scalability layers that match a given requested operation point. 668 Thus it is desirable that the media sender can indicate to Mixer 669 what layer(s) that match a given operations point. 671 Simulcast Media: If A and the Mixer has negotiated the usage of 672 simulcasted media encoding of the media source, then the Mixer can 673 adopt several operation points to best suit the receiver set, just 674 like for scalable encoding. When simulcasting, the mixer will 675 however have to send one COP request per media stream it actually 676 wants to affect. Some consideration is necessary to ensure that 677 configuration changes over multiple media streams from the same 678 media source take place. Compared to scalable media, the mixer 679 need not strip away any layers to get at a particular operation 680 point but can forward entirely self-contained media streams. 682 The use of COP as described above can be triggered by a multitude of 683 reasons. We will here discuss some of them. We already mentioned 684 that bit-rate adaptation (congestion control) on the Mixer to 685 receiver path can indicate a need to change an operation point. 686 Another reason is when a new session participant joins that has 687 certain receiver capabilities (both decoding or other hardware, as 688 well as network path related), thus potentially changing the optimal 689 set of operation points. There also exist a number of different 690 cases where the desired application behavior results in changes in 691 desired operation points, like change of active speakers, 692 reconfiguration of the display layout, etc. 694 It is also important to remember that Figure 2 only presents the view 695 of a single media sender. In most communication sessions there are 696 multiple media senders, and the mixer will need to take the 697 combination of media streams from multiple media senders into account 698 when choosing what is to be sent to a given receiver. Thus changes 699 at one media sender can result in related changes of the operation 700 points at the other media senders. 702 4.4. Media Receiver in Multicast or with RTP Transport Translator 704 This section covers usage of COP in multicast transported RTP 705 sessions, as well as when transport translators [RFC5117] are used. 706 Transport translators can be used to emulate any source multicast 707 (ASM) over unicast. Multicast usages also include Source Specific 708 Multicast (SSM) [RFC4607], which according to "RTP Control Protocol 709 (RTCP) Extensions for Single-Source Multicast Sessions with Unicast 710 Feedback" [RFC5760] has two main modes; simple mode and summary 711 feedback mode, affecting the usage of functionality that COP 712 provides. 714 +---+ +------------+ +---+ 715 | A |<---->| |<---->| B | 716 +---+ | | +---+ 717 | Translator | 718 +---+ | | +---+ 719 | C |<---->| |<---->| D | 720 +---+ +------------+ +---+ 722 Figure 3: Transport Translator 724 A transport translator [RFC5117] , which main purpose is to forward 725 any incoming packets to all the other session participants, emulates 726 an ASM session. As anyone can send to all other in both cases, there 727 are some properties in these sessions that can make use in large 728 scale sessions with many participants require some extra 729 consideration. 731 +-----+ +-----+ +-----+ 732 | MS1 | | MS2 | .... | MSm | 733 +-----+ +-----+ +-----+ 734 ^ ^ ^ 735 | | | 736 V V V 737 +---------------------------------+ 738 | Distribution Source | 739 +--------+ | 740 | FT Agg | | 741 +--------+------------------------+ 742 ^ ^ | 743 : . | 744 : +...................+ 745 : | . 746 : / \ . 747 +------+ / \ +-----+ 748 | FT1 |<----+ +----->| FT2 | 749 +------+ / \ +-----+ 750 ^ ^ / \ ^ ^ 751 : : / \ : : 752 : : / \ : : 753 : : / \ : : 754 : ./\ /\. : 755 : /. \ / .\ : 756 : V . V V . V : 757 +----+ +----+ +----+ +----+ 758 | R1 | | R2 | ... |Rn-1| | Rn | 759 +----+ +----+ +----+ +----+ 761 Figure 4: SSM based RTP session 763 In the above Figure 4, the media senders (MS1 .., MSm) send their 764 media streams and RTCP traffic to the distribution source (DS). The 765 DS forwards the RTP and RTCP traffic from the media senders to the 766 SSM group. Using the RTCP extension for unicast RTCP feedback 767 [RFC5760], the receivers (R1...Rn) send their RTCP traffic to their 768 configured feedback target. This sample session has two feedback 769 targets to scale with the amount of receivers. RTCP messages that 770 needs to go to a media sender is forwarded to the FT aggregator part 771 of the distribution source for further forwarding over the unicast 772 paths between the distribution source and the media senders. The 773 feedback target and the feedback aggregator also forwards all RTCP 774 messages from receivers in simple mode, and aggregate it in summary 775 mode. Some RTCP messages from a receiver may still have to be 776 forwarded over the SSM group. 778 COP needs to support some reasonable functionality over the different 779 multiparty topologies described above and it is also important that 780 COP does not cause significant issues in any of the environments. 782 In the basic case, where only a single multicast group exists, there 783 is a well known problem associated with adapting content and bit-rate 784 to the receiver population. The more receivers, the larger the 785 potential for non-matching requirements in requests from the 786 different receivers. One strategy for meeting this is to use the 787 lowest common denominator among the requests from the receiver 788 population. This normally results in sub-optimal quality for a 789 significant part of the session participants, the main benefit being 790 that all participants will be able to receive some content. 792 Because the above limitations of operation within a single group, 793 usage of COP in larger groups becomes difficult unless the parameters 794 that can be adopted and affected by COP requests are such that a 795 limited set of participants is expected to request them, and the 796 impact for the others are limited or acceptable. The authors 797 therefore expects the usage of COP in large groups to be limited and 798 this specification focuses on operation in smaller groups. However, 799 as it is not possible to define the threshold when a group changes 800 from being small to be too large to work well with COP in the generic 801 case, it is important that COP can operate safely in a large group, 802 although the possibilities to satisfy the request may be severely 803 limited. 805 There also exist use cases for COP where the media application uses 806 multiple multicast groups to enable multiple operation points and 807 allows each receiver to join the multicast groups that suits the 808 participant's capabilities. An example of such usage would be 809 Scalable Video Coding (SVC) using the Multi-Session Transport (MST) 810 mode of the SVC RTP payload format [RFC6190]. The SVC MST RTP 811 streams that are sent in each group can still contain multiple 812 scalability layers; one could combine coarse-grained control on the 813 operation points by having the receiver join a particular session 814 with a more fine-grained control using COP to adjust the included 815 scalability layers to suit the receiver's needs, such as lower CPU 816 load. 818 5. Requirements 820 The solution outlined in this specification should fulfill the 821 following requirements: 823 REQ-1: Enable dynamic control of possibly inter-related codec 824 properties during an ongoing media session. 826 REQ-2: Be media type agnostic, to the furthest extent possible, and 827 at least cover audio and video media. 829 REQ-3: Be codec agnostic (within the same media type), to the 830 furthest extent possible. 832 REQ-4: Work with different media transmission types, i.e. single- 833 stream, simulcast, single-stream scalable, and multi-stream 834 scalable transmission. 836 REQ-5: Work with un-encrypted as well as encrypted media. 838 REQ-6: Be extensible, making it simple to add control and 839 description of new codec properties. 841 REQ-7: Complement rather than conflict with other codec 842 configuration methods such as e.g. other RTCP based techniques and 843 SDP. 845 REQ-8: Support configurable parameters that are directly visible in 846 the media stream as well as those that are not visible in the 847 media stream. 849 In addition, Guidelines for Extending RTCP [RFC5968] should be 850 followed to the furthest extent possible. 852 6. Solution Overview 854 The mechanism described in this specification especially targets 855 heterogeneous multi-party scenarios where different endpoints require 856 differently encoded media from the same source, but its use in other 857 situations is not precluded, in fact point to point scenarios is 858 considered to be of equal importance but no more demanding that the 859 multiparty case. In the targeted scenario, the media stream from one 860 encoder is sent to multiple decoders, and hence the encoder must 861 possibly provide an encoding with multiple operation points, suitable 862 for the receivers. This is typically only possible with so-called 863 scalable codecs, but some codecs may have inherent scalability 864 features without being generally considered as scalable (e.g. H.264/ 865 AVC temporal scalability through non-reference frames). Multi-party 866 services often involve a media mixer (Topo-Mixer) [RFC5117] as a 867 central network node. 869 +---+ 870 | S | 871 +---+ 872 | 873 v 874 +-------+ 875 | Mixer | 876 +-------+ 877 / | \ 878 v v v 879 +---+ +---+ +---+ 880 | A | | B | | C | 881 +---+ +---+ +---+ 883 Figure 5: Sample Mixer Topology 885 The solution defined in this specification can be used during an 886 active session to quickly adapt to changes in media receiver 887 available bandwidth and/or preferences for one or more other codec 888 properties, while still conforming to the session configuration, like 889 SDP offer/answer negotiated minimum or maximum limits (depending on 890 individual SDP property semantics). Some needed or wanted codec 891 property changes will also motivate to re-negotiate the SDP, but the 892 scope of this specification intends to cover only changes that lies 893 within the SDP negotiated set and thus do not impact the SDP. 895 Three message types are defined to support the solution; a request, a 896 notification, and a status report: 898 Request: A media receiver requesting a media sender to adjust one or 899 more of it's media encoding parameters for a certain media stream. 900 The request is normally based on a specific set of media encoding 901 parameters that the media sender has explicitly notified the media 902 receiver about in a notification. 904 Notification: A media sender notifying a media receiver of the 905 currently used media encoding parameters for a certain 906 (identified) media stream. The notification is initiated by the 907 media sender, typically whenever the media encoding parameters 908 changed significantly from what was previously used. The reason 909 for the change can either be local to the media sender (user, end- 910 point or network), or it can be the result of one or more requests 911 from remote end-points. 913 Status Report: A media sender reporting to a request sender (media 914 receiver) on request reception status; which specific request from 915 the media receiver that was received and considered in setting 916 current media encoding parameters, and the identification of the 917 media stream that is considered to fulfill the request. The 918 status report can also indicate various error conditions, such as 919 reception of invalid or failing requests. 921 More details about the individual messages, but still on an overview 922 level, can be found in sub-sections below. To do that, some other 923 aspects need to be described first. 925 6.1. Message Structure 927 A COP message is sent from an RTP session participant in it's role 928 either as media receiver or media sender. Each message can contain 929 one or more message items of one or more message types, all 930 originating from a single media source. 932 The individual message items each relate only to a single operation 933 point, describing part of an atomic notification or request. 935 The general structure is outlined below: 937 +--------------------------------------+ 938 | AVPF PSFB FMT="COP" | 939 | SSRC of Packet Sender | 940 | SSRC of Media Source | 941 | +----------------------------------+ | 942 | | COP Message Item 0 | | 943 | +----------------------------------+ | 944 | | (Codec Configuration Parameters) | | 945 | +----------------------------------+ | 946 | +----------------------------------+ | 947 | | COP Message Item 1 | | 948 | +----------------------------------+ | 949 | | (Codec Configuration Parameters) | | 950 | +----------------------------------+ | 951 | ... | 952 +--------------------------------------+ 954 Figure 6: COP Message Structure 956 Note that the Request is the only COP Message Item defined in this 957 specification that is sent in the media receiver role and makes use 958 of "SSRC of Media Source" as the targeted media stream for the 959 Request. Both the Notification and the Status Report Message Items 960 are sent in the media sender role, reporting on the message sender's 961 own configuration and thus relate only to the "SSRC of Packet 962 Sender", being agnostic to the "SSRC of Media Source" field. 964 It is thus for example possible to co-locate COPS and COPN messages 965 for the same media source in the same COP FCI. It is also possible 966 to co-locate one or more COPR referring to a single "SSRC of Media 967 Source" with one or more COPN and/or COPS relating to a single "SSRC 968 of Packet Sender" within a single COP message. 970 Multiple Message Items of the same type in the same COP Message are 971 used to describe a notification, status or request for a media stream 972 containing multiple Operation Points (Section 6.3). 974 Multiple COP messages are needed to be able to refer to multiple 975 different "SSRC of Packet Sender" and/or "SSRC of Media Source". 977 6.2. Codec Configuration Parameter Use 979 The Codec Configuration Parameters that are applicable to a certain 980 codec may be specific to the media type (audio, video, ...), but may 981 also be codec-specific. Some codec properties (described by Codec 982 Configuration Parameters) have to be explicitly enabled by (non-RTCP 983 based) capability signaling to be possible or permitted to use. 985 An end-point implementing this specification need not support all 986 available Codec Configuration Parameters defined herein or in 987 extensions to this specification. A certain parameter could also be 988 uninteresting for a certain codec or media stream, even if it is 989 generally supported by the end-point. This specification therefore 990 defines capability signaling that allows a COP receiver to declare 991 explicit support per parameter type on a per-codec level. The set of 992 Codec Configuration Parameters that can be used for a certain media 993 stream by a COP sender is thus restricted by the combination of 994 applicability, capability signaling and explicit receiver parameter 995 support signaling. 997 Any Codec Configuration Parameter that is applicable and feasible to 998 use, but is not included as part of an Operation Point, has a default 999 value. This default is defined for each Parameter Type, but should 1000 preferably whenever possible be taken from capability signaling. It 1001 is not necessary to use all defined Parameter Types in a media stream 1002 description. Some Parameter Types can, depending on media type or 1003 codec, either be un-interesting or not possible to describe or 1004 control in detail, in which case they can be left out, meaning that 1005 the effective value is "undefined" within the limits set by 1006 capability signaling (outside the scope of this specification). 1008 6.3. Operation Point 1010 The Codec Configuration Parameters contained in a single Message Item 1011 jointly constitutes a description of an Operation Point for a 1012 specific media stream from a media sender. 1014 For the purpose of COP signaling, each such Operation Point is 1015 identified with an ID number, OPID, which is scoped by the media 1016 sender's RTP SSRC identification, and can be chosen freely by the 1017 media sender. The need for this media sub-stream identification 1018 basically only appears with scalable coding or other media encoding 1019 methods that introduces separable and configurable sub-streams within 1020 the same SSRC. An OPID thus refers to such configurable sub-stream, 1021 described by a set of related Codec Configuration Parameters. 1023 +--RTP Session 1 ---------------------+ 1024 Media Source 1----+-+-> SSRC1 --> Sub-Stream 1 -> OPID1 | 1025 (MIC, Camera) | \-> Sub-Stream 2 -> OPID2 | 1026 | | 1027 Media Source 2-+--+---> SSRC2 --> Sub-Stream 1 -> OPID3 | 1028 | | \-> Sub-Stream 2 -> OPID4 | 1029 | | \-> Sub-Stream 3 -> OPID5 | 1030 | +-------------------------------------+ 1031 | 1032 | +--RTP Session 2 ---------------------+ 1033 +--+---> SSRC3 --> Sub-Stream 1 -> OPID6 | 1034 | \-> Sub-Stream 2 -> OPID7 | 1035 +-------------------------------------+ 1037 Figure 7: Relation of OPID to Media Source, RTP session and SSRC 1039 The above Figure 7 de-picts the possible relations between media 1040 sources, RTP sessions, RTP streams (SSRCs) and their sub-streams and 1041 the OPID. 1043 For example, a single video camera may be encoded using SVC for a 1044 combined SST and MST transmission configuration. In that case some 1045 subset of scalability layers are sent as SST in the first RTP session 1046 using SSRC2. Another set of scalability layers are transported in 1047 the second RTP session as another SST using SSRC3. The RTP packet 1048 stream from each SSRC can thus contain several sub-streams, each 1049 identified with its own OPID. As a result, a single media source is 1050 present in two RTP sessions, using two different SSRCs (2 and 3) 1051 containing a total of five sub-streams (OPID 3 to 7). 1053 Since an Operation Point can be expected to change over time, as a 1054 result of media receiver requests (Section 6.4), resulting from local 1055 media sender considerations (Section 6.5), or both, the Operation 1056 Point (OPID) is version-handled. The version is scoped by SSRC and 1057 OPID. 1059 It is expected that all encoders dividing a media stream into sub- 1060 streams will include some means to identify those sub-streams in the 1061 media stream. However, it is also expected that such identification 1062 is in general codec-specific. There is thus at times a need to map 1063 the codec agnostic COP OPID identification to codec specific 1064 identification, and this specification therefore includes a method 1065 for such mapping (Section 10). 1067 6.4. Request 1069 The request is sent by a media receiver, which can be either an end- 1070 point or a middle node such as an RTP Mixer. The receiver of the 1071 request may similarly be either the original media sender or a RTP 1072 Mixer. Included in the request is a description of the desired codec 1073 configuration for a specific media (sub-)stream. The parameter 1074 values communicated in a notification (Section 6.5) of that 1075 (sub-)stream is taken as a starting point when deciding what 1076 parameters and parameter values to choose for the request, and only 1077 parameters with changed values need to be in the request. The media 1078 receiver can of course also use other sources of information when 1079 choosing parameters and values, such as for example observation of 1080 the received media stream and capability signaling. 1082 It is not an absolute requirement to have received a notification to 1083 be able to create a meaningful request. The request can include a 1084 set of changed properties for existing streams, but it can also 1085 request the addition or removal of one or more media sub-streams 1086 having certain properties, in which case there will be no 1087 notification to base the request on. A media receiver may also want 1088 to send a request prior to having received any notifications for 1089 existing streams, and can then base the request on other information 1090 such as for example observing the media stream or use information 1091 from the capability signaling. In case there is no existing stream 1092 and OPID to refer in the request, a "provisional" OPID MUST be chosen 1093 in the request, which will have to be mapped back to an existing 1094 (sub-)stream and "real" OPID through methods defined in this 1095 specification (Section 10). 1097 The media sender receiving a specific request is not required to re- 1098 configure the encoder accordingly, even if it should try to do so, 1099 but is allowed to take other (previous or concurrent) requests and 1100 any local considerations into account, possibly modifying some of the 1101 parameter values, or even totally rejecting the request if it is not 1102 seen as feasible. It is thus not possible for a media receiver to 1103 uniquely see from the media stream or even from a notification if the 1104 media sender received the request or if the request was lost and 1105 needs to be re-sent. 1107 A request should typically be based on a certain notification, but 1108 there may be situations where a request is sent approximately 1109 simultaneously with a new notification for the same stream. In that 1110 case, there is a risk that the request is based on the wrong set of 1111 codec properties compared to the new notification. It is therefore 1112 necessary to have the set of codec properties, identified by an OPID, 1113 be version controlled. If a notification announces a specific 1114 version of the operation point, where the version is updated every 1115 time it is changed, the request can refer to that specific version 1116 and any mis-reference can be clearly identified and resolved. In 1117 addition, it allows for easy identification of repeated notifications 1118 and requests, simply by checking the operation point identification 1119 and the version, and without having to parse through all of the codec 1120 properties to see if any one changed. 1122 6.5. Notification 1124 The notification is sent by a media sender and describes a media 1125 stream or sub-stream in terms of a defined, finite set of codec 1126 properties. That same set of codec properties can also be used in a 1127 request (Section 6.4). The notification and a common set of defined 1128 properties is important to a media receiver since it is rarely 1129 possible to see from the media stream itself what controllable 1130 properties were used to generate the stream. The set of codec 1131 properties and their values used to describe a certain media stream 1132 at a certain point in time is henceforth called a codec 1133 configuration. Each Operation Point in this codec configuration is 1134 implemented using a certain RTP Payload Type, defined by capability 1135 signaling outside the scope of this specification. 1137 It must be possible for a media sender to change codec configuration 1138 not only based on requests from media receivers, but also based on 1139 local limitations, considerations or user actions. This implies that 1140 the notification must be possible to send standalone and not only as 1141 a response to a request. To avoid that media receivers have to guess 1142 what codec configuration is used, a media sender should always send 1143 notifications whenever codec configuration for a stream changes. 1144 Loss of a notification should anyway not be critical since a media 1145 receiver could either fall back to infer approximate codec 1146 configuration from the media stream itself, or simply wait with a 1147 request until the next notification is sent. 1149 A notification can potentially contain a large amount of codec 1150 properties. However, parameters that are not enabled by codec and 1151 COP capability signaling, or inherently not part of the used codec 1152 will not be included. The notification only describes the currently 1153 used codec configuration, and each parameter in an operation point 1154 will thus be described by a single value. To further limit the 1155 amount of properties that needs to be sent, it is possible to rely on 1156 parameter defaults (listed by individual parameter type definitions) 1157 whenever those values are acceptable. 1159 The media receiver could want to take some local action at the time 1160 when the codec configuration in the media stream changes. Using the 1161 same reasoning as above, this may not be possible to see from the 1162 media stream itself. This functionality is explicitly enabled by 1163 inclusion of an RTP Time Stamp in the notification, where the Time 1164 Stamp describes a time (possibly in the future) when the media stream 1165 codec configuration is (estimated to be) effective. 1167 6.6. Status Report 1169 The status report is sent by a media sender and is needed to confirm 1170 reception of a specific request OPID to avoid unnecessary 1171 retransmission of requests. Loss of a status report will likely 1172 trigger a request retransmission, except when the request sender can 1173 infer from the media stream or a notification that the stream is now 1174 acceptable. 1176 The status report is not a required acknowledgement of every request, 1177 but instead reports on the last received request, identified by a 1178 request sequence number in addition to the OPID. That de-coupling of 1179 request and status report reduces the needed amount of status reports 1180 in case of frequently updated requests and/or lack of resources to 1181 send status reports. 1183 If a request is somehow not acceptable to a media sender, the status 1184 report can also indicate failure and a reason for that failure. 1186 In case the OPID in the request is a "provisional" OPID 1187 (Section 6.4), the status report responds with that exact OPID, but 1188 also includes a reference to a "real" media (sub-)stream 1189 identification or OPID that the media sender considers appropriate 1190 for the request. 1192 No description of any codec configuration is included in a status 1193 report, even if the corresponding request was successful. Used codec 1194 configuration is only carried in the notification (Section 6.5) 1195 message. Multiple status reports targeted for multiple request 1196 senders can through media (sub-)stream identification and OPID point 1197 to the same notification message, reducing the need to repeat 1198 applicable codec configuration parameters with every accepted 1199 request. 1201 6.7. Adding and Removing Operation Points 1203 A media sender can unilaterally create a new Operation Point by 1204 simply selecting a free OPID identifier and use COPN to announce it. 1206 To remove an Operation Point, the media sender simply stops 1207 announcing it in COPN. This procedure can be used both for entire 1208 media streams containing a single Operation Point and to add/remove 1209 sub-streams in media streams containing multiple Operation Points. 1211 The media receiver can request a new Operation Point to be created by 1212 using a COPR with an unused identifier and a by setting a flag to 1213 indicate that this requests a new OPID. The media sender then 1214 decides if it honors the request or not, and announces the new OPID 1215 as described above. 1217 The media receiver can indicate that it is no longer interested in 1218 receiving an Operation Point corresponding to a media sub-stream by 1219 not including any COPR Message Item for it in a single COP Message. 1220 The media receiver can indicate a wish to continue to receive an 1221 unmodified Operation Point using a COPR without any codec properties 1222 (no change). 1224 7. Codec Control Message Extension 1226 This specification specifies a new feedback message, COP, for codec 1227 control of real-time media, as an extension to the AVPF [RFC4585] and 1228 CCM [RFC5104] specifications. The AVPF specification outlines a 1229 mechanism for fast feedback messages over RTCP, which is applicable 1230 for IP based real-time media transport and communication services. 1231 It defines both transport layer and payload-specific feedback 1232 messages. This specification targets the payload-specific type, 1233 since a certain codec is typically described by a payload type. 1235 AVPF defines three and CCM defines four payload-specific feedback 1236 messages (PSFB). All AVPF and CCM messages are identified by means 1237 of the feedback message type (FMT) parameter. This specification 1238 specifies one additional payload-specific feedback message. 1240 One new PSFB FMT value is assigned in this specification: 1242 TBA1: Codec Operation Point (COP) 1244 This section defines the feedback message structure, message items 1245 and their semantics with the exception of the actual codec 1246 configuration parameters which are defined in the next section 1247 (Section 8). 1249 7.1. COP Message 1251 The COP message is a payload-specific AVPF CCM message identified by 1252 the PSFB FMT value listed above. It carries one or more COP Message 1253 Items, each with either a request for, a description of a certain 1254 "Operation Point"; a set of codec parameters, or a request status 1255 indication. 1257 Not all Message Items makes use of the "SSRC of media source" in the 1258 common packet header. "SSRC of media source" SHALL be set to 0 if no 1259 Message Item that makes use of it is included in the FCI. 1261 7.2. FCI Format 1263 The COP FCI MUST contain one or more Codec Operation Point Message 1264 Items. The maximum number of COP Message Items in a COP message is 1265 limited by the [RFC4585] Common Packet Format 'length' field. 1267 The definition of the AVPF feedback message format mandates that the 1268 FCI part is a multiple of 32-bit words. The below defined message 1269 items will not be 32-bit word aligned. Therefore it is sometimes 1270 necessary to insert one to three padding bytes at the end of the FCI. 1271 The number of padding bytes are determined by a receiver by comparing 1272 the sum of the message items and the feedback message length fields. 1273 The padding byte MUST be set to zero (0) and ignored on reception. 1275 7.2.1. Message Item Format 1277 All Codec Operation Point Message Items share a common header format: 1278 0 1 2 3 1279 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1281 |Type | Payload Length | OPID |N| Version | 1282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1283 : (Message Item Payload) : 1285 Figure 8: COP Message Item Header Format 1287 The message header fields are: 1289 Type (3 bits): Message Item Type. Three item types are defined in 1290 this specification, COPR, COPN and COPS, with values as listed in 1291 Table 1 below. More item types MAY be defined in extensions to 1292 this specification. Message items with a type field that has an 1293 unknown value SHALL be ignored by the receiver. 1295 Payload Length (13 bits): The total length in bytes of all data 1296 belonging to this message, following the Message Item Header, i.e. 1297 anything following the Version field. 1299 OPID (8 bits): Operation Point ID. Some (typically scalable) codecs 1300 are capable of encoding into multiple simultaneous operation 1301 points using the same SSRC, and each operation point can then be 1302 referenced by OPID. MUST be unique within the scope of an SSRC 1303 when N flag is not set. MUST be set to 0 for message items not 1304 using the field. See also Section 7.2.3. 1306 N (1 bit): A "New OPID" flag, indicating that the OPID value is 1307 chosen arbitrarily and is not meant to refer to any existing 1308 Operation Point. The message sender SHOULD NOT use an already 1309 known OPID in combination with the N flag. See also individual 1310 Message Item definitions. 1312 Version (7 bits): Referencing a specific version of the Codec 1313 Configuration identified by the OPID. 1315 7.2.2. Message Item Types 1317 The Message Types defined in this specification are: 1319 +-------+-------------------------------------------+ 1320 | Value | Message Item Type | 1321 +-------+-------------------------------------------+ 1322 | 0 | Codec Operation Point Notification (COPN) | 1323 | 1 | Codec Operation Point Request (COPR) | 1324 | 2 | Codec Operation Point Status (COPS) | 1325 | 3-6 | Unassigned | 1326 | 7 | Reserved for future extensions | 1327 +-------+-------------------------------------------+ 1329 Table 1: Message Item Type Values 1331 Each Message Type defined in this specification is described in 1332 detail in subsequent sections. 1334 7.2.3. Operation Point Identification 1336 All RTP media streams belonging to the same session can per 1337 definition be identified by the SSRC. However, identification of any 1338 sub-streams contained in the same RTP media stream (SSRC) needs to 1339 use some other identification method, scoped by the SSRC. This is 1340 the case for a media stream containing more than one Operation Point, 1341 like for example SVC [RFC6190] streams being sent using Single Stream 1342 Transport (SST) RTP packetization. 1344 The encoding of and restrictions for such sub-stream (Operation 1345 Point) identification will in general be codec specific. Therefore, 1346 the OPID used in this specification is merely an SSRC-unique 1347 identification number. It is however necessary to create a mapping 1348 between this generic number and the codec specific sub-stream 1349 identification that can be found in the media stream. This mapping 1350 is achieved by including the ID Parameter (Section 8.3) in a Message 1351 Item carrying a certain OPID. 1353 In Section 10, codec specific ID Parameter formats are defined for a 1354 few of the most common codecs that supports scalability. 1356 7.3. Codec Operation Point Notification 1358 7.3.1. Message Format 1360 0 1 2 3 1361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1363 |Type | Payload Length | OPID |N| Version | 1364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1365 | Transition Time Stamp | 1366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1367 |R|Payload Type | Codec Configuration Parameters : 1368 +-+-+-+-+-+-+-+-+ : 1369 : : 1371 Figure 9: COPN Format 1373 The COPN-specific message fields are (see also Message Item Format 1374 (Section 7.2.1)): 1376 Type (3 bits): Set to 0, as listed in Table 1. 1378 OPID (8 bits): The OPID which is described by the Codec 1379 Configuration Parameters. 1381 N (1 bit): Not used by COPN and SHALL be set to 0 by senders. 1383 Version (7 bits): Referencing a specific version of the Codec 1384 Configuration identified by the OPID. SHALL be increased by 1 1385 modulo 2^8 whenever the used Codec Configuration referenced by the 1386 OPID is changed. A repeated message SHALL NOT increase the 1387 Version. The initial value SHOULD be chosen randomly. 1389 Transition Time Stamp (32 bits): The RTP Time Stamp value when the 1390 listed Codec Configuration Parameters will be effective in the 1391 media stream, using the same time line as RTP packets for the 1392 referenced SSRC (media sender SSRC). The Time Stamp value MAY 1393 express either a time in the past or in the future, and need not 1394 map exactly to an actual RTP Time Stamp present in an RTP packet 1395 for that SSRC. The same timestamp value SHOULD be used for 1396 subsequent transmissions of the identical set of Codec 1397 Configuration Parameters for the same OPID and version. 1399 R (1 bit): Reserved. MUST be set to 0 by senders and MUST be 1400 ignored by receivers implementing this specification. MAY be 1401 defined differently by extensions to this specification. 1403 Payload Type (7 bits): SHALL be identical to the RTP header Payload 1404 Type valid for the (sub-)stream described by this OPID. 1406 Codec Configuration Parameters (variable length): Contains zero or 1407 more TLV carrying Codec Configuration Parameters as defined in 1408 Parameter Types (Section 8). 1410 7.3.2. Semantics 1412 This message is used to inform the media receiver(s) about used Codec 1413 Configuration Parameters at the media sender. The available Codec 1414 Parameter Types that can be used to describe the Codec Configuration 1415 are defined in Section 8. 1417 Some codecs may have clear inband indications in the encoded media 1418 stream of how one or more of the Codec Configuration Parameters are 1419 configured. For those codecs and Codec Configuration Parameters, 1420 COPN is not strictly necessary. Still, for some codecs and / or for 1421 some Codec Configuration Parameters, it is not unambiguously possible 1422 to see individual Codec Configuration Parameter Values from the 1423 encoded media stream, or even possible to see some Codec 1424 Configuration Parameters at all, motivating use of COPN. 1426 COPN SHOULD be scheduled for transmission when it becomes known that 1427 there are media receivers in the RTP session that did not yet receive 1428 any Codec Configuration Parameters for an active Operation Point, or 1429 whenever the effective Codec Configuration Parameters has changed 1430 significantly, but MAY be scheduled for transmission at any time. 1431 The media sender decides what amount of change is required to be 1432 considered significant. 1434 The reason for a Codec Configuration Parameter change can either be 1435 local to the sending terminal, for example as a result of user 1436 interaction or some algorithmic decision, or resulting from reception 1437 of one or more COPR messages (Section 7.4). 1439 If a media sender can no longer fulfill the established Codec 1440 Configuration Parameter restrictions of a Operation Point that was 1441 previously described by a COPN, it MAY change any Codec Configuration 1442 Parameter or even remove the entire Operation Point, and SHOULD then 1443 signal this at the earliest opportunity by sending an updated COPN to 1444 the media receiver(s). 1446 An OPID can implicitly be indicated as no longer being used by 1447 omitting that OPID from the set of COPN message items in the COP PSFB 1448 message. All OPIDs that the media sender intends to use at the 1449 latest time indicated by any transition timestamp value in the set of 1450 COPN present in the COP PSFB message, MUST be included in that COP 1451 message. 1453 All Operation Points referred by a COPS (Section 7.5) SHOULD also be 1454 detailed by a COPN message contained in the same or in a subsequent 1455 COP feedback message, even if the Operation Point did not change 1456 significantly from previous COPN. 1458 Note that the OPID Version of that COPN, subsequent to COPS, will be 1459 equal or larger than the Version indicated in the COPS. The Version 1460 difference may be larger than one (taking field wraparound into 1461 account) depending on the number of updated COPN sent since the COPR 1462 that triggered the COPS. See also description of those messages 1463 below. 1465 Note: COPN may be seen as a more explicit and elaborate version of 1466 the TSTN message of [RFC5104] and most of the considerations detailed 1467 there for TSTN also apply to COPN. 1469 7.3.2.1. Parameters 1471 The media sender decides what Codec Configuration Parameters to use 1472 in the COPN to describe an Operation Point. It is RECOMMENDED that 1473 all Codec Configuration Parameters that were accepted as restrictions 1474 based on received COPR messages are included. All Codec 1475 Configuration Parameters significantly more restrictive than implicit 1476 or explicit restrictions set by capability signaling (outside the 1477 scope of this specification) SHOULD also be included. Any Codec 1478 Configuration Parameter that are either not applicable to the Payload 1479 Type or not enabled by capability signaling MUST NOT be included. 1480 All Codec Configuration Parameters not covered by the above 1481 restrictions MAY be included. 1483 When the Operation Point has dependency to other Operation Points 1484 (such as in scalable coding), the values to use for Codec 1485 Configuration Parameters MUST describe the result when all 1486 dependencies are utilized. For example, assume an Operation Point 1487 describing a base layer with 15 Hz framerate, and a dependent 1488 Operation Point describing an enhancement layer adding another 15 Hz 1489 to the base layer, resulting in 30 Hz framerate when both layers are 1490 combined. The correct Parameter value to use for that latter, 1491 dependent "enhancement" Operation Point is 30 Hz, not the 15 Hz 1492 difference. 1494 The value of a Codec Configuration Parameter that was not included in 1495 a COPN message SHOULD either be inferred from other signaling, e.g. 1496 session setup or capability negotiation, outside the scope of this 1497 specification, or if such signaling is not available or not 1498 applicable, use the default value as defined per Parameter Type 1499 (Section 8). 1501 An Operation Point describes one specific setting of Codec 1502 Parameters, and a COPN Message therefore MUST NOT include the ALT 1503 Parameter Type (Section 8.2) in the Codec Parameters describing the 1504 Operation Point. 1506 7.3.2.2. Relation to COPR 1508 To limit RTCP bandwidth and avoid bandwidth expansion, COPN is not 1509 mandated as response to every received COPR (Section 7.4). 1511 A media sender implementing this specification SHOULD take requested 1512 Operation Points from COPR messages into account for future encoding, 1513 but MAY decide to use other Codec Configuration Parameter Values than 1514 those requested, e.g. as a result of multiple (possibly 1515 contradicting) COPR messages from different media receivers, or any 1516 media sender policies, rules or limitations. Thus, a COPN message 1517 Operation Point MAY use other Codec Configuration Parameters and 1518 other values than those requested in a COPR. 1520 The media sender SHOULD try to maintain OPIDs between COPR and COPN 1521 when COPR sender suggests a new OPID value (N flag is set) in the 1522 COPR, but MAY use another OPID in COPN. Examples where other OPID 1523 values have to be chosen are for example when the suggested OPID 1524 conflicts with an already existing OPID, or when the media sender 1525 decides that a the suggested new OPID can be fulfilled by an already 1526 existing OPID. 1528 Even if a COPR references an existing OPID (N flag cleared), the 1529 media sender may have to take other aspects than a specific COPR into 1530 account when choosing how many Operation Points to use, and the exact 1531 contents of those Operation Points. See the description on COPS 1532 (Section 7.5) on how to achieve mapping between a suggested new OPID 1533 and what OPID will actually be used. 1535 When OPID cannot be kept the same between COPN and COPR, the mapping 1536 SHALL be done using identical ID Parameters (Section 8.3) in the COPS 1537 and COPN resulting from the COPR. Further details are described in 1538 the section on COPS (Section 7.5). 1540 Since COPR references a certain COPN OPID, Version, and COPN is send 1541 unreliably and may be lost, COPN senders MUST keep at least the two 1542 last COPN Versions for each SSRC, OPID tuple and SHOULD keep at least 1543 four. 1545 7.3.3. Timing Rules 1547 The timing follows the rules outlined in section 3 of AVPF [RFC4585]. 1548 This notification message may be time critical and SHOULD be sent 1549 using early or immediate feedback RTCP timing, but MAY be sent using 1550 regular RTCP timing. 1552 A typical example when regular RTCP timing can be appropriate is when 1553 the sent media stream is further restricted from what was described 1554 by the most recent COPN, which should not cause any problems in the 1555 media receivers. Similarly, it is likely appropriate to use early or 1556 immediate timing when effective media stream restrictions urgently 1557 needs to be removed, which may require media receivers to increase 1558 their resource usage. 1560 7.3.4. Handling in Mixers and Translators 1562 Any media sender, including Mixers and Translators, that sends RTP 1563 media marked with it's own SSRC and that implements this 1564 specification SHALL also be prepared to send COPN, even if it is not 1565 the originating media source. As a result of that, such media sender 1566 may have to send updated COPN whenever the included media sources 1567 (CSRC) changes, subject to rules laid out above (Section 7.3.2). 1568 Note that this can be achieved in different ways, for example by 1569 forwarding (possibly cached) COPN from the included CSRC when the 1570 Mixer is not performing transcoding. 1572 In cases where a Mixer or Translator needs to forward a COPR from one 1573 side (A) to the other (B) (as described in Section 7.4.4), the COPN 1574 sent to the A side MAY need to be delayed until the Mixer or 1575 Translator has received a corresponding COPN from the B side, as 1576 indicated in Figure 10 below. 1578 +-------+ 1. COPR +-------+ 2. COPR +-------+ 1579 | |-------->| |-------->| | 1580 | A | 4. COPN | Mixer | 3. COPN | B | 1581 | |<--------| |<--------| | 1582 +-------+ +-------+ +-------+ 1584 Figure 10: Mixer Delay of COPN 1586 If a Mixer or Translator has decided to act partially (modify the 1587 media stream with respect to some Parameter Types, but not all) on a 1588 received COPR from the A side, and a COPN is received from the B side 1589 indicating that the current media modifications are no longer 1590 necessary, the mixer or translator SHOULD cease it's own actions that 1591 are no longer needed. It SHOULD then also issue a COPN describing 1592 the new situation to the A side, as indicated in Figure 11 below. 1594 +-------+ 1. COPR +-------+ +-------+ 1595 | |-------->| | 2. COPR | | 1596 | | 3. COPN | |-------->| | 1597 | A |<--------| Mixer | 4. COPN | B | 1598 | | 5. COPN | |<--------| | 1599 | |<--------| | | | 1600 +-------+ +-------+ +-------+ 1602 Figure 11: Mixer Update of COPN 1604 7.4. Codec Operation Point Request 1606 7.4.1. Message Format 1608 0 1 2 3 1609 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1611 |Type | Payload Length | OPID |N| Version | 1612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1613 | Sequence No | Codec Configuration Parameters : 1614 +-+-+-+-+-+-+-+-+ : 1615 : : 1617 Figure 12: COPR Format 1619 The COPR-specific message fields are: 1621 Type (3 bits): Set to 1, as listed in Table 1. 1623 OPID (8 bits): The OPID this request refers to for an existing OPID, 1624 and an arbitrarily chosen but unique value in requests for new 1625 operations points, i.e. with the N flag set. 1627 N (1 bit): MUST be set to 0 when OPID references an existing OPID 1628 announced in a COPN received from the targeted media sender, and 1629 MUST be set to 1 otherwise. 1631 Version (7 bits): When N flag is not set (0), referencing a specific 1632 version of the Codec Configuration identified by the OPID in a 1633 COPN received from the targeted media sender. Not used and MUST 1634 be set to 0 when N flag is set (1). 1636 Sequence No (8 bits): Sequence Number. SHALL be incremented by 1 1637 modulo 2^8 for every COPR that includes an updated set of 1638 requested Codec Configuration Parameters described by the same 1639 OPID and Version as was used with the previous Sequence Number. 1640 Sequence Number SHALL be kept unchanged in repetitions of this 1641 message. Initial value SHOULD be chosen randomly. 1643 Codec Configuration Parameters (variable length): Contains zero or 1644 more TLV carrying Codec Configuration Parameters as defined in 1645 Parameter Types (Section 8). 1647 7.4.2. Semantics 1649 This Message Item is sent by a media receiver wanting to control one 1650 or more Codec Configuration Parameters of the targeted media sender. 1651 The requested values MUST stay within the media capability negotiated 1652 by other means than this specification. The available Codec 1653 Configuration Parameters that can be controlled are listed in 1654 Section 8. 1656 Note: COPR may be seen as a more explicit and elaborate version of 1657 the TSTR message of [RFC5104] and most of the considerations detailed 1658 there for TSTR also apply to COPR. 1660 7.4.2.1. Sender Behavior 1662 If at least one COPN (Section 7.3) is received for the targeted 1663 stream, the Codec Configuration Parameters for that stream (SSRC) 1664 with defined OPID and Version are known to the COPR sender. The COPR 1665 MUST refer to the OPID and Version of the most recently received COPN 1666 (if any) for the targeted stream. Since it references a defined set 1667 of Codec Configuration Parameters from a COPN, the COPR SHOULD only 1668 include the Codec Configuration Parameters it wishes to change in the 1669 message, but it MAY include also unchanged Codec Configuration 1670 Parameters. 1672 If no COPN is received for the targeted stream, the COPR sender MUST 1673 choose an arbitrary OPID and set the N flag to indicate that the OPID 1674 does not refer to any existing Operation Point. In this case the 1675 Version field is not used and MUST be set to 0. The OPID value SHALL 1676 NOT be identical to any OPID from the same media source that the 1677 media receiver is aware of and has received COPN for. Since in this 1678 case no COPN reference exist, the COPR sender SHOULD include all 1679 Codec Configuration Parameters that it wishes to include a specific 1680 restriction for (other than the default). Note that for some codecs, 1681 some Codec Configuration Parameters may be possible to infer from the 1682 media stream, but if the wanted restriction includes also those and 1683 lacking a describing COPN, they SHOULD anyway be included explicitly 1684 in the COPR. 1686 Any Codec Configuration Parameter that are not enabled by capability 1687 signaling MUST NOT be included. 1689 A COPR sender MUST increment the SN field modulo 2^8 with every new 1690 COPR that includes any update to the Codec Configuration Parameters 1691 (referring to a specific version of an OPID compared to the 1692 previously sent SN, as long as it does not receive any COPS 1693 (Section 7.5) with the same OPID, Version, and SN as was used in the 1694 most recently sent COPR. COPR having a later SN MUST be interpreted 1695 as replacing any COPR with identical OPID and Version but with lower 1696 SN, taking field wrap into account. 1698 A COPR sender that did not receive any corresponding COPS, but did 1699 receive a COPN with the same OPID and with a higher Version than was 1700 used in the last COPR SHALL re-consider the COPR and MAY send an 1701 updated COPR referencing the new Version. 1703 If the capability negotiation has established that a codec supporting 1704 scalable operation is used, and if the media receiver wishes to 1705 request that scalability is used, it MAY do so by sending multiple 1706 COPR with different OPID to the same media sender. The OPID and 1707 Version used in such request MAY be based on an existing Operation 1708 Point, but it MAY also indicate a desire to introduce scalability 1709 into a previously non-scalable stream by choosing a new OPID 1710 (indicated by setting the N flag). In any case, the resulting OPIDs 1711 and sub-streams are identified through use of the ID Parameter 1712 (Section 8.3) in subsequent COPS and COPN. See also the description 1713 of COPS (Section 7.5). 1715 An Operation Point without any Codec Configuration Parameters MAY be 1716 used and MUST be interpreted as a request to keep the Operation Point 1717 unchanged. This is especially useful when modifying some but not all 1718 in a set of sub-streams. 1720 When a COPR sender is receiving multiple Operation Points and wants 1721 to continue to do so, it MUST include all Operation Points it still 1722 wishes to receive in the COPR, also those that can be left unchanged. 1724 An COPR MAY also describe alternative Operation Points that the media 1725 sender can choose from, through use of one or more ALT Parameters 1726 (Section 8.2). 1728 Since COPR references a specific COPN using SSRC, OPID and Version, a 1729 COPR sender typically needs to keep the latest Version of received 1730 COPN for each SSRC and OPID, also including the Codec Configuration 1731 Parameters. 1733 7.4.2.2. Media Sender Behavior 1735 A media sender receiving a COPR SHOULD take the request into account 1736 for future encoding, but MAY also take COPR from other media 1737 receivers and other information available to the media sender into 1738 account when deciding how to change encoding properties. 1740 A media receiver sending COPR thus cannot always expect that all 1741 Parameter Values of the request are fully honored, or even honored at 1742 all. It can only know that the COPR was taken into account when 1743 receiving a COPS (Section 7.5) from the media sender with a matching 1744 OPID, Version and SN. 1746 To what extent a COPR is honored is described by the chosen Codec 1747 Configuration Parameter values contained in a subsequent COPN message 1748 (Section 7.3) with a later (taking wraparound into account) Version 1749 than the one referred by the COPR. 1751 7.4.3. Timing Rules 1753 The timing follows the rules outlined in section 3 of [RFC4585]. 1754 This request message MAY be sent using Immediate, Early or Regular 1755 timing depending on the application's needs. 1757 A COPR sender that did not receive a corresponding COPS MAY choose to 1758 re-transmit the COPR, without increasing the SN. 1760 When an RTP media receiver (SSRC) is timing out or leaves (BYE 1761 received) from the RTP session, it SHALL implicitly imply that all 1762 COPR restrictions put by that media receiver are removed. 1764 7.4.4. Handling in Mixers and Translators 1766 A Mixer or media Translator that implements this specification and 1767 encodes content sent to the media receiver issuing the COPR SHALL 1768 consider the request to determine if it can fulfill it by changing 1769 its own encoding parameters. A Mixer encoding for multiple session 1770 participants will need to consider the joint needs of all 1771 participants when generating a COPR on its own behalf towards the 1772 media sender. 1774 A Mixer or Translator able to fulfill the COPR partially MAY act on 1775 the parts it can fulfill (and SHALL then send COPS and COPN 1776 accordingly), but SHOULD anyway forward the unaltered COPR towards 1777 the media sender, since it is likely most efficient to make the 1778 necessary Codec Configuration Parameter changes directly at the 1779 original media source. 1781 A media Translator that does not act on COP messages will forward 1782 them unaltered, according to normal Translator rules. 1784 7.5. Codec Operation Point Status 1786 7.5.1. Message Format 1788 0 1 2 3 1789 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1790 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1791 |Type | Payload Length | OPID |N| Version | 1792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1793 | SSRC of COPR sender | 1794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1795 | Sequence No | RC | Reason |Codec Configuration Parameters : 1796 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 1797 : : 1799 Figure 13: COPS Format 1801 The COPS-specific message fields are: 1803 Type (3 bits): Set to 2, as listed in Table 1. 1805 OPID (8 bits): MUST be set identical to the same field in the COPR 1806 being reported on. 1808 N (1 bit): MUST be set identical to the same field in the COPR being 1809 reported on. 1811 Version (7 bits): MUST be set identical to the same field in the 1812 COPR being reported on. 1814 SSRC of COPR sender (32 bits): MUST be set identical to the SSRC of 1815 packet sender field in the common AVPF header part of the COPR 1816 being reported on. 1818 Sequence No (8 bits): MUST be set identical to the same field in the 1819 COPR being reported on. 1821 RC (3 bits): Return Code. Indicates degree of success or failure of 1822 the COPR being reported on, as described in Table 2. 1824 Reason (5 bits): Contains more detailed information on the reason 1825 for success or failure, as described in Table 3 or extensions to 1826 this specification. 1828 Codec Configuration Parameters (variable): MAY contain an ID Codec 1829 Configuration Parameter providing codec specific media 1830 identification of the OPID, subject to conditions outlined in the 1831 text below, or MAY be empty. 1833 7.5.2. Semantics 1835 The COPS Message Item indicates the request status related to a 1836 certain SSRC OPID tuple by listing the latest received COPR 1837 (Section 7.4) SN. It effectively informs the COPR sender that it no 1838 longer needs to re-send that COPR SN (or any previous SN). 1840 COPS indicates that the specified COPR was successfully received by 1841 the media sender targeted in the request. If the COPR suggested 1842 Codec Configuration Parameters could be understood (Table 2), they 1843 may be taken into account, possibly together with COPR messages from 1844 other receivers and other aspects applicable to the specific media 1845 sender. The Return Code carries an indication to which extent the 1846 COPR could be honored. 1848 +-------+-------------------------------+ 1849 | Value | Meaning | 1850 +-------+-------------------------------+ 1851 | 0 | Success | 1852 | 1 | Partial success | 1853 | 2 | Failure | 1854 | 3-6 | Unassigned | 1855 | 7 | Reserved for future extension | 1856 +-------+-------------------------------+ 1858 Table 2: Return Code Values 1860 A Success Return Code indicates that the resulting media 1861 configuration is fully in line with the COPR. 1863 A Partial Success Return Code indicates that the resulting media 1864 configuration is not fully in line with the COPR, but that the media 1865 sender regards the COPR to be sufficiently well represented by one or 1866 more of the existing Operation Points. 1868 A Failure Return code indicates that the media sender failed to take 1869 the COPR into account, either due to some error condition or because 1870 no media stream could be created or changed to comply. 1872 The Reason Values defined below are independent of Return Code, but 1873 all reasons may not be meaningful with all return codes. More 1874 reasons MAY be defined in extensions to this specification. 1876 +-------+----------------------------------------------------------+ 1877 | Value | Meaning | 1878 +-------+----------------------------------------------------------+ 1879 | 0 | Success | 1880 | 1 | Unknown OPID | 1881 | 2 | Too many Operation Points | 1882 | 3 | Request violates capability limits | 1883 | 4 | Too old Operation Point Version | 1884 | 5 | Unknown Parameter Type | 1885 | 6 | Parameter Value too long | 1886 | 7 | Invalid Comparison Type | 1887 | 8 | One or more parameter values in the request were changed | 1888 | 9-31 | Unassigned | 1889 +-------+----------------------------------------------------------+ 1891 Table 3: Reason Values 1893 COPS is typically sent without any Codec Configuration Parameters. 1894 When the N flag was set in the related COPR, a non-failing COPS MUST 1895 include an ID Parameter (Section 8.3) identifying the actual sub- 1896 stream that the media sender considers applicable to the COPR. The 1897 OPID used by that sub-stream can be found through examining ID 1898 Parameters of subsequent COPN from the same media source for ID 1899 values matching the one in COPS. 1901 Senders implementing this specification MUST NOT use any other Codec 1902 Configuration Parameter Types than ID in a COPS message. The 1903 contained ID Parameter points to the specific media (sub-)stream that 1904 the media sender regards as applicable to the COPR. 1906 When a COPR receiver has received multiple COPR messages from a 1907 single COPR source with the same OPID but with several different 1908 values of Version and/or SN, and for which it has not yet sent a 1909 COPS, it SHALL only send COPS for the COPR with the Highest SN, 1910 taking field wrap of those two fields into account. 1912 7.5.3. Timing Rules 1914 COPS SHALL be sent at the earliest opportunity after having received 1915 a COPR, with the following exception: 1917 A media sender that receives a COPR with a previously received 1918 OPID, Version, and SN closely after sending a COPS for that same 1919 OPID, Version, and SN (within 2 times the longest observed round 1920 trip time, plus any AVPF-induced packet sending delays), SHOULD 1921 await a repeated COPR before scheduling another COPS transmission 1922 for that OPID, Version, and SN. 1924 The exception is introduced to avoid unnecessary COPS transmission 1925 when there is a chance that already sent COPS or COPN may satisfy or 1926 invalidate the COPR. 1928 7.5.4. Handling in Mixers and Translators 1930 A Mixer or media Translator that implements this specification, 1931 encoding content sent to media receivers and that acts on COPR SHALL 1932 also report using COPS, just like any other media sender. An RTP 1933 Translator not knowing or acting on COPR will forward all COP 1934 messages unaltered, according to normal RTP Translator rules. 1936 8. Parameter Types 1938 This section defines the general Codec Configuration Parameter (CCP) 1939 TLV format. Then a number of different parameter formats are 1940 defined. It is expected that a number of additional CCPs will be 1941 defined in the future as the needs of different codecs are explored 1942 or developed. 1944 8.1. Parameter Format 1946 COP Message Items MAY contain one or more Codec Configuration 1947 Parameters, encoded in TLV (Type-Length-Value) format, which SHOULD 1948 then be interpreted as simultaneously applicable to the defined 1949 Operation Point. Parameter Values MUST be byte-aligned. 1951 0 1 2 3 1952 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1954 | ParamType | C | Length | | 1955 +---------------+---+-----------+ | 1956 | | 1957 / Parameter Value / 1958 / +--------------+ 1959 | | 1960 +------------------------------------------------+ 1962 Figure 14: Codec Parameter Format 1964 ParamType (8 bits): The Codec Configuration Parameter Type, encoded 1965 as defined in Table 4 and possible extensions to this 1966 specification. A parameter with an unknown ParamType SHALL be 1967 ignored on reception in a COPN and SHALL either be reported as 1968 unknown in COPS or be ignored when received in COPR. 1970 C (2 bits): Comparison Type, encoded as defined in Table 5, unless 1971 specified otherwise by individual ParamType definitions. The 1972 Comparison Type specifies what type of restriction the Codec 1973 Configuration Parameter Value expresses and how it should be 1974 compared to other Codec Configuration Parameter Values of the same 1975 ParamType. 1977 Exact: The Parameter Value is an exact value, and no other values 1978 are acceptable. MUST NOT be used together with any other 1979 Comparison Types for the same ParamType. 1981 Minimum: The Parameter Value is an inclusive minimum restriction. 1982 MAY be used together with Maximum and/or Target Comparison 1983 Types for the same ParamType. If no minimum restriction is 1984 specified, no specific minimum restriction exist. 1986 Maximum: The Parameter Value is an inclusive maximum restriction. 1987 MAY be used together with Minimum and/or Target Comparison 1988 Types for the same ParamType. If no maximum restriction is 1989 specified, no specific maximum restriction exist. 1991 Target: The Parameter Value is a preferred target value, but 1992 other values within a specified range are acceptable. This 1993 type MUST be used together with at least one of Minimum and 1994 Maximum Comparison Types for the same ParamType. If no target 1995 is specified, no specific preference exist. 1997 Length (6 bits): The Parameter Value Length in bytes, excluding the 1998 ParamType and the Length field itself. A Length of 0 indicates 1999 that the parameter has no value, effectively constituting a wild- 2000 carded parameter that can take on any value (expresses no specific 2001 restriction). This is also the RECOMMENDED way to explicitly 2002 remove a previously effective restriction. 2004 Parameter Value (variable length): The actual parameter value, 2005 encoded in a format defined by the specific ParamType definition. 2007 The meaning of Multiple Codec Configuration Parameters with the same 2008 ParamType and the same Comparison Type included as part of the same 2009 Operation Point is undefined and SHALL NOT be used. 2011 A Codec Configuration Parameter that is encoded in a way (including 2012 incorrectly) that cannot be interpreted by the receiver SHALL be 2013 ignored. 2015 The below parameters encoded as signed or unsigned integers uses a 2016 variable size representation in the value field. It is RECOMMENDED 2017 to only include the minimal number of bytes necessary to represent 2018 the value that is to be included in the parameter TLV. The length 2019 field in the parameter TLV will explicitly indicate how many bytes 2020 are present in the value field. All parameters using a variable size 2021 representation of their value MUST define the maximum number of bytes 2022 possible to include in the value field. 2024 The ParamType values and the SDP tags (see Section 9) for the Codec 2025 Configuration Parameter Types defined in this specification are 2026 listed below. 2028 +--------+-------------------------------+--------------+ 2029 | Value | Meaning | Tag | 2030 +--------+-------------------------------+--------------+ 2031 | 0 | ALT | alt | 2032 | 1 | ID | id | 2033 | 2 | Payload Type | pt | 2034 | 3 | Bitrate | bitrate | 2035 | 4 | Token Bucket Size | token-bucket | 2036 | 5 | Framerate | framerate | 2037 | 6 | Horizontal Pixels | hor-size | 2038 | 7 | Vertical Pixels | ver-size | 2039 | 8 | Channels | channels | 2040 | 9 | Sampling Rate | sampling | 2041 | 10 | Maximum RTP Packet Size | max-rtp-size | 2042 | 11 | Maximum RTP Packet Rate | max-rtp-rate | 2043 | 12 | Frame Aggregation | aggregate | 2044 | 13-254 | Undefined | | 2045 | 255 | Reserved for future extension | | 2046 +--------+-------------------------------+--------------+ 2048 Table 4: Parameter Type Values 2050 The values of the defined Parameter Value Comparison Type are listed 2051 below. 2053 +-------+---------+ 2054 | Value | Meaning | 2055 +-------+---------+ 2056 | 0 | Exact | 2057 | 1 | Minimum | 2058 | 2 | Maximum | 2059 | 3 | Target | 2060 +-------+---------+ 2062 Table 5: Comparison Type Values 2064 The following sub-sections describe the syntax and semantics of the 2065 different Codec Configuration Parameter Types defined in this 2066 specification. 2068 Unless explicitly specified in the sub-sections below, or in 2069 extensions to this specification, all Parameter Type values are 2070 binary encoded unsigned integers, most significant byte first (for 2071 multi-byte values). 2073 8.2. ALT 2075 This Codec Parameter Type is a special parameter, separating the 2076 Codec Configuration Parameters preceding it from the ones that follow 2077 into two separate, alternative Operation Points. 2079 Type Value: 0 2081 Tag: alt 2083 Unit: Not applicable. 2085 Semantics: A special parameter expressing an "alternative" relation 2086 between the parameters preceding it and the parameters following 2087 it. This SHOULD be interpreted as describing two alternate 2088 Operation Points where one and only one SHALL be chosen, with the 2089 Operation Point preceding ALT in the parameter list being 2090 preferred. Multiple ALT parameters MAY be used in the same 2091 parameter list, in which case each set of parameters to evaluate 2092 can be either before the first ALT parameter, between two ALT 2093 parameters, or after the last ALT parameter. Evaluating from the 2094 top of the list and obeying the above preference rule, the first 2095 acceptable set of parameters (not containing any ALT parameter) is 2096 the one to choose. 2098 Encoding: Not applicable. 2100 Media Types: All. 2102 Value Restrictions: MUST be used with the Length field set to 0. 2103 Two ALT parameters MUST be separated by at least one parameter 2104 other than ALT. 2106 Default Value: Not applicable. 2108 Comparison Types: MUST be set to 0. 2110 Note: 2112 8.3. ID 2114 This Codec Parameter Type is a special parameter that enables codec 2115 specific identification of sub-streams, for example when there are 2116 multiple sub-streams in a single SSRC. It can also be used to 2117 reference OPID, when the used codec does not support or use sub- 2118 streams. When used, it SHALL be listed first among the Codec 2119 Parameters used to describe the sub-stream. 2121 Type Value: 1 2123 Tag: id 2125 Unit: Not applicable. 2127 Semantics: A special parameter describing the, possibly codec 2128 specific, media identification for the OPID. 2130 Encoding: If used with non-scalable encoding, it MUST contain an 2131 OPID (Section 7.2.1). If used with scalable encoding, this codec 2132 specific encoding MUST be defined by Section 10. It MUST be 2133 defined to occupy an integer number of bytes, where all bits in 2134 the bytes are defined as part of the format. 2136 Media Types: All. 2138 Value Restrictions: If used with non-scalable encoding, any OPID 2139 restrictions apply. If used with scalable encoding, any 2140 restrictions MUST be defined by the definition of the codec 2141 specific sub-stream identification definition (Section 10). 2143 Default Value: Not set. 2145 Comparison Types: MUST be set to 0. 2147 Note: MAY be used whenever there is a need to identify an Operation 2148 Point in codec native format, or when there is a need to map that 2149 against an OPID. 2151 8.4. Payload Type 2153 Type Value: 2 2155 Tag: pt 2157 Unit: Not applicable. 2159 Semantics: Referencing the RTP Payload Type to use for the OPID. 2161 Encoding: The least significant 7 bits MUST use the same encoding as 2162 the RTP Payload Type field in the RTP header. The most 2163 significant bit MUST be set to 0. 2165 Media Types: All. 2167 Value Restrictions: The same restrictions valid for RTP Payload Type 2168 apply, i.e. 7-bit values 0-127. MUST be represented by a single 2169 byte in the value field. 2171 Default Value: Not set. 2173 Comparison Types: MUST be set to 0. 2175 Note: MAY be used whenever there is a need to specify Codec 2176 Configuration Parameters valid only for a certain RTP Payload 2177 Type. What media type, codec and possible parameters that are 2178 described by the RTP Payload Type is outside the scope of this 2179 specification, but is typically defined in capability or call 2180 setup signaling, for example SDP. 2182 8.5. Bitrate 2184 Type Value: 3 2186 Tag: bitrate 2188 Unit: Bits per second. 2190 Semantics: Media level per second average media bitrate, excluding 2191 IP/UDP/RTP overhead, but including RTP payload headers (similar to 2192 b=TIAS from SDP signaling [RFC3890]), rounded up to the closest 2193 integer. 2195 Encoding: Binary encoded unsigned integer, most significant byte 2196 first. 2198 Media Types: All. 2200 Value Restrictions: A value of 0 MAY be used. The largest value 2201 allowed is what is possible to represent in a 64-bit unsigned 2202 integer value, i.e. a value between 0 and 2203 18,446,744,073,709,551,615. 2205 Default Value: Maximum value computed from capability or call setup 2206 signaling, e.g. b= parameter from SDP. Note that it is often not 2207 possible to achieve more than a rough estimation from such 2208 computation. 2210 Comparison Types: All. 2212 Note: This parameter used with a maximum comparison type parameter 2213 is significantly similar to CCM Temporary Maximum Media Bit Rate 2214 (TMMBR). When being used with a maximum comparison type value of 2215 0, it is also significantly similar to PAUSE 2216 [I-D.westerlund-avtext-rtp-stream-pause]. Compared to those, this 2217 parameter conveys significant extra information through the 2218 relation to other parameters applied to the same Operation Point, 2219 as well as the possibility to express other restrictions than a 2220 maximum limit. When CCM TMMBR is supported in addition to this 2221 specification, the Bitrate parameters from all Operation Points 2222 within each SSRC should be considered and CCM TMMBR messages 2223 SHOULD be sent for those SSRC that are found to be in the bounding 2224 set (see CCM [RFC5104], section 3.5.4.2). When PAUSE is supported 2225 in addition to this specification, the Bitrate parameters from all 2226 Operation Points within each SSRC should be considered and CCM 2227 PAUSE messages SHOULD be sent for those SSRC that contain only 2228 Operation Points that are limited by a Bitrate maximum value of 0. 2230 8.6. Token Bucket Size 2232 Type Value: 4 2234 Tag: token-bucket 2236 Unit: Bytes. 2238 Semantics: Media level token bucket [RFC2212] size excluding IP/UDP/ 2239 RTP overhead, but including RTP payload headers, describing the 2240 bitrate variability over time as described in 2241 [I-D.westerlund-mmusic-sdp-bw-attribute]. This parameter can be 2242 combined with the parameter bitrate (Section 8.5) (above) to 2243 provide token bucket fill rate plus bucket size for a complete 2244 token bucket model. 2246 Encoding: Binary encoded unsigned integer, most significant byte 2247 first. 2249 Media Types: All. 2251 Value Restrictions: A value of 0 is generally not meaningful and 2252 SHOULD NOT be used. Values that can be represented using a 32-bit 2253 unsigned integer, i.e. 0 to 4,294,967,295. 2255 Default Value: 4096 bytes. 2257 Comparison Types: Maximum, Target. 2259 Note: Changing the token bucket size does not imply changing the 2260 average bitrate, it just changes the acceptable average bitrate 2261 variation over time. 2263 8.7. Framerate 2265 Type Value: 5 2267 Tag: framerate 2269 Unit: 100th of a Hz. This definition allows e.g. distinguishing 2270 between video encoded at 30 Hz (two-byte value 3000) and 29.97 Hz 2271 (two-byte value 2997). It also allows for high speed video 2272 cameras, like 1000 Hz (three-byte value 100000), and slow-scan 2273 down to one frame every 100 seconds (one-byte value 1). 2275 Semantics: The number of media frames to render per second. 2277 Encoding: Binary encoded unsigned integer, most significant byte 2278 first. 2280 Media Types: Mainly intended for video and timed image media types, 2281 but MAY be used also for other media types. 2283 Value Restrictions: A value of 0 MAY be used, meaning single-frame, 2284 request based encoding (request procedure is out of scope for this 2285 specification). Values that can be represented using a 32-bit 2286 unsigned integer, i.e. 0 to 42,949,672.95 Hz. 2288 Default Value: Maximum allowed by call setup and/or capability 2289 signaling, e.g. a=framerate parameter from SDP [RFC4566], or 2290 codec-specific configuration. 2292 Comparison Types: All. 2294 Note: A media frame is typically a set of semantically grouped 2295 samples, e.g. the relation that a video image has to its 2296 individual pixels, or the relation that an audio frame has to 2297 individual audio samples. The value applies to encoded media 2298 framerate, not the packet rate (Section 8.13) that may also be 2299 changed as a result of different Frame Aggregation (Section 8.14). 2301 8.8. Horizontal Pixels 2303 Type Value: 6 2305 Tag: hor-size 2307 Unit: Pixels. 2309 Semantics: Horizontal image size. 2311 Encoding: Binary encoded unsigned integer, most significant byte 2312 first. 2314 Media Types: Video and image. 2316 Value Restrictions: The meaning of the value 0 is not defined and 2317 SHALL NOT be used. 2319 Default Value: Maximum allowed by call setup and/or capability 2320 signaling. Values that can be represented using a 32-bit unsigned 2321 integer, i.e. 1 to 4,294,967,295. 2323 Comparison Types: All. 2325 Note: The pixel and picture aspect ratios cannot be changed with 2326 this parameter. Video encoders can typically describe both pixel 2327 and picture aspect ratios as part of the encoded media stream. 2329 8.9. Vertical Pixels 2331 Type Value: 7 2333 Tag: ver-size 2335 Unit: Pixels. 2337 Semantics: Vertical image size. 2339 Encoding: Binary encoded unsigned integer, most significant byte 2340 first. 2342 Media Types: Video and image. 2344 Value Restrictions: The meaning of the value 0 is not defined and 2345 SHALL NOT be used. Values that can be represented using a 32-bit 2346 unsigned integer, i.e. 1 to 4,294,967,295. 2348 Default Value: Maximum allowed by call setup and/or capability 2349 signaling. 2351 Comparison Types: All. 2353 Note: See Note in Section 8.8. 2355 8.10. Channels 2357 Type Value: 8 2359 Tag: channels 2361 Unit: Unit-less. 2363 Semantics: The number of media channels. 2365 Encoding: Binary encoded unsigned integer, most significant byte 2366 first. 2368 Media Types: All. 2370 Value Restrictions: The meaning of the value 0 is not defined and 2371 SHALL NOT be used. Values that can be represented using a 16-bit 2372 unsigned integer, i.e. 1 to 65,535. 2374 Default Value: Taken from call setup or capability signaling, or 1 2375 if no other value is available. 2377 Comparison Types: All. 2379 Note: This Codec Configuration Parameter SHOULD NOT be used if the 2380 capability negotiation did not establish that suitable multi- 2381 channel coding is supported by both ends. For audio, the 2382 interpretation and spatial mapping SHALL follow the one for the 2383 indicated payload format. For video, it SHALL be interpreted as 2384 the number of views in multi-view coding, where the number 2 2385 SHOULD represent stereo (3D) coding, unless negotiated otherwise 2386 by means outside of this specification, e.g. SDP. 2388 8.11. Sampling Rate 2390 Type Value: 9 2392 Tag: sampling 2393 Unit: Hz. 2395 Semantics: Frequency of the media sampling clock in Hz, as input to 2396 the codec, per channel (Section 8.10). 2398 Encoding: Binary encoded unsigned integer, most significant byte 2399 first. 2401 Media Types: Mainly intended for audio media, but MAY be used for 2402 other media types. 2404 Value Restrictions: The meaning of the value 0 is not defined and 2405 SHALL NOT be used. Values that can be represented using a 32-bit 2406 unsigned integer, i.e. 1 to 4,294,967,295. 2408 Default Value: Taken from call setup or capability signaling, e.g. 2409 RTP TS rate from SDP m-line. 2411 Comparison Types: All. 2413 Note: The value refers to the media sample clock, not the media 2414 Framerate (Section 8.7). It does not specify any codec-internal 2415 up- or down-sampling that may take place as part of the encoding 2416 process. If multiple channels (Section 8.10) are used and 2417 different channels use different sampling rates, then this 2418 parameter MUST NOT be used unless there is a known sampling rate 2419 relationship and an ordering between the channels, in which case 2420 the specified sampling rate value SHALL be taken as applicable to 2421 the first channel of the ordered set. The relationship may e.g. 2422 be known implicitly by each party through some specification, or 2423 be negotiated using other means than this specification. 2424 Typically only a limited subset of sampling frequencies makes 2425 sense to the media encoder, and sometimes it is not possible to 2426 change at all. For video, the sampling rate is very closely 2427 connected to the image horizontal (Section 8.8), vertical 2428 (Section 8.9) resolution, and framerate (Section 8.7), which are 2429 more explicit and meaningful and SHOULD therefore be used instead. 2430 For audio, changing sampling rate may require changing codec and 2431 thus changing RTP payload type. The actual media sampling rate 2432 may not be identical to the sampling rate specified for RTP Time 2433 Stamps for that RTP Payload Type. E.g. almost all video codecs 2434 use only 90 000 Hz sampling clock for RTP Time Stamps, while the 2435 actual pixel sampling clock is typically in the range from a few 2436 to several hundred MHz. Also some recent audio codecs use an RTP 2437 Time Stamp rate that differ from the actual media sampling rate. 2438 Aspects related to mid-stream changes of RTP Time Stamp rate is 2439 described in [I-D.ietf-avtext-multiple-clock-rates]. 2441 8.12. Maximum RTP Packet Size 2443 Type Value: 10 2445 Tag: max-rtp-size 2447 Unit: Bytes. 2449 Semantics: The maximum size of an RTP packet, including the RTP 2450 header but excluding lower layers. 2452 Encoding: Binary encoded unsigned integer, most significant byte 2453 first. 2455 Media Types: All. 2457 Value Restrictions: The meaning of a value less than the size of the 2458 RTP header (12 bytes for current RTP specification [RFC3550]) is 2459 not defined and SHOULD NOT be used. Values that can be 2460 represented using a 32-bit unsigned integer, i.e. 0 to 2461 4,294,967,295. 2463 Default Value: 1400 bytes for IPv4, 1280 bytes for IPv6 or if IP 2464 version cannot be determined. 2466 Comparison Types: Maximum. 2468 Note: The parameter should typically be used to adapt encoding to a 2469 known or assumed MTU limitation, and MAY be used to assist MTU 2470 path discovery in point-to-point as well as in RTP Mixer or 2471 Translator topologies. 2473 8.13. Maximum RTP Packet Rate 2475 Type Value: 11 2477 Tag: max-rtp-rate 2479 Unit: RTP packets per second. 2481 Semantics: Maximum number of RTP packets per second, calculated or 2482 estimated as the largest value appearing during a one-second 2483 sliding window, similar to the definition of "maxprate" [RFC3890]. 2485 Encoding: Binary encoded unsigned integer, most significant byte 2486 first. 2488 Media Types: All. 2490 Value Restrictions: The meaning of the value 0 is not defined and 2491 SHALL NOT be used. Values that can be represented using a 32-bit 2492 unsigned integer, i.e. 1 to 4,294,967,295. 2494 Default Value: Not set. 2496 Comparison Types: Maximum. 2498 Note: The parameter should typically be used to adapt encoding on a 2499 network that is packet rate rather than bitrate limited, if such 2500 property is known. This Codec Configuration Parameter MUST NOT 2501 exceed any negotiated "maxprate" [RFC3890] value, if present. 2503 8.14. Application Data Unit Aggregation 2505 Type Value: 12 2507 Tag: aggregate 2509 Unit: Milliseconds. 2511 Semantics: The amount of non-redundant application data unit (ADU) 2512 representing different RTP Time Stamps that should be included in 2513 the RTP payload, henceforth in this specification called an "ADU 2514 aggregate". An ADU aggregation value of 1 is equivalent to no 2515 aggregation. 2517 Encoding: Binary encoded unsigned integer, most significant byte 2518 first. 2520 Media Types: Mainly intended for audio, but MAY be used also for 2521 other media, e.g. Real-Time Text [RFC4103]. 2523 Value Restrictions: The meaning of the value 0 is not defined and 2524 SHALL NOT be used. Values that can be represented using a 16-bit 2525 unsigned integer, i.e. 1 to 65,535. 2527 Value Default Value: 1. 2529 Comparison Types: All. 2531 Note: To use this parameter, there MUST exist a defined way of 2532 including multiple ADUs into the same RTP payload for the used RTP 2533 Payload Type. There MUST also exist a known internal timing 2534 relationship between individual ADUs within the RTP payload for 2535 the used RTP Payload Type. Some payload formats (typically video) 2536 do not allow multiple ADUs (representing different sampling times) 2537 in the RTP payload. This Codec Configuration Parameter SHOULD NOT 2538 be used unless the "maxprate" [RFC3890] and/or "ptime" parameters 2539 are included in the SDP. The requested ADU aggregation level MUST 2540 NOT cause exceeding the negotiated "maxprate" value, if present, 2541 and SHOULD NOT exceed the negotiated "ptime" value, if present. 2542 The requested frame aggregation level MUST NOT be in conflict with 2543 any Maximum RTP Packet Size (Section 8.12) or Maximum RTP Packet 2544 Rate (Section 8.13) parameters. The packet rate that may result 2545 from different frame aggregation values is related to, but 2546 semantically not the same as, media Framerate (Section 8.7). 2548 9. SDP Extensions 2550 As described in [RFC4585] and [RFC5104], the rtcp-fb attribute may be 2551 used to negotiate capability to handle specific AVPF commands and 2552 indications, and specifically the "ccm" feedback value is used for 2553 codec control. All rules defined there related to use of "rtcp-fb" 2554 and "ccm" also apply to the new feedback message defined in this 2555 specification. 2557 9.1. Extension of the rtcp-fb Attribute 2559 In this document, a new "ccm" rtcp-fb-ccm-param is defined, according 2560 to the method of extension described in [RFC5104]: 2562 o "cop" indicates support for all COP Message Items defined in this 2563 specification, and one or more of the Codec Configuration 2564 Parameters defined in this specification 2566 The ABNF [RFC5234] for the new rtcp-fb-ccm-param is: 2568 rtcp-fb-ccm-param =/ SP "cop" 1*rtcp-fb-ccm-cop-param 2569 ; rtcp-fb-ccm-param defined in [RFC5104] 2571 rtcp-fb-ccm-cop-param = SP "alt" 2572 / SP "id" 2573 / SP "pt" 2574 / SP "bitrate" 2575 / SP "token-bucket" 2576 / SP "framerate" 2577 / SP "hor-size" 2578 / SP "ver-size" 2579 / SP "channels" 2580 / SP "sampling" 2581 / SP "max-rtp-size" 2582 / SP "max-rtp-rate" 2583 / SP "aggregate" 2584 / SP token ; for future extensions 2585 ; token defined in [RFC4566] 2587 Figure 15: ABNF for cop 2589 Token values for rtcp-fb-ccm-cop-param are defined in Table 4. Their 2590 semantics are described in Section 8. 2592 Supported Parameter Types are indicated by including one or more 2593 rtcp-fb-ccm-cop-param. 2595 9.2. Offer/Answer Usage 2597 The usage of Offer/Answer [RFC3264] in this specification inherits 2598 all applicable usage defined in [RFC5104]. 2600 In order to announce support, and willingness to use, the CCM "cop" 2601 feedback message, an offerer or answerer SHALL indicate that 2602 capability through the extended SDP rtcp-fb attribute, defined in 2603 Section 9.1. The offerer or answerer MUST include a list of the 2604 Parameter Types that it is willing to receive. 2606 If an SDP offer does not indicate support of the CCM "cop" feedback 2607 message, the answerer MUST NOT indicate support in the associated SDP 2608 answer. 2610 The answerer MAY add and/or remove Parameter Types that were not 2611 present in the associated SDP offer. If the answerer adds Parameter 2612 Types to the SDP answer, it MUST be able to receive such messages, 2613 but the answerer MUST NOT send such messages towards the offerer. 2615 If an SDP answer does not indicate support of the CCM "cop" feedback 2616 message, the offerer MUST NOT send such messages towards the 2617 answerer. 2619 The offerer and the answerer SHOULD NOT send any Parameter Types that 2620 the remote party did not indicate receive support for. As described 2621 in Section 8, a parameter with an unknown ParamType SHALL be ignored 2622 on reception in a COPN and SHALL either be reported as unknown in 2623 COPS or be ignored when received in COPR. 2625 Entities MUST list all supported Parameter Types in every subsequent 2626 SDP offer or answer associated with the session. If a Parameter Type 2627 is not listed, it is an indication that the offerer or answerer is no 2628 longer willing to receive such messages within the session. 2630 9.3. Declarative Usage 2632 Declarative use of the CCM "cop" does not differ from the Offer/ 2633 Answer usage. 2635 10. Codec Sub-Stream Identification 2637 The defined mechanism is not bound to a specific codec. It uses the 2638 main characteristics of a chosen set of media types, including audio 2639 and video. To what extent this mechanism can be applied depends on 2640 which specific codec is used. 2642 When using a codec that can produce separate sub-streams within a 2643 single SSRC, those sub-streams can only be referred with a COP OPID 2644 if there is a defined relation to the codec-specific sub-stream 2645 identification. This is accomplished in this specification by 2646 defining an ID Parameter format using codec-specific sub-stream 2647 identification for each such codec. 2649 If such sub-streams have dependencies, the OPID describes the 2650 characteristics of the sub-stream including all it's dependencies, 2651 but excluding any sub-streams that are dependent on this sub-stream. 2652 The sub-stream identification describes a single, payload specific 2653 node in a dependency tree, and does in general not include any 2654 identification of the sub-streams it depends on, or the dependency 2655 structure between sub-streams. Any dependency structure must thus be 2656 described by the media stream payload format and is out of scope for 2657 this specification. 2659 This section contains ID Parameter format definitions for a few 2660 selected codecs. The format definitions MUST use an integer number 2661 of bytes and MUST define all bits in those bytes. Note, the ID 2662 parameter is interpreted in the context of a given SSRC and a 2663 specific RTP payload type. 2665 Extensions to this specification MAY add more codec-specific 2666 definitions than the ones described in the sub-sections below. Such 2667 definitions made in extensions to this specification SHOULD be 2668 considered as an integrated part of this section, with respect to 2669 usage with other mechanisms defined in this specification. 2671 10.1. H.264 AVC 2673 Some non-scalable video codecs such as H.264 AVC [H264] and 2674 corresponding RTP payload format [RFC6184] can accomplish 2675 simultaneous encoding of multiple operation points. H.264 AVC can 2676 encode a video stream using limited-reference and non-reference 2677 frames such that it enables limited temporal scalability, by use of 2678 the nal_ref_id syntax element. 2680 The ID Parameter Type is defined below: 2681 0 2682 0 1 2 3 4 5 6 7 2683 +-+-+-+-+-+-+-+-+ 2684 | Reserved | N | 2685 +-+-+-+-+-+-+-+-+ 2687 Figure 16: ID Definition for AVC 2689 Reserved (6 bits): Reserved. SHALL be set to 0 by senders and SHALL 2690 be ignored by receivers implementing this specification. MAY be 2691 defined differently by extensions to this specification. 2693 N (2 bits): SHALL be identical to the highest value of the 2694 nal_ref_idc H.264 NAL header syntax element valid for the sub- 2695 bitstream described by this OPID, with the exception of 2696 nal_ref_idc value 3 that is valid for and is part of all sub- 2697 bitstreams. 2699 10.2. H.264 SVC 2701 This document specifies the usage of multiple, simultaneous codec 2702 operation points and therefore maps well to scalable video coding. 2703 Scalable video coding such as H.264 SVC (Annex G) [H264] uses three 2704 scalability dimensions: temporal, spatial, and quality. It also 2705 includes the possibility to use redundant encodings and priority 2706 among sub-streams. 2708 The ID SHALL be considered describing an SVC sub-bitstream, which is 2709 defined in G.3.59 of H.264 [H264] and corresponding RTP payload 2710 format [RFC6190]. For use with H.264 SVC, ID SHALL be constructed as 2711 defined below: 2712 0 1 2 2713 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 2714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2715 |R| PID | RPC | DID | QID | TID | 2716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2718 Figure 17: ID Definition for SVC 2720 R (1 bit): Reserved. SHALL be set to 0 by senders and SHALL be 2721 ignored by receivers implementing this specification. MAY be 2722 defined differently by extensions to this specification. 2724 PID (6 bits): SHALL be identical to an unsigned binary integer 2725 representation of the priority_id H.264 syntax element valid for 2726 the sub-bitstream described by this OPID. SHALL be set to 0 if no 2727 priority_id is available. 2729 RPC (7 bits): SHALL be identical to an unsigned binary integer 2730 representation of the redundant_pic_cnt H.264 syntax element valid 2731 for the sub-bitstream described by this OPID. SHALL be set to 0 2732 if no redundant_pic_cnt is available. 2734 DID (3 bits): SHALL be identical to the dependency_id H.264 syntax 2735 element valid for the sub-bitstream described by this OPID. 2737 QID (4 bits): SHALL be identical to the quality_id H.264 syntax 2738 element valid for the sub-bitstream described by this OPID. 2740 TID (3 bits): SHALL be identical to the temporal_id H.264 syntax 2741 element valid for the sub-bitstream described by this OPID 2743 11. Examples 2745 COP messages are binary encoded. However, in the following examples, 2746 all COP messages are for clarity listed in symbolic, pseudo-code 2747 form, where only COP message fields of interest to the example are 2748 included, along with the COP Parameters. 2750 11.1. SDP Offer/Answer 2752 The SDP capabilities for COP are defined as receiver capabilities, 2753 meaning that there is no explicit indication what COP messages an 2754 end-point will use in the send direction. It is however reasonable 2755 to expect that an end-point can also send the same messages that it 2756 can understand and act on when received. This is assumed in all the 2757 SDP examples below, but note that symmetric COP capabilities is not a 2758 requirement. 2760 The example below shows an SDP Offer, where support of CCM "cop" 2761 message is announced for the video codecs. 2763 v=0 2764 o=alice 2890844526 2890844526 IN IP4 host.atlanta.example 2765 s=- 2766 c=IN IP4 host.atlanta.example 2767 t=0 0 2768 m=audio 50000 RTP/AVP 0 8 97 2769 b=AS:80 2770 a=rtpmap:0 PCMU/8000 2771 a=rtpmap:8 PCMA/8000 2772 a=rtpmap:97 iLBC/8000 2773 m=video 50010 RTP/AVPF 31 32 2774 b=AS:600 2775 a=rtpmap:31 H261/90000 2776 a=rtpmap:32 MPV/90000 2777 a=rtcp-fb:31 ccm cop framerate bitrate token-rate 2778 a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ 2779 token-rate 2781 Figure 18: SDP Offer (COP support indicated) 2783 Note that the offer contains two different video payload types, and 2784 that the COP Parameters differ between them, meaning that the 2785 possibility for codec configuration also differ. In this case, the 2786 MPEG-1 codec can control both framerate and image size, but for H.261 2787 only the framerate can be controlled. 2789 In the SDP Answer below, responding to the above offer, the answerer 2790 supports CCM "cop" messages. 2792 v=0 2793 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example 2794 s=- 2795 c=IN IP4 host.biloxi.example 2796 t=0 0 2797 m=audio 52000 RTP/AVP 0 2798 b=AS:80 2799 a=rtpmap:0 PCMU/8000 2800 m=video 52100 RTP/AVPF 32 2801 b=AS:600 2802 a=rtpmap:32 MPV/90000 2803 a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ 2804 token-rate packet-size 2806 Figure 19: SDP Answer (COP support indicated) 2808 Note that the answerer indicates support for more parameter types 2809 than the offerer. 2811 Below is another SDP Answer, also responding to the same offer above, 2812 where the answerer does not support "cop". 2814 v=0 2815 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example 2816 s=- 2817 c=IN IP4 host.biloxi.example 2818 t=0 0 2819 m=audio 52000 RTP/AVP 0 2820 b=AS:80 2821 a=rtpmap:0 PCMU/8000 2822 m=video 52100 RTP/AVPF 32 2823 b=AS:600 2824 a=rtpmap:32 MPV/90000 2826 Figure 20: SDP Answer (COP support not indicated) 2828 11.2. Dynamic Video Re-sizing 2830 In this example, two COP-enabled end-points communicate in an audio/ 2831 video session. The receiving end-point has a graphical user 2832 interface that can be dynamically changed by the user. This user 2833 interaction includes the ability to change the size of the receiving 2834 video window, which is also indicated in the previous SDP example 2835 (Section 11.1). 2837 At some point during the established communication, a notification 2838 about current video stream Codec Operation Point is sent to the re- 2839 sizable window end-point that receives the video stream. 2841 COPN {SSRC:123456, OPID:123, Version:5, 2842 bitrate(max):325000, 2843 token-bucket(exact):1000, 2844 framerate(exact):15, 2845 hor-size(exact):320, 2846 ver-size(exact):240} 2848 Figure 21: COPN for QVGA 15 Hz 2850 Some time later the user of the re-sizable window end-point reduces 2851 the size of the video window. As a result of the re-size operation, 2852 the video window can no longer make full use of the received video 2853 resolution, wasting bandwidth and decoder processing resources. The 2854 re-sizable window end-point thus decides to notify the video stream 2855 sender about the changed conditions by sending a request for a video 2856 stream of smaller size: 2858 COPR {SSRC:123456, OPID:123, Version:5, 2859 hor-size(target):243, 2860 ver-size(target):185} 2862 Figure 22: COPR for 243x185 2864 The COPR refers to the previously received COPN with the same OPID 2865 and Version, and thus need only list parameters that need be changed. 2866 The request could arguably contain also other parameters that are 2867 potentially affected by the spatial resolution, such as the bitrate, 2868 but that can be omitted since the media sender is not slaved to the 2869 request but is allowed to make it's own decisions based on the 2870 request. 2872 The request sender has chosen to use target type values instead of an 2873 exact value for the horizontal and vertical sizes, which can be 2874 interpreted as "anything sufficiently similar is acceptable". The 2875 target values is in this example chosen to correspond exactly to the 2876 re-sized video display area. Many video coding algorithms operate 2877 most efficiently when the image size is some even multiple, and this 2878 way of expressing the request explicitly leaves room for the media 2879 sender to take such aspect into account. 2881 The media sender (COPR receiver) responds with the following: 2883 COPS {SSRC:123456, OPID:123, Version:5, 2884 Partial Success, 2885 One or more parameter values in the request were changed} 2887 COPN {SSRC:123456, OPID:123, Version:6, 2888 bitrate(max):240000, 2889 token-bucket(exact):1000, 2890 framerate(exact):15, 2891 hor-size(exact):240, 2892 ver-size(exact):176} 2894 Figure 23: COPS and COPN for Partial Success 2896 It can be noted that the updated COPN (version 6) indicates that the 2897 media sender has, in addition to reducing the video horizontal and 2898 vertical size, chosen to also reduce the bitrate. This bitrate 2899 reduction was not in the request, but is a reasonable decision taken 2900 by the media sender. It can also be seen that the horizontal and 2901 vertical sizes are not chosen identical to the request, but is in 2902 fact adjusted to be even multiples of 16, which is a local 2903 restriction of the fictitious video encoder in this example. To 2904 handle the mismatch of the request and the resulting video stream, 2905 the video receiver can perform some local action such as for example 2906 automatic re-adjustment of the re-sized window, image scaling 2907 (possibly combined with cropping), or padding. 2909 11.3. Illegal Request 2911 In this example, the sent request is asking the media sender to go 2912 beyond what is negotiated in the SDP. The SDP Offer below indicates 2913 to use video with H.264 Constrained Baseline Profile at level 1.1. 2915 v=0 2916 o=alice 2893746526 2893746526 IN IP4 host.atlanta.example 2917 s=- 2918 c=IN IP4 host.atlanta.example 2919 t=0 0 2920 m=audio 49160 RTP/AVP 96 2921 b=AS:80 2922 a=rtpmap:96 G722/16000 2923 m=video 51920 RTP/AVPF 97 2924 b=AS:200 2925 a=rtpmap:97 H264/90000 2926 a=fmtp:97 profile-level-id=42e00b 2927 a=rtcp-fb:97 ccm cop framerate bitrate token-rate 2929 Figure 24: SDP Offer With H.264 Level 1.1 2931 Assuming this offer is accepted and that the answerer also supports 2932 COP, further assume that this COP message exchange occurs at some 2933 time during the established communication: 2935 Media Sender Media Receiver 2936 ------------ -------------- 2938 COPN {SSRC:9876, OPID:67, -> 2939 Version:2, 2940 bitrate(exact):190000, 2941 token-bucket(exact):500, 2942 framerate(exact):10, 2943 hor-size(exact):320, 2944 ver-size(exact):240} 2946 <- COPR {SSRC:9876, OPID:67, 2947 Version:2, 2948 framerate(exact):10, 2949 hor-size(exact):352, 2950 ver-size(exact):288} 2952 COPS {SSRC:9876, OPID:67, -> 2953 Version:2, 2954 Failure, 2955 Request violates capability limits} 2957 Figure 25: COP Message Exchange Indicating Failure 2959 The failure above is due to a combination of frame size and frame 2960 rate that exceeds H.264 level 1.1, which would thus exceed the limits 2961 established by SDP Offer/Answer. The maximum permitted framerate for 2962 352x288 pixels (CIF) is 7.6 Hz for H.264 level 1.1, as defined in 2963 Annex A of [H264]. 2965 11.4. Reference Response to Modification of Scalable Layer 2967 When scalable coding is used, each layer correspond to a Codec 2968 Operation Point. A media receiver can thus target a request towards 2969 a single layer. Assume a video encoding with three framerate layers, 2970 announced in a (multiple operation point) notification as: 2972 COPN {SSRC:9876, OPID:67, Version:2, ID:2 2973 bitrate(exact):190000, 2974 token-bucket(exact):500, 2975 framerate(exact):10, 2976 hor-size(exact):320, 2977 ver-size(exact):240} 2979 COPN {SSRC:9876, OPID:73, Version:1, 2980 bitrate(exact):350000, ID:1 2981 token-bucket(exact):600, 2982 framerate(exact):30, 2983 hor-size(exact):320, 2984 ver-size(exact):240} 2986 COPN {SSRC:9876, OPID:95, Version:5, ID:0 2987 bitrate(exact):400000, 2988 token-bucket(exact):800, 2989 framerate(exact):60, 2990 hor-size(exact):320, 2991 ver-size(exact):240} 2993 Figure 26: COPN Indicating Three Framerate Layers 2995 Assume further that the media receiver is not pleased with the low 2996 framerate of OPID 67, wanting to increase it from 10 Hz to 25-30 Hz. 2997 Note that the media receiver still wants to receive the other layers 2998 unchanged, not remove them, and thus has to explicitly indicate this 2999 by including them without parameters. 3001 COPR {SSRC:9876, OPID:67, Version:2, 3002 framerate(greater):25, 3003 framerate(less):30} 3005 COPR {SSRC:9876, OPID:73, Version:1} 3007 COPR {SSRC:9876, OPID:95, Version:5} 3009 Figure 27: COPR Requesting to Change One Layer 3011 The media sender decides it cannot meet the request for OPID 67, but 3012 instead considers (an unmodified) OPID 73 (with ID 1) to be a 3013 sufficiently good match: 3015 COPS {SSRC:9876, OPID:67, Version:2, 3016 Partial Success, 3017 One or more parameter values in the request were changed, 3018 ID:1} 3020 (COPN for the other two OPIDs omitted here for brevity) 3022 COPN {OSSRC:9876, OPID:73, Version:1, ID:1 3023 bitrate(exact):350000, 3024 token-bucket(exact):600, 3025 framerate(exact):30, 3026 hor-size(exact):320, 3027 ver-size(exact):240} 3029 Figure 28: COPS and COPN With Layer Modification Partial Success 3031 The COPS indicates partial success and uses the ID number to refer 3032 another OPID, describing the best compromise that can currently be 3033 used to meet the request. COPS does not contain the referred OPID, 3034 but ID should be defined in a codec-specific way that makes it 3035 possible to identify the layer directly in the media stream. If the 3036 corresponding OPID is needed, for example to attempt another request 3037 targeting that, it can be found by searching the active set of COPN 3038 for matching ID values. 3040 11.5. Successful Request to Add Codec Operation Point 3042 In this example, the media receiver is receiving a non-scalable 3043 stream from a codec that can support scalability, and wishes to add a 3044 scalability layer. Assume the existing OPID from the media sender is 3045 announced as: 3047 COPN {SSRC:3492, OPID:4, Version:2, 3048 bitrate(exact):350000, 3049 token-bucket(exact):600, 3050 framerate(exact):30, 3051 hor-size(exact):320, 3052 ver-size(exact):240} 3054 Figure 29: COPN With Single Operation Point 3056 The media receiver constructs a request for multiple streams by 3057 including multiple requests for different OPID. Since the new stream 3058 does not exist, it has no OPID from the media sender and the receiver 3059 chooses a random value as reference and indicates that it is a new, 3060 temporary OPID. The request for the new stream includes all 3061 parameters that the media receiver has an opinion on, and leaves the 3062 other parameters to be chosen by the media sender. In this case it 3063 is a request for identical frame size and doubled framerate. 3065 COPR {SSRC:3492, OPID:4, Version:2} 3067 COPR {SSRC:3492, OPID:237, New, Version:0, 3068 framerate(exact):60, 3069 hor-size(exact):320, 3070 ver-size(exact):240} 3072 Figure 30: COPR Requesting to Add Operation Point 3074 The media sender decides it can start layered encoding with the 3075 requested parameters. The status response to the new OPID contains a 3076 reference to an ID that is included as part of the matching, 3077 subsequent COPN. Note that since both the original and the new 3078 streams are now part of a scalable set, they must both be identified 3079 with ID parameters to be able to distinguish between them. The media 3080 sender has chosen an OPID for the new stream in the COPN, which need 3081 not be identical to the temporary one in the request, but the new 3082 stream can anyway be uniquely identified through the ID that is 3083 announced in both the COPS and COPN. 3085 Note that since the ID has a defined relation to the media sub-stream 3086 identification, decoding of that new sub-stream can start immediately 3087 after receiving the COPS. It may however not be possible to describe 3088 the new stream in COP parameter terms until the COPN is received 3089 (depending on COP parameter visibility directly in the media stream). 3091 COPS {SSRC:3492, OPID:4, Version:2, 3092 Success, Success, 3093 ID:1} 3095 COPS {SSRC:3492, OPID:237, New, Version:0, 3096 Success, Success, 3097 ID:0} 3099 COPN {SSRC:3492, OPID:4, Version:2, ID:1, 3100 bitrate(exact):350000, 3101 token-bucket(exact):600, 3102 framerate(exact):30, 3103 hor-size(exact):320, 3104 ver-size(exact):240} 3106 COPN {SSRC:3492, OPID:9, Version:0, ID:0, 3107 bitrate(exact):390000, 3108 token-bucket(exact):600, 3109 framerate(exact):60, 3110 hor-size(exact):320, 3111 ver-size(exact):240} 3113 Figure 31: COPS and COPN Indicating Operation Point Added 3115 12. IANA Considerations 3117 Following the guidelines in [RFC4566], in [RFC4585], and in 3118 [RFC3550], the IANA is requested to register: 3120 1. The 'cop' tag to be used with ccm under rtcp-fb AVPF attribute in 3121 SDP. 3123 2. The FMT number TBA1 to be allocated to the COP feedback message 3124 from this specification. 3126 3. A registry listing registered values for 'cop' Message Item Type, 3127 with initial values from Table 1. 3129 4. A registry listing registered values and tag names for 'cop' 3130 Parameter Type, with initial values from Table 4. 3132 13. Security Considerations 3134 Editor's Note: Security considerations must be added. 3136 14. Open Issues 3138 There is currently no defined way for a media receiver to indicate 3139 that it wants to release the restrictions it previously had on an 3140 Operation Point, if the media stream contains only a single Operation 3141 Point. 3143 15. Acknowledgements 3145 The authors would like to thank Prof. Dr.-Ing. Markus Kampmann at 3146 Fachhochschule Koblenz University of Applied Sciences and Prof. Dr.- 3147 Ing. Frank Hartung at Multimediatechnik, Audio- und Videotechnik at 3148 Fachhochschule Aachen for fruitful contributions and discussions 3149 during the initial stages of writing this specification. The authors 3150 would also like to thank Christer Holmberg for feedback on the 3151 specification. 3153 16. References 3155 16.1. Normative References 3157 [H264] ITU-T Recommendation H.264, "Advanced video coding for 3158 generic audiovisual services", March 2010. 3160 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3161 Requirement Levels", BCP 14, RFC 2119, March 1997. 3163 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 3164 with Session Description Protocol (SDP)", RFC 3264, 3165 June 2002. 3167 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 3168 Jacobson, "RTP: A Transport Protocol for Real-Time 3169 Applications", STD 64, RFC 3550, July 2003. 3171 [RFC3890] Westerlund, M., "A Transport Independent Bandwidth 3172 Modifier for the Session Description Protocol (SDP)", 3173 RFC 3890, September 2004. 3175 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 3176 Description Protocol", RFC 4566, July 2006. 3178 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 3179 "Extended RTP Profile for Real-time Transport Control 3180 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 3181 July 2006. 3183 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 3184 "Codec Control Messages in the RTP Audio-Visual Profile 3185 with Feedback (AVPF)", RFC 5104, February 2008. 3187 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 3188 Specifications: ABNF", STD 68, RFC 5234, January 2008. 3190 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 3191 Payload Format for H.264 Video", RFC 6184, May 2011. 3193 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 3194 "RTP Payload Format for Scalable Video Coding", RFC 6190, 3195 May 2011. 3197 16.2. Informative References 3199 [I-D.ietf-avtext-multiple-clock-rates] 3200 Petit-Huguenin, M., "Support for multiple clock rates in 3201 an RTP session", draft-ietf-avtext-multiple-clock-rates-02 3202 (work in progress), January 2012. 3204 [I-D.westerlund-avtext-rtp-stream-pause] 3205 Akram, A., Burman, B., Grondal, D., and M. Westerlund, 3206 "RTP Media Stream Pause and Resume", 3207 draft-westerlund-avtext-rtp-stream-pause-00 (work in 3208 progress), October 2011. 3210 [I-D.westerlund-mmusic-sdp-bw-attribute] 3211 Frankkila, T., Westerlund, M., and B. Burman, "Extensible 3212 Bandwidth Attribute for SDP", 3213 draft-westerlund-mmusic-sdp-bw-attribute-00 (work in 3214 progress), October 2011. 3216 [RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification 3217 of Guaranteed Quality of Service", RFC 2212, 3218 September 1997. 3220 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 3221 A., Peterson, J., Sparks, R., Handley, M., and E. 3222 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 3223 June 2002. 3225 [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control 3226 Protocol Extended Reports (RTCP XR)", RFC 3611, 3227 November 2003. 3229 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 3230 Conversation", RFC 4103, June 2005. 3232 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 3233 IP", RFC 4607, August 2006. 3235 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 3236 January 2008. 3238 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 3239 Protocol (RTCP) Extensions for Single-Source Multicast 3240 Sessions with Unicast Feedback", RFC 5760, February 2010. 3242 [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP 3243 Control Protocol (RTCP)", RFC 5968, September 2010. 3245 Authors' Addresses 3247 Magnus Westerlund 3248 Ericsson 3249 Farogatan 6 3250 SE-164 80 Kista 3251 Sweden 3253 Phone: +46 10 714 82 87 3254 Email: magnus.westerlund@ericsson.com 3256 Bo Burman 3257 Ericsson 3258 Farogatan 6 3259 SE-164 80 Kista 3260 Sweden 3262 Phone: +46 10 714 13 11 3263 Email: bo.burman@ericsson.com 3265 Laurits Hamm 3266 Ericsson 3267 Ericsson Allee 1 3268 DE-52134 Herzogenrath 3269 Germany 3271 Phone: +49 2407 575 6779 3272 Email: laurits.hamm@ericsson.com