idnits 2.17.1 draft-westerlund-avtext-codec-operation-point-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 22, 2012) is 4202 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'H241' -- Possible downref: Non-RFC (?) normative reference: ref. 'H264' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtext-multiple-clock-rates-02 == Outdated reference: A later version (-05) exists of draft-westerlund-avtext-rtp-stream-pause-02 == Outdated reference: A later version (-02) exists of draft-westerlund-mmusic-sdp-bw-attribute-00 -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Standards Track L. Hamm 5 Expires: April 25, 2013 Ericsson 6 October 22, 2012 8 Codec Operation Point RTCP Extension 9 draft-westerlund-avtext-codec-operation-point-01 11 Abstract 13 The Audio-visual Profile with Feedback (AVPF) specification defines a 14 framework and messages for fast feedback and media control over RTCP. 15 The Codec Control Messages (CCM) specification defines an extension 16 to AVPF, by specifying additional messages for codec control and 17 feedback. This specification extends CCM, by specifying messages 18 that let participants dynamically communicate a set of codec 19 configuration parameters, which enables better optimization of 20 resource efficiency and quality of media transmission. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on April 25, 2013. 39 Copyright Notice 41 Copyright (c) 2012 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 58 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 59 2.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 6 60 2.3. Requirements Language . . . . . . . . . . . . . . . . . . 7 61 3. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 62 3.1. Problem Description . . . . . . . . . . . . . . . . . . . 7 63 3.2. Legacy Methods . . . . . . . . . . . . . . . . . . . . . . 10 64 3.2.1. Relation to SDP . . . . . . . . . . . . . . . . . . . 10 65 3.2.2. Relation to RTCP . . . . . . . . . . . . . . . . . . . 10 66 4. Use Cases for COP . . . . . . . . . . . . . . . . . . . . . . 11 67 4.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 11 68 4.2. Media Receiver to RTP Mixer . . . . . . . . . . . . . . . 12 69 4.3. RTP Mixer to Media Sender . . . . . . . . . . . . . . . . 13 70 4.4. Media Receiver in Multicast or with RTP Transport 71 Translator . . . . . . . . . . . . . . . . . . . . . . . . 16 72 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 18 73 6. Solution Overview . . . . . . . . . . . . . . . . . . . . . . 19 74 6.1. Message Structure . . . . . . . . . . . . . . . . . . . . 21 75 6.2. Codec Configuration Parameter Use . . . . . . . . . . . . 22 76 6.3. Operation Point . . . . . . . . . . . . . . . . . . . . . 23 77 6.4. Request . . . . . . . . . . . . . . . . . . . . . . . . . 24 78 6.5. Notification . . . . . . . . . . . . . . . . . . . . . . . 25 79 6.6. Status Report . . . . . . . . . . . . . . . . . . . . . . 26 80 6.7. Adding and Removing Operation Points . . . . . . . . . . . 27 81 7. Codec Control Message Extension . . . . . . . . . . . . . . . 27 82 7.1. COP Message . . . . . . . . . . . . . . . . . . . . . . . 28 83 7.2. FCI Format . . . . . . . . . . . . . . . . . . . . . . . . 28 84 7.2.1. Message Item Format . . . . . . . . . . . . . . . . . 29 85 7.2.2. Message Item Types . . . . . . . . . . . . . . . . . . 30 86 7.2.3. Operation Point Identification . . . . . . . . . . . . 30 87 7.3. Codec Operation Point Notification . . . . . . . . . . . . 31 88 7.3.1. Message Format . . . . . . . . . . . . . . . . . . . . 31 89 7.3.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 32 90 7.3.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 35 91 7.4. Codec Operation Point Request . . . . . . . . . . . . . . 35 92 7.4.1. Message Format . . . . . . . . . . . . . . . . . . . . 35 93 7.4.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 36 94 7.4.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 38 95 7.5. Codec Operation Point Status . . . . . . . . . . . . . . . 38 96 7.5.1. Message Format . . . . . . . . . . . . . . . . . . . . 38 97 7.5.2. Semantics . . . . . . . . . . . . . . . . . . . . . . 40 98 7.5.3. Timing Rules . . . . . . . . . . . . . . . . . . . . . 41 99 7.6. Handling in Mixers and Translators . . . . . . . . . . . . 42 100 7.6.1. COPN . . . . . . . . . . . . . . . . . . . . . . . . . 42 101 7.6.2. COPR . . . . . . . . . . . . . . . . . . . . . . . . . 43 102 7.6.3. COPS . . . . . . . . . . . . . . . . . . . . . . . . . 43 103 8. Parameter Types . . . . . . . . . . . . . . . . . . . . . . . 43 104 8.1. Parameter Format . . . . . . . . . . . . . . . . . . . . . 43 105 8.2. ALT . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 106 8.3. ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 107 8.4. Payload Type . . . . . . . . . . . . . . . . . . . . . . . 48 108 8.5. Bitrate . . . . . . . . . . . . . . . . . . . . . . . . . 49 109 8.6. Token Bucket Size . . . . . . . . . . . . . . . . . . . . 50 110 8.7. Framerate . . . . . . . . . . . . . . . . . . . . . . . . 51 111 8.8. Horizontal Pixels . . . . . . . . . . . . . . . . . . . . 52 112 8.9. Vertical Pixels . . . . . . . . . . . . . . . . . . . . . 52 113 8.10. Sample Aspect Ratio . . . . . . . . . . . . . . . . . . . 53 114 8.11. Picture Aspect Ratio . . . . . . . . . . . . . . . . . . . 54 115 8.12. Channels . . . . . . . . . . . . . . . . . . . . . . . . . 54 116 8.13. Sampling Rate . . . . . . . . . . . . . . . . . . . . . . 55 117 8.14. Maximum RTP Packet Size . . . . . . . . . . . . . . . . . 56 118 8.15. Maximum RTP Packet Rate . . . . . . . . . . . . . . . . . 57 119 8.16. Application Data Unit Aggregation . . . . . . . . . . . . 58 120 9. SDP Extensions . . . . . . . . . . . . . . . . . . . . . . . . 59 121 9.1. Extension of the rtcp-fb Attribute . . . . . . . . . . . . 59 122 9.2. Offer/Answer Usage . . . . . . . . . . . . . . . . . . . . 60 123 9.3. Declarative Usage . . . . . . . . . . . . . . . . . . . . 61 124 10. Codec Sub-Stream Identification . . . . . . . . . . . . . . . 61 125 10.1. H.264 AVC . . . . . . . . . . . . . . . . . . . . . . . . 62 126 10.2. H.264 SVC . . . . . . . . . . . . . . . . . . . . . . . . 62 127 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 128 11.1. SDP Offer/Answer . . . . . . . . . . . . . . . . . . . . . 63 129 11.2. Dynamic Video Re-sizing . . . . . . . . . . . . . . . . . 65 130 11.3. Illegal Request . . . . . . . . . . . . . . . . . . . . . 67 131 11.4. Reference Response to Modification of Scalable Layer . . . 68 132 11.5. Successful Request to Add Codec Operation Point . . . . . 70 133 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 72 134 13. Security Considerations . . . . . . . . . . . . . . . . . . . 72 135 14. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 72 136 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 73 137 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 73 138 16.1. Normative References . . . . . . . . . . . . . . . . . . . 73 139 16.2. Informative References . . . . . . . . . . . . . . . . . . 74 140 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 75 142 1. Introduction 144 Multimedia real-time communication services, such as video telephony 145 and videoconferencing, use the real-time transport (RTP/RTCP) 146 [RFC3550] protocol to transmit media streams, such as audio and 147 video. A session establishment protocol, such as SIP [RFC3261], in 148 combination with a capability negotiation protocol, such as SDP 149 offer/answer [RFC3264] is normally used to establish the session and 150 negotiate media capabilities. In some cases, a set of codec 151 parameters is negotiated that does not express any specific limit or 152 capability, but just describes a certain codec configuration. 154 During session establishment, the participating endpoints normally 155 have limited knowledge about the session environment, e.g. whether 156 the session will be point-to-point or contain some multiparty 157 scenario, how users will interact with the application, how network 158 conditions will vary during the session, etc. To take those 159 variations into account, the participants can renegotiate session 160 parameters to better suit the communication environment. At times, 161 when variations or changes are frequent in nature, it will require 162 the needed reaction time to be short, which may make repeated session 163 renegotiation inefficient and/or too slow. In addition, variations 164 may not even affect negotiated session parameters, if the variations 165 occur within the negotiated boundaries. 167 The above scenario can become critical especially in cases where a 168 given media stream is transmitted towards, and received by, multiple 169 receivers. In multiparty environments, scalable encoding or 170 simulcast can be used to make the system more efficient and provide 171 better quality to participants that are capable of receiving and 172 utilizing the higher quality. These use cases result in that a 173 sending party is requested to deliver multiple encoder operation 174 points. 176 The Audio-Visual Profile with Feedback (AVPF) specification [RFC4585] 177 defines a framework and messages for fast feedback and media control 178 over RTCP. The Codec Control Messages (CCM) specification [RFC5104] 179 defines an extension to AVPF, by specifying additional messages for 180 codec control and feedback. This specification extends CCM, by 181 specifying messages that let participants dynamically communicate a 182 set of codec configuration parameters, which enable better 183 optimization of resource usage and quality of media transmission. 185 The codec configuration parameters specified in this document focus 186 on some basic audio and video properties, such as video resolution, 187 video frame rate, media stream bit-rate, audio sampling rate, number 188 of audio channels, maximum RTP packet size and rate. Additional 189 parameters can be standardized in the future. 191 The codec control messages are not meant to replace the configuration 192 performed using e.g. SDP. Instead, the messages can be used to 193 communicate dynamic and frequent changes that take place within 194 boundaries that have been negotiated as part of the session 195 establishment. 197 2. Definitions 199 2.1. Terminology 201 The following terms and abbreviations are used in this document: 203 Bandwidth: The network resource needed to transport a certain 204 bitrate and any transport overhead, measured in bits per second. 205 There will be spare network bandwidth when the (media) data 206 bitrate and overhead is less than the available bandwidth. 207 Similarly, data will have to be buffered when the available 208 bandwidth excluding transport overhead is less than the bitrate 209 used by the sender, or the excess data will be lost. The 210 available bandwidth typically varies dynamically over time. 212 Bitrate: The amount of (media) data transmitted per time unit, 213 measured in bits per second, utilizing some amount of the 214 available network bandwidth resource. In the context of this 215 specification and unless otherwise specified, it excludes IP/UDP/ 216 RTP overhead. Depending on the (media) data source, the bitrate 217 can either be constant or vary dynamically over time. 219 Codec Configuration Parameter: The configurable value describing a 220 certain codec property, which may impact user-perceived media 221 fidelity, encoded media stream characteristics, or both. The 222 parameter has a type (codec parameter type, see below) and a 223 value, where the type describes what kind of codec property is 224 controlled, and the value describes the property setting as well 225 as how the value should be used in comparison operations. A 226 single parameter value can express one specific value or an open- 227 ended range. A pair of parameter values with different comparison 228 types can describe a value range. Such value range can also be 229 combined with a third, target value within that range. 231 Codec Operation Point: Also denoted just operation point. A set of 232 codec configuration parameter values, describing the 233 characteristics of one single encoding. For scalable encoding, it 234 describes the resulting characteristics from combining a set of 235 dependent sub-streams. 237 Codec Parameter Type: The specific type of a codec configuration 238 parameter. Each parameter type defines what unit the value has. 239 This specification defines a number of generally useful parameter 240 types in Section 8 that can be used to control codec operation. 242 Encoding: A particular encoding is the resulting media stream from 243 applying a certain choice of codec configuration parameters to the 244 encoder. The media stream will have a certain fidelity (quality) 245 from that encoding through the choice of sampling, bit-rate and 246 other configuration parameters. 248 Endpoint: A host or node that has a presence in the RTP session with 249 one or more Synchronization Sources (SSRC)s. 251 Mixer: An RTP session centralized node that generates media streams 252 based on incoming media streams from other endpoints. See Topo- 253 Mixer in RTP Topologies [RFC5117]. 255 RTP Session: An association among a set of participants 256 communicating with RTP. The distinguishing feature of an RTP 257 session (defined in [RFC3550]) is that each RTP session maintains 258 a full, separate space of SSRC identifiers. Each participant in 259 the RTP session can see SSRC or CSRC identifiers from the other 260 participants, either by RTP, RTCP, or both. 262 Sub-Stream: An individually decodeable part of a scalable media 263 stream, including all dependent sub-streams. The characteristics 264 of a certain sub-stream can be described by a codec operation 265 point. 267 Translator: An RTP session centralized node that forwards all media 268 streams from other endpoints, modified to some extent, e.g. 269 addressing, encoding, fidelity. See Topo-Translator in RTP 270 Topologies [RFC5117]. 272 2.2. Abbreviations 274 AVC: Advanced Video Coding 276 AVPF: Extended RTP Profile for RTCP-Based Feedback 278 CCP: Codec Configuration Parameter 280 COP: Codec Operation Point 281 COPN: Codec Operation Point Notification 283 COPR: Codec Operation Point Request 285 COPS: Codec Operation Point Status 287 CPT: Codec Parameter Type 289 FCI: Feedback Control Information 291 FMT: Feedback Message Type 293 GUI: Graphical User Interface 295 MST: Multi-Session Transmission 297 MVC: Multiview Video Coding 299 OP: Operation Point 301 OPID: Operation Point Identification number 303 PPS: Picture Parameter Set 305 SPS: Sequence Parameter Set 307 SST: Single-Session Transmission 309 SVC: Scalable Video Coding 311 TLV: Type-Length-Value 313 2.3. Requirements Language 315 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 316 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 317 document are to be interpreted as described in [RFC2119]. 319 3. Motivation 321 3.1. Problem Description 323 Networks can contain endpoints with different capabilities, including 324 CPU power, capture and render device fidelity (e.g. image 325 resolution), and codecs. In addition, the characteristics and 326 properties of networks can vary, which endpoints have to cope with. 327 For example, in videoconferencing and telepresence services, a large 328 number of endpoints may participate, and there may be a large number 329 of media streams associated with the session. Such multiparty 330 scenarios typically use entities for media mixing, switching and 331 transcoding. The aim is to provide the best possible quality to each 332 endpoint, taking endpoint and network capabilities into 333 consideration. 335 Many communication services today use codecs that can be configured 336 in a number of different ways. Often, the codecs have multiple 337 properties that can be configured and those properties may also be 338 inter-related, often in complex ways. One example is the H.264 (AVC) 339 [H264] video codec and its scalable (SVC) and multiview (MVC) 340 versions. Most other video codecs, and codecs for many other types 341 of media, also have multiple configurable properties. Such 342 configurable properties will be referred to as "codec configuration 343 parameters" in this specification. 345 There can be several reasons to change the media rate or other 346 encoding or packetization properties during an ongoing communication 347 session. Reasons can be that the available network bandwidth varies, 348 or that other network properties change, such as effective MTU or 349 packet rate limitations. Other reasons can be that the quality or 350 representation of the media rendered to the end user changes, maybe 351 as a direct result of the user manipulating the GUI (e.g. changing 352 window position or size), or the relative importance of the received 353 media stream changes (e.g. active or non-active speaker in a 354 conferencing scenario), or the user selects to show some other 355 content source that is available among the advertised media streams. 357 The codec changes above can be made directly between endpoints in a 358 point-to-point scenario, or they may involve, and be acted upon, by 359 media aware intermediaries (e.g. RTP mixers). An RTP mixer can do 360 transcoding to provide each receiver with media streams of adapted 361 quality, but transcoding has drawbacks as it always consumes 362 processing power, typically impacts media quality in a negative way, 363 and often introduces additional delays. 365 In order to avoid separate transcoding towards each endpoint, an RTP 366 mixer can, by taking the capabilities of the endpoints into account, 367 decide to request specific codec configurations from sending 368 endpoints, which will minimize the need for transcoding. Also, in 369 scenarios where no RTP mixers are used and transmitted media reaches 370 multiple endpoints, the sender will have to take into account that 371 each endpoint may have different capabilities. The use cases section 372 (Section 4) shows different use cases, with and without RTP mixers. 374 Resource optimization involving bandwidth is expected to be one of 375 the major reasons for changing encoding properties, since it is 376 desirable to avoid using more bandwidth than absolutely necessary, 377 especially considering that 379 o the expectation for high media quality will continue to increase; 381 o the bitrate required to transmit the media, despite increasingly 382 efficient media coding, can due to the above also be expected to 383 increase; 385 o the available bandwidth is commonly a scarce and/or costly 386 resource and will continue to be in the future; 388 o the relation between media bitrate and media codec configuration, 389 the used set of media codec property values, is typically complex 390 and the mapping between each individual codec property and bitrate 391 is not linear; 393 o the used media bitrate does not uniquely identify the media codec 394 configuration, but there are multiple codec configurations that 395 can generate the same media bitrate; 397 o the media receiver preferences how the codec property values 398 should be set for a certain media bitrate will vary with the 399 specific end-user service requirements (for example, but not 400 limited to, users with special needs) and the current media stream 401 role in the application; 403 o the communication scenarios will not be limited to point-to-point, 404 potentially involving multiple and at least partly conflicting 405 constraints from different receivers. 407 Other resources that may be desirable to optimize include, but not 408 limited to, endpoint and middle node processing (CPU) utilization, 409 and transport quality (QoS). 411 A media receiver cannot be assumed to know exactly what codec 412 configuration will be best for the media sender to use, given that 413 the sender needs to take multiple aspects into account, including 414 implementation limitations in the actual encoder. It should be more 415 likely to find a value acceptable to both sender and receiver if the 416 receiver can indicate an acceptable range instead of just a single 417 value. 419 When an RTP mixer distributes streams to multiple receivers with 420 different media quality requirements, it is sometimes possible to 421 avoid targeted transcoding for every single receiver. That can be 422 accomplished if the media sender has the ability to produce multiple 423 media versions, such as for example scalable encoding or simulcast. 425 Thus, there is a need to both address specific media versions and 426 describe the fact that multiple media versions with different 427 configurations should be used. 429 3.2. Legacy Methods 431 3.2.1. Relation to SDP 433 The session description protocol (SDP) [RFC4566] is commonly used to 434 negotiate and configure codecs, as well as to establish RTP/RTCP 435 session parameters during session establishment and ongoing sessions, 436 e.g. by using it in conjunction with SIP [RFC3261] and SDP Offer/ 437 Answer [RFC3264]. 439 As described in Section 3.1, many of the underlying reasons which 440 make media receivers desire certain codec encoding properties are 441 highly dynamic in nature and using SIP/SDP to renegotiate the session 442 will in many cases be too slow to be useful. SIP messages containing 443 an SDP may become quite large for sessions containing many media 444 types, and since there is no defined way to send a partial SDP, even 445 very small changes require sending the entire SDP. Most of the 446 current defined properties in SDP are oriented to be common for all 447 media streams in the same RTP session, at least the ones sharing the 448 same RTP Payload Type, rather than being specific to one media stream 449 (e.g. "a=fmtp:98 profile-level-id=42C00C"). 451 The mechanism in this specification does not replace SDP, or the SDP 452 Offer/Answer mechanism. It is expected that SDP is used in order to 453 negotiate and configure boundary values for codec properties, and COP 454 can then be used to communicate specific values within those 455 boundaries, as long as there is no impact on the values negotiated 456 using SDP. It is possible to establish communication sessions even 457 if one or more endpoints do not support COP. 459 3.2.2. Relation to RTCP 461 As discussed in CCM, regular RTCP reporting or extended reports 462 [RFC3611] can to some extent be used to reconfigure an encoder, but 463 the reported measures seldom map directly back to encoding properties 464 and they typically cannot express an unwanted situation in terms of 465 encoding properties and what the receiver would like to receive 466 instead. Communicating codec properties indirectly as a set of 467 network properties will require interpretation by both sender and 468 receiver and will thus risk misinterpretations and ambiguity. Since 469 it is likely that a decoder is able to identify unwanted 470 characteristics of the media stream in terms of encoding properties, 471 the most straight forward approach is to convey those properties 472 directly to the encoder. 474 Responsive techniques to control encoding are already available, e.g. 475 Codec Control Messages (CCM) [RFC5104]. Although highly applicable, 476 the possibilities to control encoding is however not explicit enough, 477 both in terms of the amount of available parameters to control, and 478 the fact that they may be inter-related, alternative, or both. 480 Some codecs define codec-specific methods to enable receiver control 481 of some encoding aspects, but it should be beneficial for 482 interoperability to use codec agnostic signaling instead. 484 4. Use Cases for COP 486 This section discusses a number of use cases for codec operation 487 points. 489 4.1. Point to Point 491 This set of use cases focuses on communication, which is directly 492 point to point between a media sender and a receiver. There is no 493 need for further forwarding of the media streams. Thus, the goal 494 should be to produce a media stream, transport it to the media 495 receiver, where it is consumed as optimal as possible for the 496 application. Thanks to this one-to-one mapping between encoder and 497 decoder, great flexibility exists to produce a media stream tailored 498 to the receiver's needs, given the constraints that exist from media 499 sender, transport network and the receiver. 501 Some constraints are static (and thus suitable for session 502 configuration signalling), but others are highly dynamical and 503 desirable to adapt to during the session: 505 Video Resolution in GUI: In a video communication application, 506 including WebRTC based ones, the window where the media senders 507 media stream is presented may change, for example due to the user 508 modifying the size of the window. It might also be due to other 509 application related actions, like selecting to show a 510 collaborative work space and thus reducing the area used to show 511 the remote video. In both of these cases it is the receiver side 512 that knows how big the actual screen area is and what the most 513 suitable resolution would be. It appears suitable to let the 514 receiver request the media sender to send a media stream 515 conforming to the displayed video size. 517 Network Bit-rate Limitations: If the receiver discovers a network 518 bandwidth limitation, it can choose to meet it by requesting media 519 stream bit-rate limitations. Especially in cases where a media 520 sender provides multiple media streams, the relative distribution 521 of available bit-rate can help the application to provide the most 522 suitable experience in a constrained situation. 524 CPU Constraint: A media receiver may become constrained in the 525 amount of available processing resources. This may occur in the 526 middle of a session for example due to the user selecting a power 527 saving mode, or starting additional applications requiring 528 resources. When this occurs, the receiving application can select 529 which and how much to constrain codec parameters to best suit the 530 needs of the application. For example, if lower framerate is a 531 better constraint than lower resolution. 533 4.2. Media Receiver to RTP Mixer 535 This section considers a multiparty session with a centralized media 536 intermediary, like an RTP mixer, where the media receiver uses COP to 537 affect the delivered media. 538 +------------+ +---+ 539 | |--RTP-->| B | 540 | |<--COP--| | 541 | | +---+ 542 | | 543 +---+ | | +---+ 544 | A |-RTP->| Mixer |--RTP-->| C | 545 +---+ | | +---+ 546 | | 547 | | +---+ 548 | |--RTP-->| D | 549 +------------+ +---+ 551 Figure 1: Receiver (B) using COP to adapt a media stream 553 In the above Figure 1 we focus on the possible usages of COP by a 554 media receiver, like B. Here the functional role of the intermediary 555 becomes important (Topo-Mixer) [RFC5117]. An RTP mixer uses its own 556 SSRC(s) to channel selected media streams to B from other 557 participants like A. If the intermediary is instead a translator, the 558 Receiver B can see A's SSRC(s) directly instead of possibly showing 559 up as CSRC. We will in this section focus on the mixer case. The 560 RTP translator case is further discussed in Section 4.4. 562 The RTP mixer's usage of its own SSRC allows mixer to receiver media 563 flows to be associated with a role or purpose in the application 564 rather than a given media source. Based on the assumption that the 565 set of available stream roles are connected to the specific use case 566 or application, it is likely that the set of stream roles (for 567 example most active speaker) provided from a mixer will change less 568 often than the original media source representing that role is 569 changed. It is further assumed that the desirable media 570 characteristics related to a specific role will be fairly constant. 571 To minimize the amount of signaling needed to modify stream 572 characteristics, it could thus be appropriate to let a stream 573 represent a role rather than limiting it to represent the original 574 source. When there exist multiple RTP streams from the mixer to a 575 receiver, the receiver can use COP to request an operations point 576 that better suits the receiver's needs on each particular stream 577 (role) of the media stream. COP also allows the receiver to select 578 its desired trade-off in properties and quality between multiple 579 delivered media streams. 581 There exist different reasons why B would need to indicate changes in 582 its capabilities to receive a particular media stream: 584 Network Path: The receiver detects changes in the network that on a 585 mid to long term will result in a new capability regarding the 586 maximum bit-rate that can be supported. 588 Bandwidth Trade-off: In an application receiving multiple media 589 streams, if the receiving application likes to change the relative 590 bit-rate trade-off between the streams. 592 Presentation or GUI Changes: If the presentation or graphical user 593 interface (GUI) changes on the receiving side this results in 594 other requirements or needs on the media streams. For example if 595 the application window is resized by the user, the amount of 596 screen estate to present the different video elements changes. To 597 optimize the video quality in relation to bit-rate the receiver 598 indicates the new preferred video resolution. 600 In all the above cases the receiver sends a COP request to the mixer 601 for new codec operation points on mixer controlled media stream(s). 602 It then becomes the mixer's responsibility to determine if and how 603 the requested COPs can be supported. For example by requesting new 604 operations points from the media source as discussed in Section 4.3. 605 The selection of another media source to deliver in a media stream 606 can result in that the mixer may have to update the receiver on the 607 properties of the operations point. 609 4.3. RTP Mixer to Media Sender 611 This section looks at the usage of COP in cases of multiparty with 612 centralized media intermediary, like an RTP mixer, selecting and 613 requesting tailored media stream or streams a media sender delivers 614 to the intermediary for further forwarding or manipulation. This 615 usage can be simplified to the media streams from one media sender 616 (A), which is currently being delivered to multiple receivers (B-D) 617 as depicted in Figure 2. 618 +------------+ +---+ 619 | |--RTP-->| B | 620 | | +---+ 621 +---+ | | 622 | A |<-COP-| | +---+ 623 | |-RTP->| Mixer |--RTP-->| C | 624 +---+ | | +---+ 625 | | 626 | | +---+ 627 | |--RTP-->| D | 628 +------------+ +---+ 630 Figure 2: Mixer using COP to adapt media streams to multiple 631 receivers 633 The media path from the mixer to B, C and D are different and thus 634 the available resources may vary between them. In addition B, C and 635 D may have different capabilities when it comes to handling media 636 streams. These limitations can be learned by the mixer through 637 session configuration signalling, media transmission feedback (e.g. 638 RTCP), or usage of COP by the receivers (See Section 4.2). 639 Limitations are also expected to be updated during the session 640 lifetime. 642 The media sender (A) has certain capabilities and what is possible to 643 do will depend on A's capabilities and what has been configured 644 between A and the mixer. Let's consider different capabilities of A 645 and how they influence the usage of COP to affect the media stream(s) 646 delivered to the mixer. 648 Single Media Encoding: If A can only provide a single media encoding 649 of a particular media source, the mixer has to make a choice on 650 what property it would like to request for that media stream. The 651 most basic choice is to request the lowest common denominator 652 across the receiver population. If the mixer has certain 653 capabilities for media transcoding it could select to request 654 another operation point for the media encoding with higher quality 655 and then transcode to some few receivers. That enables a higher 656 quality to several receivers while still being able to serve 657 endpoints with the least capabilities. In these cases the mixer 658 has to send COP requests that indicate only a single operation 659 point with parameters matching the restrictions in the best 660 possible way. 662 Scalable Media Encoding: If A is capable of producing a scalable 663 media stream encoding, the mixer can request multiple operation 664 points for the same media stream. For example, if A is capable of 665 producing three different operation points, the mixer in the above 666 Figure 2 would be able to request scalability layers that match 667 the capabilities of all three receivers B, C and D. If several 668 receivers have similar capabilities, the mixer may choose to 669 request fewer operation points. In this case, other than in the 670 single media encoding, the mixer must determine which packets or 671 parts of packets to send to each receiver based on their 672 capabilities. This requires that the mixer is capable of 673 identifying in the media stream which scalability layer matches a 674 requested operation point. Thus, it is desirable that the media 675 sender can indicate to the mixer which layer matches a given 676 operation point. 678 Simulcast Media: If A and the mixer have negotiated the usage of 679 simulcasted media encoding of the media source, then the mixer can 680 adopt several operation points to best suit the receivers, just 681 like for scalable encoding. When simulcasting, the mixer will 682 however have to send one COP request per media stream it actually 683 wants to affect. It is necessary to ensure that configuration 684 changes over multiple media streams from the same media source 685 take place. Compared to scalable media, the mixer does not need 686 not strip away layers to match a particular operation point but 687 can forward entirely self-contained media streams. 689 The use of COP as described above can be triggered by a multitude of 690 reasons. We will here discuss some of them. We already mentioned 691 that bit-rate adaptation (congestion control) on the mixer to 692 receiver path can indicate a need to change an operation point. 693 Another reason is when a new session participant joins that has 694 certain receiver capabilities (both decoding or other hardware, as 695 well as network path related), thus potentially changing the optimal 696 set of operation points. There also exist a number of different 697 cases where the desired application behavior results in changes in 698 desired operation points, like change of active speakers, 699 reconfiguration of the display layout, etc. 701 It is important to remember that Figure 2 only presents the view of a 702 single media sender. In most communication sessions there are 703 multiple media senders, and the mixer will need to take the 704 combination of media streams from multiple media senders into account 705 when choosing what is to be sent to a given receiver. Thus changes 706 at one media sender can result in related changes of the operation 707 points at the other media senders. 709 4.4. Media Receiver in Multicast or with RTP Transport Translator 711 This section covers the usage of COP in multicast transported RTP 712 sessions, as well as when transport translators (Topo-Translator) 713 [RFC5117] are used. Transport translators can be used to emulate any 714 source multicast (ASM) over unicast. Multicast usages also include 715 Source Specific Multicast (SSM) [RFC4607], which according to "RTP 716 Control Protocol (RTCP) Extensions for Single-Source Multicast 717 Sessions with Unicast Feedback" [RFC5760] has two main modes: simple 718 mode, and summary feedback mode. SSM modes affect the usage of COP 719 functionalities. 720 +---+ +------------+ +---+ 721 | A |<---->| |<---->| B | 722 +---+ | | +---+ 723 | Translator | 724 +---+ | | +---+ 725 | C |<---->| |<---->| D | 726 +---+ +------------+ +---+ 728 Figure 3: RTP translator topology 730 A transport translator [RFC5117], which main purpose is to forward 731 any incoming packets to all the other session participants, emulates 732 an ASM session (see Figure 3). As anyone can send to all other in 733 both cases, there are some properties in these large scale sessions 734 with many participants which require extra consideration. 736 +-----+ +-----+ +-----+ 737 | MS1 | | MS2 | .... | MSm | 738 +-----+ +-----+ +-----+ 739 ^ ^ ^ 740 | | | 741 V V V 742 +---------------------------------+ 743 | Distribution Source | 744 +--------+ | 745 | FT Agg | | 746 +--------+------------------------+ 747 ^ ^ | 748 : . | 749 : +...................+ 750 : | . 751 : / \ . 752 +------+ / \ +-----+ 753 | FT1 |<----+ +----->| FT2 | 754 +------+ / \ +-----+ 755 ^ ^ / \ ^ ^ 756 : : / \ : : 757 : : / \ : : 758 : : / \ : : 759 : ./\ /\. : 760 : /. \ / .\ : 761 : V . V V . V : 762 +----+ +----+ +----+ +----+ 763 | R1 | | R2 | ... |Rn-1| | Rn | 764 +----+ +----+ +----+ +----+ 766 Figure 4: SSM based RTP session 768 In the above Figure 4, the media senders (MS1 ... MSm) send their 769 media streams and RTCP traffic to the distribution source (DS). The 770 DS forwards the RTP and RTCP traffic from the media senders to the 771 SSM group. Using the RTCP extension for unicast RTCP feedback 772 [RFC5760], the receivers (R1...Rn) send their RTCP traffic to their 773 configured feedback target. This sample session has two feedback 774 targets to scale with the amount of receivers. RTCP messages that 775 need to go to a media sender are forwarded to the FT aggregator part 776 of the distribution source for further forwarding over the unicast 777 paths between the distribution source and the media senders. The 778 feedback target and the feedback aggregator also forward all RTCP 779 messages from receivers in simple mode, and aggregate it in summary 780 mode. Some RTCP messages from a receiver may still have to be 781 forwarded over the SSM group. 783 COP needs to support some reasonable functionality over the different 784 multiparty topologies described above and it is important that COP 785 does not cause significant issues in any of the environments. 787 In the basic case, where only a single multicast group exists, there 788 is a well known problem associated with adapting content and bit-rate 789 to the receiver population. The more receivers, the larger the 790 potential for non-matching requirements in requests from the 791 different receivers. One strategy for meeting this is to use the 792 lowest common denominator among the requests from the receiver 793 population. This normally results in sub-optimal quality for a 794 significant part of the session participants, the main benefit being 795 that all participants will be able to receive some content. 797 Because of the above limitations of operation within a single group, 798 the usage of COP in larger groups becomes difficult unless the 799 parameters that can be adopted and affected by COP requests are such 800 that a limited set of participants is expected to request them, and 801 the impact for the others are limited or acceptable. The authors 802 therefore expects the usage of COP in large groups to be limited and 803 this specification focuses on operation in smaller groups. However, 804 as it is not possible to define the threshold when a group changes 805 from being small to be too large to work well with COP in the generic 806 case, it is important that COP can operate safely in a large group, 807 although the possibilities to satisfy the request may be severely 808 limited. 810 There also exist use cases for COP where the media application uses 811 multiple multicast groups to enable multiple operation points and 812 allows each receiver to join the multicast groups that suits the 813 participant's capabilities. An example of such usage would be 814 Scalable Video Coding (SVC) using the Multi-Session Transport (MST) 815 mode of the SVC RTP payload format [RFC6190]. The SVC MST RTP 816 streams that are sent in each group can still contain multiple 817 scalability layers. One could combine coarse-grained control on the 818 operation points by having the receiver join a particular session 819 with a more fine-grained control using COP to adjust the included 820 scalability layers to the receiver's needs, such as lower CPU load. 822 5. Requirements 824 The solution outlined in this specification should fulfill the 825 following requirements: 827 REQ-1: Enable dynamic control of possibly inter-related codec 828 properties during an ongoing media session. 830 REQ-2: Be media type agnostic, to the furthest extent possible, and 831 at least cover audio and video media. 833 REQ-3: Be codec agnostic (within the same media type), to the 834 furthest extent possible. 836 REQ-4: Work with different media transmission types, i.e. single- 837 stream, simulcast, single-stream scalable, and multi-stream 838 scalable transmission. 840 REQ-5: Work with un-encrypted as well as encrypted media. 842 REQ-6: Be extensible, making it simple to add control and 843 description of new codec properties. 845 REQ-7: Complement rather than conflict with other codec 846 configuration methods such as other RTCP based techniques and SDP. 848 REQ-8: Support configurable parameters that are directly visible in 849 the media stream as well as those that are not visible in the 850 media stream. 852 In addition, Guidelines for Extending RTCP [RFC5968] should be 853 followed. 855 6. Solution Overview 857 The mechanism described in this specification especially targets 858 heterogeneous multiparty scenarios where different endpoints require 859 differently encoded media from the same source, but its use in other 860 situations is not precluded. In fact, point-to-point scenarios are 861 considered to be of equal importance but not more demanding that the 862 multiparty case. In the targeted scenario, the media stream from one 863 encoder is sent to multiple decoders. Hence, the encoder must 864 possibly provide an encoding with multiple operation points, suitable 865 for the receivers. This is only possible with so-called scalable 866 codecs, but some codecs may have inherent scalability features 867 without being generally considered as scalable (e.g. H.264/AVC 868 temporal scalability through non-reference frames). Multiparty 869 services often involve a media mixer (Topo-Mixer) [RFC5117] as a 870 central network node. 872 +---+ 873 | S | 874 +---+ 875 | 876 v 877 +-------+ 878 | Mixer | 879 +-------+ 880 / | \ 881 v v v 882 +---+ +---+ +---+ 883 | A | | B | | C | 884 +---+ +---+ +---+ 886 Figure 5: RTP mixer topology 888 The solution defined in this specification is targeted for automatic 889 control of codec parameters, not as a direct result of user 890 interaction, although the automatic control can in turn be triggered 891 by user interaction. It can be used during an active session to 892 quickly adapt to changes in media receiver available bandwidth and/or 893 preferences for one or more codec properties, while still conforming 894 to the session configuration, like SDP offer/answer negotiated 895 minimum or maximum limits (depending on individual SDP property 896 semantics). Some codec property changes will also motivate to 897 renegotiate the SDP, but the scope of this specification intends to 898 cover only changes that lie within the SDP negotiated set and thus do 899 not impact the SDP. 901 Three message types are defined to support the solution: a request, a 902 notification, and a status report: 904 Request (COPR): A media receiver requesting a media sender to adjust 905 one or more of it's media encoding parameters for a media stream. 906 The request is normally based on a specific set of media encoding 907 parameters that the media sender has explicitly notified the media 908 receiver about in a notification. 910 Notification (COPN): A media sender notifying a media receiver of 911 the currently used media encoding parameters for a media stream. 912 The notification is initiated by the media sender, typically 913 whenever the media encoding parameters changed significantly from 914 what was previously used. The reason for the change can either be 915 local to the media sender (user, endpoint or network), or it can 916 be the result of one or more requests from remote endpoints. 918 Status Report (COPA): A media sender reporting to a request sender 919 (media receiver) on request reception status, which specific 920 request from the media receiver that was received and considered 921 in setting current media encoding parameters, and the 922 identification of the media stream that is considered to fulfill 923 the request. The status report can also indicate various error 924 conditions, such as reception of invalid or failing requests. 926 More details about the individual messages are found in the following 927 sub-sections. 929 6.1. Message Structure 931 A COP message is sent from an RTP session participant in its role 932 either as media receiver or media sender. Each message can contain 933 one or more message items of one or more message types, all 934 originating from a single media source. 936 The individual message items each relate only to a single operation 937 point, describing part of an atomic notification or request. 939 The general structure is outlined below: 940 +--------------------------------------+ 941 | AVPF PSFB FMT="COP" | 942 | SSRC of Packet Sender | 943 | SSRC of Media Source | 944 | +----------------------------------+ | 945 | | COP Message Item 0 | | 946 | +----------------------------------+ | 947 | | (Codec Configuration Parameters) | | 948 | +----------------------------------+ | 949 | +----------------------------------+ | 950 | | COP Message Item 1 | | 951 | +----------------------------------+ | 952 | | (Codec Configuration Parameters) | | 953 | +----------------------------------+ | 954 | ... | 955 +--------------------------------------+ 957 Figure 6: COP message structure 959 Note that the request is the only COP message item defined in this 960 specification that is sent in the media receiver role and makes use 961 of "SSRC of media source" as the targeted media stream for the 962 request. Both the notification and the status report message items 963 are sent in the media sender role, reporting on the message sender's 964 own configuration and thus relate only to the "SSRC of packet 965 sender", being agnostic to the "SSRC of media source" field. 967 It is for example possible to collocate COPS and COPN messages for 968 the same media source in the same COP FCI. It is also possible to 969 co-locate one or more COPR referring to a single "SSRC of media 970 source" with one or more COPN and/or COPS relating to a single "SSRC 971 of packet sender" within a single COP message. 973 Multiple message items of the same type in the same COP message are 974 used to describe a notification, status or request for a media stream 975 containing multiple operation points (see Section 6.3). 977 Multiple COP messages are needed to be able to refer to multiple 978 different "SSRC of packet sender" and/or "SSRC of media source". 980 6.2. Codec Configuration Parameter Use 982 The codec configuration parameters that are applicable to a certain 983 codec may be specific to the media type (audio, video, ...), and may 984 also be codec specific. Some codec properties (described by codec 985 configuration parameters) have to be explicitly enabled by (non-RTCP 986 based) capability signaling to be possible or permitted to use. 988 An endpoint implementing this specification does not need to support 989 all available codec configuration parameters defined herein or in 990 extensions to this specification. A certain parameter could be 991 unnecessary for a certain codec or media stream, even if it is 992 generally supported by the endpoint. This specification therefore 993 defines capability signaling that allows a COP receiver to declare 994 explicit support per parameter type on a per codec level. The set of 995 codec configuration parameters that can be used for a certain media 996 stream by a COP sender is thus restricted by the combination of 997 applicability, capability signaling and explicit receiver parameter 998 support signaling. 1000 Any codec configuration parameter that is applicable and feasible to 1001 use, but is not included as part of an operation point, has a default 1002 value. This default value is defined for each parameter type, but 1003 should preferably whenever possible be taken from capability 1004 signaling. It is not necessary to use all defined parameter types in 1005 a media stream description. Some parameter types can, depending on 1006 media type or codec, either be unnecessary, or not possible to 1007 describe or control in detail, in which case they can be left out. 1008 This means that the effective value is "undefined" within the limits 1009 set by capability signaling (outside the scope of this 1010 specification). 1012 6.3. Operation Point 1014 The codec configuration parameters contained in a single message item 1015 jointly constitute a description of an operation point for a specific 1016 media stream from a media sender. 1018 For the purpose of COP signaling, each operation point is identified 1019 with an identity number (OPID), which is scoped by the media sender's 1020 RTP SSRC identification, and can be chosen freely by the media 1021 sender. The need for this media sub-stream identification only 1022 appears with scalable coding or other media encoding methods that 1023 introduce separable and configurable sub-streams within the same 1024 SSRC. An OPID thus refers to such configurable sub-stream, described 1025 by a set of related codec configuration parameters. 1026 +--RTP Session 1 ---------------------+ 1027 Media Source 1----+-+-> SSRC1 --> Sub-Stream 1 -> OPID1 | 1028 (MIC, Camera) | \-> Sub-Stream 2 -> OPID2 | 1029 | | 1030 Media Source 2-+--+---> SSRC2 --> Sub-Stream 1 -> OPID3 | 1031 | | \-> Sub-Stream 2 -> OPID4 | 1032 | | \-> Sub-Stream 3 -> OPID5 | 1033 | +-------------------------------------+ 1034 | 1035 | +--RTP Session 2 ---------------------+ 1036 +--+---> SSRC3 --> Sub-Stream 1 -> OPID6 | 1037 | \-> Sub-Stream 2 -> OPID7 | 1038 +-------------------------------------+ 1040 Figure 7: Relation of OPID to media source, RTP session and SSRC 1042 Figure 7 depicts the possible relations between media sources, RTP 1043 sessions, RTP streams (SSRCs), RTP sub-streams, and the OPID. 1045 For example, a single video camera may be encoded using SVC for a 1046 combined SST and MST transmission configuration. In that case a 1047 subset of scalability layers is sent as SST in the first RTP session 1048 using SSRC2. Another set of scalability layers is transported in the 1049 second RTP session as another SST using SSRC3. The RTP packet stream 1050 from each SSRC can thus contain several sub-streams, each identified 1051 with its own OPID. As a result, a single media source is present in 1052 two RTP sessions, using two different SSRCs (2 and 3) containing a 1053 total of five sub-streams (OPID 3 to 7). 1055 Since an operation point is expected to change over time, as a result 1056 of media receiver requests (Section 6.4), resulting from local media 1057 sender considerations (Section 6.5), or both, the operation point 1058 (OPID) is version handled. The version is scoped by SSRC and OPID. 1060 It is expected that all encoders dividing a media stream into sub- 1061 streams will include some means to identify those sub-streams in the 1062 media stream. However, it is also expected that such identification 1063 is in general codec specific. There is thus a need to map the codec 1064 agnostic COP OPID identification to codec specific identification, 1065 and this specification therefore includes a method for such mapping 1066 (Section 10). 1068 6.4. Request 1070 The request is sent by a media receiver, which can be either an 1071 endpoint or a middle node such as an RTP mixer. The receiver of the 1072 request may similarly be either the original media sender or a RTP 1073 mixer. Included in the request is a description of the desired codec 1074 configuration for a specific media (sub-)stream. The parameter 1075 values communicated in a notification (Section 6.5) of that 1076 (sub-)stream are taken as a starting point when deciding what 1077 parameters and parameter values to choose for the request, and only 1078 parameters with changed values need to be included the request. The 1079 media receiver can of course use other sources of information when 1080 choosing parameters and values, for example observation of the 1081 received media stream and capability signaling. 1083 It is not required to receive a notification beforehand to be able to 1084 create a meaningful request. The request can include a set of 1085 changed properties for existing streams, but it can also request the 1086 addition or removal of one or more media sub-streams having certain 1087 properties, in which case there will be no notification to base the 1088 request on. A media receiver may also want to send a request prior 1089 to having received any notifications for existing streams, and can 1090 then base the request on other information such as for example 1091 observing the media stream or use information from the capability 1092 signaling. In case there is no existing stream and OPID to refer to 1093 in the request, a "provisional" OPID MUST be chosen in the request, 1094 which will have to be mapped back to an existing (sub-)stream and 1095 "real" OPID through methods defined in this specification 1096 (Section 10). 1098 The media sender receiving a specific request is not required to 1099 reconfigure the encoder accordingly, even if it should try to do so. 1100 The media sender is allowed to take other (previous or concurrent) 1101 requests and any local considerations into account, possibly 1102 modifying some of the parameter values, or even to reject the request 1103 completely if it is not seen as feasible. It is thus not possible 1104 for a media receiver to uniquely see from the media stream or even 1105 from a notification if the media sender received the request or if 1106 the request was lost and needs to be resent. 1108 A request should be based on a notification, but there may be 1109 situations where a request is sent approximately simultaneously with 1110 a new notification for the same stream. In that case, there is a 1111 risk that the request is based on the wrong set of codec properties 1112 compared to the new notification. It is therefore necessary to have 1113 the set of codec properties version controlled, identified by an 1114 OPID. If a notification announces a specific version of the 1115 operation point, where the version is updated every time it is 1116 changed, the request can refer to that specific version and any mis- 1117 reference can be clearly identified and resolved. In addition, it 1118 allows for easy identification of repeated notifications and requests 1119 by checking the operation point identification and the version, 1120 without the need to parse through all codec properties for changes. 1122 6.5. Notification 1124 The notification is sent by a media sender and describes a media 1125 stream or sub-stream in terms of a defined, finite set of codec 1126 properties. That same set of codec properties can also be used in a 1127 request (Section 6.4). The notification and the set of defined 1128 properties is important to be known at the media receiver since it is 1129 rarely possible to see from the media stream itself what controllable 1130 properties were used to generate the stream. The set of codec 1131 properties and their values used to describe a certain media stream 1132 at a certain point in time are henceforth called a codec 1133 configuration. Each operation point in this codec configuration is 1134 implemented using an RTP payload type, defined by capability 1135 signaling outside the scope of this specification. 1137 It must be possible for a media sender to change the codec 1138 configuration not only based on requests from media receivers, but 1139 also based on local limitations, considerations, or user actions. 1140 This implies that the notification can be sent standalone and not 1141 only as a response to a request (compare TMMBR and TMMBN [RFC5104]). 1142 To avoid that media receivers have to guess what codec configuration 1143 is used, a media sender should always send a notification when the 1144 codec configuration for a stream changes. Loss of a notification 1145 messages should not be critical since a media receiver could either 1146 fall back to infer the approximate codec configuration from the media 1147 stream itself, or simply wait with a request until the next 1148 notification is sent. 1150 A notification can potentially contain a large amount of codec 1151 properties. However, parameters that are not enabled by codec and 1152 COP capability signaling, or inherently are not part of the used 1153 codec will not be included. The notification only describes the 1154 currently used codec configuration, and each parameter of an 1155 operation point will be described by a single value. To further 1156 limit the amount of properties to be sent, it is possible to rely on 1157 parameter defaults (listed by individual parameter type definitions) 1158 whenever those values are acceptable. 1160 The media receiver could want to take local action at the time when 1161 the codec configuration in the media stream changes. Using the same 1162 reasoning as above, this may not be possible to see from the media 1163 stream itself. This functionality is explicitly enabled by including 1164 the RTP time stamp in the notification, where the time stamp 1165 describes a time (possibly in the future) when the codec 1166 configuration is (estimated to be) effective. 1168 It is not required that a media sender sends notifications for all 1169 media streams or sub-streams. However, the non-announced streams or 1170 sub-streams will then not be accessible to media receiver control 1171 (Section 6.4). Any media or transport resources occupied by those 1172 non-announced streams (in COP terms) must be excluded from the total 1173 amount of available resources when deciding feasible parameter value 1174 ranges for the announced streams. 1176 6.6. Status Report 1178 The status report is sent by a media sender and is needed to confirm 1179 reception of a request OPID to avoid unnecessary retransmission of 1180 requests. Loss of a status report will likely trigger a request 1181 retransmission, except when the request sender can infer from the 1182 media stream or a notification that the stream is now acceptable. 1184 The status report is not a required acknowledgement of every request, 1185 but instead reports on the last received request, identified by a 1186 request sequence number in addition to the OPID. This decoupling of 1187 requests and status reports reduces the needed amount of status 1188 reports in case of frequently updated requests and/or lack of 1189 resources to send status reports. 1191 If a request is somehow not acceptable to a media sender, the status 1192 report can also indicate failure and a reason for failure. 1194 In case the OPID in the request is a "provisional" OPID 1195 (Section 6.4), the status report responds with that exact OPID, but 1196 also includes a reference to a "real" media (sub-)stream 1197 identification or OPID that the media sender considers appropriate 1198 for the request. 1200 No description of any codec configuration is included in a status 1201 report, even if the corresponding request was successful. The codec 1202 configuration is only carried in the notification (Section 6.5) 1203 message. Multiple status reports targeted for multiple request 1204 senders can through media (sub-)stream identification and OPID point 1205 to the same notification message, reducing the need to repeat 1206 applicable codec configuration parameters with every accepted 1207 request. 1209 6.7. Adding and Removing Operation Points 1211 A media sender can unilaterally create a new operation point by 1212 simply selecting a free OPID identifier and use COPN to announce it. 1214 To remove an operation point, the media sender simply stops 1215 announcing it in COPN. This procedure can be used both for entire 1216 media streams containing a single operation point and to add/remove 1217 sub-streams in media streams containing multiple operation points. 1219 The media receiver can request a new operation point to be created by 1220 using a COPR with an unused identifier and by setting a flag to 1221 indicate that this requests a new OPID. The media sender then 1222 decides if it honors the request or not, and announces the new OPID 1223 as described above. 1225 The media receiver can indicate that it is no longer interested in 1226 receiving an operation point corresponding to a media sub-stream by 1227 not including any COPR message item for it in a single COP message. 1228 The media receiver can indicate a wish to continue to receive an 1229 unmodified operation point using a COPR without any codec properties 1230 (no change). 1232 7. Codec Control Message Extension 1234 This specification specifies a new feedback message, COP, for codec 1235 control of real-time media, as an extension to the AVPF [RFC4585] and 1236 CCM [RFC5104] specifications. The AVPF specification outlines a 1237 mechanism for fast feedback messages over RTCP, which is applicable 1238 for IP based real-time media transport and communication services. 1239 It defines both transport layer and payload-specific feedback 1240 messages. This specification targets the payload-specific type, 1241 since a certain codec is typically described by a payload type. 1243 AVPF defines three and CCM defines four payload-specific feedback 1244 messages (PSFB). All AVPF and CCM messages are identified by means 1245 of the feedback message type (FMT) parameter. This specification 1246 specifies one additional payload-specific feedback message. 1248 One new PSFB FMT value is assigned in this specification: 1250 TBA1: Codec Operation Point (COP) 1252 This section defines the feedback message structure, message items 1253 and their semantics with the exception of the actual codec 1254 configuration parameters which are defined in the next section 1255 (Section 8). 1257 7.1. COP Message 1259 The COP message is a payload-specific AVPF CCM message identified by 1260 the PSFB FMT value listed above. It carries one or more COP message 1261 items, each with either a request for, a description of a certain 1262 "operation point"; a set of codec parameters, or a request status 1263 indication. 1265 Not all message items makes use of the "SSRC of media source" in the 1266 common packet header. "SSRC of media source" SHALL be set to 0 if no 1267 message item that makes use of it is included in the FCI. 1269 7.2. FCI Format 1271 The COP FCI MUST contain one or more codec operation point message 1272 items. The maximum number of COP message items in a COP message is 1273 limited by the [RFC4585] Common Packet Format 'length' field. 1275 The definition of the AVPF feedback message format mandates that the 1276 FCI part is a multiple of 32-bit words. The below defined message 1277 items will not be 32-bit word aligned. Therefore it is sometimes 1278 necessary to insert one to three padding bytes at the end of the FCI. 1279 The number of padding bytes are determined by a receiver by comparing 1280 the sum of the message items and the feedback message length fields. 1281 The padding byte MUST be set to zero (0) and ignored on reception. 1283 0 1 2 3 1284 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1286 |V=2|P|FMT=TBA1 | PT=206 | length | 1287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1288 | SSRC of packet sender | 1289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1290 | SSRC of media source | 1291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1292 | COP message item header #1 | 1293 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1294 | COP message item payload #1 : 1295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1296 : | COP message item header #2 : 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 : | COP message item payload #2 : 1299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1300 : ... : 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1302 : COP message item payload #N | Padding (0) | 1303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1305 Figure 8: COP RTCP Message Structure 1307 7.2.1. Message Item Format 1309 All codec operation point message items share a common header format: 1310 0 1 2 3 1311 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1313 |Type | Payload Length | OPID |N| Version | 1314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1315 : (Message Item Payload) : 1317 Figure 9: COP message item header format 1319 The message header fields are: 1321 Type (3 bits): Message item type. Three item types are defined in 1322 this specification, COPR, COPN and COPS, with values as listed in 1323 Table 1 below. More item types MAY be defined in extensions to 1324 this specification. Message items with a type field that has an 1325 unknown value SHALL be ignored by the receiver. 1327 Payload Length (13 bits): The total length in bytes of all data 1328 belonging to this message, following the message item header, i.e. 1329 anything following the Version field. 1331 OPID (8 bits): Operation point ID. Some (typically scalable) codecs 1332 are capable of encoding into multiple simultaneous operation 1333 points using the same SSRC, and each operation point can then be 1334 referenced by OPID. MUST be unique within the scope of an SSRC 1335 when N flag is not set. MUST be set to 0 for message items not 1336 using the field. See also Section 7.2.3. 1338 N (1 bit): A "New OPID" flag, indicating that the OPID value is 1339 chosen arbitrarily and is not meant to refer to any existing 1340 operation point. The message sender SHOULD NOT use an already 1341 known OPID in combination with the N flag. See also individual 1342 message item definitions. 1344 Version (7 bits): Referencing a specific version of the codec 1345 configuration identified by the OPID. 1347 7.2.2. Message Item Types 1349 The message types defined in this specification are: 1351 +-------+-------------------------------------------+ 1352 | Value | Message Item Type | 1353 +-------+-------------------------------------------+ 1354 | 0 | Codec Operation Point Notification (COPN) | 1355 | 1 | Codec Operation Point Request (COPR) | 1356 | 2 | Codec Operation Point Status (COPS) | 1357 | 3-6 | Unassigned | 1358 | 7 | Reserved for future extensions | 1359 +-------+-------------------------------------------+ 1361 Table 1: Message Item Type Values 1363 Each message type defined in this specification is described in 1364 detail in subsequent sections. 1366 7.2.3. Operation Point Identification 1368 All RTP media streams belonging to the same session can per 1369 definition be identified by the SSRC. However, identification of any 1370 sub-streams contained in the same RTP media stream (SSRC) needs to 1371 use some other identification method, scoped by the SSRC. This is 1372 the case for a media stream containing more than one operation point, 1373 like for example SVC [RFC6190] streams being sent using Single Stream 1374 Transport (SST) RTP packetization. 1376 The encoding of and restrictions for such sub-stream (operation 1377 point) identification will in general be codec specific. Therefore, 1378 the OPID used in this specification is merely an SSRC-unique 1379 identification number. It is however necessary to create a mapping 1380 between this generic number and the codec specific sub-stream 1381 identification that can be found in the media stream. This mapping 1382 is achieved by including the ID parameter (Section 8.3) in a message 1383 item carrying a certain OPID. 1385 In Section 10, codec specific ID parameter formats are defined for a 1386 few of the most common codecs that supports scalability. 1388 7.3. Codec Operation Point Notification 1390 7.3.1. Message Format 1392 0 1 2 3 1393 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1395 |Type | Payload Length | OPID |N| Version | 1396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1397 | Transition Time Stamp | 1398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1399 |R|Payload Type | Codec Configuration Parameters : 1400 +-+-+-+-+-+-+-+-+ : 1401 : : 1403 Figure 10: COPN format 1405 The COPN-specific message fields are (see also message item format 1406 (Section 7.2.1)): 1408 Type (3 bits): Set to 0, as listed in Table 1. 1410 OPID (8 bits): The OPID which is described by the codec 1411 configuration parameters. 1413 N (1 bit): Not used by COPN and SHALL be set to 0 by senders. 1415 Version (7 bits): Referencing a specific version of the codec 1416 configuration identified by the OPID. SHALL be increased by 1 1417 modulo 2^8 whenever the used codec configuration referenced by the 1418 OPID is changed. A repeated message SHALL NOT increase the 1419 Version. The initial value SHOULD be chosen randomly. 1421 Transition Time Stamp (32 bits): The RTP Time Stamp value when the 1422 listed codec configuration parameters will be effective in the 1423 media stream, using the same time line as RTP packets for the 1424 referenced SSRC (media sender SSRC). The Time Stamp value MAY 1425 express either a time in the past or in the future, and need not 1426 map exactly to an actual RTP Time Stamp present in an RTP packet 1427 for that SSRC. The same timestamp value SHOULD be used for 1428 subsequent transmissions of the identical set of codec 1429 configuration parameters for the same OPID and version. 1431 R (1 bit): Reserved. MUST be set to 0 by senders and MUST be 1432 ignored by receivers implementing this specification. MAY be 1433 defined differently by extensions to this specification. 1435 Payload Type (7 bits): SHALL be identical to the RTP header Payload 1436 Type valid for the (sub-)stream described by this OPID. 1438 Codec Configuration Parameters (variable length): Contains zero or 1439 more TLV carrying codec configuration parameters as defined in 1440 parameter types (Section 8). 1442 7.3.2. Semantics 1444 This message is used to inform the media receiver(s) about used codec 1445 configuration parameters at the media sender. The available codec 1446 parameter types that can be used to describe the codec configuration 1447 are defined in Section 8. 1449 Some codecs may have clear inband indications in the encoded media 1450 stream of how one or more of the codec configuration parameters are 1451 configured. For those codecs and codec configuration parameters, 1452 COPN is not strictly necessary. Still, for some codecs and / or for 1453 some codec configuration parameters, it is not unambiguously possible 1454 to see individual codec configuration parameter values from the 1455 encoded media stream, or even possible to see some codec 1456 configuration parameters at all, motivating use of COPN. 1458 COPN SHOULD be scheduled for transmission when it becomes known that 1459 there are media receivers in the RTP session that did not yet receive 1460 any codec configuration parameters for an active operation point, or 1461 whenever the effective codec configuration parameters has changed 1462 significantly, but MAY be scheduled for transmission at any time. 1463 The media sender decides what amount of change is required to be 1464 considered significant. 1466 The reason for a codec configuration parameter change can either be 1467 local to the sending terminal, for example as a result of user 1468 interaction or some algorithmic decision, or resulting from reception 1469 of one or more COPR messages (Section 7.4). 1471 If a media sender can no longer fulfill the established codec 1472 configuration parameter restrictions of a operation point that was 1473 previously described by a COPN, it MAY change any codec configuration 1474 parameter or even remove the entire operation point, and SHOULD then 1475 signal this at the earliest opportunity by sending an updated COPN to 1476 the media receiver(s). 1478 An OPID can implicitly be indicated as no longer being used by 1479 omitting that OPID from the set of COPN message items in the COP PSFB 1480 message. All OPIDs that the media sender intends to use at the 1481 latest time indicated by any transition timestamp value in the set of 1482 COPN present in the COP PSFB message, MUST be included in that COP 1483 message. 1485 All operation points referred by a COPS (Section 7.5) SHOULD also be 1486 detailed by a COPN message contained in the same or in a subsequent 1487 COP feedback message, even if the operation point did not change 1488 significantly from previous COPN. 1490 Note that the OPID Version of that COPN, subsequent to COPS, will be 1491 equal or larger than the Version indicated in the COPS. The Version 1492 difference may be larger than one (taking field wraparound into 1493 account) depending on the number of updated COPN sent since the COPR 1494 that triggered the COPS. See also description of those messages 1495 below. 1497 Note: COPN may be seen as a more explicit and elaborate version of 1498 the TSTN message of [RFC5104] and most of the considerations detailed 1499 there for TSTN also apply to COPN. 1501 7.3.2.1. Parameters 1503 The media sender decides what codec configuration parameters to use 1504 in the COPN to describe an operation point. It is RECOMMENDED that 1505 all codec configuration parameters that were accepted as restrictions 1506 based on received COPR messages are included. All codec 1507 configuration parameters significantly more restrictive than implicit 1508 or explicit restrictions set by capability signaling (outside the 1509 scope of this specification) SHOULD also be included. Any codec 1510 configuration parameter that are either not applicable to the Payload 1511 Type or not enabled by capability signaling MUST NOT be included. 1512 All codec configuration parameters not covered by the above 1513 restrictions MAY be included. 1515 When the operation point has dependency to other operation points 1516 (such as in scalable coding), the values to use for codec 1517 configuration parameters MUST describe the result when all 1518 dependencies are utilized. For example, assume an operation point 1519 describing a base layer with 15 Hz framerate, and a dependent 1520 operation point describing an enhancement layer adding another 15 Hz 1521 to the base layer, resulting in 30 Hz framerate when both layers are 1522 combined. The correct parameter value to use for that latter, 1523 dependent "enhancement" operation point is 30 Hz, not the 15 Hz 1524 difference. 1526 The value of a codec configuration parameter that was not included in 1527 a COPN message SHOULD either be inferred from other signaling, e.g. 1528 session setup or capability negotiation, outside the scope of this 1529 specification, or if such signaling is not available or not 1530 applicable, use the default value as defined per parameter type 1531 (Section 8). 1533 An operation point describes one specific setting of codec 1534 parameters, and a COPN message therefore MUST NOT include the ALT 1535 parameter type (Section 8.2) in the codec parameters describing the 1536 operation point. 1538 7.3.2.2. Relation to COPR 1540 To limit RTCP bandwidth and avoid bandwidth expansion, COPN is not 1541 mandated as response to every received COPR (Section 7.4). 1543 A media sender implementing this specification SHOULD take requested 1544 operation points from COPR messages into account for future encoding, 1545 but MAY decide to use other codec configuration parameter values than 1546 those requested, e.g. as a result of multiple (possibly 1547 contradicting) COPR messages from different media receivers, or any 1548 media sender policies, rules or limitations. Thus, a COPN message 1549 operation point MAY use other codec configuration parameters and 1550 other values than those requested in a COPR. 1552 The media sender SHOULD try to maintain OPIDs between COPR and COPN 1553 when COPR sender suggests a new OPID value (N flag is set) in the 1554 COPR, but MAY use another OPID in COPN. Examples where other OPID 1555 values have to be chosen are for example when the suggested OPID 1556 conflicts with an already existing OPID, or when the media sender 1557 decides that a the suggested new OPID can be fulfilled by an already 1558 existing OPID. 1560 Even if a COPR references an existing OPID (N flag cleared), the 1561 media sender may have to take other aspects than a specific COPR into 1562 account when choosing how many operation points to use, and the exact 1563 contents of those operation points. See the description on COPS 1564 (Section 7.5) on how to achieve mapping between a suggested new OPID 1565 and what OPID will actually be used. 1567 When OPID cannot be kept the same between COPN and COPR, the mapping 1568 SHALL be done using identical ID parameters (Section 8.3) in the COPS 1569 and COPN resulting from the COPR. Further details are described in 1570 the section on COPS (Section 7.5). 1572 Since COPR references a certain COPN OPID, Version, and COPN is send 1573 unreliably and may be lost, COPN senders MUST keep at least the two 1574 last COPN Versions for each SSRC, OPID tuple and SHOULD keep at least 1575 four. 1577 7.3.3. Timing Rules 1579 The timing follows the rules outlined in section 3 of AVPF [RFC4585]. 1580 This notification message may be time critical and SHOULD be sent 1581 using early or immediate feedback RTCP timing, but MAY be sent using 1582 regular RTCP timing. 1584 A typical example when regular RTCP timing can be appropriate is when 1585 the sent media stream is further restricted from what was described 1586 by the most recent COPN, which should not cause any problems in the 1587 media receivers. Similarly, it is likely appropriate to use early or 1588 immediate timing when effective media stream restrictions urgently 1589 needs to be removed, which may require media receivers to increase 1590 their resource usage. 1592 7.4. Codec Operation Point Request 1594 7.4.1. Message Format 1596 0 1 2 3 1597 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1599 |Type | Payload Length | OPID |N| Version | 1600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1601 | Sequence No | Codec Configuration Parameters : 1602 +-+-+-+-+-+-+-+-+ : 1603 : : 1605 Figure 11: COPR format 1607 The COPR-specific message fields are: 1609 Type (3 bits): Set to 1, as listed in Table 1. 1611 OPID (8 bits): The OPID this request refers to for an existing OPID, 1612 and an arbitrarily chosen but unique value in requests for new 1613 operations points, i.e. with the N flag set. 1615 N (1 bit): MUST be set to 0 when OPID references an existing OPID 1616 announced in a COPN received from the targeted media sender, and 1617 MUST be set to 1 otherwise. 1619 Version (7 bits): When N flag is not set (0), referencing a specific 1620 version of the codec configuration identified by the OPID in a 1621 COPN received from the targeted media sender. Not used and MUST 1622 be set to 0 when N flag is set (1). 1624 Sequence No (8 bits): Sequence Number. SHALL be incremented by 1 1625 modulo 2^8 for every COPR that includes an updated set of 1626 requested codec configuration parameters described by the same 1627 OPID and Version as was used with the previous Sequence Number. 1628 Sequence Number SHALL be kept unchanged in repetitions of this 1629 message. Initial value SHOULD be chosen randomly. 1631 Codec Configuration Parameters (variable length): Contains zero or 1632 more TLV carrying codec configuration parameters as defined in 1633 parameter types (Section 8). 1635 7.4.2. Semantics 1637 This message item is sent by a media receiver wanting to control one 1638 or more codec configuration parameters of the targeted media sender. 1639 The requested values MUST stay within the media capability negotiated 1640 by other means than this specification. The available codec 1641 configuration parameters that can be controlled are listed in 1642 Section 8. 1644 Note: COPR may be seen as a more explicit and elaborate version of 1645 the TSTR message of [RFC5104] and most of the considerations detailed 1646 there for TSTR also apply to COPR. 1648 7.4.2.1. Sender Behavior 1650 If at least one COPN (Section 7.3) is received for the targeted 1651 stream, the codec configuration parameters for that stream (SSRC) 1652 with defined OPID and Version are known to the COPR sender. The COPR 1653 MUST refer to the OPID and Version of the most recently received COPN 1654 (if any) for the targeted stream. Since it references a defined set 1655 of codec configuration parameters from a COPN, the COPR SHOULD only 1656 include the codec configuration parameters it wishes to change in the 1657 message, but it MAY include also unchanged codec configuration 1658 parameters. 1660 If no COPN is received for the targeted stream, the COPR sender MUST 1661 choose an arbitrary OPID and set the N flag to indicate that the OPID 1662 does not refer to any existing operation point. In this case the 1663 Version field is not used and MUST be set to 0. The OPID value SHALL 1664 NOT be identical to any OPID from the same media source that the 1665 media receiver is aware of and has received COPN for. Since in this 1666 case no COPN reference exist, the COPR sender SHOULD include all 1667 codec configuration parameters that it wishes to include a specific 1668 restriction for (other than the default). Note that for some codecs, 1669 some codec configuration parameters may be possible to infer from the 1670 media stream, but if the wanted restriction includes also those and 1671 lacking a describing COPN, they SHOULD anyway be included explicitly 1672 in the COPR. 1674 Any codec configuration parameter that are not enabled by capability 1675 signaling MUST NOT be included. 1677 A COPR sender MUST increment the SN field modulo 2^8 with every new 1678 COPR that includes any update to the codec configuration parameters 1679 (referring to a specific version of an OPID compared to the 1680 previously sent SN, as long as it does not receive any COPS 1681 (Section 7.5) with the same OPID, Version, and SN as was used in the 1682 most recently sent COPR. COPR having a later SN MUST be interpreted 1683 as replacing any COPR with identical OPID and Version but with lower 1684 SN, taking field wrap into account. 1686 A COPR sender that did not receive any corresponding COPS, but did 1687 receive a COPN with the same OPID and with a higher Version than was 1688 used in the last COPR SHALL reconsider the COPR and MAY send an 1689 updated COPR referencing the new Version. 1691 If the capability negotiation has established that a codec supporting 1692 scalable operation is used, and if the media receiver wishes to 1693 request that scalability is used, it MAY do so by sending multiple 1694 COPR with different OPID to the same media sender. The OPID and 1695 Version used in such request MAY be based on an existing operation 1696 point, but it MAY also indicate a desire to introduce scalability 1697 into a previously non-scalable stream by choosing a new OPID 1698 (indicated by setting the N flag). In any case, the resulting OPIDs 1699 and sub-streams are identified through use of the ID parameter 1700 (Section 8.3) in subsequent COPS and COPN. See also the description 1701 of COPS (Section 7.5). 1703 An operation point without any codec configuration parameters MAY be 1704 used and MUST be interpreted as a request to keep the operation point 1705 unchanged. This is especially useful when modifying some but not all 1706 in a set of sub-streams. 1708 When a COPR sender is receiving multiple operation points and wants 1709 to continue to do so, it MUST include all operation points it still 1710 wishes to receive in the COPR, also those that can be left unchanged. 1712 An COPR MAY also describe alternative operation points that the media 1713 sender can choose from, through use of one or more ALT parameters 1714 (Section 8.2). 1716 Since COPR references a specific COPN using SSRC, OPID and Version, a 1717 COPR sender typically needs to keep the latest Version of received 1718 COPN for each SSRC and OPID, also including the codec configuration 1719 parameters. 1721 7.4.2.2. Media Sender Behavior 1723 A media sender receiving a COPR SHOULD take the request into account 1724 for future encoding, but MAY also take COPR from other media 1725 receivers and other information available to the media sender into 1726 account when deciding how to change encoding properties. 1728 A media receiver sending COPR thus cannot always expect that all 1729 parameter values of the request are fully honored, or even honored at 1730 all. It can only know that the COPR was taken into account when 1731 receiving a COPS (Section 7.5) from the media sender with a matching 1732 OPID, Version and SN. 1734 To what extent a COPR is honored is described by the chosen codec 1735 configuration parameter values contained in a subsequent COPN message 1736 (Section 7.3) with a later (taking wraparound into account) Version 1737 than the one referred by the COPR. 1739 7.4.3. Timing Rules 1741 The timing follows the rules outlined in section 3 of [RFC4585]. 1742 This request message MAY be sent using Immediate, Early or Regular 1743 timing depending on the application's needs. 1745 A COPR sender that did not receive a corresponding COPS MAY choose to 1746 retransmit the COPR, without increasing the SN. 1748 When an RTP media receiver (SSRC) is timing out or leaves (BYE 1749 received) from the RTP session, it SHALL implicitly imply that all 1750 COPR restrictions put by that media receiver are removed. 1752 7.5. Codec Operation Point Status 1754 7.5.1. Message Format 1755 0 1 2 3 1756 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1758 |Type | Payload Length | OPID |N| Version | 1759 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1760 | SSRC of COPR sender | 1761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1762 | Sequence No | RC | Reason |Codec Configuration Parameters : 1763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ : 1764 : : 1766 Figure 12: COPS format 1768 The COPS-specific message fields are: 1770 Type (3 bits): Set to 2, as listed in Table 1. 1772 OPID (8 bits): MUST be set identical to the same field in the COPR 1773 being reported on. 1775 N (1 bit): MUST be set identical to the same field in the COPR being 1776 reported on. 1778 Version (7 bits): MUST be set identical to the same field in the 1779 COPR being reported on. 1781 SSRC of COPR sender (32 bits): MUST be set identical to the SSRC of 1782 packet sender field in the common AVPF header part of the COPR 1783 being reported on. 1785 Sequence No (8 bits): MUST be set identical to the same field in the 1786 COPR being reported on. 1788 RC (3 bits): Return Code. Indicates degree of success or failure of 1789 the COPR being reported on, as described in Table 2. 1791 Reason (5 bits): Contains more detailed information on the reason 1792 for success or failure, as described in Table 3 or extensions to 1793 this specification. 1795 Codec Configuration Parameters (variable): MAY contain an ID codec 1796 configuration parameter providing codec specific media 1797 identification of the OPID, subject to conditions outlined in the 1798 text below, or MAY be empty. 1800 7.5.2. Semantics 1802 The COPS message item indicates the request status related to a 1803 certain SSRC OPID tuple by listing the latest received COPR 1804 (Section 7.4) SN. It effectively informs the COPR sender that it no 1805 longer needs to resend that COPR SN (or any previous SN). 1807 COPS indicates that the specified COPR was successfully received by 1808 the media sender targeted in the request. If the COPR suggested 1809 codec configuration parameters could be understood (Table 2), they 1810 may be taken into account, possibly together with COPR messages from 1811 other receivers and other aspects applicable to the specific media 1812 sender. The Return Code carries an indication to which extent the 1813 COPR could be honored. 1815 +-------+-------------------------------+ 1816 | Value | Meaning | 1817 +-------+-------------------------------+ 1818 | 0 | Success | 1819 | 1 | Partial success | 1820 | 2 | Failure | 1821 | 3-6 | Unassigned | 1822 | 7 | Reserved for future extension | 1823 +-------+-------------------------------+ 1825 Table 2: Return Code Values 1827 A Success Return Code indicates that the resulting media 1828 configuration is fully in line with the COPR. 1830 A Partial Success Return Code indicates that the resulting media 1831 configuration is not fully in line with the COPR, but that the media 1832 sender regards the COPR to be sufficiently well represented by one or 1833 more of the existing operation points. 1835 A Failure Return code indicates that the media sender failed to take 1836 the COPR into account, either due to some error condition or because 1837 no media stream could be created or changed to comply. 1839 The Reason Values defined below are independent of Return Code, but 1840 all reasons may not be meaningful with all return codes. More 1841 reasons MAY be defined in extensions to this specification. 1843 +-------+----------------------------------------------------------+ 1844 | Value | Meaning | 1845 +-------+----------------------------------------------------------+ 1846 | 0 | Success | 1847 | 1 | Unknown OPID | 1848 | 2 | Too many operation points | 1849 | 3 | Request violates capability limits | 1850 | 4 | Too old operation point version | 1851 | 5 | Unknown parameter type | 1852 | 6 | Parameter value too long | 1853 | 7 | Invalid comparison type | 1854 | 8 | One or more parameter values in the request were changed | 1855 | 9-31 | Unassigned | 1856 +-------+----------------------------------------------------------+ 1858 Table 3: Reason Values 1860 COPS is typically sent without any codec configuration parameters. 1861 When the N flag was set in the related COPR, a non-failing COPS MUST 1862 include an ID parameter (Section 8.3) identifying the actual sub- 1863 stream that the media sender considers applicable to the COPR. The 1864 OPID used by that sub-stream can be found through examining ID 1865 parameters of subsequent COPN from the same media source for ID 1866 values matching the one in COPS. 1868 Senders implementing this specification MUST NOT use any other codec 1869 configuration parameter types than ID in a COPS message. The 1870 contained ID parameter points to the specific media (sub-)stream that 1871 the media sender regards as applicable to the COPR. 1873 When a COPR receiver has received multiple COPR messages from a 1874 single COPR source with the same OPID but with several different 1875 values of Version and/or SN, and for which it has not yet sent a 1876 COPS, it SHALL only send COPS for the COPR with the Highest SN, 1877 taking field wrap of those two fields into account. 1879 7.5.3. Timing Rules 1881 COPS SHALL be sent at the earliest opportunity after having received 1882 a COPR, with the following exception: 1884 A media sender that receives a COPR with a previously received 1885 OPID, Version, and SN closely after sending a COPS for that same 1886 OPID, Version, and SN (within 2 times the longest observed round 1887 trip time, plus any AVPF-induced packet sending delays), SHOULD 1888 await a repeated COPR before scheduling another COPS transmission 1889 for that OPID, Version, and SN. 1891 The exception is introduced to avoid unnecessary COPS transmission 1892 when there is a chance that already sent COPS or COPN may satisfy or 1893 invalidate the COPR. 1895 7.6. Handling in Mixers and Translators 1897 7.6.1. COPN 1899 Any media sender, including mixers and translators, that sends RTP 1900 media marked with it's own SSRC and that implements this 1901 specification SHALL also be prepared to send COPN, even if it is not 1902 the originating media source. As a result of that, such media sender 1903 may have to send updated COPN whenever the included media sources 1904 (CSRC) changes, subject to rules laid out above (Section 7.3.2). 1905 Note that this can be achieved in different ways, for example by 1906 forwarding (possibly cached) COPN from the included CSRC when the 1907 mixer is not performing transcoding. 1909 In cases where a mixer or translator needs to forward a COPR from one 1910 side (A) to the other (B) (as described in Section 7.6.2), the COPN 1911 sent to the A side MAY need to be delayed until the mixer or 1912 translator has received a corresponding COPN from the B side, as 1913 indicated in Figure 13 below. 1914 +-------+ 1. COPR +-------+ 2. COPR +-------+ 1915 | |-------->| |-------->| | 1916 | A | 4. COPN | Mixer | 3. COPN | B | 1917 | |<--------| |<--------| | 1918 +-------+ +-------+ +-------+ 1920 Figure 13: Mixer delay of COPN 1922 If a mixer or translator has decided to act partially (modify the 1923 media stream with respect to some parameter types, but not all) on a 1924 received COPR from the A side, and a COPN is received from the B side 1925 indicating that the current media modifications are no longer 1926 necessary, the mixer or translator SHOULD cease it's own actions that 1927 are no longer needed. It SHOULD then also issue a COPN describing 1928 the new situation to the A side, as indicated in Figure 14 below. 1929 +-------+ 1. COPR +-------+ +-------+ 1930 | |-------->| | 2. COPR | | 1931 | | 3. COPN | |-------->| | 1932 | A |<--------| Mixer | 4. COPN | B | 1933 | | 5. COPN | |<--------| | 1934 | |<--------| | | | 1935 +-------+ +-------+ +-------+ 1937 Figure 14: Mixer update of COPN 1939 7.6.2. COPR 1941 A mixer or media translator that implements this specification and 1942 encodes content sent to the media receiver issuing the COPR SHALL 1943 consider the request to determine if it can fulfill it by changing 1944 its own encoding parameters. A mixer encoding for multiple session 1945 participants will need to consider the joint needs of all 1946 participants when generating a COPR on its own behalf towards the 1947 media sender. 1949 A mixer or translator able to fulfill the COPR partially MAY act on 1950 the parts it can fulfill (and SHALL then send COPS and COPN 1951 accordingly), but SHOULD anyway forward the unaltered COPR towards 1952 the media sender, since it is likely most efficient to make the 1953 necessary codec configuration parameter changes directly at the 1954 original media source. 1956 A media translator that does not act on COP messages will forward 1957 them unaltered, according to normal translator rules. 1959 7.6.3. COPS 1961 A mixer or media translator that implements this specification, 1962 encoding content sent to media receivers and that acts on COPR SHALL 1963 also report using COPS, just like any other media sender. An RTP 1964 translator not knowing or acting on COPR will forward all COP 1965 messages unaltered, according to normal RTP translator rules. 1967 8. Parameter Types 1969 This section defines the general codec configuration parameter (CCP) 1970 TLV format. Then a number of different parameter formats are 1971 defined. It is expected that a number of additional CCPs will be 1972 defined in the future as the needs of different codecs are explored 1973 or developed. 1975 8.1. Parameter Format 1977 COP message items MAY contain one or more codec configuration 1978 parameters, encoded in TLV (Type-Length-Value) format, which SHOULD 1979 then be interpreted as simultaneously applicable to the defined 1980 operation point. Parameter values MUST be byte-aligned. 1982 0 1 2 3 1983 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1985 | ParamType | C | Length | | 1986 +---------------+---+-----------+ | 1987 | | 1988 / Parameter Value / 1989 / +--------------+ 1990 | | 1991 +------------------------------------------------+ 1993 Figure 15: Codec parameter format 1995 ParamType (8 bits): The codec configuration parameter type, encoded 1996 as defined in Table 4 and possible extensions to this 1997 specification. A parameter with an unknown ParamType SHALL be 1998 ignored on reception in a COPN and SHALL either be reported as 1999 unknown in COPS or be ignored when received in COPR. 2001 C (2 bits): Comparison Type, encoded as defined in Table 5, unless 2002 specified otherwise by individual ParamType definitions. The 2003 Comparison Type specifies what type of restriction the codec 2004 configuration parameter value expresses and how it should be 2005 compared to other codec configuration parameter values of the same 2006 ParamType. 2008 Exact: The parameter value is an exact value, and no other values 2009 are acceptable. MUST NOT be used together with any other 2010 Comparison Types for the same ParamType. 2012 Minimum: The parameter value is an inclusive minimum restriction. 2013 MAY be used together with Maximum and/or Target Comparison 2014 Types for the same ParamType. If no minimum restriction is 2015 specified, no specific minimum restriction exist. 2017 Maximum: The parameter value is an inclusive maximum restriction. 2018 MAY be used together with Minimum and/or Target Comparison 2019 Types for the same ParamType. If no maximum restriction is 2020 specified, no specific maximum restriction exist. 2022 Target: The parameter value is a preferred target value, but 2023 other values within a specified range are acceptable. This 2024 type MUST be used together with at least one of Minimum and 2025 Maximum Comparison Types for the same ParamType. If no target 2026 is specified, no specific preference exist. 2028 Length (6 bits): The parameter value Length in bytes, excluding the 2029 ParamType and the Length field itself. A Length of 0 indicates 2030 that the parameter has no value, effectively constituting a wild- 2031 carded parameter that can take on any value (expresses no specific 2032 restriction). This is also the RECOMMENDED way to explicitly 2033 remove a previously effective restriction. 2035 Parameter Value (variable length): The actual parameter value, 2036 encoded in a format defined by the specific ParamType definition. 2038 The meaning of Multiple codec configuration parameters with the same 2039 ParamType and the same Comparison Type included as part of the same 2040 operation point is undefined and SHALL NOT be used. 2042 A codec configuration parameter that is encoded in a way (including 2043 incorrectly) that cannot be interpreted by the receiver SHALL be 2044 ignored. 2046 The below parameters encoded as signed or unsigned integers uses a 2047 variable size representation in the value field. It is RECOMMENDED 2048 to only include the minimal number of bytes necessary to represent 2049 the value that is to be included in the parameter TLV. The length 2050 field in the parameter TLV will explicitly indicate how many bytes 2051 are present in the value field. All parameters using a variable size 2052 representation of their value MUST define the maximum number of bytes 2053 possible to include in the value field. 2055 The ParamType values and the SDP tags (see Section 9) for the codec 2056 configuration parameter types defined in this specification are 2057 listed below. 2059 +--------+-------------------------------+--------------+ 2060 | Value | Meaning | Tag | 2061 +--------+-------------------------------+--------------+ 2062 | 0 | ALT | alt | 2063 | 1 | ID | id | 2064 | 2 | Payload Type | pt | 2065 | 3 | Bitrate | bitrate | 2066 | 4 | Token Bucket Size | token-bucket | 2067 | 5 | Framerate | framerate | 2068 | 6 | Horizontal Pixels | hor-size | 2069 | 7 | Vertical Pixels | ver-size | 2070 | 8 | Sample Aspect Ratio | sar | 2071 | 9 | Picture Aspect Ratio | par | 2072 | 10 | Channels | channels | 2073 | 11 | Sampling Rate | sampling | 2074 | 12 | Maximum RTP Packet Size | max-rtp-size | 2075 | 13 | Maximum RTP Packet Rate | max-rtp-rate | 2076 | 14 | Frame Aggregation | aggregate | 2077 | 15-254 | Undefined | | 2078 | 255 | Reserved for future extension | | 2079 +--------+-------------------------------+--------------+ 2081 Table 4: Parameter Type Values 2083 The values of the defined parameter value comparison type are listed 2084 below. 2086 +-------+---------+ 2087 | Value | Meaning | 2088 +-------+---------+ 2089 | 0 | Exact | 2090 | 1 | Minimum | 2091 | 2 | Maximum | 2092 | 3 | Target | 2093 +-------+---------+ 2095 Table 5: Comparison Type Values 2097 The following sub-sections describe the syntax and semantics of the 2098 different codec configuration parameter types defined in this 2099 specification. 2101 Unless explicitly specified in the sub-sections below, or in 2102 extensions to this specification, all parameter type values are 2103 binary encoded unsigned integers, most significant byte first (for 2104 multi-byte values). 2106 8.2. ALT 2108 This codec parameter type is a special parameter, separating the 2109 codec configuration parameters preceding it from the ones that follow 2110 into two separate, alternative operation points. 2112 Type Value: 0 2114 Tag: alt 2116 Unit: Not applicable. 2118 Semantics: A special parameter expressing an "alternative" relation 2119 between the parameters preceding it and the parameters following 2120 it. This SHOULD be interpreted as describing two alternate 2121 operation points where one and only one SHALL be chosen, with the 2122 operation point preceding ALT in the parameter list being 2123 preferred. Multiple ALT parameters MAY be used in the same 2124 parameter list, in which case each set of parameters to evaluate 2125 can be either before the first ALT parameter, between two ALT 2126 parameters, or after the last ALT parameter. Evaluating from the 2127 top of the list and obeying the above preference rule, the first 2128 acceptable set of parameters (not containing any ALT parameter) is 2129 the one to choose. 2131 Encoding: Not applicable. 2133 Media Types: All. 2135 Value Restrictions: MUST be used with the Length field set to 0. 2136 Two ALT parameters MUST be separated by at least one parameter 2137 other than ALT. 2139 Default Value: Not applicable. 2141 Comparison Types: MUST be set to 0. 2143 Note: 2145 8.3. ID 2147 This codec parameter type is a special parameter that enables codec 2148 specific identification of sub-streams, for example when there are 2149 multiple sub-streams in a single SSRC. It can also be used to 2150 reference OPID, when the used codec does not support or use sub- 2151 streams. When used, it SHALL be listed first among the codec 2152 parameters used to describe the sub-stream. 2154 Type Value: 1 2156 Tag: id 2158 Unit: Not applicable. 2160 Semantics: A special parameter describing the, possibly codec 2161 specific, media identification for the OPID. 2163 Encoding: If used with non-scalable encoding, it MUST contain an 2164 OPID (Section 7.2.1). If used with scalable encoding, this codec 2165 specific encoding MUST be defined by Section 10. It MUST be 2166 defined to occupy an integer number of bytes, where all bits in 2167 the bytes are defined as part of the format. 2169 Media Types: All. 2171 Value Restrictions: If used with non-scalable encoding, any OPID 2172 restrictions apply. If used with scalable encoding, any 2173 restrictions MUST be defined by the definition of the codec 2174 specific sub-stream identification definition (Section 10). 2176 Default Value: Not set. 2178 Comparison Types: MUST be set to 0. 2180 Note: MAY be used whenever there is a need to identify an operation 2181 point in codec native format, or when there is a need to map that 2182 against an OPID. 2184 8.4. Payload Type 2186 Type Value: 2 2188 Tag: pt 2190 Unit: Not applicable. 2192 Semantics: Referencing the RTP Payload Type to use for the OPID. 2194 Encoding: The least significant 7 bits MUST use the same encoding as 2195 the RTP Payload Type field in the RTP header. The most 2196 significant bit MUST be set to 0. 2198 Media Types: All. 2200 Value Restrictions: The same restrictions valid for RTP Payload Type 2201 apply, i.e. 7-bit values 0-127. MUST be represented by a single 2202 byte in the value field. 2204 Default Value: Not set. 2206 Comparison Types: MUST be set to 0. 2208 Note: MAY be used whenever there is a need to specify codec 2209 configuration parameters valid only for a certain RTP Payload 2210 Type. What media type, codec and possible parameters that are 2211 described by the RTP Payload Type is outside the scope of this 2212 specification, but is typically defined in capability or call 2213 setup signaling, for example SDP. 2215 8.5. Bitrate 2217 Type Value: 3 2219 Tag: bitrate 2221 Unit: Bits per second. 2223 Semantics: Media level per second average media bitrate, excluding 2224 IP/UDP/RTP overhead, but including RTP payload headers (similar to 2225 b=TIAS from SDP signaling [RFC3890]), rounded up to the closest 2226 integer. 2228 Encoding: Binary encoded unsigned integer, most significant byte 2229 first. 2231 Media Types: All. 2233 Value Restrictions: A value of 0 MAY be used. The largest value 2234 allowed is what is possible to represent in a 64-bit unsigned 2235 integer value, i.e. a value between 0 and 2236 18,446,744,073,709,551,615. 2238 Default Value: Maximum value computed from capability or call setup 2239 signaling, e.g. b= parameter from SDP. Note that it is often not 2240 possible to achieve more than a rough estimation from such 2241 computation. 2243 Comparison Types: All. The Exact comparison type is meaningful only 2244 for streams that are able to produce a set of predictable (e.g. 2245 constant) packet sizes, sent at predictable (e.g. constant) inter- 2246 packet intervals. 2248 Note: This parameter used with a maximum comparison type parameter 2249 is significantly similar to CCM Temporary Maximum Media Bit Rate 2250 (TMMBR). When being used with a maximum or exact comparison type 2251 value of 0, it is also significantly similar to PAUSE 2252 [I-D.westerlund-avtext-rtp-stream-pause]. Compared to those, this 2253 parameter conveys significant extra information through the 2254 relation to other parameters applied to the same operation point, 2255 as well as the possibility to express other restrictions than a 2256 maximum limit. When CCM TMMBR is supported in addition to this 2257 specification, the Bitrate parameters from all operation points 2258 within each SSRC should be considered and CCM TMMBR messages 2259 SHOULD be sent for those SSRC that are found to be in the bounding 2260 set (see CCM [RFC5104], section 3.5.4.2). When PAUSE is supported 2261 in addition to this specification, the Bitrate parameters from all 2262 operation points within each SSRC should be considered and CCM 2263 PAUSE messages SHOULD be sent for those SSRC that contain only 2264 operation points that are limited by a Bitrate maximum value of 0. 2265 There only difference between setting the bitrate to 0 and 2266 removing the OPID entirely is that increasing the bitrate from 0 2267 just requires the bitrate parameter to be sent again, while re- 2268 activating a removed OPID requires it to be fully re-defined 2269 including all other parameters that are included in the OPID. 2271 8.6. Token Bucket Size 2273 Type Value: 4 2275 Tag: token-bucket 2277 Unit: Bytes. 2279 Semantics: Media level token bucket [RFC2212] size excluding IP/UDP/ 2280 RTP overhead, but including RTP payload headers, describing the 2281 bitrate variability over time as described in 2282 [I-D.westerlund-mmusic-sdp-bw-attribute]. This parameter can be 2283 combined with the parameter bitrate (Section 8.5) (above) to 2284 provide token bucket fill rate plus bucket size for a complete 2285 token bucket model. 2287 Encoding: Binary encoded unsigned integer, most significant byte 2288 first. 2290 Media Types: All. 2292 Value Restrictions: A value of 0 is generally not meaningful and 2293 SHOULD NOT be used. Values that can be represented using a 32-bit 2294 unsigned integer, i.e. 0 to 4,294,967,295. 2296 Default Value: 4096 bytes. 2298 Comparison Types: Maximum, Target. 2300 Note: Changing the token bucket size does not imply changing the 2301 average bitrate, it just changes the acceptable average bitrate 2302 variation over time. 2304 8.7. Framerate 2306 Type Value: 5 2308 Tag: framerate 2310 Unit: 100th of a Hz. This definition allows e.g. distinguishing 2311 between video encoded at 30 Hz (two-byte value 3000) and 29.97 Hz 2312 (two-byte value 2997). It also allows for high speed video 2313 cameras, like 1000 Hz (three-byte value 100000), and slow-scan 2314 down to one frame every 100 seconds (one-byte value 1). 2316 Semantics: The number of media frames to render per second. 2318 Encoding: Binary encoded unsigned integer, most significant byte 2319 first. 2321 Media Types: Mainly intended for video and timed image media types, 2322 but MAY be used also for other media types. 2324 Value Restrictions: A value of 0 MAY be used, meaning single-frame, 2325 request based encoding (request procedure is out of scope for this 2326 specification). Values that can be represented using a 32-bit 2327 unsigned integer, i.e. 0 to 42,949,672.95 Hz. 2329 Default Value: Maximum allowed by call setup and/or capability 2330 signaling, e.g. a=framerate parameter from SDP [RFC4566], or 2331 codec-specific configuration. 2333 Comparison Types: All. 2335 Note: A media frame is typically a set of semantically grouped 2336 samples, e.g. the relation that a video image has to its 2337 individual pixels, or the relation that an audio frame has to 2338 individual audio samples. The value applies to encoded media 2339 framerate, not the packet rate (Section 8.15) that may also be 2340 changed as a result of different Frame Aggregation (Section 8.16). 2341 When the COP end-point also makes use of CCM [RFC5104] TSTR/TSTN, 2342 COPN with this parameter MAY be used in combination with TSTN to 2343 explicitly indicate what framerate setting the TSTR resulted in, 2344 making it possible for the TSTR sender to adjust the used, 2345 relative TSTR scale to more closely match what framerate was 2346 actually received. 2348 8.8. Horizontal Pixels 2350 Type Value: 6 2352 Tag: hor-size 2354 Unit: Pixels. 2356 Semantics: Horizontal image size. 2358 Encoding: Binary encoded unsigned integer, most significant byte 2359 first. 2361 Media Types: Video and image. 2363 Value Restrictions: The meaning of the value 0 is not defined and 2364 SHALL NOT be used. 2366 Default Value: Maximum allowed by call setup and/or capability 2367 signaling. Values that can be represented using a 32-bit unsigned 2368 integer, i.e. 1 to 4,294,967,295. 2370 Comparison Types: All. 2372 Note: The pixel and picture aspect ratios cannot be changed with 2373 this parameter. Video encoders can typically describe both pixel 2374 and picture aspect ratios as part of the encoded media stream. If 2375 the COP end-point supports imageattr signaling [RFC6236], values 2376 for this parameter SHOULD be chosen only among the negotiated set 2377 in the SDP, and should be done so both for the media receiving 2378 COPR sender and the media sending COPN sender, according to 2379 imageattr values for the affected media stream direction. 2381 8.9. Vertical Pixels 2383 Type Value: 7 2385 Tag: ver-size 2387 Unit: Pixels. 2389 Semantics: Vertical image size. 2391 Encoding: Binary encoded unsigned integer, most significant byte 2392 first. 2394 Media Types: Video and image. 2396 Value Restrictions: The meaning of the value 0 is not defined and 2397 SHALL NOT be used. Values that can be represented using a 32-bit 2398 unsigned integer, i.e. 1 to 4,294,967,295. 2400 Default Value: Maximum allowed by call setup and/or capability 2401 signaling. 2403 Comparison Types: All. 2405 Note: See Note in Section 8.8. 2407 8.10. Sample Aspect Ratio 2409 Type Value: 8 2411 Tag: sar 2413 Unit: Unit-less value pair. 2415 Semantics: The ratio between the intended horizontal distance 2416 between the columns and the intended vertical distance between the 2417 rows of the luma sample array in a frame, similar to what is 2418 defined in [H241]. 2420 Encoding: Two binary encoded, unsigned 8-bit integers in order 2421 horizontal, vertical. 2423 Media Types: Video and image. 2425 Value Restrictions: The meaning of the value 0 is not defined and 2426 SHALL NOT be used as value in either the horizontal or vertical 2427 component. Component values that can be represented using an 2428 8-bit unsigned integer, i.e. 1 to 255. 2430 Default Value: The same as defined in [H241] when there is no 2431 explicit indication, based on image size. 2433 Comparison Types: All. 2435 Note: If the COP end-point supports imageattr signaling [RFC6236], 2436 values for this parameter SHOULD be chosen only among the 2437 negotiated set in the SDP, and should be done so both for the 2438 media receiving COPR sender and the media sending COPN sender, 2439 according to imageattr values for the affected media stream 2440 direction. 2442 8.11. Picture Aspect Ratio 2444 Type Value: 9 2446 Tag: par 2448 Unit: Unit-less value pair. 2450 Semantics: The ratio between the intended horizontal width and the 2451 intended vertical height of a displayed picture, similar to what 2452 is defined in [H241]. 2454 Encoding: Two binary encoded, unsigned 8-bit integers in order 2455 horizontal, vertical. 2457 Media Types: Video and image. 2459 Value Restrictions: The meaning of the value 0 is not defined and 2460 SHALL NOT be used as value in either the horizontal or vertical 2461 component. Component values that can be represented using an 2462 8-bit unsigned integer, i.e. 1 to 255. 2464 Default Value: The same as defined in [H241] when there is no 2465 explicit indication, based on image size. 2467 Comparison Types: All. 2469 Note: If the COP end-point supports imageattr signaling [RFC6236], 2470 values for this parameter SHOULD be chosen only among the 2471 negotiated set in the SDP, and should be done so both for the 2472 media receiving COPR sender and the media sending COPN sender, 2473 according to imageattr values for the affected media stream 2474 direction. 2476 8.12. Channels 2478 Type Value: 10 2479 Tag: channels 2481 Unit: Unit-less. 2483 Semantics: The number of media channels. 2485 Encoding: Binary encoded unsigned integer, most significant byte 2486 first. 2488 Media Types: All. 2490 Value Restrictions: The meaning of the value 0 is not defined and 2491 SHALL NOT be used. Values that can be represented using a 16-bit 2492 unsigned integer, i.e. 1 to 65,535. 2494 Default Value: Taken from call setup or capability signaling, or 1 2495 if no other value is available. 2497 Comparison Types: All. 2499 Note: This codec configuration parameter SHOULD NOT be used if the 2500 capability negotiation did not establish that suitable multi- 2501 channel coding is supported by both ends. For audio, the 2502 interpretation and spatial mapping SHALL follow the one for the 2503 indicated payload format. If no such channel mapping is defined 2504 in the payload format, and if not specifically signalled by other 2505 means, e.g. SDP, the channel configurations defined in [RFC3551] 2506 SHALL be used. For video, it SHALL be interpreted as the number 2507 of views in multiview coding, where the number 2 SHOULD represent 2508 stereo (3D) coding, unless negotiated otherwise by means outside 2509 of this specification, e.g. SDP. If multiple payload formats are 2510 defined and if those do not share channel configurations, the 2511 Payload Type parameter (Section 8.4) MUST be included as one of 2512 the parameters for the OPID. 2514 8.13. Sampling Rate 2516 Type Value: 11 2518 Tag: sampling 2520 Unit: Hz. 2522 Semantics: Frequency of the media sampling clock in Hz, as input to 2523 the codec, per channel (Section 8.12). 2525 Encoding: Binary encoded unsigned integer, most significant byte 2526 first. 2528 Media Types: Mainly intended for audio media, but MAY be used for 2529 other media types. 2531 Value Restrictions: The meaning of the value 0 is not defined and 2532 SHALL NOT be used. Values that can be represented using a 32-bit 2533 unsigned integer, i.e. 1 to 4,294,967,295. 2535 Default Value: Taken from call setup or capability signaling, e.g. 2536 RTP TS rate from SDP m-line. 2538 Comparison Types: All. 2540 Note: The value refers to the media sample clock, not the media 2541 Framerate (Section 8.7). It does not specify any codec-internal 2542 up- or down-sampling that may take place as part of the encoding 2543 process. If multiple channels (Section 8.12) are used and 2544 different channels use different sampling rates, then this 2545 parameter MUST NOT be used unless there is a known sampling rate 2546 relationship and an ordering between the channels, in which case 2547 the specified sampling rate value SHALL be taken as applicable to 2548 the first channel of the ordered set. The relationship may e.g. 2549 be known implicitly by each party through some specification, or 2550 be negotiated using other means than this specification. 2551 Typically only a limited subset of sampling frequencies makes 2552 sense to the media encoder, and sometimes it is not possible to 2553 change at all. For video, the sampling rate is very closely 2554 connected to the image horizontal (Section 8.8), vertical 2555 (Section 8.9) resolution, and framerate (Section 8.7), which are 2556 more explicit and meaningful and SHOULD therefore be used instead. 2557 For audio, changing sampling rate may require changing codec and 2558 thus changing RTP payload type. The actual media sampling rate 2559 may not be identical to the sampling rate specified for RTP Time 2560 Stamps for that RTP Payload Type. E.g. almost all video codecs 2561 use only 90 000 Hz sampling clock for RTP Time Stamps, while the 2562 actual pixel sampling clock is typically in the range from a few 2563 to several hundred MHz. Also some recent audio codecs use an RTP 2564 Time Stamp rate that differ from the actual media sampling rate. 2565 Aspects related to mid-stream changes of RTP Time Stamp rate is 2566 described in [I-D.ietf-avtext-multiple-clock-rates]. 2568 8.14. Maximum RTP Packet Size 2569 Type Value: 12 2571 Tag: max-rtp-size 2573 Unit: Bytes. 2575 Semantics: The maximum size of an RTP packet, including the RTP 2576 header but excluding lower layers. 2578 Encoding: Binary encoded unsigned integer, most significant byte 2579 first. 2581 Media Types: All. 2583 Value Restrictions: The meaning of a value less than the size of the 2584 RTP header (12 bytes for current RTP specification [RFC3550]) is 2585 not defined and SHOULD NOT be used. Values that can be 2586 represented using a 32-bit unsigned integer, i.e. 0 to 2587 4,294,967,295. 2589 Default Value: 1400 bytes for IPv4, 1280 bytes for IPv6 or if IP 2590 version cannot be determined. 2592 Comparison Types: Maximum. 2594 Note: The parameter should typically be used to adapt encoding to a 2595 known or assumed MTU limitation, and MAY be used to assist MTU 2596 path discovery in point-to-point as well as in RTP mixer or 2597 translator topologies. 2599 8.15. Maximum RTP Packet Rate 2601 Type Value: 13 2603 Tag: max-rtp-rate 2605 Unit: RTP packets per second. 2607 Semantics: Maximum number of RTP packets per second, calculated or 2608 estimated as the largest value appearing during a one-second 2609 sliding window, similar to the definition of "maxprate" [RFC3890]. 2611 Encoding: Binary encoded unsigned integer, most significant byte 2612 first. 2614 Media Types: All. 2616 Value Restrictions: The meaning of the value 0 is not defined and 2617 SHALL NOT be used. Values that can be represented using a 32-bit 2618 unsigned integer, i.e. 1 to 4,294,967,295. 2620 Default Value: Not set. 2622 Comparison Types: Maximum. 2624 Note: The parameter should typically be used to adapt encoding on a 2625 network that is packet rate rather than bitrate limited, if such 2626 property is known. This codec configuration parameter MUST NOT 2627 exceed any negotiated "maxprate" [RFC3890] value, if present. 2629 8.16. Application Data Unit Aggregation 2631 Type Value: 14 2633 Tag: aggregate 2635 Unit: Milliseconds. 2637 Semantics: The amount of non-redundant application data unit (ADU) 2638 representing different RTP Time Stamps that should be included in 2639 the RTP payload, henceforth in this specification called an "ADU 2640 aggregate". An ADU aggregation value of 1 is equivalent to no 2641 aggregation. 2643 Encoding: Binary encoded unsigned integer, most significant byte 2644 first. 2646 Media Types: Mainly intended for audio, but MAY be used also for 2647 other media, e.g. Real-Time Text [RFC4103]. 2649 Value Restrictions: The meaning of the value 0 is not defined and 2650 SHALL NOT be used. Values that can be represented using a 16-bit 2651 unsigned integer, i.e. 1 to 65,535. 2653 Value Default Value: 1. 2655 Comparison Types: All. 2657 Note: To use this parameter, there MUST exist a defined way of 2658 including multiple ADUs into the same RTP payload for the used RTP 2659 Payload Type. There MUST also exist a known internal timing 2660 relationship between individual ADUs within the RTP payload for 2661 the used RTP Payload Type. Some payload formats (typically video) 2662 do not allow multiple ADUs (representing different sampling times) 2663 in the RTP payload. This codec configuration parameter SHOULD NOT 2664 be used unless the "maxprate" [RFC3890] and/or "ptime" parameters 2665 are included in the SDP. The requested ADU aggregation level MUST 2666 NOT cause exceeding the negotiated "maxprate" value, if present, 2667 and SHOULD NOT exceed the negotiated "ptime" value, if present. 2668 The requested frame aggregation level MUST NOT be in conflict with 2669 any Maximum RTP Packet Size (Section 8.14) or Maximum RTP Packet 2670 Rate (Section 8.15) parameters. The packet rate that may result 2671 from different frame aggregation values is related to, but 2672 semantically not the same as, media Framerate (Section 8.7). 2674 9. SDP Extensions 2676 As described in [RFC4585] and [RFC5104], the rtcp-fb attribute may be 2677 used to negotiate capability to handle specific AVPF commands and 2678 indications, and specifically the "ccm" feedback value is used for 2679 codec control. All rules defined there related to use of "rtcp-fb" 2680 and "ccm" also apply to the new feedback message defined in this 2681 specification. 2683 9.1. Extension of the rtcp-fb Attribute 2685 In this document, a new "ccm" rtcp-fb-ccm-param is defined, according 2686 to the method of extension described in [RFC5104]: 2688 o "cop" indicates support for all COP message items defined in this 2689 specification, and one or more of the codec configuration 2690 parameters defined in this specification 2692 The ABNF [RFC5234] for the new rtcp-fb-ccm-param is: 2694 rtcp-fb-ccm-param =/ SP "cop" 1*rtcp-fb-ccm-cop-param 2695 ; rtcp-fb-ccm-param defined in [RFC5104] 2697 rtcp-fb-ccm-cop-param = SP "alt" 2698 / SP "id" 2699 / SP "pt" 2700 / SP "bitrate" 2701 / SP "token-bucket" 2702 / SP "framerate" 2703 / SP "hor-size" 2704 / SP "ver-size" 2705 / SP "sar" 2706 / SP "par" 2707 / SP "channels" 2708 / SP "sampling" 2709 / SP "max-rtp-size" 2710 / SP "max-rtp-rate" 2711 / SP "aggregate" 2712 / SP token ; for future extensions 2713 ; token defined in [RFC4566] 2715 Figure 16: ABNF for cop 2717 Token values for rtcp-fb-ccm-cop-param are defined in Table 4. Their 2718 semantics are described in Section 8. 2720 Supported parameter types are indicated by including one or more 2721 rtcp-fb-ccm-cop-param. 2723 9.2. Offer/Answer Usage 2725 The usage of Offer/Answer [RFC3264] in this specification inherits 2726 all applicable usage defined in [RFC5104]. 2728 In order to announce support, and willingness to use, the CCM "cop" 2729 feedback message, an offerer or answerer SHALL indicate that 2730 capability through the extended SDP rtcp-fb attribute, defined in 2731 Section 9.1. The offerer or answerer MUST include a list of the 2732 parameter types that it is willing to receive. 2734 If an SDP offer does not indicate support of the CCM "cop" feedback 2735 message, the answerer MUST NOT indicate support in the associated SDP 2736 answer. 2738 The answerer MAY add and/or remove parameter types that were not 2739 present in the associated SDP offer. If the answerer adds parameter 2740 types to the SDP answer, it MUST be able to receive such messages, 2741 but the answerer MUST NOT send such messages towards the offerer. 2743 If an SDP answer does not indicate support of the CCM "cop" feedback 2744 message, the offerer MUST NOT send such messages towards the 2745 answerer. 2747 The offerer and the answerer SHOULD NOT send any parameter types that 2748 the remote party did not indicate receive support for. As described 2749 in Section 8, a parameter with an unknown ParamType SHALL be ignored 2750 on reception in a COPN and SHALL either be reported as unknown in 2751 COPS or be ignored when received in COPR. 2753 Entities MUST list all supported parameter types in every subsequent 2754 SDP offer or answer associated with the session. If a parameter type 2755 is not listed, it is an indication that the offerer or answerer is no 2756 longer willing to receive such messages within the session. 2758 9.3. Declarative Usage 2760 Declarative use of the CCM "cop" does not differ from the Offer/ 2761 Answer usage. 2763 10. Codec Sub-Stream Identification 2765 The defined mechanism is not bound to a specific codec. It uses the 2766 main characteristics of a chosen set of media types, including audio 2767 and video. To what extent this mechanism can be applied depends on 2768 which specific codec is used. 2770 When using a codec that can produce separate sub-streams within a 2771 single SSRC, those sub-streams can only be referred with a COP OPID 2772 if there is a defined relation to the codec-specific sub-stream 2773 identification. This is accomplished in this specification by 2774 defining an ID parameter format using codec-specific sub-stream 2775 identification for each such codec. 2777 If such sub-streams have dependencies, the OPID describes the 2778 characteristics of the sub-stream including all it's dependencies, 2779 but excluding any sub-streams that are dependent on this sub-stream. 2780 The sub-stream identification describes a single, payload specific 2781 node in a dependency tree, and does in general not include any 2782 identification of the sub-streams it depends on, or the dependency 2783 structure between sub-streams. Any dependency structure must thus be 2784 described by the media stream payload format and is out of scope for 2785 this specification. 2787 This section contains ID parameter format definitions for a few 2788 selected codecs. The format definitions MUST use an integer number 2789 of bytes and MUST define all bits in those bytes. Note, the ID 2790 parameter is interpreted in the context of a given SSRC and a 2791 specific RTP payload type. 2793 Extensions to this specification MAY add more codec-specific 2794 definitions than the ones described in the sub-sections below. Such 2795 definitions made in extensions to this specification SHOULD be 2796 considered as an integrated part of this section, with respect to 2797 usage with other mechanisms defined in this specification. 2799 10.1. H.264 AVC 2801 Some non-scalable video codecs such as H.264 AVC [H264] and 2802 corresponding RTP payload format [RFC6184] can accomplish 2803 simultaneous encoding of multiple operation points. H.264 AVC can 2804 encode a video stream using limited-reference and non-reference 2805 frames such that it enables limited temporal scalability, by use of 2806 the nal_ref_id syntax element. 2808 The ID parameter type is defined below: 2809 0 2810 0 1 2 3 4 5 6 7 2811 +-+-+-+-+-+-+-+-+ 2812 | Reserved | N | 2813 +-+-+-+-+-+-+-+-+ 2815 Figure 17: ID definition for AVC 2817 Reserved (6 bits): Reserved. SHALL be set to 0 by senders and SHALL 2818 be ignored by receivers implementing this specification. MAY be 2819 defined differently by extensions to this specification. 2821 N (2 bits): SHALL be identical to the highest value of the 2822 nal_ref_idc H.264 NAL header syntax element valid for the sub- 2823 bitstream described by this OPID, with the exception of 2824 nal_ref_idc value 3 that is valid for and is part of all sub- 2825 bitstreams. 2827 10.2. H.264 SVC 2829 This document specifies the usage of multiple, simultaneous codec 2830 operation points and therefore maps well to scalable video coding. 2831 Scalable video coding such as H.264 SVC (Annex G) [H264] uses three 2832 scalability dimensions: temporal, spatial, and quality. It also 2833 includes the possibility to use redundant encodings and priority 2834 among sub-streams. 2836 The ID SHALL be considered describing an SVC sub-bitstream, which is 2837 defined in G.3.59 of H.264 [H264] and corresponding RTP payload 2838 format [RFC6190]. For use with H.264 SVC, ID SHALL be constructed as 2839 defined below: 2840 0 1 2 2841 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 2842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2843 |R| PID | RPC | DID | QID | TID | 2844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2846 Figure 18: ID definition for SVC 2848 R (1 bit): Reserved. SHALL be set to 0 by senders and SHALL be 2849 ignored by receivers implementing this specification. MAY be 2850 defined differently by extensions to this specification. 2852 PID (6 bits): SHALL be identical to an unsigned binary integer 2853 representation of the priority_id H.264 syntax element valid for 2854 the sub-bitstream described by this OPID. SHALL be set to 0 if no 2855 priority_id is available. 2857 RPC (7 bits): SHALL be identical to an unsigned binary integer 2858 representation of the redundant_pic_cnt H.264 syntax element valid 2859 for the sub-bitstream described by this OPID. SHALL be set to 0 2860 if no redundant_pic_cnt is available. 2862 DID (3 bits): SHALL be identical to the dependency_id H.264 syntax 2863 element valid for the sub-bitstream described by this OPID. 2865 QID (4 bits): SHALL be identical to the quality_id H.264 syntax 2866 element valid for the sub-bitstream described by this OPID. 2868 TID (3 bits): SHALL be identical to the temporal_id H.264 syntax 2869 element valid for the sub-bitstream described by this OPID 2871 11. Examples 2873 COP messages are binary encoded. However, in the following examples, 2874 all COP messages are for clarity listed in symbolic, pseudo-code 2875 form, where only COP message fields of interest to the example are 2876 included, along with the COP parameters. 2878 11.1. SDP Offer/Answer 2880 The SDP capabilities for COP are defined as receiver capabilities, 2881 meaning that there is no explicit indication what COP messages an 2882 endpoint will use in the send direction. It is however reasonable to 2883 expect that an endpoint can also send the same messages that it can 2884 understand and act on when received. This is assumed in all the SDP 2885 examples below, but note that symmetric COP capabilities is not a 2886 requirement. 2888 The example below shows an SDP Offer, where support of CCM "cop" 2889 message is announced for the video codecs. 2890 v=0 2891 o=alice 2890844526 2890844526 IN IP4 host.atlanta.example 2892 s=- 2893 c=IN IP4 host.atlanta.example 2894 t=0 0 2895 m=audio 50000 RTP/AVP 0 8 97 2896 b=AS:80 2897 a=rtpmap:0 PCMU/8000 2898 a=rtpmap:8 PCMA/8000 2899 a=rtpmap:97 iLBC/8000 2900 m=video 50010 RTP/AVPF 31 32 2901 b=AS:600 2902 a=rtpmap:31 H261/90000 2903 a=rtpmap:32 MPV/90000 2904 a=rtcp-fb:31 ccm cop framerate bitrate token-rate 2905 a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ 2906 token-rate 2908 Figure 19: SDP offer (COP support indicated) 2910 Note that the offer contains two different video payload types, and 2911 that the COP parameters differ between them, meaning that the 2912 possibility for codec configuration also differ. In this case, the 2913 MPEG-1 codec can control both framerate and image size, but for H.261 2914 only the framerate can be controlled. 2916 In the SDP Answer below, responding to the above offer, the answerer 2917 supports CCM "cop" messages. 2918 v=0 2919 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example 2920 s=- 2921 c=IN IP4 host.biloxi.example 2922 t=0 0 2923 m=audio 52000 RTP/AVP 0 2924 b=AS:80 2925 a=rtpmap:0 PCMU/8000 2926 m=video 52100 RTP/AVPF 32 2927 b=AS:600 2928 a=rtpmap:32 MPV/90000 2929 a=rtcp-fb:32 ccm cop hor-size ver-size framerate bitrate \ 2930 token-rate packet-size 2932 Figure 20: SDP answer (COP support indicated) 2934 Note that the answerer indicates support for more parameter types 2935 than the offerer. 2937 Below is another SDP Answer, also responding to the same offer above, 2938 where the answerer does not support "cop". 2939 v=0 2940 o=bob 2808844564 2808844564 IN IP4 host.biloxi.example 2941 s=- 2942 c=IN IP4 host.biloxi.example 2943 t=0 0 2944 m=audio 52000 RTP/AVP 0 2945 b=AS:80 2946 a=rtpmap:0 PCMU/8000 2947 m=video 52100 RTP/AVPF 32 2948 b=AS:600 2949 a=rtpmap:32 MPV/90000 2951 Figure 21: SDP answer (COP support not indicated) 2953 11.2. Dynamic Video Re-sizing 2955 In this example, two COP-enabled endpoints communicate in an audio/ 2956 video session. The receiving endpoint has a graphical user interface 2957 that can be dynamically changed by the user. This user interaction 2958 includes the ability to change the size of the receiving video 2959 window, which is also indicated in the previous SDP example 2960 (Section 11.1). 2962 At some point during the established communication, a notification 2963 about current video stream codec operation point is sent to the 2964 resizable window endpoint that receives the video stream. 2965 COPN {SSRC:123456, OPID:123, Version:5, 2966 bitrate(max):325000, 2967 token-bucket(exact):1000, 2968 framerate(exact):15, 2969 hor-size(exact):320, 2970 ver-size(exact):240} 2972 Figure 22: COPN for QVGA 15 Hz 2974 Some time later the user of the resizable window endpoint reduces the 2975 size of the video window. As a result of the resize operation, the 2976 video window can no longer make full use of the received video 2977 resolution, wasting bandwidth and decoder processing resources. The 2978 resizable window endpoint thus decides to notify the video stream 2979 sender about the changed conditions by sending a request for a video 2980 stream of smaller size: 2982 COPR {SSRC:123456, OPID:123, Version:5, 2983 hor-size(target):243, 2984 ver-size(target):185} 2986 Figure 23: COPR for 243x185 2988 The COPR refers to the previously received COPN with the same OPID 2989 and Version, and thus need only list parameters that need be changed. 2990 The request could arguably contain also other parameters that are 2991 potentially affected by the spatial resolution, such as the bitrate, 2992 but that can be omitted since the media sender is not slaved to the 2993 request but is allowed to make it's own decisions based on the 2994 request. 2996 The request sender has chosen to use target type values instead of an 2997 exact value for the horizontal and vertical sizes, which can be 2998 interpreted as "anything sufficiently similar is acceptable". The 2999 target values is in this example chosen to correspond exactly to the 3000 resized video display area. Many video coding algorithms operate 3001 most efficiently when the image size is some even multiple, and this 3002 way of expressing the request explicitly leaves room for the media 3003 sender to take such aspect into account. 3005 The media sender (COPR receiver) responds with the following: 3006 COPS {SSRC:123456, OPID:123, Version:5, 3007 Partial Success, 3008 One or more parameter values in the request were changed} 3010 COPN {SSRC:123456, OPID:123, Version:6, 3011 bitrate(max):240000, 3012 token-bucket(exact):1000, 3013 framerate(exact):15, 3014 hor-size(exact):240, 3015 ver-size(exact):176} 3017 Figure 24: COPS and COPN for partial success 3019 It can be noted that the updated COPN (version 6) indicates that the 3020 media sender has, in addition to reducing the video horizontal and 3021 vertical size, chosen to also reduce the bitrate. This bitrate 3022 reduction was not in the request, but is a reasonable decision taken 3023 by the media sender. It can also be seen that the horizontal and 3024 vertical sizes are not chosen identical to the request, but is in 3025 fact adjusted to be even multiples of 16, which is a local 3026 restriction of the fictitious video encoder in this example. To 3027 handle the mismatch of the request and the resulting video stream, 3028 the video receiver can perform some local action such as for example 3029 automatic readjustment of the resized window, image scaling (possibly 3030 combined with cropping), or padding. 3032 11.3. Illegal Request 3034 In this example, the sent request is asking the media sender to go 3035 beyond what is negotiated in the SDP. The SDP Offer below indicates 3036 to use video with H.264 Constrained Baseline Profile at level 1.1. 3037 v=0 3038 o=alice 2893746526 2893746526 IN IP4 host.atlanta.example 3039 s=- 3040 c=IN IP4 host.atlanta.example 3041 t=0 0 3042 m=audio 49160 RTP/AVP 96 3043 b=AS:80 3044 a=rtpmap:96 G722/16000 3045 m=video 51920 RTP/AVPF 97 3046 b=AS:200 3047 a=rtpmap:97 H264/90000 3048 a=fmtp:97 profile-level-id=42e00b 3049 a=rtcp-fb:97 ccm cop framerate bitrate token-rate 3051 Figure 25: SDP offer with H.264 level 1.1 3053 Assuming this offer is accepted and that the answerer also supports 3054 COP, further assume that this COP message exchange occurs at some 3055 time during the established communication: 3057 Media Sender Media Receiver 3058 ------------ -------------- 3060 COPN {SSRC:9876, OPID:67, -> 3061 Version:2, 3062 bitrate(exact):190000, 3063 token-bucket(exact):500, 3064 framerate(exact):10, 3065 hor-size(exact):320, 3066 ver-size(exact):240} 3068 <- COPR {SSRC:9876, OPID:67, 3069 Version:2, 3070 framerate(exact):10, 3071 hor-size(exact):352, 3072 ver-size(exact):288} 3074 COPS {SSRC:9876, OPID:67, -> 3075 Version:2, 3076 Failure, 3077 Request violates capability limits} 3079 Figure 26: COP message exchange indicating failure 3081 The failure above is due to a combination of frame size and frame 3082 rate that exceeds H.264 level 1.1, which would thus exceed the limits 3083 established by SDP Offer/Answer. The maximum permitted framerate for 3084 352x288 pixels (CIF) is 7.6 Hz for H.264 level 1.1, as defined in 3085 Annex A of [H264]. 3087 11.4. Reference Response to Modification of Scalable Layer 3089 When scalable coding is used, each layer correspond to a codec 3090 operation point. A media receiver can thus target a request towards 3091 a single layer. Assume a video encoding with three framerate layers, 3092 announced in a (multiple operation point) notification as: 3094 COPN {SSRC:9876, OPID:67, Version:2, ID:2 3095 bitrate(exact):190000, 3096 token-bucket(exact):500, 3097 framerate(exact):10, 3098 hor-size(exact):320, 3099 ver-size(exact):240} 3101 COPN {SSRC:9876, OPID:73, Version:1, 3102 bitrate(exact):350000, ID:1 3103 token-bucket(exact):600, 3104 framerate(exact):30, 3105 hor-size(exact):320, 3106 ver-size(exact):240} 3108 COPN {SSRC:9876, OPID:95, Version:5, ID:0 3109 bitrate(exact):400000, 3110 token-bucket(exact):800, 3111 framerate(exact):60, 3112 hor-size(exact):320, 3113 ver-size(exact):240} 3115 Figure 27: COPN indicating three framerate layers 3117 Assume further that the media receiver is not pleased with the low 3118 framerate of OPID 67, wanting to increase it from 10 Hz to 25-30 Hz. 3119 Note that the media receiver still wants to receive the other layers 3120 unchanged, not remove them, and thus has to explicitly indicate this 3121 by including them without parameters. 3122 COPR {SSRC:9876, OPID:67, Version:2, 3123 framerate(greater):25, 3124 framerate(less):30} 3126 COPR {SSRC:9876, OPID:73, Version:1} 3128 COPR {SSRC:9876, OPID:95, Version:5} 3130 Figure 28: COPR requesting to change one layer 3132 The media sender decides it cannot meet the request for OPID 67, but 3133 instead considers (an unmodified) OPID 73 (with ID 1) to be a 3134 sufficiently good match: 3136 COPS {SSRC:9876, OPID:67, Version:2, 3137 Partial Success, 3138 One or more parameter values in the request were changed, 3139 ID:1} 3141 (COPN for the other two OPIDs omitted here for brevity) 3143 COPN {OSSRC:9876, OPID:73, Version:1, ID:1 3144 bitrate(exact):350000, 3145 token-bucket(exact):600, 3146 framerate(exact):30, 3147 hor-size(exact):320, 3148 ver-size(exact):240} 3150 Figure 29: COPS and COPN with layer modification partial success 3152 The COPS indicates partial success and uses the ID number to refer 3153 another OPID, describing the best compromise that can currently be 3154 used to meet the request. COPS does not contain the referred OPID, 3155 but ID should be defined in a codec-specific way that makes it 3156 possible to identify the layer directly in the media stream. If the 3157 corresponding OPID is needed, for example to attempt another request 3158 targeting that, it can be found by searching the active set of COPN 3159 for matching ID values. 3161 11.5. Successful Request to Add Codec Operation Point 3163 In this example, the media receiver is receiving a non-scalable 3164 stream from a codec that can support scalability, and wishes to add a 3165 scalability layer. Assume the existing OPID from the media sender is 3166 announced as: 3167 COPN {SSRC:3492, OPID:4, Version:2, 3168 bitrate(exact):350000, 3169 token-bucket(exact):600, 3170 framerate(exact):30, 3171 hor-size(exact):320, 3172 ver-size(exact):240} 3174 Figure 30: COPN with single operation point 3176 The media receiver constructs a request for multiple streams by 3177 including multiple requests for different OPID. Since the new stream 3178 does not exist, it has no OPID from the media sender and the receiver 3179 chooses a random value as reference and indicates that it is a new, 3180 temporary OPID. The request for the new stream includes all 3181 parameters that the media receiver has an opinion on, and leaves the 3182 other parameters to be chosen by the media sender. In this case it 3183 is a request for identical frame size and doubled framerate. 3185 COPR {SSRC:3492, OPID:4, Version:2} 3187 COPR {SSRC:3492, OPID:237, New, Version:0, 3188 framerate(exact):60, 3189 hor-size(exact):320, 3190 ver-size(exact):240} 3192 Figure 31: COPR requesting to add operation point 3194 The media sender decides it can start layered encoding with the 3195 requested parameters. The status response to the new OPID contains a 3196 reference to an ID that is included as part of the matching, 3197 subsequent COPN. Note that since both the original and the new 3198 streams are now part of a scalable set, they must both be identified 3199 with ID parameters to be able to distinguish between them. The media 3200 sender has chosen an OPID for the new stream in the COPN, which need 3201 not be identical to the temporary one in the request, but the new 3202 stream can anyway be uniquely identified through the ID that is 3203 announced in both the COPS and COPN. 3205 Note that since the ID has a defined relation to the media sub-stream 3206 identification, decoding of that new sub-stream can start immediately 3207 after receiving the COPS. It may however not be possible to describe 3208 the new stream in COP parameter terms until the COPN is received 3209 (depending on COP parameter visibility directly in the media stream). 3210 COPS {SSRC:3492, OPID:4, Version:2, 3211 Success, Success, 3212 ID:1} 3214 COPS {SSRC:3492, OPID:237, New, Version:0, 3215 Success, Success, 3216 ID:0} 3218 COPN {SSRC:3492, OPID:4, Version:2, ID:1, 3219 bitrate(exact):350000, 3220 token-bucket(exact):600, 3221 framerate(exact):30, 3222 hor-size(exact):320, 3223 ver-size(exact):240} 3225 COPN {SSRC:3492, OPID:9, Version:0, ID:0, 3226 bitrate(exact):390000, 3227 token-bucket(exact):600, 3228 framerate(exact):60, 3229 hor-size(exact):320, 3230 ver-size(exact):240} 3232 Figure 32: COPS and COPN indicating operation point added 3234 12. IANA Considerations 3236 Following the guidelines in [RFC4566], in [RFC4585], and in 3237 [RFC3550], the IANA is requested to register: 3239 1. The 'cop' tag to be used with ccm under rtcp-fb AVPF attribute in 3240 SDP. 3242 2. The FMT number TBA1 to be allocated to the COP feedback message 3243 from this specification. 3245 3. A registry listing registered values for 'cop' message item type, 3246 with initial values from Table 1. 3248 4. A registry listing registered values and tag names for 'cop' 3249 parameter type, with initial values from Table 4. 3251 13. Security Considerations 3253 This document extends the CCM [RFC5104] and defines new messages, 3254 i.e. COPR, COPN and COPS. The exchange of these new messages MAY 3255 have some security implications, which need to be addressed by the 3256 user. Following are some important implications, 3258 1. Identity spoofing - An attacker can spoof him/herself as an 3259 authenticated user and can falsely control or indicate the codec 3260 parameters of any source transmission. In order to prevent this 3261 type of attack, a strong authentication and integrity protection 3262 mechanism is needed. 3264 2. Denial of Service (DoS) - An attacker can falsely set codec 3265 parameters for all the source streams which MAY result in Denial 3266 of Service (DoS). An Authentication protocol MAY save from this 3267 attack. 3269 3. Man-in-Middle Attack (MiMT) - The codec configuration and 3270 notification of changes of the RTP source is prone to a Man-in- 3271 Middle attack. The public key authentication May be used to 3272 prevent MiMT. 3274 14. Open Issues 3276 There is currently no defined way for a media receiver to indicate 3277 that it wants to release the restrictions it previously had on an 3278 operation point, if the media stream contains only a single operation 3279 point. 3281 15. Acknowledgements 3283 The authors would like to thank Prof. Dr.-Ing. Markus Kampmann at 3284 Fachhochschule Koblenz University of Applied Sciences and Prof. Dr.- 3285 Ing. Frank Hartung at Multimediatechnik, Audio- und Videotechnik at 3286 Fachhochschule Aachen for fruitful contributions and discussions 3287 during the initial stages of writing this specification. The authors 3288 would also like to thank Christer Holmberg for feedback on the 3289 specification. 3291 16. References 3293 16.1. Normative References 3295 [H241] ITU-T Recommendation H.241, "Extended video procedures and 3296 control signals for H.300 series terminals", May 2006. 3298 [H264] ITU-T Recommendation H.264, "Advanced video coding for 3299 generic audiovisual services", March 2010. 3301 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3302 Requirement Levels", BCP 14, RFC 2119, March 1997. 3304 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 3305 with Session Description Protocol (SDP)", RFC 3264, 3306 June 2002. 3308 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 3309 Jacobson, "RTP: A Transport Protocol for Real-Time 3310 Applications", STD 64, RFC 3550, July 2003. 3312 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 3313 Video Conferences with Minimal Control", STD 65, RFC 3551, 3314 July 2003. 3316 [RFC3890] Westerlund, M., "A Transport Independent Bandwidth 3317 Modifier for the Session Description Protocol (SDP)", 3318 RFC 3890, September 2004. 3320 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 3321 Description Protocol", RFC 4566, July 2006. 3323 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 3324 "Extended RTP Profile for Real-time Transport Control 3325 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 3326 July 2006. 3328 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 3329 "Codec Control Messages in the RTP Audio-Visual Profile 3330 with Feedback (AVPF)", RFC 5104, February 2008. 3332 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 3333 Specifications: ABNF", STD 68, RFC 5234, January 2008. 3335 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 3336 Payload Format for H.264 Video", RFC 6184, May 2011. 3338 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 3339 "RTP Payload Format for Scalable Video Coding", RFC 6190, 3340 May 2011. 3342 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 3343 Attributes in the Session Description Protocol (SDP)", 3344 RFC 6236, May 2011. 3346 16.2. Informative References 3348 [I-D.ietf-avtext-multiple-clock-rates] 3349 Petit-Huguenin, M., "Support for multiple clock rates in 3350 an RTP session", draft-ietf-avtext-multiple-clock-rates-02 3351 (work in progress), January 2012. 3353 [I-D.westerlund-avtext-rtp-stream-pause] 3354 Akram, A., Burman, B., Grondal, D., and M. Westerlund, 3355 "RTP Media Stream Pause and Resume", 3356 draft-westerlund-avtext-rtp-stream-pause-02 (work in 3357 progress), July 2012. 3359 [I-D.westerlund-mmusic-sdp-bw-attribute] 3360 Frankkila, T., Westerlund, M., and B. Burman, "Extensible 3361 Bandwidth Attribute for SDP", 3362 draft-westerlund-mmusic-sdp-bw-attribute-00 (work in 3363 progress), October 2011. 3365 [RFC2212] Shenker, S., Partridge, C., and R. Guerin, "Specification 3366 of Guaranteed Quality of Service", RFC 2212, 3367 September 1997. 3369 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 3370 A., Peterson, J., Sparks, R., Handley, M., and E. 3371 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 3372 June 2002. 3374 [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control 3375 Protocol Extended Reports (RTCP XR)", RFC 3611, 3376 November 2003. 3378 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 3379 Conversation", RFC 4103, June 2005. 3381 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 3382 IP", RFC 4607, August 2006. 3384 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 3385 January 2008. 3387 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 3388 Protocol (RTCP) Extensions for Single-Source Multicast 3389 Sessions with Unicast Feedback", RFC 5760, February 2010. 3391 [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP 3392 Control Protocol (RTCP)", RFC 5968, September 2010. 3394 Authors' Addresses 3396 Magnus Westerlund 3397 Ericsson 3398 Farogatan 6 3399 SE-164 80 Kista 3400 Sweden 3402 Phone: +46 10 714 82 87 3403 Email: magnus.westerlund@ericsson.com 3405 Bo Burman 3406 Ericsson 3407 Farogatan 6 3408 SE-164 80 Kista 3409 Sweden 3411 Phone: +46 10 714 13 11 3412 Email: bo.burman@ericsson.com 3413 Laurits Hamm 3414 Ericsson 3415 Ericsson Allee 1 3416 DE-52134 Herzogenrath 3417 Germany 3419 Phone: +49 2407 575 6779 3420 Email: laurits.hamm@ericsson.com