idnits 2.17.1 draft-ietf-avt-rtp-svc-27.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([I-D.ietf-avt-rtp-RFC3984bis]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1275 has weird spacing: '... yes yes...' == Line 1276 has weird spacing: '... no no ...' == Line 1277 has weird spacing: '... yes yes...' == Line 1279 has weird spacing: '... no no ...' == Line 1280 has weird spacing: '... no no ...' -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 1, 2011) is 4830 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 2550 -- Looks like a reference, but probably isn't: '3' on line 2550 -- Looks like a reference, but probably isn't: '8' on line 2550 -- Looks like a reference, but probably isn't: '4' on line 2550 -- Looks like a reference, but probably isn't: '2' on line 2550 -- Looks like a reference, but probably isn't: '6' on line 2550 -- Looks like a reference, but probably isn't: '5' on line 2550 -- Looks like a reference, but probably isn't: '7' on line 2550 -- Looks like a reference, but probably isn't: '12' on line 2550 -- Looks like a reference, but probably isn't: '10' on line 2550 ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 3 errors (**), 0 flaws (~~), 6 warnings (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG S. Wenger 2 Independent 3 Internet Draft Y.-K. Wang 4 Intended status: Standards track Huawei Technologies 5 Expires: August 2011 T. Schierl 6 Fraunhofer HHI 7 A. Eleftheriadis 8 Vidyo 9 February 1, 2011 11 RTP Payload Format for Scalable Video Coding 12 draft-ietf-avt-rtp-svc-27.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with 17 the provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on August 1, 2011. 37 Copyright and License Notice 39 Copyright (c) 2011 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with 47 respect to this document. Code Components extracted from this 48 document must include Simplified BSD License text as described in 49 Section 4.e of the Trust Legal Provisions and are provided without 50 warranty as described in the Simplified BSD License. 52 This document may contain material from IETF Documents or IETF 53 Contributions published or made publicly available before November 54 10, 2008. The person(s) controlling the copyright in some of this 55 material may not have granted the IETF Trust the right to allow 56 modifications of such material outside the IETF Standards Process. 57 Without obtaining an adequate license from the person(s) controlling 58 the copyright in such materials, this document may not be modified 59 outside the IETF Standards Process, and derivative works of it may 60 not be created outside the IETF Standards Process, except to format 61 it for publication as an RFC or to translate it into languages other 62 than English. 64 Abstract 66 This memo describes an RTP payload format for Scalable Video Coding 67 (SVC) as defined in Annex G of ITU-T Recommendation H.264, which is 68 technically identical to Amendment 3 of ISO/IEC International 69 Standard 14496-10. The RTP payload format allows for packetization 70 of one or more Network Abstraction Layer (NAL) units in each RTP 71 packet payload, as well as fragmentation of a NAL unit in multiple 72 RTP packets. Furthermore, it supports transmission of an SVC stream 73 over a single as well as multiple RTP sessions. The payload format 74 defines a new media subtype name "H264-SVC", but is still backwards 75 compatible to [I-D.ietf-avt-rtp-rfc3984bis] since the base layer, 76 when encapsulated in its own RTP stream, must use the H.264 media 77 subtype name ("H264") and the packetization method specified in [I- 78 D.ietf-avt-rtp-rfc3984bis]. The payload format has wide 79 applicability in videoconferencing, Internet video streaming, and 80 high bit-rate entertainment-quality video, among others. 82 Table of Contents 84 Status of this Memo...............................................1 85 Abstract..........................................................2 86 Table of Contents.................................................3 87 1 . Introduction..................................................5 88 1.1 . The SVC Codec............................................6 89 1.1.1 . Overview............................................6 90 1.1.2 . Parameter Sets......................................8 91 1.1.3 . NAL Unit Header.....................................9 92 1.2 . Overview of the Payload Format..........................12 93 1.2.1 Design Principles....................................12 94 1.2.2 Transmission Modes and Packetization Modes...........13 95 1.2.3 New Payload Structures...............................15 96 2 . Conventions..................................................16 97 3 . Definitions and Abbreviations................................16 98 3.1 Definitions...............................................16 99 3.1.1 Definitions from the SVC Specification...............17 100 3.1.2 Definitions Specific to This Memo....................19 101 3.2 Abbreviations.............................................23 102 4 . RTP Payload Format...........................................23 103 4.1 RTP Header Usage..........................................23 104 4.2 NAL Unit Extension and Header Usage.......................24 105 4.2.1 NAL Unit Extension...................................24 106 4.2.2 NAL Unit Header Usage................................24 107 4.3 Payload Structures........................................26 108 4.4 Transmission Modes........................................28 109 4.5 Packetization Modes.......................................29 110 4.5.1 Packetization Modes for Single-Session Transmission..29 111 4.5.2 Packetization Modes for Multi-Session Transmission...30 112 4.6 Single NAL Unit Packets...................................33 113 4.7 Aggregation Packets.......................................33 114 4.7.1 Non-Interleaved Multi-Time Aggregation Packets (NI- 115 MTAPs).....................................................34 116 4.8 Fragmentation Units (FUs).................................36 117 4.9 Payload Content Scalability Information (PACSI) NAL Unit..36 118 4.10 Empty NAL unit...........................................44 119 4.11 Decoding Order Number (DON)..............................45 120 4.11.1 Cross-Session DON (CS-DON) for Multi-Session 121 Transmission...............................................45 122 5 . Packetization Rules..........................................47 123 5.1 Packetization Rules for Single-Session Transmission.......47 124 5.2 Packetization Rules for Multi-Session Transmission........48 125 5.2.1 NI-T/NI-TC Packetization Rules.......................48 126 5.2.2 NI-C/NI-TC Packetization Rules.......................51 127 5.2.3 I-C Packetization Rules..............................52 128 5.2.4 Packetization Rules for Non-VCL NAL Units............52 129 5.2.5 Packetization Rules for Prefix NAL Units.............53 130 6 . De-Packetization Process.....................................53 131 6.1 De-Packetization Process for Single-Session Transmission..53 132 6.2 De-Packetization Process for Multi-Session Transmission...53 133 6.2.1 Decoding Order Recovery for the NI-T and NI-TC Modes.54 134 6.2.1.1 Informative Algorithm for NI-T Decoding Order 135 Recovery within an Access Unit..........................57 136 6.2.2 Decoding Order Recovery for the NI-C, NI-TC and I-C 137 Modes......................................................60 138 7 . Payload Format Parameters....................................62 139 7.1 Media Type Registration...................................62 140 7.2 SDP Parameters............................................78 141 7.2.1 Mapping of Payload Type Parameters to SDP............78 142 7.2.2 Usage with the SDP Offer/Answer Model................79 143 7.2.3 Dependency Signaling in Multi-Session Transmission...88 144 7.2.4 Usage in Declarative Session Descriptions............89 145 7.3 Examples..................................................90 146 7.3.1 Example for Offering a Single SVC Session............90 147 7.3.2 Example for Offering a Single SVC Session using 148 scalable-layer-id..........................................91 149 7.3.3 Example for Offering Multiple Sessions in MST........91 150 7.3.4 Example for Offering Multiple Sessions in MST including 151 operation with Answerer using scalable-layer-id............93 152 7.3.5 Example for Negotiating an SVC Stream with a Constrained 153 Base Layer in SST..........................................94 154 7.4 Parameter Set Considerations..............................95 155 8 . Security Considerations......................................95 156 9 . Congestion Control...........................................95 157 10 . IANA Consideration..........................................97 158 11 . Informative Appendix: Application Examples..................97 159 11.1 Introduction.............................................97 160 11.2 Layered Multicast........................................97 161 11.3 Streaming................................................98 162 11.4 Videoconferencing (Unicast to MANE, Unicast to Endpoints)99 163 11.5 Mobile TV (Multicast to MANE, Unicast to Endpoint)......100 164 12 . Acknowledgements...........................................101 165 13 . References.................................................102 166 13.1 Normative References....................................102 167 13.2 Informative References..................................103 168 14 . Authors' Addresses.........................................104 170 1. Introduction 172 This memo specifies an RTP [RFC3550] payload format for the Scalable 173 Video Coding (SVC) extension of the H.264/AVC video coding standard. 174 SVC is specified in Amendment 3 to ISO/IEC 14496 Part 10 175 [ISO/IEC 14496-10], and equivalently in Annex G of ITU-T Rec. H.264 176 [H.264]. In this memo, unless explicitly stated otherwise, 177 "H.264/AVC" refers to the specification of [H.264] excluding Annex G. 179 SVC covers the entire application range of H.264/AVC, from low 180 bitrate mobile applications, to High-Definition Television (HDTV) 181 broadcasting, and even Digital Cinema that requires nearly lossless 182 coding and hundreds of mega bits per second. The scalability 183 features that SVC adds to H.264/AVC enable several system-level 184 functionalities related to the ability of a system to adapt the 185 signal to different system conditions with no or minimal processing. 186 The adaptation relates both to the capabilities of potentially 187 heterogeneous receivers (differing in screen resolution, processing 188 speed, etc.), as well as differing or time-varying network 189 conditions. The adaptation can be performed at the source, the 190 destination, or in intermediate media-aware network elements (MANEs). 191 The payload format specified in this memo exposes these system-level 192 functionalities so that system designers can take direct advantage 193 of these features. 195 Informative note: Since SVC streams contain, by design, a sub- 196 stream that is compliant with H.264/AVC, it is trivial for a 197 MANE to filter the stream so that all SVC-specific information 198 is removed. This memo, in fact, defines a media type parameter 199 ("sprop-avc-ready", Section 7.2) that indicates whether or not 200 the stream can be converted to one compliant to [I-D.ietf-avt- 201 rtp-rfc3984bis] by eliminating RTP packets, and rewriting RTCP 202 to match the changes to the RTP packet stream as specified in 203 Section 7 of [RFC3550]. 205 This memo defines two basic modes for transmission of SVC data, 206 single session transmission (SST) and multi-session transmission 207 (MST). In SST, a single RTP session is used for the transmission of 208 all scalability layers comprising an SVC bitstream, whereas in MST 209 the scalability layers are transported on different RTP sessions. 210 In SST, packetization is a straightforward extension of [I-D.ietf- 211 avt-rtp-rfc3984bis]. For MST four different modes are defined in 212 this memo. They differ on whether or not they allow interleaving, 213 i.e., transmitting Network Abstraction Layer (NAL) units in an order 214 different than the decoding order, and by the technique used to 215 effect inter-session NAL unit decoding order recovery. Decoding 216 order recovery is performed using either inter-session timestamp 217 alignment [RFC3550] or Cross-Session Decoding Order Numbers (CS-DON). 218 One of the MST modes supports both decoding order recovery 219 techniques, so that receivers can select their preferred technique. 220 More details can be found in Section 1.2.2. 222 This memo further defines three new NAL unit types. The first type 223 is the Payload Content Scalability Information (PACSI) NAL unit, 224 which is used to provide an informative summary of the scalability 225 information of the data contained in an RTP packet, as well as 226 ancillary data (e.g., CS-DON values). The second and third new NAL 227 unit types are the Empty NAL unit and the Non-Interleaved Multi-time 228 Aggregation Packet (NI-MTAP) NAL unit. The Empty NAL unit is used to 229 ensure inter-session timestamp alignment required for decoding order 230 recovery in MST. The NI-MTAP is used as a new payload structure 231 allowing the grouping of NAL units of different time instances in 232 decoding order. More details about the new packet structures can be 233 found in Section 1.2.3. 235 This memo also defines the signaling support for SVC transport over 236 RTP, including a new media subtype name (H264-SVC). 238 A non-normative overview of the SVC codec and the payload is given 239 in the remainder of this section. 241 1.1. The SVC Codec 243 1.1.1. Overview 245 SVC defines a coded video representation in which a given bitstream 246 offers representations of the source material at different levels of 247 fidelity (hence the term "scalable"). Scalable video coding 248 bitstreams, or scalable bitstreams, are constructed in a pyramidal 249 fashion: the coding process creates bitstream components that 250 improve the fidelity of hierarchically lower components. 252 The fidelity dimensions offered by SVC are spatial (picture size), 253 quality (or Signal-to-Noise Ratio - SNR), as well as temporal 254 (pictures per second). Bitstream components associated with a given 255 level of spatial, quality, and temporal fidelity are identified 256 using corresponding parameters in the bitstream: dependency_id, 257 quality_id, and temporal_id (see also Section 1.1.3). The fidelity 258 identifiers have integer values, where higher values designate 259 components that are higher in the hierarchy. It is noted that SVC 260 offers significant flexibility in terms of how an encoder may choose 261 to structure the dependencies between the various components. 262 Decoding of a particular component requires the availability of all 263 the components it depends upon, either directly, or indirectly. An 264 operation point of an SVC bitstream consists of the bitstream 265 components required to be able to decode a particular dependency_id, 266 quality_id, and temporal_id combination. 268 The term "layer" is used in various contexts in this memo. For 269 example, in the terms "Video Coding Layer" and "Network Abstraction 270 Layer" it refers to conceptual organization levels. When referring 271 to bitstream syntax elements such as block layer or macroblock layer, 272 it refers to hierarchical bitstream structure levels. When used in 273 the context of bitstream scalability, e.g., "AVC base layer", it 274 refers to a level of representation fidelity of the source signal 275 with a specific set of NAL units included. The correct 276 interpretation is supported by providing the appropriate context. 278 SVC maintains the bitstream organization introduced in H.264/AVC. 279 Specifically, all bitstream components are encapsulated in Network 280 Abstraction Layer (NAL) units which are organized as Access Units 281 (AU). An AU is associated with a single sampling instance in time. 282 A subset of the NAL unit types correspond to the Video Coding Layer 283 (VCL), and contain the coded picture data associated with the source 284 content. Non-VCL NAL units carry ancillary data that may be 285 necessary for decoding (e.g., parameter sets as explained below), or 286 that facilitate certain system operations but are not needed by the 287 decoding process itself. Coded picture data at the various fidelity 288 dimensions are organized in slices. Within one AU, a coded picture 289 of an operation point consists of all the coded slices required for 290 decoding up to the particular combination of dependency_id and 291 quality_id values at the time instance corresponding to the AU. 293 It is noted that the concept of temporal scalability is already 294 present in H.264/AVC, as profiles defined in Annex A of [H.264] 295 already support it. Specifically, in H.264/AVC the concept of sub- 296 sequences has been introduced to allow optional use of temporal 297 layers through Supplemental Enhancement Information (SEI) messages. 298 SVC extends this approach by exposing the temporal scalability 299 information using the temporal_id parameter, alongside (and unified 300 with) the dependency_id and quality_id values that are used for 301 spatial and quality scalability, respectively. For coded picture 302 data defined in Annex G of [H.264] this is accomplished by using a 303 new type of NAL unit, namely coded slice in scalable extension NAL 304 unit (type 20), where the fidelity parameters are part of its header. 305 For coded picture data that follow H.264/AVC, and to ensure 306 compatibility with existing H.264/AVC decoders, another new type of 307 NAL unit, namely prefix NAL unit (type 14), has been defined to 308 carry this header information. SVC additionally specifies a third 309 new type of NAL unit, namely subset sequence parameter set NAL unit 310 (type 15), to contain sequence parameter set information for quality 311 and spatial enhancement layers. All these three newly specified NAL 312 unit types (14, 15 and 20) are among those reserved in H.264/AVC, 313 and are to be ignored by decoders conforming to one or more of the 314 profiles specified in Annex A of [H.264]. 316 Within an AU, the VCL NAL units associated with a given 317 dependency_id and quality_id are referred to as a "layer 318 representation". The layer representation corresponding to the 319 lowest values of dependency_id and quality_id (i.e., zero for both) 320 is compliant by design to H.264/AVC. The set of VCL and associated 321 non-VCL NAL units across all AUs in a bitstream associated with a 322 particular combination of values of dependency_id and quality_id, 323 and regardless of the value of temporal_id, is conceptually a 324 scalable layer. For backwards compatibility with H.264/AVC, it is 325 important to differentiate, however, whether or not SVC-specific NAL 326 units are present in a given bitstream or not. This is particularly 327 important for the lowest fidelity values in terms of dependency_id 328 and quality_id (zero for both), as the corresponding VCL data are 329 compliant to H.264/AVC, and may or may not be accompanied by 330 associated prefix NAL units. This memo therefore uses the term "AVC 331 base layer" to designate the layer that does not contain SVC- 332 specific NAL units, and "SVC base layer" to designate the same layer 333 but with the addition of the associated SVC prefix NAL units. Note 334 that the SVC specification uses the term "base layer" for what in 335 this memo will be referred to as "AVC base layer". Similarly, it is 336 also important to be able to differentiate, within a layer, the 337 temporal fidelity components it contains. This memo uses the term 338 "T0" to indicate, within a particular layer, the subset that 339 contains the NAL units associated with temporal_id equal to 0. 341 SNR scalability in SVC is offered in two different ways. In what is 342 called Coarse-Grained Scalability (CGS), scalability is provided by 343 including or excluding a complete layer when decoding a particular 344 bitstream. In contrast, in Medium-Grained Scalability (MGS), 345 scalability is provided by selectively omitting the decoding of 346 specific NAL units belonging to MGS layers. The selection of the 347 NAL units to omit can be based on fixed length fields present in the 348 NAL unit header (see also Sections 1.1.3 and 4.2). 350 1.1.2. Parameter Sets 352 SVC maintains the parameter sets concept in H.264/AVC and introduces 353 a new type of sequence parameter set, referred to as subset sequence 354 parameter set [H.264]. Subset sequence parameter sets have NAL unit 355 type equal to 15, which is different from the NAL unit type value (7) 356 of sequence parameter sets. VCL NAL units of NAL unit type 1 to 5 357 must only (indirectly) refer to sequence parameter sets, while VCL 358 NAL units of NAL unit type 20 must only (indirectly) refer to subset 359 sequence parameter sets. The references are indirect because VCL 360 NAL units refer to picture parameter sets (in their slice header), 361 which in turn refer to regular or subset sequence parameter sets. 362 Subset sequence parameter sets use a separate identifier value space 363 than sequence parameter sets. 365 In SVC, coded picture data from different layers may use the same or 366 different sequence and picture parameter sets. Let the variable 367 DQId be equal to dependency_id * 16 + quality_id. At any time 368 instant during the decoding process there is one active sequence 369 parameter set for the layer representation with the highest value of 370 DQId and one or more active layer SVC sequence parameter set(s) for 371 layer representations with lower values of DQId. The active 372 sequence parameter set or an active layer SVC sequence parameter set 373 remains unchanged throughout a coded video sequence in the scalable 374 layer in which the active sequence parameter set or active layer SVC 375 sequence parameter set is referred to. This means that the referred 376 sequence parameter set or subset sequence parameter set can only 377 change at IDR access units for any layer. At any time instant 378 during the decoding process there may be one active picture 379 parameter set (for the layer representation with the highest value 380 of DQId) and one or more active layer picture parameter set(s) (for 381 layer representations with lower values of DQId). The active 382 picture parameter set or an active layer picture parameter set 383 remains unchanged throughout a layer representation in which the 384 active picture parameter set or active layer picture parameter set 385 is referred to, but may change from one AU to the next. 387 1.1.3. NAL Unit Header 389 SVC extends the one-byte H.264/AVC NAL unit header by three 390 additional octets for NAL units of type 14 and 20. The header 391 indicates the type of the NAL unit, the (potential) presence of bit 392 errors or syntax violations in the NAL unit payload, information 393 regarding the relative importance of the NAL unit for the decoding 394 process, the layer identification information, and other fields as 395 discussed below. 397 The syntax and semantics of the NAL unit header are specified in 398 [H.264], but the essential properties of the NAL unit header are 399 summarized below for convenience. 401 The first byte of the NAL unit header has the following format (the 402 bit fields are the same as defined for the one-byte H.264/AVC NAL 403 unit header, while the semantics of some fields have changed 404 slightly, in a backwards compatible way): 406 +---------------+ 407 |0|1|2|3|4|5|6|7| 408 +-+-+-+-+-+-+-+-+ 409 |F|NRI| Type | 410 +---------------+ 412 The semantics of the components of the NAL unit type octet, as 413 specified in [H.264], are described briefly below. In addition to 414 the name and size of each field, the corresponding syntax element 415 name in [H.264] is also provided. 417 F: 1 bit 418 forbidden_zero_bit. H.264/AVC declares a value of 1 as a syntax 419 violation. 421 NRI: 2 bits 422 nal_ref_idc. A value of "00" (in binary form) indicates that the 423 content of the NAL unit is not used to reconstruct reference 424 pictures for future prediction. Such NAL units can be discarded 425 without risking the integrity of the reference pictures in the 426 same layer. A value greater than "00" indicates that the 427 decoding of the NAL unit is required to maintain the integrity of 428 reference pictures in the same layer, or that the NAL unit 429 contains parameter sets. 431 Type: 5 bits 432 nal_unit_type. This component specifies the NAL unit type as 433 defined in Table 7-1 of [H.264], and later within this memo. For 434 a reference of all currently defined NAL unit types and their 435 semantics, please refer to Section 7.4.1 in [H.264]. 437 In H.264/AVC, NAL unit types 14, 15 and 20 are reserved for 438 future extensions. SVC uses these three NAL unit types as 439 follows: NAL unit type 14 is used for prefix NAL unit, NAL unit 440 type 15 is used for subset sequence parameter set, and NAL unit 441 type 20 is used for coded slice in scalable extension (see 442 Section 7.4.1 in [H.264]). NAL unit types 14 and 20 indicate the 443 presence of three additional octets in the NAL unit header, as 444 shown below. 446 +---------------+---------------+---------------+ 447 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 448 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 449 |R|I| PRID |N| DID | QID | TID |U|D|O| RR| 450 +---------------+---------------+---------------+ 452 R: 1 bit 453 reserved_one_bit. Reserved bit for future extension. R must be 454 equal to 1. The value of R must be ignored by decoders. 456 I: 1 bit 457 idr_flag. This component specifies whether the layer 458 representation is an instantaneous decoding refresh (IDR) layer 459 representation (when equal to 1) or not (when equal to 0). 461 PRID: 6 bits 462 priority_id. This flag specifies a priority identifier for the 463 NAL unit. A lower value of PRID indicates a higher priority. 465 N: 1 bit 466 no_inter_layer_pred_flag. This flag specifies, when present in a 467 coded slice NAL unit, whether inter-layer prediction may be used 468 for decoding the coded slice (when equal to 1) or not (when equal 469 to 0). 471 DID: 3 bits 472 dependency_id. This component indicates the inter-layer coding 473 dependency level of a layer representation. At any access unit, 474 a layer representation with a given dependency_id may be used for 475 inter-layer prediction for coding of a layer representation with 476 a higher dependency_id, while a layer representation with a given 477 dependency_id shall not be used for inter-layer prediction for 478 coding of a layer representation with a lower dependency_id. 480 QID: 4 bits 481 quality_id. This component indicates the quality level of an MGS 482 layer representation. At any access unit and for identical 483 dependency_id values, a layer representation with quality_id 484 equal to ql uses a layer representation with quality_id equal to 485 ql-1 for inter-layer prediction. 487 TID: 3 bits 488 temporal_id. This component indicates the temporal level of a 489 layer representation. The temporal_id is associated with the 490 frame rate, with lower values of _temporal_id corresponding to 491 lower frame rates. A layer representation at a given temporal_id 492 typically depends on layer representations with lower temporal_id 493 values, but it never depends on layer representations with higher 494 temporal_id values. 496 U: 1 bit 497 use_ref_base_pic_flag. A value of 1 indicates that only 498 reference base pictures are used during the inter prediction 499 process. A value of 0 indicates that the reference base pictures 500 are not used during the inter prediction process. 502 D: 1 bit 503 discardable_flag. A value of 1 indicates that the current NAL 504 unit is not used for decoding NAL units with values of 505 dependency_id higher than the one of the current NAL unit, in the 506 current and all subsequent access units. Such NAL units can be 507 discarded without risking the integrity of layers with higher 508 dependency_id values. discardable_flag equal to 0 indicates that 509 the decoding of the NAL unit is required to maintain the 510 integrity of layers with higher dependency_id. 512 O: 1 bit 513 output_flag: Affects the decoded picture output process as 514 defined in Annex C of [H.264]. 516 RR: 2 bits 517 reserved_three_2bits. Reserved bits for future extension. RR 518 MUST be equal to "11" (in binary form). The value of RR must be 519 ignored by decoders. 521 This memo extends the semantics of F, NRI, I, PRID, DID, QID, TID, U, 522 and D per Annex G of [H.264] as described in Section 4.2. 524 1.2. Overview of the Payload Format 526 Similar to [I-D.ietf-avt-rtp-rfc3984bis], this payload format can 527 only be used to carry the raw NAL unit stream over RTP and not the 528 byte stream format specified in Annex B of [H.264]. 530 The design principles, transmission modes, packetization modes as 531 well as new payload structures are summarized in this section. It 532 is assumed that the reader is familiar with the terminology and 533 concepts defined in [I-D.ietf-avt-rtp-rfc3984bis]. 535 1.2.1 Design Principles 537 The following design principles have been observed for this payload 538 format: 540 o Backward compatibility with [I-D.ietf-avt-rtp-rfc3984bis] 541 wherever possible. 543 o The SVC base layer or any H.264/AVC compatible subset of the SVC 544 base layer, when transmitted in its own RTP stream, must be 545 encapsulated using [I-D.ietf-avt-rtp-rfc3984bis]. This ensures 546 that such an RTP stream can be understood by [I-D.ietf-avt-rtp- 547 rfc3984bis] receivers. 549 o Media-Aware Network Elements (MANEs) as defined in [I-D.ietf-avt- 550 rtp-rfc3984bis] are signaling-aware, rely on signaling 551 information, and have state. 553 o MANEs can aggregate multiple RTP streams, possibly from multiple 554 RTP sessions. 556 o MANEs can perform media-aware stream thinning (selective 557 elimination of packets or portions thereof). By using the 558 payload header information identifying layers within an RTP 559 session, MANEs are able to remove packets or portions thereof 560 from the incoming RTP packet stream. This implies rewriting the 561 RTP headers of the outgoing packet stream, and rewriting of RTCP 562 packets as specified in Section 7 of [RFC3550]. 564 1.2.2 Transmission Modes and Packetization Modes 566 This memo allows the packetization of SVC data for both single- 567 session transmission (SST) and multi-session transmission (MST). In 568 the case of SST all SVC data are carried in a single RTP session. 569 In the case of MST two or more RTP sessions are used to carry the 570 SVC data, in accordance with the MST-specific packetization modes 571 defined in this memo, which are based on the packetization modes 572 defined in [I-D.ietf-avt-rtp-rfc3984bis]. In MST, each RTP session 573 is associated with one RTP stream, which may carry one or more 574 layers. 576 The base layer is, by design, compatible to H.264/AVC. During 577 transmission, the associated prefix NAL units, which are introduced 578 by SVC and, when present, are ignored by H.264/AVC decoders, may be 579 encapsulated within the same RTP packet stream as the H.264/AVC VCL 580 NAL units, or in a different RTP packet stream (when MST is used). 581 For convenience, the term "AVC base layer" is used to refer to the 582 base layer without prefix NAL units, while the term "SVC base layer" 583 is used to refer to the base layer with prefix NAL units. 585 Furthermore, the base layer may have multiple temporal components 586 (i.e., supporting different frame rates). As a result, the lowest 587 temporal component ("T0") of the AVC or SVC base layer is used as 588 the starting point of the SVC bitstream hierarchy. 590 This memo allows encapsulating in a given RTP stream any of the 591 following three alternatives of layer combinations: 593 1. the T0 AVC base layer or the T0 SVC base layer only; 594 2. one or more enhancement layers only; 595 3. the T0 SVC base layer, and one or more enhancement layers. 597 SST should be used in point-to-point unicast applications and, in 598 general, whenever the potential benefit of using multiple RTP 599 sessions does not justify the added complexity. When SST is used the 600 layer combination cases 1 and 3 above can be used. When an 601 H.264/AVC compatible subset of the SVC base layer is transmitted 602 using SST, the packetization of [I-D.ietf-avt-rtp-rfc3984bis] must 603 be used, thus ensuring compatibility with [I-D.ietf-avt-rtp- 604 rfc3984bis] receivers. When, however, one or more SVC quality or 605 spatial enhancement layers are transmitted using SST, the 606 packetization defined in this memo must be used. In SST, any of the 607 three [I-D.ietf-avt-rtp-rfc3984bis] packetization modes, namely 608 Single NAL Unit Mode, Non-Interleaved Mode, and Interleaved Mode, 609 can be used. 611 MST should be used in a multicast session when different receivers 612 may request different layers of the scalable bitstream. An 613 operation point for an SVC bit stream, as defined in this memo, 614 corresponds to a set of layers that together conform to one of the 615 profiles defined in Annex A or G of [H.264] and, when decoded, offer 616 a representation of the original video at a certain fidelity. The 617 number of streams used in MST should be at least equal to the number 618 of operation points that may be requested by the receivers. 619 Depending on the application, this may result in each layer being 620 carried in its own RTP session, or in having multiple layers 621 encapsulated within one RTP session. 623 Informative note: Layered multicast is a term commonly used to 624 describe the application where multicast is used to transmit 625 layered or scalable data that has been encapsulated into more 626 than one RTP session. This application allows different 627 receivers in the multicast session to receive different 628 operation points of the scalable bitstream. Layered multicast, 629 among other application examples, is discussed in more detail 630 in Section 11.2. 632 When MST is used, any of the three layer combinations above can be 633 used for each of the sessions. When an H.264/AVC compatible subset 634 of the SVC base layer is transmitted in its own session in MST, the 635 packetization of [I-D.ietf-avt-rtp-rfc3984bis] must be used, such 636 that [I-D.ietf-avt-rtp-rfc3984bis] receivers can be part of the MST 637 and receive only this session. For MST, this memo defines four 638 different MST specific packetization modes, namely Non-Interleaved 639 Timestamp based Mode (NI-T), Non-Interleaved Cross-Layer Decoding 640 Order Number (CS-DON) based Mode (NI-C), Non-Interleaved Combined 641 Timestamp and CS-DON Mode (NI-TC), and Interleaved CS-DON based Mode 642 (I-C) (detailed in Section 4.5.2). The modes differ depending on 643 whether the SVC data are allowed to be interleaved, i.e., to be 644 transmitted in an order different than the intended decoding order, 645 and they also differ in the mechanisms provided in order to recover 646 the correct decoding order of the NAL units across the multiple RTP 647 sessions. These four MST modes re-use the packetization modes 648 introduced in [I-D.ietf-avt-rtp-rfc3984bis] for the packetization of 649 NAL units in each of their individual RTP sessions. 651 As the names of the MST packetization modes imply, the NI-T, NI-C 652 and NI-TC modes do not allow interleaved transmission, while the I-C 653 mode allows interleaved transmission. With any of the three non- 654 interleaved MST packetization modes, legacy [I-D.ietf-avt-rtp- 655 rfc3984bis] receivers with implementation of the Non-Interleaved 656 Mode specified in [I-D.ietf-avt-rtp-rfc3984bis] can join a multi- 657 session transmission of SVC, to receive the base RTP session 658 encapsulated according to [I-D.ietf-avt-rtp-rfc3984bis]. 660 1.2.3 New Payload Structures 662 [I-D.ietf-avt-rtp-rfc3984bis] specifies three basic payload 663 structures, namely Single NAL Unit Packet, Aggregation Packet, and 664 Fragmentation Unit. Depending on the basic payload structure, an 665 RTP packet may contain a NAL unit not aggregating other NAL units, 666 one or more NAL units aggregated in another NAL unit, or a fragment 667 of a NAL unit not aggregating other NAL units. Each NAL unit of a 668 type specified in [H.264] (i.e., 1 to 23, inclusive) may be carried 669 in its entirety in a single NAL unit packet, may be aggregated in an 670 aggregation packet, or may be fragmented and carried in a number of 671 fragmentation unit packets. To enable aggregation or fragmentation 672 of NAL units while still ensuring that the RTP packet payload is 673 only comprised of NAL units, [I-D.ietf-avt-rtp-rfc3984bis] 674 introduced six new NAL unit types (24-29) to be used as payload 675 structures, selected from the NAL unit types left unspecified in 676 [H.264]. 678 This memo reuses all the payload structures used in [I-D.ietf-avt- 679 rtp-rfc3984bis]. Furthermore, three new types of NAL units are 680 defined: namely Payload Content Scalability Information (PACSI) NAL 681 unit, Empty NAL unit, and Non-Interleaved Multi-Time Aggregation 682 Packet (NI-MTAP) (specified in Sections 4.9, 4.10, and 4.7.1, 683 respectively). 685 PACSI NAL units may be used for the following purposes: 687 o To enable MANEs to decide whether to forward, process or discard 688 aggregation packets, by checking in PACSI NAL units the 689 scalability information and other characteristics of the 690 aggregated NAL units, rather than looking into the aggregated NAL 691 units themselves, which are defined by the video coding 692 specification. 694 o To enable correct decoding order recovery in MST using the NI-C 695 or NI-TC mode, with the help of the CS-DON information included in 696 PACSI NAL units. 698 o To improve resilience to packet losses, e.g. by utilizing the 699 following data or information included in PACSI NAL units: 700 repeated Supplemental Enhancement Information (SEI) messages, 701 information regarding the start and end of layer representations, 702 and the indices to layer representations of the lowest temporal 703 subset. 705 Empty NAL units may be used to enable correct decoding order 706 recovery in MST using the NI-T or NI-TC mode. NI-MTAP NAL units may 707 be used to aggregate NAL units from multiple access units but 708 without interleaving. 710 2. Conventions 712 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 713 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 714 document are to be interpreted as described in BCP 14, RFC 2119 715 [RFC2119]. 717 This specification uses the notion of setting and clearing a bit 718 when bit fields are handled. Setting a bit is the same as assigning 719 that bit the value of 1 (On). Clearing a bit is the same as 720 assigning that bit the value of 0 (Off). 722 3. Definitions and Abbreviations 724 3.1 Definitions 726 This document uses the terms and definitions of [H.264]. Section 727 3.1.1 lists relevant definitions copied from [H.264] for convenience. 729 When there is discrepancy, the definitions in [H.264] take 730 precedence. Section 3.1.2 gives definitions specific to this memo. 731 Some of the definitions in Section 3.1.2 are also present in [I- 732 D.ietf-avt-rtp-rfc3984bis] and copied here with slight adaptations 733 as needed. 735 3.1.1 Definitions from the SVC Specification 737 access unit: A set of NAL units always containing exactly one 738 primary coded picture. In addition to the primary coded picture, 739 an access unit may also contain one or more redundant coded 740 pictures, one auxiliary coded picture, or other NAL units not 741 containing slices or slice data partitions of a coded picture. 742 The decoding of an access unit always results in a decoded 743 picture. 745 base layer: A bitstream subset that contains all the NAL units 746 with the nal_unit_type syntax element equal to 1 or 5 of the 747 bitstream and does not contain any NAL unit with the 748 nal_unit_type syntax element equal to 14, 15, or 20 and conforms 749 to one or more of the profiles specified in Annex A of [H.264]. 751 base quality layer representation: The layer representation of 752 the target dependency representation of an access unit that is 753 associated with the quality_id syntax element equal to 0. 755 coded video sequence: A sequence of access units that consists, 756 in decoding order, of an IDR access unit followed by zero or more 757 non-IDR access units including all subsequent access units up to 758 but not including any subsequent IDR access unit. 760 dependency representation: A subset of Video Coding Layer (VCL) 761 NAL units within an access unit that are associated with the same 762 value of the dependency_id syntax element, which is provided as 763 part of the NAL unit header or by an associated prefix NAL unit. 764 A dependency representation consists of one or more layer 765 representations. 767 IDR access unit: An access unit in which the primary coded 768 picture is an IDR picture. 770 IDR picture: Instantaneous Decoding Refresh picture. A coded 771 picture in which all slices of the target dependency 772 representation within the access unit are I or EI slices that 773 causes the decoding process to mark all reference pictures as 774 "unused for reference" immediately after decoding the IDR picture. 775 After the decoding of an IDR picture all following coded pictures 776 in decoding order can be decoded without inter prediction from 777 any picture decoded prior to the IDR picture. The first picture 778 of each coded video sequence is an IDR picture. 780 layer representation: A subset of VCL NAL units within an access 781 unit that are associated with the same values of the 782 dependency_id and quality_id syntax elements, which are provided 783 as part of the VCL NAL unit header or by an associated prefix NAL 784 unit. One or more layer representations represent a dependency 785 representation. 787 prefix NAL unit: A NAL unit with nal_unit_type equal to 14 that 788 immediately precedes in decoding order a NAL unit with 789 nal_unit_type equal to 1, 5, or 12. The NAL unit that 790 immediately succeeds in decoding order the prefix NAL unit is 791 referred to as the associated NAL unit. The prefix NAL unit 792 contains data associated with the associated NAL unit, which are 793 considered to be part of the associated NAL unit. 795 reference base picture: A reference picture that is obtained by 796 decoding a base quality layer representation with the nal_ref_idc 797 syntax element not equal to 0 and the store_ref_base_pic_flag 798 syntax element equal to 1 of an access unit and all layer 799 representations of the access unit that are referred to by inter- 800 layer prediction of the base quality layer representation. A 801 reference base picture is not an output of the decoding process, 802 but the samples of a reference base picture may be used for inter 803 prediction in the decoding process of subsequent pictures in 804 decoding order. Reference base picture is a collective term for 805 a reference base field or a reference base frame. 807 scalable bitstream: A bitstream with the property that one or 808 more bitstream subsets that are not identical to the scalable 809 bitstream form another bitstream that conforms to the SVC 810 specification[H.264]. 812 target dependency representation: The dependency representation 813 of an access unit that is associated with the largest value of 814 the dependency_id syntax element for all dependency 815 representations of the access unit. 817 target layer representation: The layer representation of the 818 target dependency representation of an access unit that is 819 associated with the largest value of the quality_id syntax 820 element for all layer representations of the target dependency 821 representation of the access unit. 823 3.1.2 Definitions Specific to This Memo 825 anchor layer representation: An anchor layer representation is 826 such a layer representation that, if decoding of the operation 827 point corresponding to the layer starts from the access unit 828 containing this layer representation, all the following layer 829 representations of the layer, in output order, can be correctly 830 decoded. The output order is defined in [H.264] as the order in 831 which decoded pictures are output from the decoded picture buffer 832 of the decoder. As H.264 does not specify the picture display 833 process, this more general term is used instead of display order. 834 An anchor layer representation is a random access point to the 835 layer the anchor layer representation belongs to. However, some 836 layer representations, succeeding an anchor layer representation 837 in decoding order but preceding the anchor layer representation 838 in output order, may refer to earlier layer representations for 839 inter prediction, and hence the decoding may be incorrect if 840 random access is performed at the anchor layer representation. 842 AVC base layer: The subset of the SVC base layer in which all 843 prefix NAL units (type 14) are removed. Note that this is 844 equivalent to the term "base layer" as defined in Annex G of 845 [H.264]. 847 base RTP session: When multi-session transmission is used, the 848 RTP session that carries the RTP stream containing the T0 AVC 849 base layer or the T0 SVC base layer, and zero or more enhancement 850 layers. This RTP session does not depend on any other RTP 851 session as indicated by mechanisms defined in Section 7.2.3. The 852 base RTP session may carry NAL units of NAL unit type equal to 14 853 and 15. 855 decoding order number (DON): A field in the payload structure or 856 a derived variable indicating NAL unit decoding order. Values of 857 DON are in the range of 0 to 65535, inclusive. After reaching 858 the maximum value, the value of DON wraps around to 0. Note that 859 this definition also exists in [I-D.ietf-avt-rtp-rfc3984bis] in 860 exactly the same form. 862 Empty NAL unit: A NAL unit with NAL unit type equal to 31 and 863 sub-type equal to 1. An Empty NAL unit consists of only the two- 864 byte NAL unit header with an empty payload. 866 enhancement RTP session: When multi-session transmission is used, 867 an RTP session that is not the base RTP session. An enhancement 868 RTP session typically contains an RTP stream that depends on at 869 least one other RTP session as indicated by mechanisms defined in 870 Section 7.2.3. A lower RTP session to an enhancement RTP session 871 is an RTP session which the enhancement RTP session depends on. 872 The lowest RTP session for a receiver is the RTP session that 873 does not depend on any other RTP session received by the receiver. 874 The highest RTP session for a receiver is the RTP session which 875 no other RTP session received by the receiver depends on. 877 cross-session decoding order number (CS-DON): A derived variable 878 indicating NAL unit decoding order number over all NAL units 879 within all the session-multiplexed RTP sessions that carry the 880 same SVC bitstream. 882 default level: The level indicated by the profile-level-id 883 parameter. In SDP Offer/Answer, the level is downgradable, i.e., 884 the answer may either use the default level or a lower level. 885 Note that this definition also exists in [I-D.ietf-avt-rtp- 886 rfc3984bis] in a slightly different form. 888 default sub-profile: The subset of coding tools, which may be all 889 coding tools of one profile or the common subset of coding tools 890 of more than one profile, indicated by the profile-level-id 891 parameter. In SDP Offer/Answer, the default sub-profile must be 892 used in a symmetric manner, i.e. the answer must either use the 893 same sub-profile as the offer or reject the offer. Note that 894 this definition also exists in [I-D.ietf-avt-rtp-rfc3984bis] in a 895 slightly different form. 897 enhancement layer: A layer in which at least one of the values of 898 dependency_id or quality_id is higher than 0, or a layer in which 899 none of the NAL units is associated with the value of temporal_id 900 equal to 0. An operation point constructed using the maximum 901 temporal_id, dependency_id, and quality_id values associated with 902 an enhancement layer may or may not conform to one or more of the 903 profiles specified in Annex A of [H.264]. 905 H.264/AVC compatible: The property of a bitstream subset of 906 conforming to one or more of the profiles specified in Annex A of 907 [H.264]. 909 intra layer representation: A layer representation that contains 910 only slices that use intra prediction, and hence do not refer to 911 any earlier layer representation in decoding order in the same 912 layer. Note that in SVC intra prediction includes intra-layer 913 intra prediction as well as inter-layer intra prediction. 915 layer: A bitstream subset in which all NAL units of type 1, 5, 12, 916 14, or 20 have the same values of dependency_id and quality_id, 917 either directly through their NAL unit header (for NAL units of 918 type 14 or 20) or through association to a prefix (type 14) NAL 919 unit (for NAL unit types 1, 5, or 12). A layer may contain NAL 920 units associated with more than one values of temporal_id. 922 media aware network element (MANE): A network element, such as a 923 middlebox or application layer gateway that is capable of parsing 924 certain aspects of the RTP payload headers or the RTP payload and 925 reacting to their contents. Note that this definition also 926 exists in [I-D.ietf-avt-rtp-rfc3984bis] in exactly the same form. 928 Informative note: The concept of a MANE goes beyond normal 929 routers or gateways in that a MANE has to be aware of the 930 signaling (e.g., to learn about the payload type mappings of 931 the media streams), and in that it has to be trusted when 932 working with SRTP. The advantage of using MANEs is that they 933 allow packets to be dropped according to the needs of the 934 media coding. For example, if a MANE has to drop packets due 935 to congestion on a certain link, it can identify and remove 936 those packets whose elimination produces the least adverse 937 effect on the user experience. After dropping packets, MANEs 938 must rewrite RTCP packets to match the changes to the RTP 939 packet stream as specified in Section 7 of [RFC3550]. 941 multi-session transmission: The transmission mode in which the 942 SVC stream is transmitted over multiple RTP sessions. Dependency 943 between RTP sessions MUST be signaled according to Section 7.2.3 944 of this memo. 946 NAL unit decoding order: A NAL unit order that conforms to the 947 constraints on NAL unit order given in Section G.7.4.1.2 in 948 [H.264]. Note that this definition also exists in [I-D.ietf-avt- 949 rtp-rfc3984bis] in a slightly different form. 951 NALU-time: The value that the RTP timestamp would have if the NAL 952 unit would be transported in its own RTP packet. Note that this 953 definition also exists in [I-D.ietf-avt-rtp-rfc3984bis] in 954 exactly the same form. 956 operation point: An operation point is identified by a set of 957 values of temporal_id, dependency_id, and quality_id. A 958 bitstream corresponding to an operation point can be constructed 959 by removing all NAL units associated with a higher value of 960 dependency_id, and all NAL units associated with the same value 961 of dependency_id but higher values of quality_id or temporal_id. 962 An operation point bitstream conforms to at least one of the 963 profiles defined in Annex A or Annex G of [H.264], and offers a 964 representation of the original video signal at a certain fidelity. 966 Informative Note: Additional NAL units may be removed (with 967 lower dependency_id or same dependency_id but lower 968 quality_id) if they are not required for decoding the 969 bitstream at the particular operation point. The resulting 970 bitstream, however, may no longer conform to any of the 971 profiles defined in Annex A or G of [H.264]. 973 operation point representation: The set of all NAL units of an 974 operation point within the same access unit. 976 RTP packet stream: A sequence of RTP packets with increasing 977 sequence numbers (except for wrap-around), identical PT and 978 identical SSRC (Synchronization Source), carried in one RTP 979 session. Within the scope of this memo, one RTP packet stream is 980 utilized to transport one or more layers. 982 single-session transmission: The transmission mode in which the 983 SVC bitstream is transmitted over a single RTP session. 985 SVC base layer: The layer that includes all NAL units associated 986 with dependency_id and quality_id values both equal to 0, 987 including prefix NAL units (NAL unit type 14). 989 SVC enhancement layer: A layer in which at least one of the 990 values of dependency_id or quality_id is higher than 0. An 991 operation point constructed using the maximum dependency_id and 992 quality_id values and any temporal_id value associated with an 993 SVC enhancement layer does not conform to any of the profiles 994 specified in Annex A of [H.264]. 996 SVC NAL unit: A NAL unit of NAL unit type 14, 15, or 20 as 997 specified in Annex G of [H.264]. 999 SVC NAL unit header: A four-byte header resulting from the 1000 addition of a three-byte SVC-specific header extension added in 1001 NAL unit types 14 and 20. 1003 SVC RTP session: Either the base RTP session or an enhancement 1004 RTP session. 1006 T0 AVC base layer: A subset of the AVC base layer constructed by 1007 removing all VCL NAL units associated with temporal_id values 1008 higher than 0 and non-VCL NAL units and SEI messages associated 1009 only with the VCL NAL units being removed. 1011 T0 SVC base layer: A subset of the SVC base layer constructed by 1012 removing all VCL NAL units associated with temporal_id values 1013 higher than 0 as well as prefix NAL units, non-VCL NAL units, and 1014 SEI messages associated only with the VCL NAL units being removed. 1016 transmission order: The order of packets in ascending RTP 1017 sequence number order (in modulo arithmetic). Within an 1018 aggregation packet, the NAL unit transmission order is the same 1019 as the order of appearance of NAL units in the packet. Note that 1020 this definition also exists in [I-D.ietf-avt-rtp-rfc3984bis] in 1021 exactly the same form. 1023 3.2 Abbreviations 1025 In addition to the abbreviations defined in [I-D.ietf-avt-rtp- 1026 rfc3984bis], the following abbreviations are used in this memo. 1028 CGS: Coarse-Grain Scalability 1029 CS-DON: Cross-Session Decoding Order Number 1030 MGS: Medium-Grain Scalability 1031 MST: Multi-Session Transmission 1032 PACSI: Payload Content Scalability Information 1033 SST: Single Session Transmission 1034 SNR: Signal-to-Noise Ratio 1035 SVC: Scalable Video Coding 1037 4. RTP Payload Format 1039 4.1 RTP Header Usage 1041 In addition to Section 5.1 of [I-D.ietf-avt-rtp-rfc3984bis] the 1042 following rules apply. 1044 o Setting of the M bit 1046 The M bit of an RTP packet for which the packet payload is an NI- 1047 MTAP MUST be equal to 1 if the last NAL unit, in decoding order, of 1048 the access unit associated with the RTP timestamp is contained in 1049 the packet. 1051 o Setting of the RTP timestamp: 1053 For an RTP packet for which the packet payload is an Empty NAL unit, 1054 the RTP timestamp must be set according to Section 4.10. 1056 For an RTP packet for which the packet payload is a PACSI NAL unit, 1057 the RTP timestamp MUST be equal to the NALU-time of the next non- 1058 PACSI NAL unit in transmission order. Recall that the NALU-time of a 1059 NAL unit in an MTAP is defined in [I-D.ietf-avt-rtp-rfc3984bis] as 1060 the value that the RTP timestamp would have if that NAL unit would 1061 be transported in its own RTP packet. 1063 o Setting of the SSRC: 1065 For both SST and MST, the SSRC values MUST be set according to [RFC 1066 3550]. 1068 4.2 NAL Unit Extension and Header Usage 1070 4.2.1 NAL Unit Extension 1072 This memo specifies a NAL unit extension mechanism to allow for 1073 introduction of new types of NAL units, beyond the three NAL unit 1074 types left undefined in [I-D.ietf-avt-rtp-rfc3984bis] (i.e., 0, 30 1075 and 31). The extension mechanism utilizes the NAL unit type value 1076 31 and is specified as follows. When the NAL unit type value is 1077 equal to 31, the one-byte NAL unit header consisting of the F, NRI 1078 and Type fields as specified in Section 1.1.3 is extended by one 1079 additional octet, which consists of a 5-bit field named Subtype and 1080 three 1-bit fields named J, K, and L, respectively. The additional 1081 octet is shown in the following figure. 1083 +---------------+ 1084 |0|1|2|3|4|5|6|7| 1085 +-+-+-+-+-+-+-+-+ 1086 | Subtype |J|K|L| 1087 +---------------+ 1089 The Subtype value determines the (extended) NAL unit type of this 1090 NAL unit. The interpretation of the fields J, K, and L depends on 1091 the Subtype. The semantics of the fields are as follows. 1093 When Subtype is equal to 1, the NAL unit is an Empty NAL unit as 1094 specified in Section 4.10. When Subtype is equal to 2, the NAL unit 1095 is an NI-MTAP NAL unit as specified in Section 4.7.1. All other 1096 values of Subtype (0, 3-31) are reserved for future extensions, and 1097 receivers MUST ignore the entire NAL unit when Subtype is equal to 1098 any of these reserved values. 1100 4.2.2 NAL Unit Header Usage 1102 The structure and semantics of the NAL unit header according to the 1103 H.264 specification [H.264] were introduced in Section 1.1.3. This 1104 section specifies the extended semantics of the NAL unit header 1105 fields F, NRI, I, PRID, DID, QID, TID, U, and D, according to this 1106 memo. When the Type field is equal to 31, the semantics of the 1107 fields in the extension NAL unit header were specified in Section 1108 4.2.1. 1110 The semantics of F specified in Section 5.3 of [I-D.ietf-avt-rtp- 1111 rfc3984bis] also apply in this memo. That is, a value of 0 for F 1112 indicates that the NAL unit type octet and payload should not 1113 contain bit errors or other syntax violations, whereas a value of 1 1114 for F indicates that the NAL unit type octet and payload may contain 1115 bit errors or other syntax violations. MANEs SHOULD set the F bit to 1116 indicate bit errors in the NAL unit. 1118 For NRI, for a bitstream conforming to one of the profiles defined 1119 in Annex A of [H.264] and transported using [I-D.ietf-avt-rtp- 1120 rfc3984bis], the semantics specified in Section 5.3 of [I-D.ietf- 1121 avt-rtp-rfc3984bis] apply, i.e., NRI also indicates the relative 1122 importance of NAL units. For a bitstream conforming to one of the 1123 profiles defined in Annex G of [H.264] and transported using this 1124 memo, in addition to the semantics specified in Annex G of [H.264], 1125 NRI also indicates the relative importance of NAL units within a 1126 layer. 1128 For I, in addition to the semantics specified in Annex G of [H.264], 1129 according to this memo, MANEs MAY use this information to protect 1130 NAL units with I equal to 1 better than NAL units with I equal to 0. 1131 MANEs MAY also utilize information of NAL units with I equal to 1 to 1132 decide when to forward more packets for an RTP packet stream. For 1133 example, when it is detected that spatial layer switching has 1134 happened such that the operation point has changed to a higher value 1135 of DID, MANEs MAY start to forward NAL units with the higher value 1136 of DID only after forwarding a NAL unit with I equal to 1 with the 1137 higher value of DID. 1139 Note that, in the context of this section, "protecting a NAL unit" 1140 means any RTP or network transport mechanism that could improve the 1141 probability of successful delivery of the packet conveying the NAL 1142 unit, including applying a QoS-enabled network, Forward Error 1143 Correction (FEC), retransmissions, and advanced scheduling behavior, 1144 whenever possible. 1146 For PRID, the semantics specified in Annex G of [H.264] apply. Note 1147 that MANEs implementing unequal error protection MAY use this 1148 information to protect NAL units with smaller PRID values better 1149 than those with larger PRID values, for example by including only 1150 the more important NAL units in an FEC protection mechanism. The 1151 importance for the decoding process decreases as the PRID value 1152 increases. 1154 For DID, QID, TID, in addition to the semantics specified in Annex G 1155 of [H.264], according to this memo, values of DID, QID, or TID 1156 indicate the relative importance in their respective dimension. A 1157 lower value of DID, QID, or TID indicates a higher importance if the 1158 other two components are identical. MANEs MAY use this information 1159 to protect more important NAL units better than less important NAL 1160 units. 1162 For U, in addition to the semantics specified in Annex G of [H.264], 1163 according to this memo, MANEs MAY use this information to protect 1164 NAL units with U equal to 1 better than NAL units with U equal to 0. 1166 For D, in addition to the semantics specified in Annex G of [H.264], 1167 according to this memo, MANEs MAY use this information to determine 1168 whether a given NAL unit is required for successfully decoding a 1169 certain Operation Point of the SVC bitstream, hence to decide 1170 whether to forward the NAL unit. 1172 4.3 Payload Structures 1174 The NAL unit structure is central to H.264/AVC, [I-D.ietf-avt-rtp- 1175 rfc3984bis], as well as SVC and this memo. In H.264/AVC and SVC, 1176 all coded bits for representing a video signal are encapsulated in 1177 NAL units. In [I-D.ietf-avt-rtp-rfc3984bis], each RTP packet 1178 payload is structured as a NAL unit, which contains one or a part of 1179 one NAL unit specified in H.264/AVC, or aggregates one or more NAL 1180 units specified in H.264/AVC. 1182 [I-D.ietf-avt-rtp-rfc3984bis] specifies three basic payload 1183 structures (in Section 5.2 of [I-D.ietf-avt-rtp-rfc3984bis]): Single 1184 NAL Unit Packet, Aggregation Packet, and Fragmentation Unit, and six 1185 new types (24 to 29) of NAL units. The value of the Type field of 1186 the RTP packet payload header (i.e., the first byte of the payload) 1187 may be equal to any value from 1 to 23 for a Single NAL Unit Packet, 1188 any value from 24 to 27 for an Aggregation Packet, and 28 or 29 for 1189 a Fragmentation Unit. 1191 In addition to the NAL unit types defined originally for H.264/AVC, 1192 SVC defines three new NAL unit types specifically for SVC: coded 1193 slice in scalable extension NAL units (type 20), prefix NAL units 1194 (type 14), and subset sequence parameter set NAL units (type 15), as 1195 described in Section 1.1. 1197 This memo further introduces three new types of NAL units, PACSI NAL 1198 unit (NAL unit type 30) as specified in Section 4.9, Empty NAL unit 1199 (type 31, subtype 1) as specified in Section 4.10, and NI-MTAP NAL 1200 unit (type 31, subtype 2) as specified in Section 4.7.1. 1202 The RTP packet payload structure in [I-D.ietf-avt-rtp-rfc3984bis] is 1203 maintained with slight extensions in this memo, as follows. Each 1204 RTP packet payload is still structured as a NAL unit, which contains 1205 one or a part of one NAL unit specified in H.264/AVC and SVC, or 1206 contains one PACSI NAL unit or one Empty NAL unit, or aggregates 1207 zero or more NAL units specified in H.264/AVC and SVC, zero or one 1208 PACSI NAL unit, and zero or more Empty NAL units. 1210 In this memo, one of the three basic payload structures, 1211 Fragmentation Unit, remains the same as in [I-D.ietf-avt-rtp- 1212 rfc3984bis], and the other two, Single NAL Unit Packet and 1213 Aggregation Packet, are extended as follows. The value of the Type 1214 field of the payload header may be equal to any value from 1 to 23, 1215 inclusive, and 30 to 31, inclusive, for a Single NAL Unit Packet, 1216 and any value from 24 to 27, inclusive, and 31, for an Aggregation 1217 Packet. When the Type field of the payload header is equal to 31 1218 and the Subtype field of the payload header is equal to 2, the 1219 packet is an Aggregation Packet (containing a NI-MTAP NAL unit). 1220 When the Type field of the payload header is equal to 31 and the 1221 Subtype field of the payload header is equal to 1, the packet is a 1222 Single NAL Unit Packet (containing an Empty NAL unit). 1224 Note that, in this memo, the length of the payload header varies 1225 depending on the value of the Type field in the first byte of the 1226 RTP packet payload. If the value is equal to 14, 20, or 30, the 1227 first four bytes of the packet payload form the payload header; 1228 otherwise if the value is equal to 31, the first two bytes of the 1229 payload form the payload header; otherwise, the payload header is 1230 the first byte of the packet payload. 1232 Table 1 lists the NAL unit types introduced in SVC and this memo and 1233 where they are described in this memo. Table 2 summarizes the basic 1234 payload structure types for all NAL unit types when they are 1235 directly used as RTP packet payloads according to this memo. Table 1236 3 summarizes the NAL unit types allowed to be aggregated (i.e., used 1237 as aggregation units in aggregation packets) or fragmented (i.e., 1238 carried in fragmentation units) according to this memo. 1240 Table 1. NAL unit types introduced in SVC and this memo 1242 Type Subtype NAL Unit Name Section Numbers 1243 ----------------------------------------------------------- 1244 14 - Prefix NAL unit 1.1 1245 15 - Subset sequence parameter set 1.1 1246 20 - Coded slice in scalable extension 1.1 1247 30 - PACSI NAL unit 4.9 1248 31 0 reserved 4.2.1 1249 31 1 Empty NAL unit 4.10 1250 31 2 NI-MTAP 4.7.1 1251 31 3-31 reserved 4.2.1 1253 Table 2. Basic payload structure types for all NAL unit 1254 types when they are directly used as RTP packet payloads 1256 Type Subtype Basic Payload Structure 1257 ------------------------------------------ 1258 0 - reserved 1259 1-23 - Single NAL Unit Packet 1260 24-27 - Aggregation Packet 1261 28-29 - Fragmentation Unit 1262 30 - Single NAL Unit Packet 1263 31 0 reserved 1264 31 1 Single NAL Unit Packet 1265 31 2 Aggregation Packet 1266 31 3-31 reserved 1268 Table 3. Summary of the NAL unit types allowed to be 1269 aggregated or fragmented (yes = allowed, no = disallowed, 1270 - = not applicable/not specified) 1272 Type Subtype STAP-A STAP-B MTAP16 MTAP24 FU-A FU-B NI-MTAP 1273 ------------------------------------------------------------- 1274 0 - - - - - - - - 1275 1-23 - yes yes yes yes yes yes yes 1276 24-29 - no no no no no no no 1277 30 - yes yes yes yes no no yes 1278 31 0 - - - - - - - 1279 31 1 yes no no no no no yes 1280 31 2 no no no no no no no 1281 31 3-31 - - - - - - - 1283 4.4 Transmission Modes 1285 This memo enables transmission of an SVC bitstream over one or more 1286 RTP sessions. If only one RTP session is used for transmission of 1287 the SVC bitstream, the transmission mode is referred to as Single- 1288 Session Transmission (SST); otherwise (more than one RTP session is 1289 used for transmission of the SVC bitstream), the transmission mode 1290 is referred to as Multi-Session Transmission (MST). 1292 SST SHOULD be used for point-to-point unicast scenarios, while MST 1293 SHOULD be used for point-to-multipoint multicast scenarios where 1294 different receivers requires different operation points of the same 1295 SVC bitstream, to improve bandwidth utilizing efficiency. 1297 If the OPTIONAL mst-mode media type parameter (see Section 7.1) is 1298 not present, SST MUST be used; otherwise (mst-mode is present), MST 1299 MUST be used. 1301 4.5 Packetization Modes 1303 4.5.1 Packetization Modes for Single-Session Transmission 1305 When SST is in use, Section 5.4 of [I-D.ietf-avt-rtp-rfc3984bis] 1306 applies with the following extensions. 1308 The packetization modes specified in Section 5.4 of [I-D.ietf-avt- 1309 rtp-rfc3984bis], namely Single NAL Unit Mode, Non-Interleaved Mode 1310 and Interleaved Mode, are also referred to as session packetization 1311 modes. Table 4 summarizes the allowed session packetization modes 1312 for SST. 1314 Table 4. Summary of allowed session packetization modes 1315 (denoted as "Session Mode" for simplicity) for SST (yes = 1316 allowed, no = disallowed) 1318 Session Mode Allowed 1319 ------------------------------------- 1320 Single NAL Unit Mode yes 1321 Non-Interleaved Mode yes 1322 Interleaved Mode yes 1324 For NAL unit types in the range of 0 to 29, inclusive, the NAL unit 1325 types allowed to be directly used as packet payloads for each 1326 session packetization mode are the same as specified in Section 5.4 1327 of [I-D.ietf-avt-rtp-rfc3984bis]. For other NAL unit types, which 1328 are newly introduced in this memo, the NAL unit types allowed to be 1329 directly used as packet payloads for each session packetization mode 1330 are summarized in Table 5. 1332 Table 5. New NAL unit types allowed to be directly used 1333 as packet payloads for each session packetization mode 1334 (yes = allowed, no = disallowed, - = not applicable/not 1335 specified) 1337 Type Subtype Single NAL Non-Interleaved Interleaved 1338 Unit Mode Mode Mode 1339 ------------------------------------------------------------- 1340 30 - yes no no 1341 31 0 - - - 1342 31 1 yes yes no 1343 31 2 no yes no 1344 31 3-31 - - - 1346 4.5.2 Packetization Modes for Multi-Session Transmission 1348 For MST, this memo specifies four MST packetization modes: 1350 o Non-interleaved timestamp based mode (NI-T); 1351 o Non-interleaved cross-session decoding order number (CS-DON) 1352 based mode (NI-C); 1353 o Non-interleaved combined timestamp and CS-DON mode (NI-TC); and 1354 o Interleaved CS-DON (I-C) mode. 1356 These four modes differ in two ways. First, they differ in terms of 1357 whether NAL units are required to be transmitted within each RTP 1358 session in decoding order (i.e., non-interleaved), or they are 1359 allowed to be transmitted in a different order (i.e., interleaved). 1360 Second, they differ in the mechanisms they provide in order to 1361 recover the correct decoding order of the NAL units across all RTP 1362 sessions involved. 1364 The NI-T, NI-C, and NI-TC modes do not allow interleaving, and are 1365 thus targeted for systems that require relatively low end-to-end 1366 latency, e.g., conversational systems. The I-C mode allows 1367 interleaving and is thus targeted for systems that do not require 1368 very low end-to-end latency. The benefits of interleaving are the 1369 same as that of the Interleaved Mode specified in [I-D.ietf-avt-rtp- 1370 rfc3984bis]. 1372 The NI-T mode uses timestamps to recover the decoding order of NAL 1373 units, whereas the NI-C and I-C modes both use the CS-DON mechanism 1374 (explained later on) to do so. The NI-TC mode provides both 1375 timestamps and the CS-DON method; receivers in this case may choose 1376 to use either method for performing decoding order recovery 1377 The MST packetization mode in use MUST be signaled by the value of 1378 the OPTIONAL mst-mode media type parameter. The used MST 1379 packetization mode governs which session packetization modes are 1380 allowed in the associated RTP sessions, which in turn govern which 1381 NAL unit types are allowed to be directly used as RTP packet 1382 payloads. 1384 Table 6 summarizes the allowed session packetization modes for NI-T, 1385 NI-C and NI-TC. Table 7 summarizes the allowed session 1386 packetization modes for I-C. 1388 Table 6. Summary of allowed session packetization modes 1389 (denoted as "Session Mode" for simplicity) for NI-T, NI-C 1390 and NI-TC (yes = allowed, no = disallowed) 1392 Session Mode Base Session Enhancement Session 1393 ----------------------------------------------------------- 1394 Single NAL Unit Mode yes no 1395 Non-Interleaved Mode yes yes 1396 Interleaved Mode no no 1398 Table 7. Summary of allowed session packetization modes 1399 (denoted as "Session Mode" for simplicity) for I-C (yes = 1400 allowed, no = disallowed) 1402 Session Mode Base Session Enhancement Session 1403 ----------------------------------------------------------- 1404 Single NAL Unit Mode no no 1405 Non-Interleaved Mode no no 1406 Interleaved Mode yes yes 1408 For NAL unit types in the range of 0 to 29, inclusive, the NAL unit 1409 types allowed to be directly used as packet payloads for each 1410 session packetization mode are the same as specified in Section 5.4 1411 of [I-D.ietf-avt-rtp-rfc3984bis]. For other NAL unit types, which 1412 are newly introduced in this memo, the NAL unit types allowed to be 1413 directly used as packet payloads for each allowed session 1414 packetization mode for NI-T, NI-C, NI-TC, and I-C are summarized in 1415 Tables 8, 9, 10, and 11, respectively. 1417 Table 8. New NAL unit types allowed to be directly used 1418 as packet payloads for each allowed session packetization 1419 mode when NI-T is in use (yes = allowed, no = disallowed, 1420 - = not applicable/not specified) 1422 Type Subtype Single NAL Non-Interleaved 1423 Unit Mode Mode 1424 --------------------------------------------------- 1425 30 - yes no 1426 31 0 - - 1427 31 1 yes yes 1428 31 2 no yes 1429 31 3-31 - - 1431 Table 9. New NAL unit types allowed to be directly used 1432 as packet payloads for each allowed session packetization 1433 mode when NI-C is in use (yes = allowed, no = disallowed, 1434 - = not applicable/not specified) 1436 Type Subtype Single NAL Non-Interleaved 1437 Unit Mode Mode 1438 --------------------------------------------------- 1439 30 - yes yes 1440 31 0 - - 1441 31 1 no no 1442 31 2 no yes 1443 31 3-31 - - 1445 Table 10. New NAL unit types allowed to be directly used 1446 as packet payloads for each allowed session packetization 1447 mode when NI-TC is in use (yes = allowed, no = disallowed, 1448 - = not applicable/not specified) 1450 Type Subtype Single NAL Non-Interleaved 1451 Unit Mode Mode 1452 --------------------------------------------------- 1453 30 - yes yes 1454 31 0 - - 1455 31 1 yes yes 1456 31 2 no yes 1457 31 3-31 - - 1458 Table 11. New NAL unit types allowed to be directly used 1459 as packet payloads for the allowed session packetization 1460 mode when I-C is in use (yes = allowed, no = disallowed, - 1461 = not applicable/not specified) 1463 Type Subtype Interleaved Mode 1464 ------------------------------------ 1465 30 - no 1466 31 0 - 1467 31 1 no 1468 31 2 no 1469 31 3-31 - 1471 When MST is in use and the MST packetization mode in use is NI-C, 1472 Empty NAL units (type 31, subtype 1) MUST NOT be used, i.e., no RTP 1473 packet is allowed to contain one or more Empty NAL units. 1475 When MST is in use and the MST packetization mode in use is I-C, 1476 both Empty NAL units (type 31, subtype 1) and NI-MTAP NAL units 1477 (type 31, subtype 2) MUST NOT be used, i.e., no RTP packet is 1478 allowed to contain one or more Empty NAL units or an NI-MTAP NAL 1479 unit. 1481 4.6 Single NAL Unit Packets 1483 Section 5.6 of [I-D.ietf-avt-rtp-rfc3984bis] applies with the 1484 following extensions. 1486 The payload of a Single NAL Unit Packet MAY be a PACSI NAL unit 1487 (Type 30) or an Empty NAL unit (Type 31 and Subtype 1), in addition 1488 to a NAL unit with NAL unit type equal to any value from 1 to 23, 1489 inclusive. 1491 If the Type field of the first byte of the payload is not equal to 1492 31, the payload header is the first byte of the payload. Otherwise 1493 (the Type field of the first byte of the payload is equal to 31), 1494 the payload header is the first two bytes of the payload. 1496 4.7 Aggregation Packets 1498 In addition to Section 5.7 of [I-D.ietf-avt-rtp-rfc3984bis], the 1499 following applies in this memo. 1501 4.7.1 Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs) 1503 One new NAL unit type introduced in this memo is the Non-Interleaved 1504 Multi-Time Aggregation packet (NI-MTAP). An NI-MTAP consists of one 1505 or more non-interleaved multi-time aggregation units. 1507 The NAL units contained in NI-MTAPs MUST be aggregated in decoding 1508 order. 1510 A non-interleaved multi-time aggregation unit for the NI-MTAP 1511 consists of 16 bits of unsigned size information of the following 1512 NAL unit (in network byte order), and 16 bits (in network byte order) 1513 of timestamp offset (TS offset) for the NAL unit. The structure is 1514 presented in Figure 1. The starting or ending position of an 1515 aggregation unit within a packet may or may not be on a 32-bit word 1516 boundary. The NAL units in the NI-MTAP are ordered in NAL unit 1517 decoding order. 1519 The Type field of the NI-MTAP MUST be set equal to "31". 1521 The F bit MUST be set to 0 if all the F bits of the aggregated NAL 1522 units are zero; otherwise, it MUST be set to 1. 1524 The value of NRI MUST be the maximum value of NRI across all NAL 1525 units carried in the NI-MTAP packet. 1527 The field Subtype MUST be equal to 2. 1529 If the field J is equal to 1 the optional DON field MUST be present 1530 for each of the non-interleaved multi-time aggregation units. For 1531 SST the J field MUST be equal to 0. For MST, in the NI-T mode the J 1532 field MUST be equal to 0, whereas in the NI-C or NI-TC mode the J 1533 field MUST be equal to 1. When the NI-C or NI-TC mode is in use, 1534 the DON field, when present, MUST represent the CS-DON value for the 1535 particular NAL unit as defined in Section 6.2.2. 1537 The fields K and L MUST be both equal to 0. 1539 0 1 2 3 1540 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1542 : NAL unit size | TS offset | 1543 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1544 | DON (optional) | | 1545 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NAL unit | 1546 | | 1547 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1548 | : 1549 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1551 Figure 1 Non-interleaved multi-time aggregation unit for 1552 NI-MTAP 1554 Let TS be the RTP timestamp of the packet carrying the NAL unit. 1555 Recall that the NALU-time of a NAL unit in an MTAP is defined in [I- 1556 D.ietf-avt-rtp-rfc3984bis] as the value that the RTP timestamp would 1557 have if that NAL unit would be transported in its own RTP packet. 1558 The timestamp offset field MUST be set to a value equal to the value 1559 of the following formula: 1561 if NALU-time >= TS, TS offset = NALU-time - TS 1562 else, TS offset = NALU-time + (2^32 - TS) 1564 For the "earliest" multi-time aggregation unit in an NI-MTAP the 1565 timestamp offset MUST be zero. Hence, the RTP timestamp of the NI- 1566 MTAP itself is identical to the earliest NALU-time. 1568 Informative note: The "earliest" multi-time aggregation unit is 1569 the one that would have the smallest extended RTP timestamp among 1570 all the aggregation units of an NI-MTAP if the aggregation units 1571 were encapsulated in single NAL unit packets. An extended 1572 timestamp is a timestamp that has more than 32 bits and is 1573 capable of counting the wraparound of the timestamp field, thus 1574 enabling one to determine the smallest value if the timestamp 1575 wraps. Such an "earliest" aggregation unit may or may not be the 1576 first one in the order in which the aggregation units are 1577 encapsulated in an NI-MTAP. The "earliest" NAL unit need not be 1578 the same as the first NAL unit in the NAL unit decoding order 1579 either. 1581 Figure 2 presents an example of an RTP packet that contains an NI- 1582 MTAP that contains two non-interleaved multi-time aggregation units, 1583 labeled as 1 and 2 in the figure. 1585 0 1 2 3 1586 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1588 | RTP Header | 1589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1590 |F|NRI| Type | Subtype |J|K|L| | 1591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1592 | | 1593 | Non-interleaved Multi-time aggregation unit #1 | 1594 : : 1595 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1596 | | Non-interleaved Multi-time | 1597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1598 | aggregation unit #2 | 1599 : : 1600 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1601 | :...OPTIONAL RTP padding | 1602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1604 Figure 2 An RTP packet including an NI-MTAP containing two 1605 non-interleaved multi-time aggregation units 1607 4.8 Fragmentation Units (FUs) 1609 Section 5.8 of [I-D.ietf-avt-rtp-rfc3984bis] applies. 1611 Informative note: In case a NAL unit with the four-byte SVC NAL 1612 unit header is fragmented, the three-byte SVC-specific header 1613 extension is considered as part of the NAL unit payload. That is, 1614 the three-byte SVC-specific header extension is only available in 1615 the first fragment of the fragmented NAL unit. 1617 4.9 Payload Content Scalability Information (PACSI) NAL Unit 1619 Another new type of NAL unit specified in this memo is the Payload 1620 Content Scalability Information (PACSI) NAL unit. The Type field of 1621 PACSI NAL units MUST be equal to 30 (a NAL unit type value left 1622 unspecified in [H.264] and [I-D.ietf-avt-rtp-rfc3984bis]). A PACSI 1623 NAL unit MAY be carried in a single NAL unit packet or an 1624 aggregation packet, and MUST NOT be fragmented. 1626 PACSI NAL units may be used for the following purposes: 1628 o To enable MANEs to decide whether to forward, process or discard 1629 aggregation packets, by checking in PACSI NAL units the 1630 scalability information and other characteristics of the 1631 aggregated NAL units, rather than looking into the aggregated NAL 1632 units themselves, which are defined by the video coding 1633 specification; 1634 o To enable correct decoding order recovery in MST using the NI-C 1635 or NI-TC mode, with the help of the CS-DON information included in 1636 PACSI NAL units; and 1637 o To improve resilience to packet losses, e.g. by utilizing the 1638 following data or information included in PACSI NAL units: 1639 repeated Supplemental Enhancement Information (SEI) messages, 1640 information regarding the start and end of layer representations, 1641 and the indices to layer representations of the lowest temporal 1642 subset. 1644 PACSI NAL units MAY be ignored in the NI-T mode without affecting 1645 the decoding order recovery process. 1647 When a PACSI NAL unit is present in an aggregation packet, the 1648 following applies. 1650 o The PACSI NAL unit MUST be the first aggregated NAL unit in the 1651 aggregation packet. 1653 o There MUST be at least one additional aggregated NAL unit in the 1654 aggregation packet. 1656 o The RTP header fields and the payload header fields of the 1657 aggregation packet are set as if the PACSI NAL unit was not 1658 included in the aggregation packet. 1660 o If the aggregation packet is an MTAP16, MTAP24, or NI-MTAP with 1661 the J field equal to 1, the decoding order number (DON) for the 1662 PACSI NAL unit MUST be set to indicate that the PACSI NAL unit 1663 has an identical DON to the first NAL unit in decoding order 1664 among the remaining NAL units in the aggregation packet. 1666 When a PACSI NAL unit is included in a single NAL unit packet, it is 1667 associated with the next non-PACSI NAL unit in transmission order, 1668 and the RTP header fields of the packet are set as if the next non- 1669 PACSI NAL unit in transmission order was included in a single NAL 1670 unit packet. 1672 The PACSI NAL unit structure is as follows. The first four octets 1673 are exactly the same as the four-byte SVC NAL unit header discussed 1674 in Section 1.1.3. They are followed by one octet containing several 1675 flags, then five optional octets, and finally zero or more SEI NAL 1676 units. Each SEI NAL unit is preceded by a 16-bit unsigned size 1677 field (in network byte order) that indicates the size of the 1678 following NAL unit in bytes (excluding these two octets, but 1679 including the NAL unit header octet of the SEI NAL unit). Figure 3 1680 illustrates the PACSI NAL unit structure and an example of a PACSI 1681 NAL unit containing two SEI NAL units. 1683 0 1 2 3 1684 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1686 |F|NRI| Type |R|I| PRID |N| DID | QID | TID |U|D|O| RR| 1687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1688 |X|Y|T|A|P|C|S|E| TL0PICIDX (o) | IDRPICID (o) | 1689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1690 | DONC (o) | NAL unit size 1 | 1691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1692 | | 1693 | SEI NAL unit 1 | 1694 | | 1695 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1696 | | NAL unit size 2 | | 1697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1698 | | 1699 | SEI NAL unit 2 | 1700 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1701 | | 1702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1704 Figure 3 PACSI NAL unit structure. Fields suffixed by 1705 "(o)" are OPTIONAL. 1707 The bits A, P, and C are specified only if the bit X is equal to 1. 1708 The bits S and E are specified, and the fields TL0PICIDX and 1709 IDRPICID are present, only if the bit Y is equal to 1. The field 1710 DONC is present only if the bit T is equal to 1. The field T MUST 1711 be equal to 0 if the PACSI NAL unit is contained in an STAP-B, 1712 MTAP16, MTAP24, or NI-MTAP with the J field equal to 1. 1714 The values of the fields in PACSI NAL unit MUST be set as follows. 1716 o The F bit MUST be set to 1 if the F bit in at least one of the 1717 remaining NAL units in the aggregation packet is equal to 1 (when 1718 the PACSI NAL unit is included in an aggregation packet) or if 1719 the next non-PACSI NAL unit in transmission order has the F bit 1720 equal to 1 (when the PACSI NAL unit is included in a single NAL 1721 unit packet). Otherwise, the F bit MUST be set to 0. 1723 o The NRI field MUST be set to the highest value of NRI field among 1724 all the remaining NAL units in the aggregation packet (when the 1725 PACSI NAL unit is included in an aggregation packet) or the value 1726 of the NRI field of the next non-PACSI NAL unit in transmission 1727 order (when the PACSI NAL unit is included in a single NAL unit 1728 packet). 1730 o The Type field MUST be set to 30. 1732 o The R bit MUST be set to 1. Receivers MUST ignore the value of R. 1734 o The I bit MUST be set to 1 if the I bit of at least one of the 1735 remaining NAL units in the aggregation packet is equal to 1 (when 1736 the PACSI NAL unit is included in an aggregation packet) or if 1737 the I bit of the next non-PACSI NAL unit in transmission order is 1738 equal to 1 (when the PACSI NAL unit is included in a single NAL 1739 unit packet). Otherwise, the I bit MUST be set to 0. 1741 o The PRID field MUST be set to the lowest value of the PRID values 1742 of the remaining NAL units in the aggregation packet (when the 1743 PACSI NAL unit is included in an aggregation packet) or the PRID 1744 value of the next non-PACSI NAL unit in transmission order (when 1745 the PACSI NAL unit is included in a single NAL unit packet). 1747 o The N bit MUST be set to 1 if the N bit of all the remaining NAL 1748 units in the aggregation packet is equal to 1 (when the PACSI NAL 1749 unit is included in an aggregation packet) or if the N bit of the 1750 next non-PACSI NAL unit in transmission order is equal to 1 (when 1751 the PACSI NAL unit is included in a single NAL unit packet). 1752 Otherwise, the N bit MUST be set to 0. 1754 o The DID field MUST be set to the lowest value of the DID values 1755 of the remaining NAL units in the aggregation packet (when the 1756 PACSI NAL unit is included in an aggregation packet) or the DID 1757 value of the next non-PACSI NAL unit in transmission order (when 1758 the PACSI NAL unit is included in a single NAL unit packet). 1760 o The QID field MUST be set to the lowest value of the QID values 1761 of the remaining NAL units with the lowest value of DID in the 1762 aggregation packet (when the PACSI NAL unit is included in an 1763 aggregation packet) or the QID value of the next non-PACSI NAL 1764 unit in transmission order (when the PACSI NAL unit is included 1765 in a single NAL unit packet). 1767 o The TID field MUST be set to the lowest value of the TID values 1768 of the remaining NAL units with the lowest value of DID in the 1769 aggregation packet (when the PACSI NAL unit is included in an 1770 aggregation packet) or the TID value of the next non-PACSI NAL 1771 unit in transmission order (when the PACSI NAL unit is included 1772 in a single NAL unit packet). 1774 o The U bit MUST be set to 1 if the U bit of at least one of the 1775 remaining NAL units in the aggregation packet is equal to 1 (when 1776 the PACSI NAL unit is included in an aggregation packet) or if 1777 the U bit of the next non-PACSI NAL unit in transmission order is 1778 equal to 1 (when the PACSI NAL unit is included in a single NAL 1779 unit packet). Otherwise, the U bit MUST be set to 0. 1781 o The D bit MUST be set to 1 if the D value of all the remaining 1782 NAL unit in the aggregation packet is equal to 1 (when the PACSI 1783 NAL unit is included in an aggregation packet) or if the D bit of 1784 the next non-PACSI NAL unit in transmission order is equal to 1 1785 (when the PACSI NAL unit is included in a single NAL unit packet). 1786 Otherwise, the D bit MUST be set to 0. 1788 o The O bit MUST be set to 1 if the O bit of at least one of the 1789 remaining NAL units in the aggregation packet is equal to 1 (when 1790 the PACSI NAL unit is included in an aggregation packet) or if 1791 the O bit of the next non-PACSI NAL unit in transmission order is 1792 equal to 1 (when the PACSI NAL unit is included in a single NAL 1793 unit packet). Otherwise, the O bit MUST be set to 0. 1795 o The RR field MUST be set to "11" (in binary form). Receivers 1796 MUST ignore the value of RR. 1798 o If the X bit is equal to 1, the bits A, P, and C are specified as 1799 below. Otherwise, the bits A, P, and C are unspecified, and 1800 receivers MUST ignore the values of these bits. The X bit SHOULD 1801 be identical for all the PACSI NAL units in all the RTP sessions 1802 carrying the same SVC bitstream. 1804 o If the Y bit is equal to 1, the OPTIONAL fields TL0PICIDX and 1805 IDRPICID MUST be present and specified as below, and the bits S 1806 and E are also specified as below. Otherwise, the fields 1807 TL0PICIDX and IDRPICID MUST NOT be present, while the S and E 1808 bits are unspecified and receivers MUST ignore the values of 1809 these bits. The Y bit MUST be identical for all the PACSI NAL 1810 units in all the RTP sessions carrying the same SVC bitstream. 1811 The Y bit MUST be equal to 0 when the parameter packetization- 1812 mode is equal to 2. 1814 o If the T bit is equal to 1, the OPTIONAL field DONC MUST be 1815 present and specified as below. Otherwise, the field DONC MUST 1816 NOT be present. The field T MUST be equal to 0 if the PACSI NAL 1817 unit is contained in an STAP-B, MTAP16, MTAP24, or NI-MTAP. 1819 o The A bit MUST be set to 1 if at least one of the remaining NAL 1820 units in the aggregation packet belongs to an anchor layer 1821 representation (when the PACSI NAL unit is included in an 1822 aggregation packet) or if the next non-PACSI NAL unit in 1823 transmission order belongs to an anchor layer representation 1824 (when the PACSI NAL unit is included in a single NAL unit packet). 1825 Otherwise, the A bit MUST be set to 0. 1827 Informative note: The A bit indicates whether CGS or spatial 1828 layer switching at a non-IDR layer representation (a layer 1829 representation with nal_unit_type not equal to 5 and idr_flag not 1830 equal to 1) can be performed. With some picture coding 1831 structures a non-IDR intra layer representation can be used for 1832 random access. Compared to using only IDR layer representations, 1833 higher coding efficiency can be achieved. The H.264/AVC or SVC 1834 solution to indicate the random accessibility of a non-IDR intra 1835 layer representation is using a recovery point SEI message. The 1836 A bit offers direct access to this information, without having to 1837 parse the recovery point SEI message, which may be buried deeply 1838 in an SEI NAL unit. Furthermore, the SEI message may or may not 1839 be present in the bitstream. 1841 o The P bit MUST be set to 1 if all the remaining NAL units in the 1842 aggregation packet have redundant_pic_cnt greater than 0 (when 1843 the PACSI NAL unit is included in an aggregation packet) or the 1844 next non-PACSI NAL unit in transmission order has 1845 redundant_pic_cnt greater than 0 (when the PACSI NAL unit is 1846 included in a single NAL unit packet). Otherwise, the P bit MUST 1847 be set to 0. 1849 Informative note: The P bit indicates whether a packet can be 1850 discarded because it contains only redundant slice NAL units. 1851 Without this bit, the corresponding information can be obtained 1852 from the syntax element redundant_pic_cnt, which is contained in 1853 the variable-length coded slice header. 1855 o The C bit MUST be set to 1 if at least one of the remaining NAL 1856 units in the aggregation packet belongs to an intra layer 1857 representation (when the PACSI NAL unit is included in an 1858 aggregation packet) or if the next non-PACSI NAL unit in 1859 transmission order belongs to an intra layer representation (when 1860 the PACSI NAL unit is included in a single NAL unit packet). 1861 Otherwise, the C bit MUST be set to 0. 1863 Informative note: The C bit indicates whether a packet contains 1864 intra slices, which may be the only packets to be forwarded, e.g., 1865 when the network conditions are particularly adverse. 1867 o The S bit MUST be set to 1, if the first NAL unit following the 1868 PACSI NAL unit in an aggregation packet is the first VCL NAL unit, 1869 in decoding order, of a layer representation (when the PACSI NAL 1870 unit is included in an aggregation packet) or if the next non- 1871 PACSI NAL unit in transmission order is the first VCL NAL unit, 1872 in decoding order, of a layer representation(when the PACSI NAL 1873 unit is included in a single NAL unit packet). Otherwise, the S 1874 bit MUST be set to 0. 1876 o The E bit MUST be set to 1, if the last NAL unit following the 1877 PACSI NAL unit in an aggregation packet is the last VCL NAL unit, 1878 in decoding order, of a layer representation (when the PACSI NAL 1879 unit is included in an aggregation packet) or if the next non- 1880 PACSI NAL unit in transmission order is the last VCL NAL unit, in 1881 decoding order, of a layer representation (when the PACSI NAL 1882 unit is included in a single NAL unit packet). Otherwise, the E 1883 field MUST be set to 0. 1885 Informative note: In an aggregation packet it is always possible 1886 to detect the beginning or end of a layer representation by 1887 detecting changes in the values of dependency_id, quality_id, and 1888 temporal_id in NAL unit headers, except from the first and last 1889 NAL units of a packet. The S or E bits are used to provide this 1890 information, for both single NAL unit and aggregation packets, so 1891 that previous or following packets do not have to be examined. 1892 This enables MANEs to detect slice loss and take proper action 1893 such as requesting a retransmission as soon as possible, as well 1894 as to allow efficient playout buffer handling similarly to the M 1895 bit present in the RTP header. The M bit in the RTP header still 1896 indicates the end of an access unit, not the end of a layer 1897 representation. 1899 o When present, the TL0PICIDX field MUST be set to equal to 1900 tl0_dep_rep_idx as specified in Annex G of [H.264] for the layer 1901 representation containing the first NAL unit following the PACSI 1902 NAL unit in the aggregation packet (when the PACSI NAL unit is 1903 included in an aggregation packet) or containing the next non- 1904 PACSI NAL unit in transmission order (when the PACSI NAL unit is 1905 included in a single NAL unit packet). 1907 o When present, the IDRPICID field MUST be set to equal to 1908 effective_idr_pic_id as specified in Annex G of [H.264] for the 1909 layer representation containing the first NAL unit following the 1910 PACSI NAL unit in the aggregation packet (when the PACSI NAL unit 1911 is included in an aggregation packet) or containing the next non- 1912 PACSI NAL unit in transmission order (when the PACSI NAL unit is 1913 included in a single NAL unit packet). 1915 Informative note: The TL0PICIDX and IDRPICID fields enable the 1916 detection of the loss of layer representations in the most 1917 important temporal layer (with temporal_id equal to 0) by 1918 receivers as well as MANEs. SVC provides a solution that uses 1919 SEI messages, which are harder to parse and may or may not be 1920 present in the bitstream. When the PACSI NAL unit is part of an 1921 NI-MTAP packet, it is possible to infer the correct values of 1922 tl0_dep_rep_idx and idr_pic_id for all layer representations 1923 contained in the NI-MTAP by following the rules that specify how 1924 these parameters are set as given in Annex G of [H.264] and by 1925 detecting the different layer representations contained in the 1926 NI-MTAP packet by detecting changes in the values of 1927 dependency_id_, quality_id, and temporal_id in the NAL unit 1928 headers as well as using the S and E flags. The only exception 1929 is if NAL units of an IDR picture are present in the NI-MTAP in a 1930 position other than the first NAL unit following the PACSI NAL 1931 unit, in which case the value of idr_pic_id cannot be inferred. 1932 In this case the NAL unit has to be partially parsed to obtain 1933 the idr_pic_id. Note that, due to the large size of IDR pictures, 1934 their inclusion in an NI-MTAP, and especially in a position other 1935 than the first NAL unit following the PACSI NAL unit may be 1936 neither practical nor useful. 1938 o When present, the field DONC indicates the Cross-Session Decoding 1939 Order Number (CS-DON) for the first of the remaining NAL units in 1940 the aggregation packet (when the PACSI NAL unit is included in an 1941 aggregation packet) or the CS-DON of the next non-PACSI NAL unit 1942 in transmission order (when the PACSI NAL unit is included in a 1943 single NAL unit packet). CS-DON is further discussed in Section 1944 4.11. 1946 The PACSI NAL unit MAY include a subset of the SEI NAL units 1947 associated with the access unit to which the first non-PACSI NAL 1948 unit in the aggregation packet belongs, and MUST NOT contain SEI NAL 1949 units associated with any other access unit. 1951 Informative note: In H.264/AVC and SVC, within each access unit, 1952 SEI NAL units must appear before any VCL NAL unit in decoding 1953 order. Therefore, without using PACSI NAL units, SEI messages 1954 are typically only conveyed in the first of the packets carrying 1955 an access unit. Senders may repeat SEI NAL units in PACSI NAL 1956 units, so that they are repeated in more than one packet and thus 1957 increase robustness against packet losses. Receivers may use the 1958 repeated SEI messages in place of missing SEI messages. 1960 For a PACSI NAL unit included in an aggregation packet, an SEI 1961 message SHOULD NOT be included in the PACSI NAL unit and also 1962 included in one of the remaining NAL units contained in the same 1963 aggregation packet. 1965 4.10 Empty NAL unit 1967 An Empty NAL unit MAY be included in a single NAL unit packet, an 1968 STAP-A or an NI-MTAP packet. Empty NAL units MUST have an RTP 1969 timestamp (when transported in a single NAL unit packet) or NALU- 1970 time (when transported in an aggregation packet) that is associated 1971 with an access unit for which there exists at least one NAL unit of 1972 type 1, 5, or 20. When MST is used, the type 1, 5, or 20 NAL unit 1973 may be in a different RTP session. Empty NAL units may be used in 1974 the decoding order recovery process of the NI-T mode as described in 1975 Section 5.2.1. 1977 The packet structure is shown in the following figure. 1979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1980 |F|NRI| type | Subtype |J|K|L| 1981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1983 Figure 4 Empty NAL unit structure. 1985 The fields MUST be set as follows: 1987 - F MUST be equal to 0 1988 NRI MUST be equal to 3 1989 Type MUST be equal to 31 1990 Subtype MUST be equal to 1 1991 J MUST be equal to 0 1992 K MUST be equal to 0 1993 L MUST be equal to 0 1995 4.11 Decoding Order Number (DON) 1997 The DON concept is introduced in [I-D.ietf-avt-rtp-rfc3984bis] and 1998 is used to recover the decoding order when interleaving is used 1999 within a single session. Section 5.5 of [I-D.ietf-avt-rtp- 2000 rfc3984bis] applies when using SST. 2002 When using MST, it is necessary to recover the decoding order across 2003 the various RTP sessions regardless if interleaving is used or not. 2004 In addition to the timestamp mechanism described later on, the CS- 2005 DON mechanism is an extension of the DON facility that can be used 2006 for this purpose, and is defined in the following section. 2008 4.11.1 Cross-Session DON (CS-DON) for Multi-Session Transmission 2010 The Cross-Session Decoding Order Number (CS-DON) is a number that 2011 indicates the decoding order of NAL units across all RTP sessions 2012 involved in MST. It is similar to the DON concept in [I-D.ietf-avt- 2013 rtp-rfc3984bis], but contrary to [I-D.ietf-avt-rtp-rfc3984bis] where 2014 the DON was used only for interleaved packetization, in this memo it 2015 is used not only in the interleaved MST mode (I-C) but also in two 2016 of the non-interleaved MST modes as well (NI-C and NI-TC). 2018 When the NI-C or NI-TC MST modes are in use, the packetization of 2019 each session MUST be as specified in Section 5.2.2. In PACSI NAL 2020 units the CS-DON value is explicitly coded in the field DONC. For 2021 non-PACSI NAL units the CS-DON value is derived as follows. Let SN 2022 indicate the RTP sequence number of a packet. 2024 o For each non-PACSI NAL unit carried in a session using the single 2025 NAL unit session packetization mode, the CS-DON value of the NAL 2026 unit is equal to (DONC_prev_PACSI + SN_diff - 1) % 65536, wherein 2027 "%" is the modulo operation, DONC_prev_PACSI is the DONC value of 2028 the previous PACSI NAL unit with the same NALU-time as the 2029 current NAL unit, and SN_diff is calculated as follows: 2031 if SN1 > SN2, SN_diff = SN1 - SN2 2032 else SN_diff = SN2 + 65536 - SN1 2034 where SN1 and SN2 are the SNs of the current NAL unit and the 2035 previous PACSI NAL unit with the same NALU-time, respectively. 2037 o For non-PACSI NAL units carried in a session using the non- 2038 interleaved session packetization mode, the CS-DON value of each 2039 non-PACSI NAL unit is derived as follows. 2041 For a non-PACSI NAL unit in a single NAL unit packet, the 2042 following applies. 2044 If the previous PACSI NAL unit is contained in a single 2045 NAL unit packet, the CS-DON value of the NAL unit is 2046 calculated as above; 2048 otherwise (the previous PACSI NAL unit is contained in 2049 an STAP-A packet), the CS-DON value of the NAL unit is 2050 calculated as above, with DONC_prev_PACSI being replaced 2051 by the CS-DON value of the previous non-PACSI NAL unit 2052 in decoding order (i.e., the CS-DON value of the last 2053 NAL unit of the STAP-A packet). 2055 For a non-PACSI NAL unit in an STAP-A packet, the following 2056 applies. 2058 If the non-PACSI NAL unit is the first non-PACSI NAL 2059 unit in the STAP-A packet, the CS-DON value of the NAL 2060 unit is equal to DONC of the PACSI NAL unit in the STAP- 2061 A packet; 2063 otherwise (the non-PACSI NAL unit is not the first non- 2064 PACSI NAL unit in the STAP-A packet), the CS-DON value 2065 of the NAL unit is equal to: (the CS-DON value of the 2066 previous non-PACSI NAL unit in decoding order + 1) % 2067 65536, wherein "%" is the modulo operation. 2069 For a non-PACSI NAL unit in a number of FU-A packets, the CS- 2070 DON value of the NAL unit is calculated the same way as when 2071 the single NAL unit session packetization mode is in use, with 2072 SN1 being the SN value of the first FU-A packet. 2074 For a non-PACSI NAL unit in an NI-MTAP packet, the CS-DON 2075 value is equal to the value of the DON field of the non- 2076 interleaved multi-time aggregation unit. 2078 When the I-C MST packetization mode is in use, the DON values 2079 derived according to [I-D.ietf-avt-rtp-rfc3984bis] for all the NAL 2080 units in each of the RTP sessions MUST indicate CS-DON values. 2082 5. Packetization Rules 2084 Section 6 of [I-D.ietf-avt-rtp-rfc3984bis] applies in this memo, 2085 with the following additions. 2087 5.1 Packetization Rules for Single-Session Transmission 2089 All receivers MUST support the single NAL unit packetization mode to 2090 provide backward compatibility to endpoints supporting only the 2091 single NAL unit mode of [I-D.ietf-avt-rtp-rfc3984bis]. However, the 2092 use of single NAL unit packetization mode (packetization-mode equal 2093 to 0) SHOULD be avoided whenever possible, because encapsulating NAL 2094 units of small sizes in their own packets (e.g., small NAL units 2095 containing parameter sets, prefix NAL units, or SEI messages) is 2096 less efficient due to the packet header overhead. 2098 All receivers MUST support the non-interleaved mode. 2100 Informative note: The non-interleaved mode of [I-D.ietf-avt-rtp- 2101 rfc3984bis] does allow an application to encapsulate a single NAL 2102 unit in a single RTP packet. Historically, the single NAL unit 2103 mode has been included into [I-D.ietf-avt-rtp-rfc3984bis] only 2104 for compatibility with ITU-T Rec. H.241 Annex A [H.241]. There 2105 is no point in carrying this historic ballast towards a new 2106 application space such as the one provided with SVC. The 2107 implementation complexity increase for supporting the additional 2108 mechanisms of the non-interleaved mode (namely STAP-A and FU-A) 2109 is minor, whereas the benefits are significant. As a result, the 2110 support of STAP-A and FU-A is required. Additionally, support 2111 for two of the three NAL unit types defined in this memo, namely 2112 Empty NAL units and NI-MTAP is needed, as specified in Section 2113 4.5.1. 2115 A NAL unit of small size SHOULD be encapsulated in an aggregation 2116 packet together with one or more other NAL units. For example, non- 2117 VCL NAL units such as access unit delimiters, parameter sets, or SEI 2118 NAL units are typically small. 2120 A prefix NAL unit and the NAL unit with which it is associated, and 2121 which follows the prefix NAL unit in decoding order, SHOULD be 2122 included in the same aggregation packet whenever an aggregation 2123 packet is used for the associated NAL unit, unless this would 2124 violate session MTU constraints or if fragmentation units are used 2125 for the associated NAL unit. 2127 Informative note: Although the prefix NAL unit is ignored by an 2128 H.264/AVC decoder, it is necessary in the SVC decoding process. 2130 Given the small size of the prefix NAL unit, it is best if it is 2131 transported in the same RTP packet as its associated NAL unit. 2133 When only an H.264/AVC compatible subset of the SVC base layer is 2134 transmitted in an RTP session, the subset MUST be encapsulated 2135 according to [I-D.ietf-avt-rtp-rfc3984bis]. This way, an [I-D.ietf- 2136 avt-rtp-rfc3984bis] receiver will be able to receive the H.264/AVC 2137 compatible bitstream subset. 2139 When a set of layers including one or more SVC enhancement layers is 2140 transmitted in an RTP session, the set SHOULD be carried in one RTP 2141 stream that SHOULD be encapsulated according to this memo. 2143 5.2 Packetization Rules for Multi-Session Transmission 2145 When MST is used, the packetization rules specified in Section 5.1 2146 still apply. In addition, the following packetization rules MUST be 2147 followed, to ensure that decoding order of NAL units carried in the 2148 sessions can be correctly recovered for each of the MST 2149 packetization modes using the de-packetization process specified in 2150 Section 6.2. 2152 The NI-T and NI-TC modes both use timestamps to recover the decoding 2153 order. In order to be able to do so, it is necessary for the RTP 2154 packet stream to contain data for all sampling instances of a given 2155 RTP session in all enhancement RTP sessions that depend on the given 2156 RTP session. The NI-C and I-C modes do not have this limitation, 2157 and use the CS-DON values as a means to explicitly indicate decoding 2158 order, either directly coded in PACSI NAL units, or inferred from 2159 them using the packetization rules. It is noted that the NI-TC mode 2160 offers both alternatives and it is up to the receiver to select 2161 which one to use. 2163 5.2.1 NI-T/NI-TC Packetization Rules 2165 When using the NI-T mode and a PACSI NAL unit is present, the T bit 2166 MUST be equal to 0, i.e., the DONC field MUST NOT be present. 2168 When using the NI-T mode, the optional parameters sprop-mst-remux- 2169 buf-size, sprop-remux-buf-req, remux-buf-cap, sprop-remux-init-buf- 2170 time, sprop-mst-max-don-diff MUST NOT be present. 2172 When the NI-T or NI-TC MST mode is in use, the following applies. 2174 If one or more NAL units of an access unit of sampling time instance 2175 t is present in RTP session A, then one or more NAL units of the 2176 same access unit MUST be present in any enhancement RTP session 2177 which depends on RTP session A. 2179 Informative note 1: The mapping between RTP and NTP format 2180 timestamps is conveyed in RTCP SR packets. In addition, the 2181 mechanisms for faster media timestamp synchronization discussed 2182 in [RFC6051] may be used to speed up the acquisition of the RTP- 2183 to-wall-clock mapping. 2185 Informative note 2: The rule above may require the insertion of 2186 NAL units, typically when temporal scalability is used, i.e., an 2187 enhancement RTP session does not contain any NAL units for an 2188 access unit with a particular NTP timestamp (media timestamp), 2189 which however is present in a lower enhancement RTP session or 2190 the base RTP session. There are two ways to insert additional NAL 2191 units in order to satisfy this rule: 2193 - One option for adding additional NAL units is to use Empty NAL 2194 units (defined in Section 4.10), which can be used by the process 2195 described in Section 6.2.1 for the access unit re-ordering 2196 process. 2198 - Additional NAL units may also be added by the encoder itself, 2199 for example by transmitting coded data that simply instruct the 2200 decoder to repeat the previous picture. This option, however, 2201 may be difficult to use with pre-encoded content. 2203 If a packet must be inserted in order to satisfy the above rule, 2204 e.g., in case of a MANE generating multiple RTP streams out of a 2205 single RTP stream, the inserted packet must have an RTP timestamp 2206 that maps to the same wall-clock time (in NTP format) as the one of 2207 the RTP timestamp of any packet of the access unit present in any 2208 lower enhancement RTP session or the base RTP session. This is easy 2209 to accomplish if the NAL unit or the packet can be inserted at the 2210 time of the RTP stream generation, since the media timestamp (NTP 2211 timestamp) must be the same for the inserted packet and the packet 2212 of the corresponding access unit. If there is no knowledge of the 2213 media time at RTP stream generation or if the RTP streams are not 2214 generated at the same instance, this can be also applied later in 2215 the transmission process. In this case the NTP timestamp of the 2216 inserted packet can be calculated as follows. 2218 Assume that a packet A2 of an access unit with RTP timestamp TS_A2 2219 is present in base RTP session A, and that no packet of that access 2220 unit is present in enhancement RTP session B, as shown in Figure 5. 2221 Thus a packet B2 must be inserted into session B following the rule 2222 above. The most recent RTCP sender report in session A carries NTP 2223 timestamp NTP_A and the RTP timestamp TS_A. The sender report in 2224 session B with a lower NTP timestamp than NTP_A is NTP_B, and 2225 carries the RTP timestamp TS_B. 2227 RTP session B:..B0........B1........(B2)...................... 2229 RTCP session B:.....SR(NTP_B,TS_B)............................. 2231 RTP session A:..A0........A1........A2........................ 2233 RTCP session A:..................SR(NTP_A,TS_A)................ 2235 -----------------|--x------|-----x---|------------------------> 2236 NTP time 2237 --------------------+<---------->+<->+------------------------> 2238 t1 t2 RTP TS(B) time 2240 Figure 5 Example calculation of RTP timestamp for packet 2241 insertion in an enhancement layer RTP session 2243 The vertical bars ("|")in the NTP timeline in the figure above 2244 indicate that access unit data is present in at least one of the 2245 sessions. The "x" marks indicate the times of the sender reports. 2246 The RTP timestamp time line for session B, shown right below the NTP 2247 time line, indicates two time segments, t1 and t2. t1 is the time 2248 difference between the sender reports between the two sessions, 2249 expressed in RTP timestamp clock ticks, and t2 is the time 2250 difference from the session A sender report to the A2 packet, again 2251 expressed in RTP timestamp clock ticks. The sum of these differences 2252 is added to the RTP timestamp of the session report from session B 2253 in order to derive the correct RTP timestamp for the inserted packet 2254 B2. In other words: 2256 TS_B2 = TS_B + t1 + t2 2258 Let toRTP() be a function that calculates the RTP time difference 2259 (in clock ticks of the used clock) given an NTP timestamp difference, 2260 and effRTPdiff() be a function that calculates the effective 2261 difference between two timestamps, including wraparounds: 2263 effRTPdiff( ts1, ts2 ): 2265 if( ts1 <= ts2 ) then 2266 effRTPdiff := ts1-ts2 2267 else 2268 effRTPDiff := (4294967296 + ts2) - ts1 2270 We have: 2272 t1 = toRTP(NTP_A - NTP_B) and t2 = effRTPdiff(TS_A2, TS_A) 2274 Hence in order to generate the RTP timestamp TS_B2 for the inserted 2275 packet B2, the RTP timestamp for packet B2 TS_B2 can be calculated 2276 as follows. 2278 TS_B2 = TS_B + toRTP(NTP_A - NTP_B) + effRTPdiff(TS_A2, TS_A) 2280 5.2.2 NI-C/NI-TC Packetization Rules 2282 When the NI-C or NI-TC MST mode is in use, the following applies for 2283 each of the RTP sessions. 2285 o For each single NAL unit packet containing a non-PACSI NAL unit, 2286 the previous packet, if present, MUST have the same RTP timestamp 2287 as the single NAL unit packet, and the following applies. 2289 o If the NALU-time of the non-PACSI NAL unit is not equal to 2290 the NALU-time of the previous non-PACSI NAL unit in decoding 2291 order, the previous packet MUST contain a PACSI NAL unit 2292 containing the DONC field. 2294 o In an STAP-A packet the first NAL unit in the STAP-A packet MUST 2295 be a PACSI NAL unit containing the DONC field. 2297 o For an FU-A packet the previous packet MUST have the same RTP 2298 timestamp as the FU-A packet, and the following applies. 2300 o If the FU-A packet is the start of the fragmented NAL unit, 2301 the following applies. 2303 o If the NALU-time of the fragmented NAL unit is not 2304 equal to the NALU-time of the previous non-PACSI NAL 2305 unit in decoding order, the previous packet MUST 2306 contain a PACSI NAL unit containing the DONC field; 2308 o Otherwise (the NALU-time of the fragmented NAL unit is 2309 equal to the NALU-time of the previous non-PACSI NAL 2310 unit in decoding order), the previous packet MAY 2311 contain a PACSI NAL unit containing the DONC field. 2313 o Otherwise if the FU-A packet is the end of the fragmented 2314 NAL unit, the following applies. 2316 o If the next non-PACSI NAL unit in decoding order has 2317 NALU-time equal to the NALU-time of the fragmented NAL 2318 unit, and is carried in a number of FU-A packets or a 2319 single NAL unit packet, the next packet MUST be a 2320 single NAL unit packet containing a PACSI NAL unit 2321 containing the DONC field. 2323 o Otherwise (the FU-A packet is neither the start nor the 2324 end of the fragmented NAL unit), the previous packet 2325 MUST be a FU-A packet. 2327 o For each single NAL unit packet containing a PACSI NAL unit, if 2328 present, the PACSI NAL unit MUST contain the DONC field. 2330 o When the optional media type parameter sprop-mst-csdon-always- 2331 present is equal to 1, the session packetization mode in use MUST 2332 be the Non-Interleaved Mode, and only STAP-A and NI-MTAP packets 2333 can be used. 2335 5.2.3 I-C Packetization Rules 2337 When the I-C MST packetization mode is in use, the following applies. 2339 o When a PACSI NAL unit is present, the T bit MUST be equal to 0, 2340 i.e., the DONC field is not present, and the Y bit MUST be equal 2341 to 0, i.e., the TL0PICIDX and IDRPICID are not present. 2343 5.2.4 Packetization Rules for Non-VCL NAL Units 2345 NAL units which do not directly encode video slices are known in 2346 H.264 as non-VCL NAL units. Non-VCL units that are only used by, or 2347 only relevant to, enhancement RTP sessions SHOULD be sent in the 2348 lowest session to which they are relevant. 2350 Some senders, however, such as those sending pre-encoded data, may 2351 be unable to easily determine which non-VCL units are relevant to 2352 which session. Thus, non-VCL NAL units MAY, instead, be sent in a 2353 session that the session using these non-VCL NAL units depends on 2354 (e.g., the base RTP session). 2356 If a non-VCL unit is relevant to more than one RTP session, neither 2357 of which depends on the other(s), the NAL unit MAY be sent in 2358 another session which all these sessions depend on. 2360 5.2.5 Packetization Rules for Prefix NAL Units 2362 Section 5.1 of this memo applies, with the following addition. If 2363 the base layer is sent in a base RTP session using [I-D.ietf-avt- 2364 rtp-rfc3984bis], prefix NAL units MAY be sent in the lowest 2365 enhancement RTP session rather than in the base RTP session. 2367 6. De-Packetization Process 2369 6.1 De-Packetization Process for Single-Session Transmission 2371 For single-session transmission, where a single RTP session is used, 2372 the de-packetization process specified in Section 7 of [I-D.ietf- 2373 avt-rtp-rfc3984bis] applies. 2375 6.2 De-Packetization Process for Multi-Session Transmission 2377 For multi-session transmission, where more than one RTP session is 2378 used to receive data from the same SVC bitstream, the de- 2379 packetization process is specified as follows. 2381 As for a single RTP session, the general concept behind the de- 2382 packetization process is to reorder NAL units from transmission 2383 order to the NAL unit decoding order. 2385 The sessions to be received MUST be identified by mechanisms 2386 specified in Section 7.2.3. An enhancement RTP session typically 2387 contains an RTP stream that depends on at least one other RTP 2388 session, as indicated by mechanisms defined in Section 7.2.3. A 2389 lower RTP session to an enhancement RTP session is an RTP session 2390 which the enhancement RTP session depends on. The lowest RTP 2391 session for a receiver is the base RTP session, which does not 2392 depend on any other RTP session received by the receiver. The 2393 highest RTP session for a receiver is the RTP session which no other 2394 RTP session received by the receiver depends on. 2396 For each of the RTP sessions, the RTP reception process as specified 2397 in RFC 3550 is applied. Then the received packets are passed into 2398 the payload de-packetization process as defined in this memo. 2400 The decoding order of the NAL units carried in all the associated 2401 RTP sessions is then recovered by applying one of the following 2402 subsections, depending on which of the MST packetization modes is in 2403 use. 2405 6.2.1 Decoding Order Recovery for the NI-T and NI-TC Modes 2407 The following process MUST be applied when the NI-T packetization 2408 mode is in use. The following process MAY be applied when the NI-TC 2409 packetization mode is in use. 2411 The process is based on RTP session dependency signaling, RTP 2412 sequence numbers, and timestamps. 2414 The decoding order of NAL units within an RTP packet stream in RTP 2415 session is given by the ordering of sequence numbers SN of the RTP 2416 packets that contain the NAL units, and the order of appearance of 2417 NAL units within a packet. 2419 Timing information according to the media timestamp TS, i.e. the NTP 2420 timestamp as derived from the RTP timestamp of an RTP packet, is 2421 associated with all NAL units contained in the same RTP packet 2422 received in an RTP session. 2424 For NI-MTAP packets the NALU-time is derived for each contained NAL 2425 unit by using the "TS offset" value in the NI-MTAP packet as defined 2426 in Section 4.10, and is used instead of the RTP packet timestamp to 2427 derive the media timestamp, e.g., using the NTP wall clock as 2428 provided via RTCP sender reports. NAL units contained in 2429 fragmentation packets are handled as defragmented, entire NAL units 2430 with their own media timestamps. All NAL units associated with the 2431 same value of media timestamp TS are part of the same access unit 2432 AU(TS). Any Empty NAL units SHOULD be kept as, effectively, access 2433 unit indicators in the re-ordering process. Empty NAL units and 2434 PACSI NAL units SHOULD be removed before passing access unit data to 2435 the decoder. 2437 Informative note: These Empty NAL units are used to associate 2438 NAL units present in other RTP sessions with RTP sessions not 2439 containing any data for an access unit of a particular time 2440 instance. They act as access unit indicators in sessions that 2441 would otherwise contain no data for the particular access unit. 2442 The presence of these NAL units is ensured by the 2443 packetization rules in Section 5.2.1. 2445 It is assumed that the receiver has established an operation point 2446 (DID, QID, and TID values), and has identified the highest 2447 enhancement RTP session for this operation point. The decoding 2448 order of NAL units from multiple RTP streams in multiple RTP 2449 sessions MUST be recovered into a single sequence of NAL units, 2450 grouped into access units, by performing any process equivalent to 2451 the following steps. The general process is described in Section 4.2 2452 of [RFC6051]. For convenience the instructions of [RFC6051] are 2453 repeated and applied to NAL units rather than to full RTP packets. 2454 Additionally SVC specific extensions to the procedure in Section 4.2. 2455 of [RFC6051] are presented in the following list: 2457 o The process should be started with the NAL units received in 2458 the highest RTP session with the first media timestamp TS (in 2459 NTP format) available in the session's (de-jittering) buffer. 2460 It is assumed, that packets in the de-jittering buffer are 2461 already stored in RTP sequence number order. 2463 o Collect all NAL units associated with the same value of media 2464 timestamp TS, starting from the highest RTP session, from all 2465 the (de-jittering) buffers of the received RTP sessions. The 2466 collected NAL units will be those associated with the access 2467 unit AU(TS). 2469 o Place the collected NAL units in the order of session 2470 dependency as derived by the dependency indication as 2471 specified in Section 7.2.3, starting from the lowest RTP 2472 session. 2474 o Place the session ordered NAL units in decoding order within 2475 the particular access unit by satisfying the NAL unit 2476 ordering rules for SVC access units, as described in the 2477 informative algorithm provided in Section 6.2.1.1. 2479 o Remove NI-MTAP and any PACSI NAL units from the access unit 2480 AU(TS). 2482 o The access units can then be transferred to the decoder. 2483 Access units AU(TS) are transferred to the decoder in the 2484 order of appearance (given by the order of RTP sequence 2485 numbers) of media timestamp values TS in the highest RTP 2486 session associated with access unit AU(TS). 2488 Informative Note: Due to packet loss it is possible that 2489 not all sessions may have NAL units present for the media 2490 timestamp value TS present in the highest RTP session. In 2491 such a case an algorithm may: 2492 a) proceed to the next complete access unit with NAL units 2493 present in all the received RTP sessions; or 2494 b) consider a new highest RTP session, the highest RTP 2495 session for which the access unit is complete, and apply 2496 the process above. The algorithm may return to the 2497 original highest RTP session when a complete and error-free 2498 access unit that contains NAL units in all the sessions is 2499 received. 2501 The following gives an informative example. 2503 The example shown in Figure 6 refers to three RTP sessions A, B and 2504 C containing an SVC bitstream transmitted as 3 sources. In the 2505 example, the dependency signaling (described in Section 7.2.3) 2506 indicates that session A is the base RTP session, B is the first 2507 enhancement RTP session and depends on A, and C is the second 2508 enhancement RTP session and depends on A and B. A hierarchical 2509 picture coding prediction structure is used, in which Session A has 2510 the lowest frame rate and Session B and C have the same but higher 2511 frame rate. 2513 The figure shows NAL units contained in RTP packets which are stored 2514 in the de-jittering buffer at the receiver for session de- 2515 packetization. The NAL units are already re-ordered according to 2516 their RTP sequence number order and, if within an aggregation packet, 2517 according to the order of their appearance within the aggregation 2518 packet. The figure indicates for the received NAL units the 2519 decoding order within the sessions, as well as the associated media 2520 (NTP) timestamps ("TS[..]"). NAL units of the same access unit 2521 within a session are grouped by "(.,.)" and share the same media 2522 timestamp TS, which is shown at the bottom of the figure. Note that 2523 the timestamps are not in increasing order since, in this example, 2524 the decoding order is different from the output/display order. 2526 The process first proceeds to the NAL units associated with the 2527 first media timestamp TS[1] present in the highest session C and 2528 removes/ignores all preceding (in decoding order) NAL units to NAL 2529 units with TS[1] in each of the de-jittering buffers of RTP sessions 2530 A, B, and C. Then, starting from session C, the first media 2531 timestamp available in decoding order (TS [1]) is selected and NAL 2532 units starting from RTP session A, and sessions B and C are placed 2533 in order of the RTP session dependency as required by Section 7.2.3 2534 of this memo (in the example for TS[1]: first session B and then 2535 session C) into the access unit AU(TS[1]) associated with media 2536 timestamp TS[1]. Then the next media timestamp TS[3] in order of 2537 appearance in the highest RTP session C is processed and the process 2538 described above is repeated. Note that there may be access units 2539 with no NAL units present, e.g., in the lowest RTP session A (see, 2540 e.g., TS[1]). With TS[8], the first access unit with NAL units 2541 present in all the RTP sessions appears in the buffers. 2543 C: ------------(1,2)-(3,4)--(5)---(6)---(7,8)(9,10)-(11)--(12)---- 2544 | | | | | | | | | | 2545 B: -(1,2)-(3,4)-(5)---(6)--(7,8)-(9,10)-(11)-(12)--(13,14)(15,15)- 2546 | | | | | | 2547 A: -------(1)---------------(2)---(3)---------------(4)----(5)---- 2548 ---------------------------------------------------decoding order--> 2550 TS: [4] [2] [1] [3] [8] [6] [5] [7] [12] [10] 2552 Key: 2553 A, B, C - RTP sessions 2554 Integer values in "()" - NAL unit decoding order within RTP session 2555 "( )" - groups the NAL units of an access unit 2556 in an RTP session 2557 "|" - indicates corresponding NAL units of the 2558 same access unit AU(TS[..]) in the RTP 2559 sessions 2560 Integer values in "[]" - media timestamp TS, sampling time 2561 as derived, e.g., from NTP timestamp 2562 associated with the access unit AU(TS[..]), 2563 consisting of NAL units in the sessions 2564 above each TS value. 2566 Figure 6 Example of decoding order recovery in multi-source 2567 transmission. 2569 6.2.1.1 Informative Algorithm for NI-T Decoding Order Recovery within 2570 an Access Unit 2572 Within an access unit, the [H.264] specification (Sections 7.4.1.2.3 2573 and G.7.4.1.2.3) constrains the valid decoding order of NAL units. 2575 These constraints make it possible to reconstruct a valid decoding 2576 order for the NAL units of an access unit based only on the order of 2577 NAL units in each session, the NAL unit headers, and Supplemental 2578 Enhancement Information message headers. 2580 This section specifies an informative algorithm to reconstruct a 2581 valid decoding order for NAL units within an access unit. Other NAL 2582 unit orderings may also be valid; however, any compliant NAL unit 2583 ordering will describe the same video stream and ancillary data as 2584 the one produced by this algorithm. 2586 An actual implementation, of course, needs only to behave "as if" 2587 this reordering is done. In particular, NAL units which are 2588 discarded by an implementation's decoding process do not need to be 2589 reordered. 2591 In this algorithm, NAL units within an access unit are first ordered 2592 by NAL unit type, in the order specified in Table 12 below, except 2593 from NAL unit type 14 which is handled specially as described in the 2594 table. NAL units of the same type are then ordered as specified for 2595 the type, if necessary. 2597 For the purposes of this algorithm, "session order" is the order of 2598 NAL units implied by their transmission order within an RTP session. 2599 For the Non-Interleaved and Single NAL unit modes, this is the RTP 2600 sequence number order coupled with the order of NAL units within an 2601 aggregation unit. 2603 Table 12. Ordering of NAL unit types within in Access Unit 2605 Type Description / Comments 2606 ----------------------------------------------------------- 2607 9 Access unit delimiter 2609 7 Sequence parameter set 2611 13 Sequence parameter set extension 2613 15 Subset sequence parameter set 2615 8 Picture parameter set 2617 16-18 Reserved 2619 6 Supplemental enhancement information (SEI) 2620 If an SEI message with a first payload of 0 (Buffering 2621 Period) is present, it must be the first SEI message. 2623 If SEI messages with a Scalable Nesting (30) payload and 2624 a nested payload of 0 (Buffering Period) are present, 2625 these then follow the first SEI message. Such an SEI 2626 message with the all_layer_representations_in_au_flag 2627 equal to 1 is placed first, followed by any others, 2628 sorted in increasing order of DQId. 2630 All other SEI messages follow in any order. 2632 14 Prefix NAL unit in scalable extension 2633 1 Coded slice of a non-IDR picture 2634 5 Coded slice of an IDR picture 2636 NAL units of type 1 or 5 will be sent within only a 2637 single session for any given access unit. They are 2638 placed in session order. (Note: Any given access unit 2639 will contain only NAL units of type 1 or type 5, not 2640 both.) 2642 If NAL units of type 14 are present, every NAL unit of 2643 type 1 or 5 is prefixed by a NAL unit of type 14. (Note: 2644 Within an access unit, every NAL unit of type 14 is 2645 identical, so correlation of type 14 NAL units with the 2646 other NAL units is not necessary.) 2648 12 Filler data 2650 The only restriction of filler data NAL units within an 2651 access unit is that they shall not precede the first VCL 2652 NAL unit with the same access unit. 2654 19 Coded slice of an auxiliary coded picture without 2655 partitioning 2657 These NAL units will be sent within only a single 2658 session for any given access unit, and are placed in 2659 session order. 2661 20 Coded slice in scalable extension 2662 21-23 Reserved 2664 Type 20 NAL units are placed in increasing order of DQId. 2665 Within each DQId value, they are placed in session order. 2667 (Note: SVC slices with a given DQId value will be sent 2668 within only a single session for any given access unit.) 2670 Type 21-23 NAL units are placed immediately following 2671 the non-reserved-type VCL NAL unit they follow in 2672 session order. 2674 10 End of sequence 2676 11 End of stream 2678 6.2.2 Decoding Order Recovery for the NI-C, NI-TC and I-C Modes 2680 The following process MUST be used when either the NI-C or I-C MST 2681 packetization mode is in use. The following process MAY be applied 2682 when the NI-TC MST packetization mode is in use. 2684 The RTP packets output from the RTP-level reception processing for 2685 each session are placed into a re-multiplexing buffer. 2687 It is RECOMMENDED to set the size of the re-multiplexing buffer (in 2688 bytes) equal to or greater than the value of the sprop-remux-buf-req 2689 media type parameter of the highest RTP session the receiver 2690 receives. 2692 The CS-DON value is calculated and stored for each NAL unit. 2694 Informative note: The CS-DON value of a NAL unit may rely on 2695 information carried in another packet than the packet 2696 containing the NAL unit. This happens, e.g., when the CS-DON 2697 values need to be derived for non-PACSI NAL units contained in 2698 single NAL unit packets, as the single NAL unit packets 2699 themselves do not contain CS-DON information. In this case, 2700 when no packet containing required CS-DON information is 2701 received for a NAL unit, this NAL unit has to be discarded by 2702 the receiver as it cannot be fed to the decoder in the correct 2703 order. When the optional media type parameter sprop-mst-csdon- 2704 always-present is equal to 1, no such dependency exists, i.e., 2705 the CS-DON value of any particular NAL unit can be derived 2706 solely according to information in the packet containing the 2707 NAL unit, and therefore, the receiver does not need to discard 2708 any received NAL units. 2710 The receiver operation is described below with the help of the 2711 following functions and constants: 2713 o Function AbsDON is specified in Section 8.1 of [I-D.ietf-avt-rtp- 2714 rfc3984bis]. 2716 o Function don_diff is specified in Section 5.5 of [I-D.ietf-avt- 2717 rtp-rfc3984bis]. 2719 o Constant N is the value of the OPTIONAL sprop-mst-remux-buf-size 2720 media type parameter of the highest RTP session incremented by 1. 2722 Initial buffering lasts until one of the following conditions is 2723 fulfilled: 2725 o There are N or more VCL NAL units in the re-multiplexing buffer. 2727 o If sprop-mst-max-don-diff of the highest RTP session is present, 2728 don_diff(m,n) is greater than the value of sprop-mst-max-don-diff 2729 of the highest RTP session, where n corresponds to the NAL unit 2730 having the greatest value of AbsDON among the received NAL units 2731 and m corresponds to the NAL unit having the smallest value of 2732 AbsDON among the received NAL units. 2734 o Initial buffering has lasted for the duration equal to or greater 2735 than the value of the OPTIONAL sprop-remux-init-buf-time media 2736 type parameter of the highest RTP session. 2738 The NAL units to be removed from the re-multiplexing buffer are 2739 determined as follows: 2741 o If the re-multiplexing buffer contains at least N VCL NAL units, 2742 NAL units are removed from the re-multiplexing buffer and passed 2743 to the decoder in the order specified below until the buffer 2744 contains N-1 VCL NAL units. 2746 o If sprop-mst-max-don-diff of the highest RTP session is present, 2747 all NAL units m for which don_diff(m,n) is greater than sprop- 2748 max-don-diff of the highest RTP session are removed from the re- 2749 multiplexing buffer and passed to the decoder in the order 2750 specified below. Herein, n corresponds to the NAL unit having 2751 the greatest value of AbsDON among the NAL units in the re- 2752 multiplexing buffer. 2754 The order in which NAL units are passed to the decoder is specified 2755 as follows: 2757 o Let PDON be a variable that is initialized to 0 at the beginning 2758 of the RTP sessions. 2760 o For each NAL unit associated with a value of CS-DON, a CS-DON 2761 distance is calculated as follows. If the value of CS-DON of the 2762 NAL unit is larger than the value of PDON, the CS-DON distance is 2763 equal to CS-DON - PDON. Otherwise, the CS-DON distance is equal 2764 to 65535 - PDON + CS-DON + 1. 2766 o NAL units are delivered to the decoder in increasing order of CS- 2767 DON distance. If several NAL units share the same value of CS- 2768 DON distance, they can be passed to the decoder in any order. 2770 o When a desired number of NAL units have been passed to the 2771 decoder, the value of PDON is set to the value of CS-DON for the 2772 last NAL unit passed to the decoder. 2774 7. Payload Format Parameters 2776 This section specifies the parameters that MAY be used to select 2777 optional features of the payload format and certain features of the 2778 bitstream. The parameters are specified here as part of the media 2779 type registration for the SVC codec. A mapping of the parameters 2780 into the Session Description Protocol (SDP) [RFC4566] is also 2781 provided for applications that use SDP. Equivalent parameters could 2782 be defined elsewhere for use with control protocols that do not use 2783 SDP. 2785 Some parameters provide a receiver with the properties of the stream 2786 that will be sent. The names of all these parameters start with 2787 "sprop" for stream properties. Some of these "sprop" parameters are 2788 limited by other payload or codec configuration parameters. For 2789 example, the sprop-parameter-sets parameter is constrained by the 2790 profile-level-id parameter. The media sender selects all "sprop" 2791 parameters rather than the receiver. This uncommon characteristic 2792 of the "sprop" parameters may be incompatible with some signaling 2793 protocol concepts, in which case the use of these parameters SHOULD 2794 be avoided. 2796 7.1 Media Type Registration 2798 The media subtype for the SVC codec is allocated from the IETF tree. 2800 The receiver MUST ignore any unspecified parameter. 2802 Informative note: Requiring that the receiver ignores unspecified 2803 parameters allows for backward compatibility of future extensions. 2804 For example, if a future specification that is backward 2805 compatible to this specification specifies some new parameters, 2806 then a receiver according to this specification is capable of 2807 receiving data per the new payload but ignoring those parameters 2808 newly specified in the new payload specification. This provision 2809 is also present in [I-D.ietf-avt-rtp-rfc3984bis]. 2811 Media Type name: video 2813 Media subtype name: H264-SVC 2815 Required parameters: none 2817 OPTIONAL parameters: 2819 In the following definitions of parameters, "the stream" or "the 2820 NAL unit stream" refers to all NAL units conveyed in the current 2821 RTP session in SST, and all NAL units conveyed in the current RTP 2822 session and all NAL units conveyed in other RTP sessions that the 2823 current RTP session depends on in MST. 2825 profile-level-id: 2826 A base16 [RFC4648] (hexadecimal) representation of the 2827 following three bytes in the sequence parameter set or subset 2828 sequence parameter set NAL unit specified in [H.264]: 1) 2829 profile_idc, 2) a byte herein referred to as profile-iop, 2830 composed of the values of constraint_set0_flag, 2831 constraint_set1_flag, constraint_set2_flag, 2832 constraint_set3_flag, and reserved_zero_4bits positioned 2833 starting from the most significant bit towards the least 2834 significant bit (bit positions 7 through 4), and 3) level_idc. 2835 Note that reserved_zero_4bits is required to be equal to 0 in 2836 [H.264], but other values for it may be specified in the 2837 future by ITU-T or ISO/IEC. 2839 The profile-level-id parameter indicates the default sub- 2840 profile, i.e., the subset of coding tools that may have been 2841 used to generate the stream or that the receiver supports, and 2842 the default level of the stream or the one that the receiver 2843 supports. 2845 The default sub-profile is indicated collectively by the 2846 profile_idc byte and some fields in the profile-iop byte. 2847 Depending on the values of the fields in the profile-iop byte, 2848 the default sub-profile may be the same set of coding tools 2849 supported by one profile, or a common subset of coding tools 2850 of multiple profiles, as specified in subsection G.7.4.2.1.1 2851 of [H.264]. The default level is indicated by the level_idc 2852 byte, and, when profile_idc is equal to 66, 77 or 88 (the 2853 Baseline, Main, or Extended profile) and level_idc is equal to 2854 11, additionally by bit 4 (constraint_set3_flag) of the 2855 profile-iop byte. When profile_idc is equal to 66, 77 or 88 2856 (the Baseline, Main, or Extended profile) and level_idc is 2857 equal to 11, and bit 4 (constraint_set3_flag) of the profile- 2858 iop byte is equal to 1, the default level is level 1b. 2860 Table 13 lists all profiles defined in Annex A and Annex G of 2861 [H.264] and, for each of the profiles, the possible 2862 combinations of profile_idc and profile-iop that represent the 2863 same sub-profile. 2865 Table 13. Combinations of profile_idc and profile-iop 2866 representing the same sub-profile corresponding to the full 2867 set of coding tools supported by one profile. In the 2868 following, x may be either 0 or 1, while the profile names 2869 are indicated as follows. CB: Constrained Baseline profile, 2870 B: Baseline profile, M: Main profile, E: Extended profile, 2871 H: High profile, H10: High 10 profile, H42: High 4:2:2 2872 profile, H44: High 4:4:4 Predictive profile, H10I: High 10 2873 Intra profile, H42I: High 4:2:2 Intra profile, H44I: High 2874 4:4:4 Intra profile, C44I: CAVLC 4:4:4 Intra profile, SB: 2875 Scalable Baseline profile, SH: Scalable High profile, and 2876 SHI: Scalable High Intra profile. 2878 Profile profile_idc profile-iop 2879 (hexadecimal) (binary) 2881 CB 42 (B) x1xx0000 2882 same as: 4D (M) 1xxx0000 2883 same as: 58 (E) 11xx0000 2884 B 42 (B) x0xx0000 2885 same as: 58 (E) 10xx0000 2886 M 4D (M) 0x0x0000 2887 E 58 00xx0000 2888 H 64 00000000 2889 H10 6E 00000000 2890 H42 7A 00000000 2891 H44 F4 00000000 2892 H10I 6E 00010000 2893 H42I 7A 00010000 2894 H44I F4 00010000 2895 C44I 2C 00010000 2896 SB 53 x0000000 2897 SH 56 0x000000 2898 SHI 56 0x010000 2900 For example, in the table above, profile_idc equal to 58 2901 (Extended) with profile-iop equal to 11xx0000 indicates the 2902 same sub-profile corresponding to profile_idc equal to 42 2903 (Baseline) with profile-iop equal to x1xx0000. Note that 2904 other combinations of profile_idc and profile-iop (not listed 2905 in Table 13) may represent a sub-profile equivalent to the 2906 common subset of coding tools for more than one profile. Note 2907 also that a decoder conforming to a certain profile may be 2908 able to decode bitstreams conforming to other profiles. For 2909 example, a decoder conforming to the High 4:4:4 profile at 2910 certain level must be able to decode bitstreams confirming to 2911 the Constrained Baseline, Main, High, High 10 or High 4:2:2 2912 profile at the same or a lower level. 2914 If profile-level-id is used to indicate stream properties, it 2915 indicates that, to decode the stream, the minimum subset of 2916 coding tools a decoder has to support is the default sub- 2917 profile, and the lowest level the decoder has to support is 2918 the default level. 2920 If the profile-level-id parameter is used for capability 2921 exchange or session setup, it indicates the subset of coding 2922 tools, which is equal to the default sub-profile, that the 2923 codec supports for both receiving and sending. If max-recv- 2924 level is not present, the default level from profile-level-id 2925 indicates the highest level the codec wishes to support. If 2926 max-recv-level is present it indicates the highest level the 2927 codec supports for receiving. For either receiving or sending, 2928 all levels that are lower than the highest level supported 2929 MUST also be supported. 2931 Informative note: Capability exchange and session setup 2932 procedures should provide means to list the capabilities 2933 for each supported sub-profile separately. For example, 2934 the one-of-N codec selection procedure of the SDP 2935 Offer/Answer model can be used (Section 10.2 of [RFC3264]). 2936 The one-of-N codec selection procedure may also be used to 2937 provide different combinations of profile_idc and profile- 2938 iop that represent the same sub-profile. When there are 2939 many different combinations of profile_idc and profile-iop 2940 that represent the same sub-profile, using the one-of-N 2941 codec selection procedure may result into a fairly large 2942 SDP message. Therefore, a receiver should understand the 2943 different equivalent combinations of profile_idc and 2944 profile-iop that represent the same sub-profile, and be 2945 ready to accept an offer using any of the equivalent 2946 combinations. 2948 If no profile-level-id is present, the Baseline Profile 2949 without additional constraints at Level 1 MUST be implied. 2951 max-recv-level: 2952 This parameter MAY be used to indicate the highest level a 2953 receiver supports when the highest level is higher than the 2954 default level (the level indicated by profile-level-id). The 2955 value of max-recv-level is a base16 (hexadecimal) 2956 representation of the two bytes after the syntax element 2957 profile_idc in the sequence parameter set NAL unit specified 2958 in [H.264]: profile-iop (as defined above) and level_idc. If 2959 (the level_idc byte of max-recv-level is equal to 11 and bit 4 2960 of the profile-iop byte of max-recv-level is equal to 1) or 2961 (the level_idc byte of max-recv-level is equal to 9 and bit 4 2962 of the profile-iop byte of max-recv-level is equal to 0), the 2963 highest level the receiver supports is level 1b. Otherwise, 2964 the highest level the receiver supports is equal to the 2965 level_idc byte of max-recv-level divided by 10. 2967 max-recv-level MUST NOT be present if the highest level the 2968 receiver supports is not higher than the default level. 2970 max-recv-base-level: 2971 This parameter MAY be used to indicate the highest level a 2972 receiver supports for the base layer when negotiating an SVC 2973 stream. The value of max-recv-base-level is a base16 2974 (hexadecimal) representation of the two bytes after the syntax 2975 element profile_idc in the sequence parameter set NAL unit 2976 specified in [H.264]: profile-iop (as defined above) and 2977 level_idc. If (the level_idc byte of max-recv-level is equal 2978 to 11 and bit 4 of the profile-iop byte of max-recv-level is 2979 equal to 1) or (the level_idc byte of max-recv-level is equal 2980 to 9 and bit 4 of the profile-iop byte of max-recv-level is 2981 equal to 0), the highest level the receiver supports for the 2982 base layer is level 1b. Otherwise, the highest level the 2983 receiver supports for the base layer is equal to the level_idc 2984 byte of max-recv-level divided by 10. 2986 max-mbps, max-fs, max-cpb, max-dpb, and max-br: 2987 The common properties of these parameters are specified in [I- 2988 D.ietf-avt-rtp-rfc3984bis]. 2990 max-mbps: This parameter is as specified in [I-D.ietf-avt-rtp- 2991 rfc3984bis]. 2993 max-fs: This parameter is as specified in [I-D.ietf-avt-rtp- 2994 rfc3984bis]. 2996 max-cpb: The value of max-cpb is an integer indicating the 2997 maximum coded picture buffer size in units of 1000 bits for 2998 the VCL HRD parameters (see A.3.1 item i or G.10.2.2 item g of 2999 [H.264]) and in units of 1200 bits for the NAL HRD parameters 3000 (see A.3.1 item j or G.10.2.2 item h of [H.264]). The max-cpb 3001 parameter signals that the receiver has more memory than the 3002 minimum amount of coded picture buffer memory required by the 3003 signaled highest level conveyed in the value of the profile- 3004 level-id parameter or the max-recv-level parameter. When max- 3005 cpb is signaled, the receiver MUST be able to decode NAL unit 3006 streams that conform to the signaled highest level, with the 3007 exception that the MaxCPB value in Table A-1 of [H.264] for 3008 the signaled highest level is replaced with the value of max- 3009 cpb. The value of max-cpb MUST be greater than or equal to 3010 the value of MaxCPB given in Table A-1 of [H.264] for the 3011 highest level. Senders MAY use this knowledge to construct 3012 coded video streams with greater variation of bit rate than 3013 can be achieved with the MaxCPB value in Table A-1 of [H.264]. 3015 Informative note: The coded picture buffer is used in the 3016 Hypothetical Reference Decoder (HRD, Annex C) of [H.264]. 3017 The use of the HRD is recommended in SVC encoders to verify 3018 that the produced bitstream conforms to the standard and to 3019 control the output bit rate. Thus, the coded picture 3020 buffer is conceptually independent of any other potential 3021 buffers in the receiver, including de-interleaving, re- 3022 multiplexing and de-jitter buffers. The coded picture 3023 buffer need not be implemented in decoders as specified in 3024 Annex C of [H.264]; standard-compliant decoders can have 3025 any buffering arrangements provided that they can decode 3026 standard-compliant bitstreams. Thus, in practice, the 3027 input buffer for video decoder can be integrated with the 3028 de-interleaving, re-multiplexing and de-jitter buffers of 3029 the receiver. 3031 max-dpb: This parameter is as specified in [I-D.ietf-avt-rtp- 3032 rfc3984bis]. 3034 max-br: The value of max-br is an integer indicating the maximum 3035 video bit rate in units of 1000 bits per second for the VCL 3036 HRD parameters (see A.3.1 item i or G.10.2.2 item g of [H.264]) 3037 and in units of 1200 bits per second for the NAL HRD 3038 parameters (see A.3.1 item j or G.10.2.2 item h of [H.264]). 3040 The max-br parameter signals that the video decoder of the 3041 receiver is capable of decoding video at a higher bit rate 3042 than is required by the signaled highest level conveyed in the 3043 value of the profile-level-id parameter or the max-recv-level 3044 parameter. 3046 When max-br is signaled, the video codec of the receiver MUST 3047 be able to decode NAL unit streams that conform to the 3048 signaled highest level, with the following exceptions in the 3049 limits specified by the highest level: 3051 o The value of max-br replaces the MaxBR value in Table A-1 3052 of [H.264] for the highest level. 3054 o When the max-cpb parameter is not present, the result of 3055 the following formula replaces the value of MaxCPB in Table 3056 A-1 of [H.264]: (MaxCPB of the signaled level) * max-br / 3057 (MaxBR of the signaled highest level). 3059 For example, if a receiver signals capability for Level 1.2 3060 with max-br equal to 1550, this indicates a maximum video 3061 bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum 3062 video bitrate of 1860 kbits/sec for NAL HRD parameters, and a 3063 CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000). 3065 The value of max-br MUST be greater than or equal to the value 3066 MaxBR given in Table A-1 of [H.264] for the signaled highest 3067 level. 3069 Senders MAY use this knowledge to send higher bitrate video as 3070 allowed in the level definition of SVC, to achieve improved 3071 video quality. 3073 Informative note: This parameter was added primarily to 3074 complement a similar codepoint in the ITU-T Recommendation 3075 H.245, so as to facilitate signaling gateway designs. No 3076 assumption can be made from the value of this parameter 3077 that the network is capable of handling such bit rates at 3078 any given time. In particular, no conclusion can be drawn 3079 that the signaled bit rate is possible under congestion 3080 control constraints. 3082 redundant-pic-cap: 3083 This parameter is as specified in [I-D.ietf-avt-rtp- 3084 rfc3984bis]. 3086 sprop-parameter-sets: 3087 This parameter MAY be used to convey any sequence parameter 3088 set, subset sequence parameter set and picture parameter set 3089 NAL units (herein referred to as the initial parameter set NAL 3090 units) that can be placed in the NAL unit stream to precede 3091 any other NAL units in decoding order and that are associated 3092 with the default level of profile-level-id. The parameter 3093 MUST NOT be used to indicate codec capability in any 3094 capability exchange procedure. The value of the parameter is 3095 a comma (',') separated list of base64 [RFC4648] 3096 representations of the parameter set NAL units as specified in 3097 Sections 7.3.2.1, 7.3.2.2 and G.7.3.2.1 of [H.264]. Note that 3098 the number of bytes in a parameter set NAL unit is typically 3099 less than 10, but a picture parameter set NAL unit can contain 3100 several hundreds of bytes. 3102 Informative note: When several payload types are offered in 3103 the SDP Offer/Answer model, each with its own sprop- 3104 parameter-sets parameter, then the receiver cannot assume 3105 that those parameter sets do not use conflicting storage 3106 locations (i.e., identical values of parameter set 3107 identifiers). Therefore, a receiver should buffer all 3108 sprop-parameter-sets and make them available to the decoder 3109 instance that decodes a certain payload type. 3111 sprop-level-parameter-sets: 3112 This parameter MAY be used to convey any sequence, subset 3113 sequence and picture parameter set NAL units (herein referred 3114 to as the initial parameter set NAL units) that can be placed 3115 in the NAL unit stream to precede any other NAL units in 3116 decoding order and that are associated with one or more levels 3117 different than the default level of profile-level-id. The 3118 parameter MUST NOT be used to indicate codec capability in any 3119 capability exchange procedure. 3121 The sprop-level-parameter-sets parameter contains parameter 3122 sets for one or more levels which are different than the 3123 default level. All parameter sets targeted for use when one 3124 level of the default sub-profile is accepted by a receiver are 3125 clustered and prefixed with a three-byte field which has the 3126 same syntax as profile-level-id. This enables the receiver to 3127 install the parameter sets for the accepted level and discard 3128 the rest. The three-byte field is named PLId, and all 3129 parameter sets associated with one level are named PSL, which 3130 has the same syntax as sprop-parameter-sets. Parameter sets 3131 for each level are represented in the form of PLId:PSL, i.e., 3132 PLId followed by a colon (':') and the base64 [RFC4648] 3133 representation of the initial parameter set NAL units for the 3134 level. Each pair of PLId:PSL is also separated by a colon. 3135 Note that a PSL can contain multiple parameter sets for that 3136 level, separated with commas (','). 3138 The subset of coding tools indicated by each PLId field MUST 3139 be equal to the default sub-profile, and the level indicated 3140 by each PLId field MUST be different than the default level. 3142 Informative note: This parameter allows for efficient level 3143 downgrade or upgrade in SDP Offer/Answer and out-of-band 3144 transport of parameter sets, simultaneously. 3146 in-band-parameter-sets: 3147 This parameter MAY be used to indicate a receiver capability. 3148 The value MAY be equal to either 0 or 1. The value 1 3149 indicates that the receiver discards out-of-band parameter 3150 sets in sprop-parameter-sets and sprop-level-parameter-sets, 3151 therefore the sender MUST transmit all parameter sets in-band. 3152 The value 0 indicates that the receiver utilizes out-of-band 3153 parameter sets included in sprop-parameter-sets and/or sprop- 3154 level-parameter-sets. However, in this case, the sender MAY 3155 still choose to send parameter sets in-band. When the 3156 parameter is not present, this receiver capability is not 3157 specified, and therefore the sender MAY send out-of-band 3158 parameter sets only, or it MAY send in-band-parameter-sets 3159 only, or it MAY send both. 3161 packetization-mode: 3162 This parameter is as specified in [I-D.ietf-avt-rtp- 3163 rfc3984bis]. When the mst-mode parameter is present, the 3164 value of this parameter is additionally constrained as follows. 3165 If mst-mode is equal to "NI-T", "NI-C" or "NI-TC", 3166 packetization-mode MUST NOT be equal to 2. Otherwise (mst- 3167 mode is equal to "I-C"), packetization-mode MUST be equal to 2. 3169 sprop-interleaving-depth: 3170 This parameter is as specified in [I-D.ietf-avt-rtp- 3171 rfc3984bis]. 3173 sprop-deint-buf-req: 3174 This parameter is as specified in [I-D.ietf-avt-rtp- 3175 rfc3984bis]. 3177 deint-buf-cap: 3178 This parameter is as specified in [I-D.ietf-avt-rtp- 3179 rfc3984bis]. 3181 sprop-init-buf-time: 3182 This parameter is as specified in [I-D.ietf-avt-rtp- 3183 rfc3984bis]. 3185 sprop-max-don-diff: 3186 This parameter is as specified in [I-D.ietf-avt-rtp- 3187 rfc3984bis]. 3189 max-rcmd-nalu-size: 3190 This parameter is as specified in [I-D.ietf-avt-rtp- 3191 rfc3984bis]. 3193 mst-mode: 3194 This parameter MAY be used to signal the properties of a NAL 3195 unit stream or the capabilities of a receiver implementation. 3196 If this parameter is present, multi-session transmission MUST 3197 be used. Otherwise (this parameter is not present), single- 3198 session transmission MUST be used. When this parameter is 3199 present, the following applies. When the value of mst-mode is 3200 equal to "NI-T", the NI-T mode MUST be used. When the value 3201 of mst-mode is equal to "NI-C", the NI-C mode MUST be used. 3202 When the value of mst-mode is equal to "NI-TC", the NI-TC mode 3203 MUST be used. When the value of mst-mode is equal to "I-C", 3204 the I-C mode MUST be used. The value of mst-mode MUST have 3205 one of the following tokens: "NI-T", "NI-C", "NI-TC", or "I-C". 3207 All RTP sessions in an MST MUST have the same value of mst- 3208 mode. 3210 sprop-mst-csdon-always-present: 3211 This parameter MUST NOT be present when mst-mode is not 3212 present or the value of mst-mode is equal to "NI-T" or "I-C". 3213 This parameter signals the properties of the NAL unit stream. 3214 When sprop-mst-csdon-always-present is present and the value 3215 is equal to 1, packetization-mode MUST be equal to 1, and all 3216 the RTP packets carrying the NAL unit stream MUST be STAP-A 3217 packets containing a PACSI NAL unit that further contains the 3218 DONC field or NI-MTAP packets with the J field equal to 1. 3219 When sprop-mst-csdon-always-present is present and the value 3220 is equal to 1, the CS-DON value of any particular NAL unit can 3221 be derived solely according to information in the packet 3222 containing the NAL unit. 3224 When sprop-mst-csdon-always-present is present in the current 3225 RTP session, it MUST be present also in all the RTP sessions 3226 the current RTP session depends on and the value of sprop-mst- 3227 csdon-always-present is identical for the current RTP session 3228 and all the RTP sessions the current RTP session depends on. 3230 sprop-mst-remux-buf-size: 3231 This parameter MUST NOT be present when mst-mode is not 3232 present or the value of mst-mode is equal to "NI-T". This 3233 parameter MUST be present when mst-mode is present and the 3234 value of mst-mode is equal to "NI-C", "NI-TC", or "I-C". 3236 This parameter signals the properties of the NAL unit stream. 3237 It MUST be set to a value one less than the minimum re- 3238 multiplexing buffer size (in NAL units), so that it is 3239 guaranteed that receivers can reconstruct NAL unit decoding 3240 order as specified in Subsection 6.2.2. 3242 The value of sprop-mst-remux-buf-size MUST be an integer in 3243 the range of 0 to 32767, inclusive. 3245 sprop-remux-buf-req: 3246 This parameter MUST NOT be present when mst-mode is not 3247 present or the value of mst-mode is equal to "NI-T". It MUST 3248 be present when mst-mode is present and the value of mst-mode 3249 is equal to "NI-C", "NI-TC", or "I-C". 3251 sprop-remux-buf-req signals the required size of the re- 3252 multiplexing buffer for the NAL unit stream. It is guaranteed 3253 that receivers can recover the decoding order of the received 3254 NAL units from the current RTP session and the RTP sessions 3255 the current RTP session depends on as specified in Section 3256 6.2.2, when the re-multiplexing buffer size is of at least the 3257 value of sprop-remux-buf-req in units of bytes. 3259 The value of sprop-remux-buf-req MUST be an integer in the 3260 range of 0 to 4294967295, inclusive. 3262 remux-buf-cap: 3263 This parameter MUST NOT be present when mst-mode is not 3264 present or the value of mst-mode is equal to "NI-T". This 3265 parameter MAY be used to signal the capabilities of a receiver 3266 implementation and indicates the amount of re-multiplexing 3267 buffer space in units of bytes that the receiver has available 3268 for recovering the NAL unit decoding order as specified in 3269 Section 6.2.2. A receiver is able to handle any NAL unit 3270 stream for which the value of the sprop-remux-buf-req 3271 parameter is smaller than or equal to this parameter. 3273 If the parameter is not present, then a value of 0 MUST be 3274 used for remux-buf-cap. The value of remux-buf-cap MUST be an 3275 integer in the range of 0 to 4294967295, inclusive. 3277 sprop-remux-init-buf-time: 3278 This parameter MAY be used to signal the properties of the NAL 3279 unit stream. The parameter MUST NOT be present if mst-mode is 3280 not present or the value of mst-mode is equal to "NI-T". 3282 The parameter signals the initial buffering time that a 3283 receiver MUST wait before starting to recover the NAL unit 3284 decoding order as specified in Section 6.2.2 of this memo. 3286 The parameter is coded as a non-negative base10 integer 3287 representation in clock ticks of a 90-kHz clock. If the 3288 parameter is not present, then no initial buffering time value 3289 is defined. Otherwise the value of sprop-remux-init-buf-time 3290 MUST be an integer in the range of 0 to 4294967295, inclusive. 3292 sprop-mst-max-don-diff: 3293 This parameter MAY be used to signal the properties of the NAL 3294 unit stream. It MUST NOT be used to signal transmitter or 3295 receiver or codec capabilities. The parameter MUST NOT be 3296 present if mst-mode is not present or the value of mst-mode is 3297 equal to "NI-T". sprop-mst-max-don-diff is an integer in the 3298 range of 0 to 32767, inclusive. If sprop-mst-max-don-diff is 3299 not present, the value of the parameter is unspecified. 3300 sprop-mst-max-don-diff is calculated same as sprop-max-don- 3301 diff as specified in [I-D.ietf-avt-rtp-rfc3984bis], with 3302 decoding order number being replaced by cross-session decoding 3303 order number. 3305 sprop-scalability-info: 3306 This parameter MAY be used to convey the NAL unit containing 3307 the scalability information SEI message as specified in Annex 3308 G of [H.264]. This parameter MAY be used to signal the 3309 contained layers of an SVC bitstream. The parameter MUST NOT 3310 be used to indicate codec capability in any capability 3311 exchange procedure. The value of the parameter is the base64 3312 [RFC4648] representation of the NAL unit containing the 3313 scalability information SEI message. If present, the NAL unit 3314 MUST contain only one SEI message which is a scalability 3315 information SEI message. 3317 This parameter MAY be used in an offering or declarative SDP 3318 message to indicate what layers (operation points) can be 3319 provided. A receiver MAY indicate its choice of one layer 3320 using the optional media type parameter scalable-layer-id. 3322 scalable-layer-id: 3323 This parameter MAY be used to signal a receiver's choice of 3324 the offers or declared operation points or layers using sprop- 3325 scalability-info or sprop-operation-point-info. The value of 3326 scalable-layer-id is a base16 representation of the 3327 layer_id[ i ] syntax element in the scalability information 3328 SEI message as specified in Annex G of [H.264] or layer-ID 3329 contained in sprop-operation-point-info. 3331 sprop-operation-point-info: 3332 This parameter MAY be used to describe the operation points of 3333 an RTP session. The value of this parameter consists of a 3334 comma-separated list of operation-point-description vectors. 3335 The values given by the operation-point-description vectors 3336 are the same as, or are derived from, the values that would be 3337 given for a scalable layer in the scalability information SEI 3338 message as specified in Annex G of [H.264], where the term 3339 scalable layer in the scalability information SEI message 3340 refers to all NAL units associated with the same values of 3341 temporal_id, dependency_id and quality_id. In this memo such 3342 a set of NAL units is called an operation point. 3344 Each operation-point-description vector has ten elements, 3345 provided as a comma-separated list of values as defined below. 3346 The first value of the operation-point-description vector is 3347 preceded by a '<' and the last value of the operation-point- 3348 description vector is followed by a '>'. If the sprop- 3349 operation-point-info is followed by exactly one operation- 3350 point-description vector, this describes the highest operation 3351 point contained in the RTP session. If there are two or more 3352 operation-point-description vectors, the first describes the 3353 lowest and the last describes the highest operation point 3354 contained in the RTP session. 3356 The values given by the operation-point-description vector are 3357 as follows, in the order listed: 3359 - layer-ID: This value specifies the layer identifier of the 3360 operation point, which is identical to the layer_id that would 3361 be indicated (for the same values of dependency_id, quality_id, 3362 and temporal_id) in the scalability information SEI message. 3363 This field MAY be empty, indicating that the value is 3364 unspecified. When there are multiple operation-point- 3365 description vectors with layer-ID, the values of layer-ID do 3366 not need to be consecutive. 3368 - temporal-ID: This value specifies the temporal_id of the 3369 operation point. This field MUST NOT be empty. 3371 - dependency-ID: This values specifies the dependency_id of 3372 the operation point. This field MUST NOT be empty. 3374 - quality-ID: This values specifies the quality_id of the 3375 operation point. This field MUST NOT be empty. 3377 - profile-level-ID: This value specifies the profile-level-idc 3378 of the operation point in the base16 format. The default sub- 3379 profile or default level indicated by the parameter profile- 3380 level-ID in the sprop-operation-point-info vector SHALL be 3381 equal to or lower than the default sub-profile or default 3382 level indicated by profile-level-id, which may be either 3383 present or the default value is taken. This field MAY be 3384 empty, indicating that the value is unspecified. 3386 - avg-framerate: This value specifies the average frame rate 3387 of the operation point. This value is given as an integer in 3388 frames per 256 seconds. The field MAY be empty, indicating 3389 that the value is unspecified. 3391 - width: This value specifies the width dimension in pixels of 3392 decoded frames for the operation point. This parameter is not 3393 directly given in the scalability information SEI message. 3394 This field MAY be empty, indicating that the value is 3395 unspecified. 3397 - height: This value gives the height dimension in pixels of 3398 decoded frames for the operation point. This parameter is not 3399 directly given in the scalability information SEI. This field 3400 MAY be empty, indicating that the value is unspecified. 3402 - avg-bitrate: This value specifies the average bit rate of 3403 the operation point. This parameter is given as an integer in 3404 kbits per second over the entire stream. Note that this 3405 parameter is provided in the scalability information SEI 3406 message in bits per second and calculated over a variable time 3407 window. This field MAY be empty, indicating that the value is 3408 unspecified. 3410 - max-bitrate: This value specifies the maximum bit rate of 3411 the operation point. This parameter is given as an integer in 3412 kbits per second and describes the maximum bitrate per each 3413 one second window. Note that this parameter is provided in 3414 the scalability information SEI message in bits per second and 3415 is calculated over a variable time window. This field MAY be 3416 empty, indicating that the value is unspecified. 3418 Similarly to sprop-scalability-info, this parameter MAY be 3419 used in an offering or declarative SDP message to indicate 3420 what layers (operation points) can be provided. A receiver 3421 MAY indicate its choice of the highest layer it wants to send 3422 and/or receive using the optional media type parameter 3423 scalable-layer-id. 3425 sprop-no-NAL-reordering-required: 3426 This parameter MAY be used to signal the properties of the NAL 3427 unit stream. This parameter MUST NOT be present when mst-mode 3428 is not present or the value of mst-mode is not equal to "NI-T". 3429 The presence of this parameters indicates that no reordering 3430 of non-VCL or VCL NAL units is required for the decoding order 3431 recovery process. 3433 sprop-avc-ready: 3434 This parameter MAY be used to indicate the properties of the 3435 NAL unit stream. The presence of this parameter indicates 3436 that the RTP session, if used in SST, or used in MST combined 3437 with other RTP sessions also with this parameter present, can 3438 be processed by a [I-D.ietf-avt-rtp-rfc3984bis] receiver. 3439 This parameter MAY be used with RTP sessions with media 3440 subtype H264-SVC. 3442 Encoding considerations: 3443 This media type is framed and binary; see Section 4.8 of RFC 3444 4288 [RFC4288]. 3446 Security considerations: 3447 See Section 8 of RFC XXXX. 3449 Published specification: 3450 Please refer to Section 13 of RFC XXXX. 3452 Additional information: 3453 None 3455 File extensions: none 3457 Macintosh file type code: none 3459 Object identifier or OID: none 3461 Person & email address to contact for further information: 3463 Ye-Kui Wang, yekui.wang@huawei.com 3465 Intended usage: COMMON 3467 Restrictions on usage: 3468 This media type depends on RTP framing, and hence is only 3469 defined for transfer via RTP [RFC3550]. Transport within 3470 other framing protocols is not defined at this time. 3472 Interoperability considerations: 3473 The media subtype name contains "SVC" to avoid potential 3474 conflict with RFC 3984 and its potential future replacement 3475 RTP payload format for H.264 non-SVC profiles. 3477 Applications that use this media type: 3478 Real-time video applications like video streaming, video 3479 telephony, and video conferencing. 3481 Author: 3483 Ye-Kui Wang, yekui.wang@huawei.com 3485 Change controller: 3486 IETF Audio/Video Transport working group delegated from the 3487 IESG. 3489 7.2 SDP Parameters 3491 7.2.1 Mapping of Payload Type Parameters to SDP 3493 The media type video/H264-SVC string is mapped to fields in the 3494 Session Description Protocol (SDP) as follows: 3496 o The media name in the "m=" line of SDP MUST be video. 3498 o The encoding name in the "a=rtpmap" line of SDP MUST be H264-SVC 3499 (the media subtype). 3501 o The clock rate in the "a=rtpmap" line MUST be 90000. 3503 o The OPTIONAL parameters "profile-level-id", "max-recv-level", 3504 "max-recv-base-level", "max-mbps", "max-fs", "max-cpb", "max-dpb", 3505 "max-br", "redundant-pic-cap", "in-band-parameter-sets", 3506 "packetization-mode", "sprop-interleaving-depth", "deint-buf-cap", 3507 "sprop-deint-buf-req", "sprop-init-buf-time", "sprop-max-don- 3508 diff", "max-rcmd-nalu-size", "mst-mode", "sprop-mst-csdon-always- 3509 present", "sprop-mst-remux-buf-size", "sprop-remux-buf-req", 3510 "remux-buf-cap", "sprop-remux-init-buf-time", "sprop-mst-max-don- 3511 diff", and "scalable-layer-id", when present, MUST be included 3512 in the "a=fmtp" line of SDP. These parameters are expressed as a 3513 media type string, in the form of a semicolon separated list of 3514 parameter=value pairs. 3516 o The OPTIONAL parameters "sprop-parameter-sets", "sprop-level- 3517 parameter-sets", "sprop-scalability-info", "sprop-operation- 3518 point-info", "sprop-no-NAL-reordering-required", and "sprop-avc- 3519 ready", when present, MUST be included in the "a=fmtp" line of 3520 SDP or conveyed using the "fmtp" source attribute as specified in 3521 Section 6.3 of [RFC5576]. For a particular media format (i.e., 3522 RTP payload type), a "sprop-parameter-sets" or "sprop-level- 3523 parameter-sets" MUST NOT be both included in the "a=fmtp" line of 3524 SDP and conveyed using the "fmtp" source attribute. When 3525 included in the "a=fmtp" line of SDP, these parameters are 3526 expressed as a media type string, in the form of a semicolon 3527 separated list of parameter=value pairs. When conveyed using the 3528 "fmtp" source attribute, these parameters are only associated 3529 with the given source and payload type as parts of the "fmtp" 3530 source attribute. 3532 Informative note: Conveyance of "sprop-parameter-sets" and 3533 "sprop-level-parameter-sets" using the "fmtp" source attribute 3534 allows for out-of-band transport of parameter sets in 3535 topologies like Topo-Video-switch-MCU [RFC5117]. 3537 7.2.2 Usage with the SDP Offer/Answer Model 3539 When an SVC stream (with media subtype H264-SVC) is offered over RTP 3540 using SDP in an Offer/Answer model [RFC3264] for negotiation for 3541 unicast usage, the following limitations and rules apply: 3543 o The parameters identifying a media format configuration for SVC 3544 are "profile-level-id", "packetization-mode", and "mst-mode". 3545 These media configuration parameters (except for the level part 3546 of "profile-level-id") MUST be used symmetrically when the 3547 answerer does not include "scalable-layer-id" in the answer; i.e., 3548 the answerer MUST either maintain all configuration parameters or 3549 remove the media format (payload type) completely, if one or more 3550 of the parameter values are not supported. Note that the level 3551 part of "profile-level-id" includes level_idc, and, for 3552 indication of level 1b when profile_idc is equal to 66, 77 or 88, 3553 bit 4 (constraint_set3_flag) of profile-iop. The level part of 3554 "profile-level-id" is changeable. 3556 Informative note: The requirement for symmetric use does not 3557 apply for the level part of "profile-level-id", and does not 3558 apply for the other stream properties and capability 3559 parameters. 3561 Informative note: In [H.264], all the levels except for level 3562 1b are equal to the value of level_idc divided by 10. Level 3563 1b is a level higher than level 1.0 but lower than level 1.1, 3564 and is signaled in an ad-hoc manner. For the Baseline, Main 3565 and Extended profiles (with profile_idc equal to 66, 77 and 88, 3566 respectively), level 1b is indicated by level_idc equal to 11 3567 (i.e. same as level 1.1) and constraint_set3_flag equal to 1. 3568 For other profiles, level 1b is indicated by level_idc equal 3569 to 9 (but note that level 1b for these profiles are still 3570 higher than level 1, which has level_idc equal to 10, and 3571 lower than level 1.1). In SDP Offer/Answer, an answer may 3572 indicate a level equal to or lower than the level indicated in 3573 the offer. Due to the ad-hoc indication of level 1b, offerers 3574 and answerers must check the value of bit 4 3575 (constraint_set3_flag) of the middle octet of the parameter 3576 "profile-level-id", when profile_idc is equal to 66, 77 or 88 3577 and level_idc is equal to 11. 3579 To simplify handling and matching of these configurations, the 3580 same RTP payload type number used in the offer should also be 3581 used in the answer, as specified in [RFC3264]. The same RTP 3582 payload type number used in the offer MUST also be used in the 3583 answer when the answer includes "scalable-layer-id". When the 3584 answer does not include "scalable-layer-id", the answer MUST NOT 3585 contain a payload type number used in the offer unless the 3586 configuration is exactly the same as in the offer or the 3587 configuration in the answer only differs from that in the offer 3588 with a level lower than the default level offered. 3590 Informative note: When an offerer receives an answer that does 3591 not include "scalable-layer-id" it has to compare payload 3592 types not declared in the offer based on the media type (i.e., 3593 video/H264-SVC) and the above media configuration parameters 3594 with any payload types it has already declared. This will 3595 enable it to determine whether the configuration in question 3596 is new or if it is equivalent to configuration already offered, 3597 since a different payload type number may be used in the 3598 answer. 3600 Since an SVC stream may contain multiple operation points, a 3601 facility is provided so that an answerer can select a different 3602 operation point than the entire SVC stream. Specifically, 3603 different operation points MAY be described using the "sprop- 3604 scalability-info" or "sprop-operation-point-info" parameters. 3605 The first one carries the entire scalability information SEI 3606 message defined in Annex G of [H.264], whereas the second one may 3607 be derived, e.g. as a subset of this SEI message that only 3608 contains key information about an operation point. Operation 3609 points, in both cases, are associated with a layer identifier. 3611 If such information ("sprop-operation-point-info" or "sprop- 3612 scalability-info") is provided in an offer, an answerer MAY 3613 select from the various operation points offered in the "sprop- 3614 scalability-information" or "sprop-operation-point-info" 3615 parameters by including "scalable-layer-id" in the answer. By 3616 this, the answerer indicates its selection of a particular 3617 operation point in the received and/or in the sent stream. When 3618 such operation point selection takes place, i.e., the answerer 3619 includes "scalable-layer-id" in the answer, the media 3620 configuration parameters MUST NOT be present in the answer. 3621 Rather, the media configuration that the answerer will use for 3622 receiving and/or sending is the one used for the selected 3623 operation point as indicated in the offer. 3625 Informative note: The ability to perform operation point 3626 selection enables a receiver to utilize the scalable nature of 3627 an SVC stream. 3629 o The parameter "max-recv-level", when present, declares the 3630 highest level supported for receiving. In case "max-recv-level" 3631 is not present, the highest level supported for receiving is 3632 equal to the default level indicated by the level part of 3633 "profile-level-id". "max-recv-level", when present, MUST be 3634 higher than the default level. 3636 o The parameter "max-recv-base-level", when present, declares the 3637 highest level of the base layer supported for receiving. When 3638 "max-recv-base-level" is not present, the highest level supported 3639 for the base layer is not constrained separately from the SVC 3640 stream containing the base layer. The endpoint at the other side 3641 MUST NOT send a scalable stream for which the base layer is of a 3642 level higher than max-recv-base-level. Parameters declaring 3643 receiver capabilities above the default level (max-mbps, max- 3644 smbps, max-fs, max-cpb, max-dpb, max-br, and max-recv-level) do 3645 not apply to the base layer when max-recv-base-level is present. 3647 o The parameters "sprop-deint-buf-req", "sprop-interleaving-depth", 3648 "sprop-max-don-diff", "sprop-init-buf-time", "sprop-mst-csdon- 3649 always-present", "sprop-remux-buf-req", "sprop-mst-remux-buf- 3650 size", "sprop-remux-init-buf-time", "sprop-mst-max-don-diff", 3651 "sprop-scalability-information", "sprop-operation-point-info", 3652 "sprop-no-NAL-reordering-required", and "sprop-avc-ready" 3653 describe the properties of the NAL unit stream that the offerer 3654 or answerer is sending for the media format configuration. This 3655 differs from the normal usage of the Offer/Answer parameters: 3656 normally such parameters declare the properties of the stream 3657 that the offerer or the answerer is able to receive. When 3658 dealing with SVC, the offerer assumes that the answerer will be 3659 able to receive media encoded using the configuration being 3660 offered. 3662 Informative note: The above parameters apply for any stream 3663 sent by the declaring entity with the same configuration; i.e., 3664 they are dependent on their source. Rather than being bound 3665 to the payload type, the values may have to be applied to 3666 another payload type when being sent, as they apply for the 3667 configuration. 3669 o The capability parameters "max-mbps", "max-fs", "max-cpb", "max- 3670 dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size" MAY be 3671 used to declare further capabilities of the offerer or answerer 3672 for receiving. These parameters MUST NOT be present when the 3673 direction attribute is sendonly, and the parameters describe the 3674 limitations of what the offerer or answerer accepts for receiving 3675 streams. 3677 o When "mst-mode" is not present and "packetization-mode" is equal 3678 to 2, the following applies. 3680 o An offerer has to include the size of the de-interleaving 3681 buffer, "sprop-deint-buf-req", in the offer. To enable the 3682 offerer and answerer to inform each other about their 3683 capabilities for de-interleaving buffering, both parties are 3684 RECOMMENDED to include "deint-buf-cap". It is also 3685 RECOMMENDED to consider offering multiple payload types with 3686 different buffering requirements when the capabilities of the 3687 receiver are unknown. 3689 o When "mst-mode" is present and equal to "NI-C", "NI-TC" or "I-C", 3690 the following applies. 3692 o An offerer has to include "sprop-remux-buf-req" in the offer. 3693 To enable the offerer and answerer to inform each other about 3694 their capabilities for re-multiplexing buffering, both 3695 parties are RECOMMENDED to include "remux-buf-cap". It is 3696 also RECOMMENDED to consider offering multiple payload types 3697 with different buffering requirements when the capabilities 3698 of the receiver are unknown. 3700 o The "sprop-parameter-sets" or "sprop-level-parameter-sets" 3701 parameter, when present (included in the "a=fmtp" line of SDP or 3702 conveyed using the "fmtp" source attribute as specified in 3703 Section 6.3 of [RFC5576]), is used for out-of-band transport of 3704 parameter sets. However, when out-of-band transport of parameter 3705 sets is used, parameter sets MAY still be additionally 3706 transported in-band. 3708 The answerer MAY use either out-of-band or in-band transport of 3709 parameter sets for the stream it is sending, regardless of 3710 whether out-of-band parameter sets transport has been used in the 3711 offerer-to-answerer direction. Parameter sets included in an 3712 answer are independent of those parameter sets included in the 3713 offer, as they are used for decoding two different video streams, 3714 one from the answerer to the offerer, and the other in the 3715 opposite direction. 3717 The following rules apply to transport of parameter sets in the 3718 offerer-to-answerer direction. 3720 o An offer MAY include either or both of "sprop-parameter- 3721 sets" and "sprop-level-parameter-sets". If neither "sprop- 3722 parameter-sets" nor "sprop-level-parameter-sets" is present 3723 in the offer, then only in-band transport of parameter sets 3724 is used. 3726 o If the answer includes "in-band-parameter-sets" equal to 1, 3727 then the offerer MUST transmit parameter sets in-band. 3728 Otherwise, the following applies. 3730 o If the level to use in the offerer-to-answerer 3731 direction is equal to the default level in the offer, 3732 the following applies. 3734 The answerer MUST be prepared to use the parameter 3735 sets included in "sprop-parameter-sets", when 3736 present, for decoding the incoming NAL unit stream, 3737 and ignore "sprop-level-parameter-sets", when 3738 present. 3740 When "sprop-parameter-sets" is not present in the 3741 offer, in-band transport of parameter sets MUST be 3742 used. 3744 o Otherwise (the level to use in the offerer-to-answerer 3745 direction is not equal to the default level in the 3746 offer), the following applies. 3748 The answerer MUST be prepared to use the parameter 3749 sets that are included in "sprop-level-parameter- 3750 sets" for the accepted level (i.e., the default 3751 level in the answer, which is also the level to 3752 use in the offerer-to-answerer direction), when 3753 present, for decoding the incoming NAL unit stream, 3754 and ignore all other parameter sets included in 3755 "sprop-level-parameter-sets" and "sprop-parameter- 3756 sets", when present. 3758 When no parameter sets for the accepted level are 3759 present in the "sprop-level-parameter-sets", in- 3760 band transport of parameter sets MUST be used. 3762 The following rules apply to transport of parameter sets in the 3763 answerer-to-offerer direction. 3765 o An answer MAY include either "sprop-parameter-sets" or 3766 "sprop-level-parameter-sets", but MUST NOT include both of 3767 the two. If neither "sprop-parameter-sets" nor "sprop- 3768 level-parameter-sets" is present in the answer, then only 3769 in-band transport of parameter sets is used. 3771 o If the offer includes "in-band-parameter-sets" equal to 1, 3772 then the answerer MUST NOT include "sprop-parameter-sets" or 3773 "sprop-level-parameter-sets" in the answer and MUST transmit 3774 parameter sets in-band. Otherwise, the following applies. 3776 o If the level to use in the answerer-to-offerer 3777 direction is equal to the default level in the answer, 3778 the following applies. 3780 The offerer MUST be prepared to use the parameter 3781 sets included in "sprop-parameter-sets", when 3782 present, for decoding the incoming NAL unit stream, 3783 and ignore "sprop-level-parameter-sets", when 3784 present. 3786 When "sprop-parameter-sets" is not present in the 3787 answer, the answerer MUST transmit parameter sets 3788 in-band. 3790 o Otherwise (the level to use in the answerer-to-offerer 3791 direction is not equal to the default level in the 3792 answer), the following applies. 3794 The offerer MUST be prepared to use the parameter 3795 sets that are included in "sprop-level-parameter- 3796 sets" for the level to use in the answerer-to- 3797 offerer direction, when present in the answer, for 3798 decoding the incoming NAL unit stream, and ignore 3799 all other parameter sets included in "sprop-level- 3800 parameter-sets" and "sprop-parameter-sets", when 3801 present in the answer. 3803 When no parameter sets for the level to use in the 3804 answerer-to-offerer direction are present in 3805 "sprop-level-parameter-sets" in the answer, the 3806 answerer MUST transmit parameter sets in-band. 3808 When "sprop-parameter-sets" or "sprop-level-parameter-sets" is 3809 conveyed using the "fmtp" source attribute as specified in 3810 Section 6.3 of [RFC5576], the receiver of the parameters MUST 3811 store the parameter sets included in the "sprop-parameter-sets" 3812 or "sprop-level-parameter-sets" for the accepted level and 3813 associate them to the source given as a part of the "fmtp" source 3814 attribute. Parameter sets associated with one source MUST only 3815 be used to decode NAL units conveyed in RTP packets from the same 3816 source. When this mechanism is in use, SSRC collision detection 3817 and resolution MUST be performed as specified in [RFC5576]. 3819 Informative note: Conveyance of "sprop-parameter-sets" and 3820 "sprop-level-parameter-sets" using the "fmtp" source attribute 3821 may be used in topologies like Topo-Video-switch-MCU [RFC5117] 3822 to enable out-of-band transport of parameter sets. 3824 For streams being delivered over multicast, the following rules 3825 apply: 3827 o The media format configuration is identified by "profile-level- 3828 id", including the level part, "packetization-mode", and "mst- 3829 mode". These media format configuration parameters (including 3830 the level part of "profile-level-id") MUST be used symmetrically; 3831 i.e., the answerer MUST either maintain all configuration 3832 parameters or remove the media format (payload type) completely. 3833 Note that this implies that the level part of "profile-level-id" 3834 for Offer/Answer in multicast is not changeable. 3836 To simplify handling and matching of these configurations, the 3837 same RTP payload type number used in the offer should also be 3838 used in the answer, as specified in [RFC3264]. An answer MUST 3839 NOT contain a payload type number used in the offer unless the 3840 configuration is the same as in the offer. 3842 o Parameter sets received MUST be associated with the originating 3843 source, and MUST be only used in decoding the incoming NAL unit 3844 stream from the same source. 3846 o The rules for other parameters are the same as above for unicast 3847 as long as the above rules are obeyed. 3849 Table 14 lists the interpretation of all the parameters that MUST be 3850 used for the various combinations of offer, answer, and direction 3851 attributes. Note that the two columns wherein the "scalable-layer- 3852 id" parameter is used only apply to answers, whereas the other 3853 columns apply to both offers and answers. 3855 Table 14. Interpretation of parameters for various combinations of 3856 offers, answers, direction attributes, with and without scalable- 3857 layer-id. Columns that do not indicate offer or answer apply to 3858 both. 3860 sendonly --+ 3861 answer: recvonly,scalable-layer-id --+ | 3862 recvonly w/o scalable-layer-id --+ | | 3863 answer: sendrecv, scalable-layer-id --+ | | | 3864 sendrecv w/o scalable-layer-id --+ | | | | 3865 | | | | | 3866 profile-level-id C X C X P 3867 max-recv-level R R R R - 3868 max-recv-base-level R R R R - 3869 packetization-mode C X C X P 3870 mst-mode C X C X P 3871 sprop-avc-ready P P - - P 3872 sprop-deint-buf-req P P - - P 3873 sprop-init-buf-time P P - - P 3874 sprop-interleaving-depth P P - - P 3875 sprop-max-don-diff P P - - P 3876 sprop-mst-csdon-always-present P P - - P 3877 sprop-mst-max-don-diff P P - - P 3878 sprop-mst-remux-buf-size P P - - P 3879 sprop-no-NAL-reordering-required P P - - P 3880 sprop-operation-point-info P P - - P 3881 sprop-remux-buf-req P P - - P 3882 sprop-remux-init-buf-time P P - - P 3883 sprop-scalability-info P P - - P 3884 deint-buf-cap R R R R - 3885 max-br R R R R - 3886 max-cpb R R R R - 3887 max-dpb R R R R - 3888 max-fs R R R R - 3889 max-mbps R R R R - 3890 max-rcmd-nalu-size R R R R - 3891 redundant-pic-cap R R R R - 3892 remux-buf-cap R R R R - 3893 in-band-parameter-sets R R R R - 3894 sprop-parameter-sets S S - - S 3895 sprop-level-parameter-sets S S - - S 3896 scalable-layer-id X O X O - 3898 Legend: 3900 C: configuration for sending and receiving streams 3901 P: properties of the stream to be sent 3902 R: receiver capabilities 3903 S: out-of-band parameter sets 3904 O: operation point selection 3905 X: MUST NOT be present 3906 -: not usable, when present SHOULD be ignored 3908 Parameters used for declaring receiver capabilities are in general 3909 downgradable; i.e., they express the upper limit for a sender's 3910 possible behavior. Thus a sender MAY select to set its encoder 3911 using only lower/lesser or equal values of these parameters. 3913 Parameters declaring a configuration point are not changeable, with 3914 the exception of the level part of the "profile-level-id" parameter 3915 for unicast usage. This expresses values a receiver expects to be 3916 used and must be used verbatim on the sender side. If level 3917 downgrading (for profile-level-id) is used, an answerer MUST NOT 3918 include the scalable-layer-id parameter. 3920 When a sender's capabilities are declared, and non-downgradable 3921 parameters are used in this declaration, then these parameters 3922 express a configuration that is acceptable for the sender to receive 3923 streams. In order to achieve high interoperability levels, it is 3924 often advisable to offer multiple alternative configurations; e.g., 3925 for the packetization mode. It is impossible to offer multiple 3926 configurations in a single payload type. Thus, when multiple 3927 configuration offers are made, each offer requires its own RTP 3928 payload type associated with the offer. 3930 A receiver SHOULD understand all media type parameters, even if it 3931 only supports a subset of the payload format's functionality. This 3932 ensures that a receiver is capable of understanding when an offer to 3933 receive media can be downgraded to what is supported by the receiver 3934 of the offer. 3936 An answerer MAY extend the offer with additional media format 3937 configurations. However, to enable their usage, in most cases a 3938 second offer is required from the offerer to provide the stream 3939 property parameters that the media sender will use. This also has 3940 the effect that the offerer has to be able to receive this media 3941 format configuration, not only to send it. 3943 If an offerer wishes to have non-symmetric capabilities between 3944 sending and receiving, the offerer can allow asymmetric levels via 3945 "level-asymmetry-allowed" equal to 1. Alternatively, the offerer 3946 can offer different RTP sessions; i.e., different media lines 3947 declared as "recvonly" and "sendonly", respectively. This may have 3948 further implications on the system, and may require additional 3949 external semantics to associate the two media lines. 3951 7.2.3 Dependency Signaling in Multi-Session Transmission 3953 If MST is used, the rules on signaling media decoding dependency in 3954 SDP as defined in [RFC5583] apply. The rules on "hierarchical or 3955 layered encoding" with multicast in Section 5.7 of [RFC4566] do not 3956 apply, i.e., the notation for Connection Data "c=" SHALL NOT be used 3957 with more than one address. Additionally, the order of dependencies 3958 of the RTP sessions indicated by the "a=depend" attribute as defined 3959 in [RFC5583] MUST represent the decoding order of the VC) NAL units 3960 in an access unit, i.e., the order of session dependency is given 3961 from the base or the lowest enhancement RTP session (the most 3962 important) to the highest enhancement RTP session (the least 3963 important). 3965 7.2.4 Usage in Declarative Session Descriptions 3967 When SVC over RTP is offered with SDP in a declarative style, as in 3968 RTSP [RFC2326] or SAP [RFC2974], the following considerations are 3969 necessary. 3971 o All parameters capable of indicating both stream properties and 3972 receiver capabilities are used to indicate only stream properties. 3973 For example, in this case, the parameter "profile-level-id" 3974 declares the values used by the stream, not the capabilities for 3975 receiving streams. This results in that the following 3976 interpretation of the parameters MUST be used: 3978 Declaring actual configuration or stream properties: 3980 - profile-level-id 3981 - packetization-mode 3982 - mst-mode 3983 - sprop-deint-buf-req 3984 - sprop-interleaving-depth 3985 - sprop-max-don-diff 3986 - sprop-init-buf-time 3987 - sprop-mst-csdon-always-present 3988 - sprop-mst-remux-buf-size 3989 - sprop-remux-buf-req 3990 - sprop-remux-init-buf-time 3991 - sprop-mst-max-don-diff 3992 - sprop-scalability-info 3993 - sprop-operation-point-info 3994 - sprop-no-NAL-reordering-required 3995 - sprop-avc-ready 3997 Out-of-band transporting of parameter sets: 3999 - sprop-parameter-sets 4000 - sprop-level-parameter-sets 4002 Not usable (when present, they SHOULD be ignored): 4004 - max-mbps 4005 - max-fs 4006 - max-cpb 4007 - max-dpb 4008 - max-br 4009 - max-recv-level 4010 - max-recv-base-level 4011 - redundant-pic-cap 4012 - max-rcmd-nalu-size 4013 - deint-buf-cap 4014 - remux-buf-cap 4015 - scalable-layer-id 4017 o A receiver of the SDP is required to support all parameters and 4018 values of the parameters provided; otherwise, the receiver MUST 4019 reject (RTSP) or not participate in (SAP) the session. It falls 4020 on the creator of the session to use values that are expected to 4021 be supported by the receiving application. 4023 7.3 Examples 4025 In the following examples, "{data}" is used to indicate a data 4026 string encoded as base64. 4028 7.3.1 Example for Offering a Single SVC Session 4030 Example 1: The offerer offers one video media description including 4031 two RTP payload types. The first payload type offers H264 and the 4032 second offers H264-SVC. Both payload types have different fmtp 4033 parameters as profile-level-id, packetization-mode, and sprop- 4034 parameter-sets. 4036 Offerer -> Answerer SDP message: 4038 m=video 20000 RTP/AVP 97 96 4039 a=rtpmap:96 H264/90000 4040 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4041 sprop-parameter-sets={sps0},{pps0}; 4042 a=rtpmap:97 H264-SVC/90000 4043 a=fmtp:97 profile-level-id=53000c; packetization-mode=1; 4044 sprop-parameter-sets={sps0},{pps0},{sps1},{pps1}; 4046 If the answerer does not support media subtype H264-SVC, it can 4047 issue an answer accepting only the base layer offer (payload type 4048 96). In the following example the receiver supports H264-SVC, so it 4049 lists payload type 97 first as the preferred option. 4051 Answerer -> Offerer SDP message: 4053 m=video 40000 RTP/AVP 97 96 4054 a=rtpmap:96 H264/90000 4055 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4056 sprop-parameter-sets={sps2},{pps2}; 4057 a=rtpmap:97 H264-SVC/90000 4058 a=fmtp:97 profile-level-id=53000c; packetization-mode=1; 4059 sprop-parameter-sets={sps2},{pps2},{sps3},{pps3}; 4061 7.3.2 Example for Offering a Single SVC Session using scalable-layer-id 4063 Example 2: Offerer offers the same media configurations as shown in 4064 the example above for receiving and sending the stream, but using a 4065 single RTP payload type and including sprop-operation-point-info. 4067 Offerer -> Answerer SDP message: 4069 m=video 20000 RTP/AVP 97 4070 a=rtpmap:97 H264-SVC/90000 4071 a=fmtp:97 profile-level-id=53000c; packetization-mode=1; 4072 sprop-parameter-sets={sps0},{sps1},{pps0},{pps1}; 4073 sprop-operation-point-info=<1,0,0,0,4de00a,3200,176,144,128, 4074 256>,<2,1,1,0,53000c,6400,352,288,256,512>; 4076 In this example the receiver supports H264-SVC and chooses the lower 4077 operation point offered in the RTP payload type for sending and 4078 receiving the stream. 4080 Answerer -> Offerer SDP message: 4082 m=video 40000 RTP/AVP 97 4083 a=rtpmap:97 H264-SVC/90000 4084 a=fmtp:97 sprop-parameter-sets={sps2},{sps3},{pps2},{pps3}; 4085 scalable-layer-id=1; 4087 In an equivalent example showing the use of sprop-scalabilty-info 4088 instead using the sprop-operation-point-info, the sprop-operation- 4089 point-info would be exchanged by the sprop-scalability-info followed 4090 by the binary (base16) representation of the Scalability Information 4091 SEI message. 4093 7.3.3 Example for Offering Multiple Sessions in MST 4095 Example 3: In this example the offerer offers a multi-session 4096 transmission with up to three sessions. The base session media 4097 description includes payload types which are backward compatible 4098 with [I-D.ietf-avt-rtp-rfc3984bis], and three different payload 4099 types are offered. The other two media are using payload types with 4100 media subtype H264-SVC. In each media description different values 4101 of profile-level-id, packetization-mode, mst-mode, and sprop- 4102 parameter-sets are offered. 4104 Offerer -> Answerer SDP message: 4106 a=group:DDP L1 L2 L3 4107 m=video 20000 RTP/AVP 96 97 98 4108 a=rtpmap:96 H264/90000 4109 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4110 mst-mode=NI-T; sprop-parameter-sets={sps0},{pps0}; 4111 a=rtpmap:97 H264/90000 4112 a=fmtp:97 profile-level-id=4de00a; packetization-mode=1; 4113 mst-mode=NI-TC; sprop-parameter-sets={sps0},{pps0}; 4114 a=rtpmap:98 H264/90000 4115 a=fmtp:98 profile-level-id=4de00a; packetization-mode=2; 4116 mst-mode=I-C; init-buf-time=156320; 4117 sprop-parameter-sets={sps0},{pps0}; 4118 a=mid:L1 4119 m=video 20002 RTP/AVP 99 100 4120 a=rtpmap:99 H264-SVC/90000 4121 a=fmtp:99 profile-level-id=53000c; packetization-mode=1; 4122 mst-mode=NI-T; sprop-parameter-sets={sps1},{pps1}; 4123 a=rtpmap:100 H264-SVC/90000 4124 a=fmtp:100 profile-level-id=53000c; packetization-mode=2; 4125 mst-mode=I-C; sprop-parameter-sets={sps1},{pps1}; 4126 a=mid:L2 4127 a=depend:99 lay L1:96,97; 100 lay L1:98 4128 m=video 20004 RTP/AVP 101 4129 a=rtpmap:101 H264-SVC/90000 4130 a=fmtp:101 profile-level-id=53001F; packetization-mode=1; 4131 mst-mode=NI-T; sprop-parameter-sets={sps2},{pps2}; 4132 a=mid:L3 4133 a=depend:101 lay L1:96,97 L2:99 4135 It is assumed that in this example the answerer only supports the 4136 NI-T mode for multi-session transmission. For this reason, it 4137 chooses the corresponding payload type (96) for the base RTP session. 4138 For the two enhancement RTP sessions the answerer also chooses the 4139 payload types that us the NI-T mode (99 and 101). 4141 Answerer -> Offerer SDP message: 4143 a=group:DDP L1 L2 L3 4144 m=video 40000 RTP/AVP 96 4145 a=rtpmap:96 H264/90000 4146 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4147 mst-mode=NI-T; sprop-parameter-sets={sps3},{pps3}; 4148 a=mid:L1 4149 m=video 40002 RTP/AVP 99 4150 a=rtpmap:99 H264-SVC/90000 4151 a=fmtp:99 profile-level-id=53000c; packetization-mode=1; 4152 mst-mode=NI-T; sprop-parameter-sets={sps4},{pps4}; 4153 a=mid:L2 4154 a=depend:99 lay L1:96 4155 m=video 40004 RTP/AVP 101 4156 a=rtpmap:101 H264-SVC/90000 4157 a=fmtp:101 profile-level-id=53001F; packetization-mode=1; 4158 mst-mode=NI-T; sprop-parameter-sets={sps5},{pps5}; 4159 a=mid:L3 4160 a=depend:101 lay L1:96 L2:99 4162 7.3.4 Example for Offering Multiple Sessions in MST including operation 4163 with Answerer using scalable-layer-id 4165 Example 4: In this example the offerer offers a multi-session 4166 transmission of three layers with up to two sessions. The base 4167 session media description has a payload type which is backward 4168 compatible with [I-D.ietf-avt-rtp-rfc3984bis]. Note that no 4169 parameter sets are provided, in which case in-band transport must be 4170 used. The other media description contains two enhancement layers 4171 and uses the media subtype H264-SVC. It includes two operation 4172 point definitions. 4174 Offerer -> Answerer SDP message: 4176 a=group:DDP L1 L2 4177 m=video 20000 RTP/AVP 96 4178 a=rtpmap:96 H264/90000 4179 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4180 mst-mode=NI-T; 4181 a=mid:L1 4182 m=video 20002 RTP/AVP 97 4183 a=rtpmap:97 H264-SVC/90000 4184 a=fmtp:97 profile-level-id=53001F; packetization-mode=1; 4185 mst-mode=NI-TC; sprop-operation-point-info=<2,0,1,0,53000c, 4186 3200,352,288,384,512>,<3,1,2,0,53001F,6400,704,576,768,1024>; 4187 a=mid:L2 4188 a=depend:97 lay L1:96 4190 It is assumed that the answerer wants to send and receive the base 4191 layer (payload type 96), but it only wants to send and receive the 4192 lower enhancement layer, i.e., the one with layer id equal to 2. 4193 For this reason, the response will include the selection of the 4194 desired layer by setting scalable-layer-id equal to 2. Note that 4195 the answer only includes the scalable-layer-id information. The 4196 answer could include sprop-parameter-sets in the response. 4198 Answerer -> Offerer SDP message: 4200 a=group:DDP L1 L2 4201 m=video 40000 RTP/AVP 96 4202 a=rtpmap:96 H264/90000 4203 a=fmtp:96 profile-level-id=4de00a; packetization-mode=0; 4204 mst-mode=NI-T; 4205 a=mid:L1 4206 m=video 40002 RTP/AVP 97 4207 a=rtpmap:97 H264-SVC/90000 4208 a=fmtp:97 scalable-layer-id=2; 4209 a=mid:L2 4210 a=depend:97 lay L1:96 4212 7.3.5 Example for Negotiating an SVC Stream with a Constrained Base 4213 Layer in SST 4215 Example 5: The offerer (Alice) offers one video description 4216 including two RTP payload types with differing levels and 4217 packetization modes. 4219 Offerer -> Answerer SDP message: 4221 m=video 20000 RTP/AVP 97 96 4222 a=rtpmap:96 H264-SVC/90000 4223 a=fmtp:96 profile-level-id=53001e; packetization-mode=0; 4224 a=rtpmap:97 H264-SVC/90000 4225 a=fmtp:97 profile-level-id=53001f; packetization-mode=1; 4227 The answerer (Bridge) chooses packetization mode 1, and indicates 4228 that it would receive an SVC stream with the base layer being 4229 constrained. 4231 Answerer -> Offerer SDP message: 4233 m=video 40000 RTP/AVP 97 4234 a=rtpmap:97 H264-SVC/90000 4235 a=fmtp:97 profile-level-id=53001f; packetization-mode=1; 4236 max-recv-base-level=000d 4238 The answering endpoint must send an SVC stream at level 3.1. Since 4239 the offering endpoint did not declare max-recv-base-level, the base 4240 layer of the SVC stream the answering endpoint must send is not 4241 specifically constrained. The offering endpoint (Alice) must send 4242 an SVC stream at level 3.1, for which the base layer must be of a 4243 level not higher than level 1.3. 4245 7.4 Parameter Set Considerations 4247 Section 8.4 of [I-D.ietf-avt-rtp-rfc3984bis] applies in this memo, 4248 with the following applies additionally for multi-session 4249 transmission (MST). 4251 In MST, regardless of out-of-band or in-band transport of parameter 4252 sets is in use, parameter sets required for decoding NAL units 4253 carried in one particular RTP session SHOULD be carried in the same 4254 session, MAY be carried in a session that the particular RTP session 4255 depends on, and MUST NOT be carried in a session that the particular 4256 RTP session does not depend on. 4258 8. Security Considerations 4260 The security considerations of the RTP Payload Format for H.264 4261 Video specification [I-D.ietf-avt-rtp-rfc3984bis] applies. 4262 Additionally, the following applies. 4264 Decoders MUST exercise caution with respect to the handling of 4265 reserved NAL unit types and reserved SEI messages, particularly if 4266 they contain active elements, and MUST restrict their domain of 4267 applicability to the presentation containing the stream. The safest 4268 way is to simply discard these NAL units and SEI messages. 4270 When integrity protection is applied to a stream, care MUST be taken 4271 that the stream being transported may be scalable; hence a receiver 4272 may be able to access only part of the entire stream. 4274 End-to-end security with either authentication, integrity or 4275 confidentiality protection will prevent a MANE from performing 4276 media-aware operations other than discarding complete packets. And 4277 in the case of confidentiality protection it will even be prevented 4278 from performing discarding of packets in a media aware way. To 4279 allow any MANE to perform its operations, it will be required to be 4280 a trusted entity which is included in the security context 4281 establishment. This applies both for the media path and for the 4282 RTCP path, if RTCP packets need to be rewritten. 4284 9. Congestion Control 4286 Within any given RTP session carrying payload according to this 4287 specification, the provisions of Section 10 of [I-D.ietf-avt-rtp- 4288 rfc3984bis] apply. Reducing the session bitrate is possible by one 4289 or more of the following means: 4291 a) Within the highest layer identified by the DID field remove any 4292 NAL units with QID higher than a certain value. 4294 b) Remove all NAL units with TID higher than a certain value. 4296 c) Remove all NAL units associated with a DID higher than a certain 4297 value. 4299 Informative note: Removal of all coded slice NAL units associated 4300 with DIDs higher than a certain value in the entire stream is 4301 required in order to preserve conformance of the resulting SVC 4302 stream. 4304 d) Utilize the PRID field to indicate the relative importance of NAL 4305 units, and remove all NAL units associated with a PRID higher than 4306 a certain value. Note that the use of the PRID is application- 4307 specific. 4309 e) Remove NAL units or entire packets according to application- 4310 specific rules. The result will depend on the particular coding 4311 structure used as well as any additional application-specific 4312 functionality (e.g., concealment performed at the receiving 4313 decoder). In general, this will result in the reception of a non- 4314 conforming bitstream and hence the decoder behavior is not 4315 specified by [H.264]. Significant artifacts may therefore appear 4316 in the decoded output if the particular decoder implementation 4317 does not take appropriate action in response to congestion control. 4319 Informative note: The discussion above is centered on NAL units 4320 rather than packets, primarily because that is the level where 4321 senders can meaningfully manipulate the scalable bitstream. The 4322 mapping of NAL units to RTP packets is fairly flexible when using 4323 aggregation packets. Depending on the nature of the congestion 4324 control algorithm, the "dimension" of congestion measurement 4325 (packet count or bitrate) and reaction to it (reducing packet 4326 count or bitrate or both) can be adjusted accordingly. 4328 All aforementioned means are available to the RTP sender, regardless 4329 whether that sender is located in the sending endpoint or in a 4330 mixer-based MANE. 4332 When a translator-based MANE is employed, then the MANE MAY 4333 manipulate the session only on the MANE's outgoing path, so that the 4334 sensed end-to-end congestion falls within the permissible envelope. 4335 As all translators, in this case the MANE needs to rewrite RTCP RRs 4336 to reflect the manipulations it has performed on the session. 4338 Informative note: Applications MAY also implement, in addition or 4339 separately, other congestion control mechanisms, e.g., as 4340 described in [RFC5775] and [Yan]. 4342 10. IANA Consideration 4344 A new media type, as specified in Section 7.1 of this memo, should 4345 be registered with IANA. 4347 11. Informative Appendix: Application Examples 4349 11.1 Introduction 4351 Scalable video coding is a concept that has been around since at 4352 least MPEG-2 [MPEG2], which goes back as early as 1993. 4353 Nevertheless, it has never gained wide acceptance; perhaps partly 4354 because applications didn't materialize in the form envisioned 4355 during standardization. 4357 ISO/IEC MPEG and ITU-T VCEG, respectively, performed a requirement 4358 analysis for the SVC project. The MPEG and VCEG requirement 4359 documents are available in [JVT-N026] and [JVT-N027], respectively. 4361 The following introduces four main application scenarios that the 4362 authors consider relevant and that are implementable with this 4363 specification. 4365 11.2 Layered Multicast 4367 This well-understood form of the use of layered coding [McCanne] 4368 implies that all layers are individually conveyed in their own RTP 4369 packet streams, each carried in its own RTP session using the IP 4370 (multicast) address and port number as the single demultiplexing 4371 point. Receivers "tune" into the layers by subscribing to the IP 4372 multicast, normally by using IGMP [IGMP]. Depending on the 4373 application scenario, it is also possible to convey a number of 4374 layers in one RTP session, when finer operation points within the 4375 subset of layers are not needed. 4377 Layered multicast has the great advantage of simplicity and easy 4378 implementation. However, it has also the great disadvantage of 4379 utilizing many different transport addresses. While the authors 4380 consider this not to be a major problem for a professionally 4381 maintained content server, receiving client endpoints need to open 4382 many ports to IP multicast addresses in their firewalls. This is a 4383 practical problem from a firewall and network address translation 4384 (NAT) viewpoint. Furthermore, even today IP multicast is not as 4385 widely deployed as many wish. 4387 The authors consider layered multicast an important application 4388 scenario for the following reasons. First, it is well understood 4389 and the implementation constraints are well known. Second, there 4390 may well be large scale IP networks outside the immediate Internet 4391 context that may wish to employ layered multicast in the future. 4392 One possible example could be a combination of content creation and 4393 core-network distribution for the various mobile TV services, e.g., 4394 those being developed by 3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H]. 4396 11.3 Streaming 4398 In this scenario, a streaming server has a repository of stored SVC 4399 coded layers for a given content. At the time of streaming, and 4400 according to the capabilities, connectivity, and congestion 4401 situation of the client(s), the streaming server generates and 4402 serves a scalable stream. Both unicast and multicast serving is 4403 possible. At the same time, the streaming server may use the same 4404 repository of stored layers to compose different streams (with a 4405 different set of layers) intended for other audiences. 4407 As every endpoint receives only a single SVC RTP session, the number 4408 of firewall pinholes can be optimized to one. 4410 The main difference between this scenario and straightforward 4411 simulcasting lies in the architecture and the requirements of the 4412 streaming server, and is therefore out of the scope of IETF 4413 standardization. However, compelling arguments can be made why such 4414 a streaming server design makes sense. One possible argument is 4415 related to storage space and channel bandwidth. Another is 4416 bandwidth adaptability without transcoding -- a considerable 4417 advantage in a congestion controlled network. When the streaming 4418 server learns about congestion, it can reduce the sending bit rate 4419 by choosing fewer layers when composing the layered stream; see 4420 Section 9. SVC is designed to gracefully support both bandwidth 4421 ramp-down and bandwidth ramp-up with a considerable dynamic range. 4422 This payload format is designed to allow for bandwidth flexibility 4423 in the mentioned sense. While, in theory, a transcoding step could 4424 achieve a similar dynamic range, the computational demands are 4425 impractically high and video quality is typically lowered -- 4426 therefore, few (if any) streaming servers implement full transcoding. 4428 11.4 Videoconferencing (Unicast to MANE, Unicast to Endpoints) 4430 Videoconferencing has traditionally relied on Multipoint Control 4431 Units (MCUs). These units connect endpoints in a star configuration 4432 and operate as follows. Coded video is transmitted from each 4433 endpoint to the MCU, where it is decoded, scaled, and composited to 4434 construct output frames, which are then re-encoded and transmitted 4435 to the endpoins(s). In systems supporting personalized layout (each 4436 user is allowed to select the layout of his/her screen), the 4437 compositing and encoding process is performed for each of the 4438 receiving endpoints. Even without personalized layout, rate 4439 matching still requires that the encoding process at the MCU is 4440 performed separately for each endpoint. As a result, MCUs have 4441 considerable complexity and introduce significant delay. The 4442 cascaded encodings also reduce the video quality. Particularly for 4443 multipoint connections, interactive communication is cumbersome as 4444 the end-to-end delay is very high [G.114]. A simpler architecture 4445 is the switching MCU, in which one of the incoming video streams is 4446 redirected to the receiving endpoints. Obviously, only one user at 4447 a time can be seen and rate matching cannot be performed, thus 4448 forcing all transmitting endpoints to transmit at the lowest bit 4449 rate available in the MCU-to-endpoint connections. 4451 With scalable video coding the MCU can be replaced with an 4452 application-level router (ALR): this unit simply selects which 4453 incoming packets should be transmitted to which of the receiving 4454 endpoints [Eleft]. In such a system, each endpoint performs its own 4455 composition of the incoming video streams. Assuming, for example, a 4456 system that uses spatial scalability with two layers, personalized 4457 layout is equivalent to instructing the ALR to only send the 4458 required packets for the corresponding resolution to the particular 4459 endpoint. Similarly, rate matching at the ALR for a particular 4460 endpoint can be performed by selecting an appropriate subset of the 4461 incoming video packets to transmit to the particular endpoint. 4462 Personalized layout and rate matching thus become routing decisions, 4463 and require no signal processing. Note that scalability also allows 4464 participants to enjoy the best video quality afforded by their links, 4465 i.e., users no longer have to be forced to operate at the quality 4466 supported by the weakest endpoint. Most importantly, the ALR has an 4467 insignificant contribution to the end-to-end delay, typically an 4468 order of magnitude less than an MCU. This makes it possible to have 4469 fully interactive multipoint conferences with even a very large 4470 number of participants. There are significant advantages as well in 4471 terms of error resilience and, in fact, error tolerance can be 4472 increased by nearly an order of magnitude here as well (e.g., using 4473 unequal error protection). Finally, the very low delay of an ALR 4474 allows these systems to be cascaded, with significant benefits in 4475 terms of system design and deployment. Cascading of traditional 4476 MCUs is impossible due to the very high delay that even a single MCU 4477 introduces. 4479 Scalable video coding enables a very significant paradigm shift in 4480 videoconferencing systems, bringing the complexity of video 4481 communication systems (particularly the servers residing within the 4482 network) in line with other types of network applications. 4484 11.5 Mobile TV (Multicast to MANE, Unicast to Endpoint) 4486 This scenario is a bit more complex, and designed to optimize the 4487 network traffic in a core network, while still requiring only a 4488 single pinhole in the endpoint's firewall. One of its key 4489 applications is the mobile TV market. 4491 Consider a large private IP network, e.g., the core network of 3GPP. 4492 Streaming servers within this core network can be assumed to be 4493 professionally maintained. It is assumed that these servers can 4494 have many ports open to the network and that layered multicast is a 4495 real option. Therefore, the streaming server multicasts SVC 4496 scalable layers, instead of simulcasting different representations 4497 of the same content at different bit rates. 4499 Also consider many endpoints of different classes. Some of these 4500 endpoints may lack the processing power or the display size to 4501 meaningfully decode all layers; others may have these capabilities. 4502 Users of some endpoints may wish not to pay for high quality and are 4503 happy with a base service, which may be cheaper or even free. Other 4504 users are willing to pay for high quality. Finally, some connected 4505 users may have a bandwidth problem in that they can't receive the 4506 bandwidth they would want to receive -- be it through congestion, 4507 connectivity, change of service quality, or for whatever other 4508 reasons. However, all these users have in common that they don't 4509 want to be exposed too much, and therefore the number of firewall 4510 pinholes needs to be small. 4512 This situation can be handled best by introducing middleboxes close 4513 to the edge of the core network, which receive the layered multicast 4514 streams and compose the single SVC scalable bit stream according to 4515 the needs of the endpoint connected. These middleboxes are called 4516 MANEs throughout this specification. In practice, the authors 4517 envision the MANE to be part of (or at least physically and 4518 topologically close to) the base station of a mobile network, where 4519 all the signaling and media traffic necessarily are multiplexed on 4520 the same physical link. 4522 MANEs necessarily need to be fairly complex devices. They certainly 4523 need to understand the signaling, so, for example, to associate the 4524 PT octet in the RTP header with the SVC payload type. 4526 A MANE may aggregate multiple RTP streams, possibly from multiple 4527 RTP sessions, thus to reduce the number of firewall pinholes 4528 required at the endpoints, or may optimize the outgoing RTP stream 4529 to the MTU size of the outgoing path by utilizing the aggregation 4530 and fragmentation mechanisms of this memo. This type of MANEs is 4531 conceptually easy to implement and can offer powerful features, 4532 primarily because it necessarily can "see" the payload (including 4533 the RTP payload headers), utilize the wealth of layering information 4534 available therein, and manipulate it. 4536 A MANE can also perform stream thinning, in order to adhere to 4537 congestion control principles as discussed in Section 9. While the 4538 implementation of the forward (media) channel of such a MANE appears 4539 to be comparatively simple, the need to rewrite RTCP RRs makes even 4540 such a MANE a complex device. 4542 While the implementation complexity of either case of a MANE, as 4543 discussed above, is fairly high, the computational demands are 4544 comparatively low. 4546 12. Acknowledgements 4548 Miska Hannuksela contributed significantly to the designs of the 4549 PACSI NAL unit and the NI-C mode for decoding order recovery. Roni 4550 Even organized and coordinated the design team for the development 4551 of this memo, and provided valuable comments. Jonathan Lennox 4552 contributed to the NAL unit reordering algorithm for MST and 4553 provided input on several parts of this memo. Peter Amon, Sam 4554 Ganesan, Mike Nilsson, Colin Perkins, and Thomas Wiegand were 4555 members of the design team and provided valuable contributions. 4556 Magnus Westerlund has also made valuable comments. Charles Eckel 4557 and Stuart Taylor provided valuable comments after the first WGLC 4558 for this document. Xiaohui (Joanne) Wei helped improving Table 13 4559 and the SDP examples. 4561 The work of Thomas Schierl has been supported by the European 4562 Commission under contract number FP7-ICT-248036, project COAST. 4564 This document was prepared using 2-Word-v2.0.template.dot. 4566 13. References 4568 13.1 Normative References 4570 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 4571 generic audiovisual services", 3rd Edition, November 2007. 4573 [I-D.ietf-avt-rtp-rfc3984bis] 4574 Wang, Y.-K., Even, R., Kristensen, T., and Jesup, R., "RTP 4575 Payload Format for H.264 Video", draft-ietf-avt-rtp- 4576 rfc3984bis-12.txt (work in progress), Oct. 2010. 4578 [ISO/IEC 14496-10] 4579 ISO/IEC International Standard 14496-10:2005. 4581 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4582 Requirement Levels", BCP 14, RFC 2119, March 1997. 4584 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 4585 With Session Description Protocol (SDP)", RFC 3264, June 4586 2002. 4588 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 4589 V., "RTP: A Transport Protocol for Real-Time Applications", 4590 STD 64, RFC 3550, July 2003. 4592 [RFC4288] Freed, N. and Klensin, J., "Media Type Specification and 4593 Registration Procedures ", RFC 4288, December 2005. 4595 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 4596 Description Protocol", RFC 4566, July 2006. 4598 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 4599 Encodings", RFC 4648, October 2006. 4601 [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific 4602 Media Attributes in the Session Description Protocol 4603 (SDP)", RFC 5576, June 2009. 4605 [RFC5583] Schierl, T. and Wenger, S., "Signaling media decoding 4606 dependency in the Session Description Protocol (SDP)", RFC 4607 5583, July 2009. 4609 [RFC6051] Perkins, C. and Schierl, T., "Rapid Synchronisation of RTP 4610 Flows", RFC 6051, November 2010 4612 13.2 Informative References 4614 [DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H 4615 Implementation Guidelines, ETSI TR 102 377, 2005. 4617 [Eleft] Eleftheriadis, A., R. Civanlar, and O. Shapiro, 4618 "Multipoint Videoconferencing with Scalable Video Coding", 4619 Journal of Zhejiang University SCIENCE A, Vol. 7, Nr. 5, 4620 April 2006, pp. 696-705. (Proceedings of the Packet Video 4621 2006 Workshop.) 4623 [G.114] ITU-T Rec. G.114, "One-way transmission time", May 2003. 4625 [H.241] ITU-T Rec. H.241, "Extended video procedures and control 4626 signals for H.300-series terminals", May 2006. 4628 [IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B., and 4629 Thyagarajan, A., "Internet Group Management Protocol, 4630 Version 3", RFC 3376, October 2002. 4632 [JVT-N026] Ohm J.-R., Koenen, R., and Chiariglione, L. (ed.), "SVC 4633 requirements specified by MPEG (ISO/IEC JTC1 SC29 WG11)", 4634 JVT-N026, available from http://ftp3.itu.ch/av-arch/jvt- 4635 site/2005_01_HongKongGeneva/JVT-N026.doc, Hong Kong, China, 4636 January 2005. 4638 [JVT-N027] Sullivan, G. and Wiegand, T. (ed.), "SVC requirements 4639 specified by VCEG (ITU-T SG16 Q.6)", JVT-N027, available 4640 from http://ftp3.itu.ch/av-arch/jvt- 4641 site/2005_01_HongKongGeneva/JVT-N027.doc, Hong Kong, China, 4642 January 2005. 4644 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 4645 driven layered multicast", in Proc. of ACM SIGCOMM'96, 4646 pages 117--130, Stanford, CA, August 1996. 4648 [MBMS] 3GPP - Technical Specification Group Services and System 4649 Aspects; Multimedia Broadcast/Multicast Service (MBMS); 4650 Protocols and codecs (Release 6), December 2005. 4652 [MPEG2] ISO/IEC International Standard 13818-2:1993. 4654 [RFC2326] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time 4655 Streaming Protocol (RTSP)", RFC 2326, April 1998. 4657 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 4658 Announcement Protocol", RFC 2974, October 2000. 4660 [RFC5117] Westerlund, M. and Wenger, S., "RTP Topologies", RFC 5117, 4661 January 2008. 4663 [RFC5775] Luby, M., Watson, M., and Vicisano, L., "Asynchronous 4664 layered coding (ALC) protocol instantiation", RFC 5775, 4665 April 2010. 4667 [Yan] Yan, J., Katrinis, K., May, M., and Plattner, R., "Media- 4668 and TCP-friendly congestion control for scalable video 4669 streams", in IEEE Trans. Multimedia, pages 196--206, April 4670 2006. 4672 14. Authors' Addresses 4674 Stephan Wenger 4675 2400 Skyfarm Dr. 4676 Hillsborough, CA 94010 4677 USA 4679 Phone: +1-415-713-5473 4680 EMail: stewe@stewe.org 4682 Ye-Kui Wang 4683 Huawei Technologies 4684 400 Crossing Blvd, 2nd Floor 4685 Bridgewater, NJ 08807 4686 USA 4688 Phone: +1-908-541-3518 4689 EMail: yekui.wang@huawei.com 4691 Thomas Schierl 4692 Fraunhofer HHI 4693 Einsteinufer 37 4694 D-10587 Berlin 4695 Germany 4697 Phone: +49-30-31002-227 4698 Email: ts@thomas-schierl.de 4700 Alex Eleftheriadis 4701 Vidyo, Inc. 4702 433 Hackensack Ave. 4704 Hackensack, NJ 07601 4705 USA 4707 Phone: +1-201-467-5135 4708 Email: alex@vidyo.com