idnits 2.17.1 draft-ietf-payload-rtp-h265-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 3 instances of too long lines in the document, the longest one being 14 characters in excess of 72. ** The abstract seems to contain references ([HEVC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1351 has weird spacing: '...L unit into ...' == Line 3385 has weird spacing: '... loss indic...' == Line 3424 has weird spacing: '... Note that ...' == Line 3425 has weird spacing: '...cnt_lsb is n...' == Line 3447 has weird spacing: '...k-sized video...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The FU payload consists of fragments of the payload of the fragmented NAL unit so that if the FU payloads of consecutive FUs, starting with an FU with the S bit equal to 1 and ending with an FU with the E bit equal to 1, are sequentially concatenated, the payload of the fragmented NAL unit can be reconstructed. The NAL unit header of the fragmented NAL unit is not included as such in the FU payload, but rather the information of the NAL unit header of the fragmented NAL unit is conveyed in F, LayerId, and TID fields of the FU payload headers of the FUs and the FuType field of the FU header of the FUs. An FU payload MUST not be empty. -- The document date (May 28, 2014) is 3592 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3GP' is mentioned on line 274, but not defined -- Looks like a reference, but probably isn't: '0' on line 1081 == Missing Reference: 'RFC5234' is mentioned on line 2679, but not defined == Missing Reference: 'RFC5117' is mentioned on line 2900, but not defined ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667) == Missing Reference: 'RFC2326' is mentioned on line 3267, but not defined ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826) == Missing Reference: 'RFC2974' is mentioned on line 3268, but not defined == Missing Reference: 'RFC3551' is mentioned on line 3503, but not defined == Missing Reference: 'RFC3711' is mentioned on line 3503, but not defined == Missing Reference: 'RFC5124' is mentioned on line 3504, but not defined == Missing Reference: 'RFC 3711' is mentioned on line 3529, but not defined == Missing Reference: 'RFC 3551' is mentioned on line 3553, but not defined == Unused Reference: '3GPPFF' is defined on line 3679, but no explicit reference was found in the text == Unused Reference: 'RFC5109' is defined on line 3742, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-01 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-01 Summary: 5 errors (**), 0 flaws (~~), 23 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: November 2014 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 May 28, 2014 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-04.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) [HEVC] and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization of 22 one or more Network Abstraction Layer (NAL) units in each RTP packet 23 payload, as well as fragmentation of a NAL unit into multiple RTP 24 packets. Furthermore, it supports transmission of an HEVC bitstream 25 over a single as well as multiple RTP streams. The payload format 26 has wide applicability in videoconferencing, Internet video 27 streaming, and high bit-rate entertainment-quality video, among 28 others. 30 Status of this Memo 32 This Internet-Draft is submitted to IETF in full conformance with 33 the provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six 41 months and may be updated, replaced, or obsoleted by other documents 42 at any time. It is inappropriate to use Internet-Drafts as 43 reference material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/ietf/1id-abstracts.txt. 48 The list of Internet-Draft Shadow Directories can be accessed at 49 http://www.ietf.org/shadow.html. 51 This Internet-Draft will expire on November 28, 2014. 53 Copyright and License Notice 55 Copyright (c) 2014 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with 63 respect to this document. Code Components extracted from this 64 document must include Simplified BSD License text as described in 65 Section 4.e of the Trust Legal Provisions and are provided without 66 warranty as described in the Simplified BSD License. 68 Table of Contents 70 Abstract..........................................................1 71 Status of this Memo...............................................1 72 Table of Contents.................................................3 73 1. Introduction...................................................5 74 1.1. Overview of the HEVC Codec................................5 75 1.1.1 Coding-Tool Features..................................5 76 1.1.2 Systems and Transport Interfaces......................7 77 1.1.3 Parallel Processing Support..........................14 78 1.1.4 NAL Unit Header......................................16 79 1.2. Overview of the Payload Format...........................17 80 2. Conventions...................................................18 81 3. Definitions and Abbreviations.................................18 82 3.1 Definitions...............................................18 83 3.1.1 Definitions from the HEVC Specification..............18 84 3.1.2 Definitions Specific to This Memo....................20 85 3.2 Abbreviations.............................................22 86 4. RTP Payload Format............................................23 87 4.1 RTP Header Usage..........................................23 88 4.2 Payload Header Usage......................................26 89 4.3 Payload Structures........................................26 90 4.4 Transmission Modes........................................27 91 4.5 Decoding Order Number.....................................28 92 4.6 Single NAL Unit Packets...................................30 93 4.7 Aggregation Packets (APs).................................31 94 4.8 Fragmentation Units (FUs).................................35 95 4.9 PACI packets..............................................38 96 4.9.1 Reasons for the PACI rules (informative).............41 97 4.9.2 PACI extensions (Informative)........................41 98 4.10 Temporal Scalability Control Information.................43 99 5. Packetization Rules...........................................45 100 6. De-packetization Process......................................45 101 7. Payload Format Parameters.....................................48 102 7.1 Media Type Registration...................................48 103 7.2 SDP Parameters............................................73 104 7.2.1 Mapping of Payload Type Parameters to SDP............73 105 7.2.2 Usage with SDP Offer/Answer Model....................74 106 7.2.3 Usage in Declarative Session Descriptions............83 107 7.2.4 Parameter Sets Considerations........................84 108 7.2.5 Dependency Signaling in Multi-Stream Mode............84 109 8. Use with Feedback Messages....................................85 110 8.1 Picture Loss Indication (PLI).............................86 111 8.2 Slice Loss Indication.....................................86 112 8.3 Use of HEVC with the RPSI Feedback Message................87 113 8.4 Full Intra Request (FIR)..................................88 114 9. Security Considerations.......................................88 115 10. Congestion Control...........................................90 116 11. IANA Consideration...........................................91 117 12. Acknowledgements.............................................91 118 13. References...................................................91 119 13.1 Normative References.....................................91 120 13.2 Informative References...................................93 121 14. Authors' Addresses...........................................95 123 1. Introduction 125 1.1. Overview of the HEVC Codec 127 High Efficiency Video Coding [HEVC], formally known as ITU-T 128 Recommendation H.265 and ISO/IEC International Standard 23008-2 was 129 ratified by ITU-T in April 2013 and reportedly provides significant 130 coding efficiency gains over H.264 [H.264]. 132 As both H.264 [H.264] and its RTP payload format [RFC6184] are 133 widely deployed and generally known in the relevant implementer 134 communities, frequently only the differences between those two 135 specifications are highlighted in non-normative, explanatory parts 136 of this memo. Basic familiarity with both specifications is assumed 137 for those parts. However, the normative parts of this memo do not 138 require study of H.264 or its RTP payload format. 140 H.264 and HEVC share a similar hybrid video codec design. 141 Conceptually, both technologies include a video coding layer (VCL), 142 which is often used to refer to the coding-tool features, and a 143 network abstraction layer (NAL), which is often used to refer to the 144 systems and transport interface aspects of the codecs. 146 1.1.1 Coding-Tool Features 148 Similarly to earlier hybrid-video-coding-based standards, including 149 H.264, the following basic video coding design is employed by HEVC. 150 A prediction signal is first formed either by intra or motion 151 compensated prediction, and the residual (the difference between the 152 original and the prediction) is then coded. The gains in coding 153 efficiency are achieved by redesigning and improving almost all 154 parts of the codec over earlier designs. In addition, HEVC includes 155 several tools to make the implementation on parallel architectures 156 easier. Below is a summary of HEVC coding-tool features. 158 Quad-tree block and transform structure 160 One of the major tools that contribute significantly to the coding 161 efficiency of HEVC is the usage of flexible coding blocks and 162 transforms, which are defined in a hierarchical quad-tree manner. 163 Unlike H.264, where the basic coding block is a macroblock of fixed 164 size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size 165 of 64x64. Each CTU can be divided into smaller units in a 166 hierarchical quad-tree manner and can represent smaller blocks down 167 to size 4x4. Similarly, the transforms used in HEVC can have 168 different sizes, starting from 4x4 and going up to 32x32. Utilizing 169 large blocks and transforms contribute to the major gain of HEVC, 170 especially at high resolutions. 172 Entropy coding 174 HEVC uses a single entropy coding engine, which is based on Context 175 Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two 176 distinct entropy coding engines. CABAC in HEVC shares many 177 similarities with CABAC of H.264, but contains several improvements. 178 Those include improvements in coding efficiency and lowered 179 implementation complexity, especially for parallel architectures. 181 In-loop filtering 183 H.264 includes an in-loop adaptive deblocking filter, where the 184 blocking artifacts around the transform edges in the reconstructed 185 picture are smoothed to improve the picture quality and compression 186 efficiency. In HEVC, a similar deblocking filter is employed but 187 with somewhat lower complexity. In addition, pictures undergo a 188 subsequent filtering operation called Sample Adaptive Offset (SAO), 189 which is a new design element in HEVC. SAO basically adds a pixel- 190 level offset in an adaptive manner and usually acts as a de-ringing 191 filter. It is observed that SAO improves the picture quality, 192 especially around sharp edges contributing substantially to visual 193 quality improvements of HEVC. 195 Motion prediction and coding 197 There have been a number of improvements in this area that are 198 summarized as follows. The first category is motion merge and 199 advanced motion vector prediction (AMVP) modes. The motion 200 information of a prediction block can be inferred from the spatially 201 or temporally neighboring blocks. This is similar to the DIRECT 202 mode in H.264 but includes new aspects to incorporate the flexible 203 quad-tree structure and methods to improve the parallel 204 implementations. In addition, the motion vector predictor can be 205 signaled for improved efficiency. The second category is high- 206 precision interpolation. The interpolation filter length is 207 increased to 8-tap from 6-tap, which improves the coding efficiency 208 but also comes with increased complexity. In addition, the 209 interpolation filter is defined with higher precision without any 210 intermediate rounding operations to further improve the coding 211 efficiency. 213 Intra prediction and intra coding 215 Compared to 8 intra prediction modes in H.264, HEVC supports angular 216 intra prediction with 33 directions. This increased flexibility 217 improves both objective coding efficiency and visual quality as the 218 edges can be better predicted and ringing artifacts around the edges 219 can be reduced. In addition, the reference samples are adaptively 220 smoothed based on the prediction direction. To avoid contouring 221 artifacts a new interpolative prediction generation is included to 222 improve the visual quality. Furthermore, discrete sine transform 223 (DST) is utilized instead of traditional discrete cosine transform 224 (DCT) for 4x4 intra transform blocks. 226 Other coding-tool features 228 HEVC includes some tools for lossless coding and efficient screen 229 content coding, such as skipping the transform for certain blocks. 230 These tools are particularly useful for example when streaming the 231 user-interface of a mobile device to a large display. 233 1.1.2 Systems and Transport Interfaces 235 HEVC inherited the basic systems and transport interfaces designs, 236 such as the NAL-unit-based syntax structure, the hierarchical syntax 237 and data unit structure from sequence-level parameter sets, multi- 238 picture-level or picture-level parameter sets, slice-level header 239 parameters, lower-level parameters, the supplemental enhancement 240 information (SEI) message mechanism, the hypothetical reference 241 decoder (HRD) based video buffering model, and so on. In the 242 following, a list of differences in these aspects compared to H.264 243 is summarized. 245 Video parameter set 247 A new type of parameter set, called video parameter set (VPS), was 248 introduced. For the first (2013) version of [HEVC], the video 249 parameter set NAL unit is required to be available prior to its 250 activation, while the information contained in the video parameter 251 set is not necessary for operation of the decoding process. For 252 future HEVC extensions, such as the 3D or scalable extensions, the 253 video parameter set is expected to include information necessary for 254 operation of the decoding process, e.g. decoding dependency or 255 information for reference picture set construction of enhancement 256 layers. The VPS provides a "big picture" of a bitstream, including 257 what types of operation points are provided, the profile, tier, and 258 level of the operation points, and some other high-level properties 259 of the bitstream that can be used as the basis for session 260 negotiation and content selection, etc. (see section 7.1). 262 Profile, tier and level 264 The profile, tier and level syntax structure that can be included in 265 both VPS and sequence parameter set (SPS) includes 12 bytes of data 266 to describe the entire bitstream (including all temporally scalable 267 layers, which are referred to as sub-layers in the HEVC 268 specification), and can optionally include more profile, tier and 269 level information pertaining to individual temporally scalable 270 layers. The profile indicator indicates the "best viewed as" 271 profile when the bitstream conforms to multiple profiles, similar to 272 the major brand concept in the ISO base media file format (ISOBMFF) 273 [ISOBMFF] and file formats derived based on ISOBMFF, such as the 274 3GPP file format [3GP]. The profile, tier and level syntax 275 structure also includes the indications of whether the bitstream is 276 free of frame-packed content, whether the bitstream is free of 277 interlaced source content and free of field pictures, i.e. contains 278 only frame pictures of progressive source, such that clients/players 279 with no support of post-processing functionalities for handling of 280 frame-packed or interlaced source content or field pictures can 281 reject those bitstreams. 283 Bitstream and elementary stream 285 HEVC includes a definition of an elementary stream, which is new 286 compared to H.264. An elementary stream consists of a sequence of 287 one or more bitstreams. An elementary stream that consists of two 288 or more bitstreams has typically been formed by splicing together 289 two or more bitstreams (or parts thereof). When an elementary 290 stream contains more than one bitstream, the last NAL unit of the 291 last access unit of a bitstream (except the last bitstream in the 292 elementary stream) must contain an end of bitstream NAL unit and the 293 first access unit of the subsequent bitstream must be an intra 294 random access point (IRAP) access unit. This IRAP access unit may 295 be a clean random access (CRA), broken link access (BLA), or 296 instantaneous decoding refresh (IDR) access unit. 298 Random access support 300 HEVC includes signaling in NAL unit header, through NAL unit types, 301 of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, 302 namely IDR, CRA and BLA pictures are supported, wherein IDR pictures 303 are conventionally referred to as closed group-of-pictures (closed- 304 GOP) random access points, and CRA and BLA pictures are those 305 conventionally referred to as open-GOP random access points. BLA 306 pictures usually originate from splicing of two bitstreams or part 307 thereof at a CRA picture, e.g. during stream switching. To enable 308 better systems usage of IRAP pictures, altogether six different NAL 309 units are defined to signal the properties of the IRAP pictures, 310 which can be used to better match the stream access point (SAP) 311 types as defined in the ISOBMFF [ISOBMFF], which are utilized for 312 random access support in both 3GP-DASH [3GPDASH] and MPEG DASH 313 [MPEGDASH]. Pictures following an IRAP picture in decoding order 314 and preceding the IRAP picture in output order are referred to as 315 leading pictures associated with the IRAP picture. There are two 316 types of leading pictures, namely random access decodable leading 317 (RADL) pictures and random access skipped leading (RASL) pictures. 318 RADL pictures are decodable when the decoding started at the 319 associated IRAP picture, and RASL pictures are not decodable when 320 the decoding started at the associated IRAP picture and are usually 321 discarded. HEVC provides mechanisms to enable the specification of 322 conformance of bitstreams with RASL pictures being discarded, thus 323 to provide a standard-compliant way to enable systems components to 324 discard RASL pictures when needed. 326 Temporal scalability support 328 HEVC includes an improved support of temporal scalability, by 329 inclusion of the signaling of TemporalId in the NAL unit header, the 330 restriction that pictures of a particular temporal sub-layer cannot 331 be used for inter prediction reference by pictures of a lower 332 temporal sub-layer, the sub-bitstream extraction process, and the 333 requirement that each sub-bitstream extraction output be a 334 conforming bitstream. Media-aware network elements (MANEs) can 335 utilize the TemporalId in the NAL unit header for stream adaptation 336 purposes based on temporal scalability. 338 Temporal sub-layer switching support 340 HEVC specifies, through NAL unit types present in the NAL unit 341 header, the signaling of temporal sub-layer access (TSA) and 342 stepwise temporal sub-layer access (STSA). A TSA picture and 343 pictures following the TSA picture in decoding order do not use 344 pictures prior to the TSA picture in decoding order with TemporalId 345 greater than or equal to that of the TSA picture for inter 346 prediction reference. A TSA picture enables up-switching, at the 347 TSA picture, to the sub-layer containing the TSA picture or any 348 higher sub-layer, from the immediately lower sub-layer. An STSA 349 picture does not use pictures with the same TemporalId as the STSA 350 picture for inter prediction reference. Pictures following an STSA 351 picture in decoding order with the same TemporalId as the STSA 352 picture do not use pictures prior to the STSA picture in decoding 353 order with the same TemporalId as the STSA picture for inter 354 prediction reference. An STSA picture enables up-switching, at the 355 STSA picture, to the sub-layer containing the STSA picture, from the 356 immediately lower sub-layer. 358 Sub-layer reference or non-reference pictures 360 The concept and signaling of reference/non-reference pictures in 361 HEVC are different from H.264. In H.264, if a picture may be used 362 by any other picture for inter prediction reference, it is a 363 reference picture; otherwise it is a non-reference picture, and this 364 is signaled by two bits in the NAL unit header. In HEVC, a picture 365 is called a reference picture only when it is marked as "used for 366 reference". In addition, the concept of sub-layer reference picture 367 was introduced. If a picture may be used by another other picture 368 with the same TemporalId for inter prediction reference, it is a 369 sub-layer reference picture; otherwise it is a sub-layer non- 370 reference picture. Whether a picture is a sub-layer reference 371 picture or sub-layer non-reference picture is signaled through NAL 372 unit type values. 374 Extensibility 376 Besides the TemporalId in the NAL unit header, HEVC also includes 377 the signaling of a six-bit layer ID in the NAL unit header, which 378 must be equal to 0 for a single-layer bitstream. Extension 379 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice 380 headers, and so on. All these extension mechanisms enable future 381 extensions in a backward compatible manner, such that bitstreams 382 encoded according to potential future HEVC extensions can be fed to 383 then-legacy decoders (e.g. HEVC version 1 decoders) and the then- 384 legacy decoders can decode and output the base layer bitstream. 386 Bitstream extraction 388 HEVC includes a bitstream extraction process as an integral part of 389 the overall decoding process, as well as specification of the use of 390 the bitstream extraction process in description of bitstream 391 conformance tests as part of the hypothetical reference decoder 392 (HRD) specification. 394 Reference picture management 396 The reference picture management of HEVC, including reference 397 picture marking and removal from the decoded picture buffer (DPB) as 398 well as reference picture list construction (RPLC), differs from 399 that of H.264. Instead of the sliding window plus adaptive memory 400 management control operation (MMCO) based reference picture marking 401 mechanism in H.264, HEVC specifies a reference picture set (RPS) 402 based reference picture management and marking mechanism, and the 403 RPLC is consequently based on the RPS mechanism. A reference 404 picture set consists of a set of reference pictures associated with 405 a picture, consisting of all reference pictures that are prior to 406 the associated picture in decoding order, that may be used for inter 407 prediction of the associated picture or any picture following the 408 associated picture in decoding order. The reference picture set 409 consists of five lists of reference pictures; RefPicSetStCurrBefore, 410 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 411 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 412 RefPicSetLtCurr contain all reference pictures that may be used in 413 inter prediction of the current picture and that may be used in 414 inter prediction of one or more of the pictures following the 415 current picture in decoding order. RefPicSetStFoll and 416 RefPicSetLtFoll consist of all reference pictures that are not used 417 in inter prediction of the current picture but may be used in inter 418 prediction of one or more of the pictures following the current 419 picture in decoding order. RPS provides an "intra-coded" signaling 420 of the DPB status, instead of an "inter-coded" signaling, mainly for 421 improved error resilience. The RPLC process in HEVC is based on the 422 RPS, by signaling an index to an RPS subset for each reference 423 index. The RPLC process has been simplified compared to that in 424 H.264, by removal of the reference picture list modification (also 425 referred to as reference picture list reordering) process. 427 Ultra low delay support 429 HEVC specifies a sub-picture-level HRD operation, for support of the 430 so-called ultra-low delay. The mechanism specifies a standard- 431 compliant way to enable delay reduction below one picture interval. 432 Sub-picture-level coded picture buffer (CPB) and DPB parameters may 433 be signaled, and utilization of these information for the derivation 434 of CPB timing (wherein the CPB removal time corresponds to decoding 435 time) and DPB output timing (display time) is specified. Decoders 436 are allowed to operate the HRD at the conventional access-unit- 437 level, even when the sub-picture-level HRD parameters are present. 439 New SEI messages 441 HEVC inherits many H.264 SEI messages with changes in syntax and/or 442 semantics making them applicable to HEVC. Additionally, there are a 443 few new SEI messages reviewed briefly in the following paragraphs. 445 The display orientation SEI message informs the decoder of a 446 transformation that is recommended to be applied to the cropped 447 decoded picture prior to display, such that the pictures can be 448 properly displayed, e.g. in an upside-up manner. 450 The structure of pictures SEI message provides information on the 451 NAL unit types, picture order count values, and prediction 452 dependencies of a sequence of pictures. The SEI message can be used 453 for example for concluding what impact a lost picture has on other 454 pictures. 456 The decoded picture hash SEI message provides a checksum derived 457 from the sample values of a decoded picture. It can be used for 458 detecting whether a picture was correctly received and decoded. 460 The active parameter sets SEI message includes the IDs of the active 461 video parameter set and the active sequence parameter set and can be 462 used to activate VPSs and SPSs. In addition, the SEI message 463 includes the following indications: 1) An indication of whether 464 "full random accessibility" is supported (when supported, all 465 parameter sets needed for decoding of the remaining of the bitstream 466 when random accessing from the beginning of the current coded video 467 sequence by completely discarding all access units earlier in 468 decoding order are present in the remaining bitstream and all coded 469 pictures in the remaining bitstream can be correctly decoded); 2) An 470 indication of whether there is no parameter set within the current 471 coded video sequence that updates another parameter set of the same 472 type preceding in decoding order. An update of a parameter set 473 refers to the use of the same parameter set ID but with some other 474 parameters changed. If this property is true for all coded video 475 sequences in the bitstream, then all parameter sets can be sent out- 476 of-band before session start. 478 The decoding unit information SEI message provides coded picture 479 buffer removal delay information for a decoding unit. The message 480 can be used in very-low-delay buffering operations. 482 The region refresh information SEI message can be used together with 483 the recovery point SEI message (present in both H.264 and HEVC) for 484 improved support of gradual decoding refresh (GDR). This supports 485 random access from inter-coded pictures, wherein complete pictures 486 can be correctly decoded or recovered after an indicated number of 487 pictures in output/display order. 489 1.1.3 Parallel Processing Support 491 The reportedly significantly higher encoding computational demand of 492 HEVC over H.264, in conjunction with the ever increasing video 493 resolution (both spatially and temporally) required by the market, 494 led to the adoption of VCL coding tools specifically targeted to 495 allow for parallelization on the sub-picture level. That is, 496 parallelization occurs, at the minimum, at the granularity of an 497 integer number of CTUs. The targets for this type of high-level 498 parallelization are multicore CPUs and DSPs as well as 499 multiprocessor systems. In a system design, to be useful, these 500 tools require signaling support, which is provided in Section 7 of 501 this memo. This section provides a brief overview of the tools 502 available in [HEVC]. 504 Many of the tools incorporated in HEVC were designed keeping in mind 505 the potential parallel implementations in multi-core/multi-processor 506 architectures. Specifically, for parallelization, four picture 507 partition strategies are available. 509 Slices are segments of the bitstream that can be reconstructed 510 independently from other slices within the same picture (though 511 there may still be interdependencies through loop filtering 512 operations). Slices are the only tool that can be used for 513 parallelization that is also available, in virtually identical form, 514 in H.264. Slices based parallelization does not require much inter- 515 processor or inter-core communication (except for inter-processor or 516 inter-core data sharing for motion compensation when decoding a 517 predictively coded picture, which is typically much heavier than 518 inter-processor or inter-core data sharing due to in-picture 519 prediction), as slices are designed to be independently decodable. 520 However, for the same reason, slices can require some coding 521 overhead. Further, slices (in contrast to some of the other tools 522 mentioned below) also serve as the key mechanism for bitstream 523 partitioning to match Maximum Transfer Unit (MTU) size requirements, 524 due to the in-picture independence of slices and the fact that each 525 regular slice is encapsulated in its own NAL unit. In many cases, 526 the goal of parallelization and the goal of MTU size matching can 527 place contradicting demands to the slice layout in a picture. The 528 realization of this situation led to the development of the more 529 advanced tools mentioned below. 531 Dependent slice segments allow for fragmentation of a coded slice 532 into fragments at CTU boundaries without breaking any in-picture 533 prediction mechanism. They are complementary to the fragmentation 534 mechanism described in this memo in that they need the cooperation 535 of the encoder. As a dependent slice segment necessarily contains 536 an integer number of CTUs, a decoder using multiple cores operating 537 on CTUs can process a dependent slice segment without communicating 538 parts of the slice segment's bitstream to other cores. 539 Fragmentation, as specified in this memo, in contrast, does not 540 guarantee that a fragment contains an integer number of CTUs. 542 In wavefront parallel processing (WPP), the picture is partitioned 543 into rows of CTUs. Entropy decoding and prediction are allowed to 544 use data from CTUs in other partitions. Parallel processing is 545 possible through parallel decoding of CTU rows, where the start of 546 the decoding of a row is delayed by two CTUs, so to ensure that data 547 related to a CTU above and to the right of the subject CTU is 548 available before the subject CTU is being decoded. Using this 549 staggered start (which appears like a wavefront when represented 550 graphically), parallelization is possible with up to as many 551 processors/cores as the picture contains CTU rows. 553 Because in-picture prediction between neighboring CTU rows within a 554 picture is allowed, the required inter-processor/inter-core 555 communication to enable in-picture prediction can be substantial. 556 The WPP partitioning does not result in the creation of more NAL 557 units compared to when it is not applied, thus WPP cannot be used 558 for MTU size matching, though slices can be used in combination for 559 that purpose. 561 Tiles define horizontal and vertical boundaries that partition a 562 picture into tile columns and rows. The scan order of CTUs is 563 changed to be local within a tile (in the order of a CTU raster scan 564 of a tile), before decoding the top-left CTU of the next tile in the 565 order of tile raster scan of a picture. Similar to slices, tiles 566 break in-picture prediction dependencies (including entropy decoding 567 dependencies). However, they do not need to be included into 568 individual NAL units (same as WPP in this regard), hence tiles 569 cannot be used for MTU size matching, though slices can be used in 570 combination for that purpose. Each tile can be processed by one 571 processor/core, and the inter-processor/inter-core communication 572 required for in-picture prediction between processing units decoding 573 neighboring tiles is limited to conveying the shared slice header in 574 cases a slice is spanning more than one tile, and loop filtering 575 related sharing of reconstructed samples and metadata. Insofar, 576 tiles are less demanding in terms of inter-processor communication 577 bandwidth compared to WPP due to the in-picture independence between 578 two neighboring partitions. 580 1.1.4 NAL Unit Header 582 HEVC maintains the NAL unit concept of H.264 with modifications. 583 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 584 payload of a NAL unit refers to the NAL unit excluding the NAL unit 585 header. 587 +---------------+---------------+ 588 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 |F| Type | LayerId | TID | 591 +-------------+-----------------+ 593 Figure 1 The structure of HEVC NAL unit header 595 The semantics of the fields in the NAL unit header are as specified 596 in [HEVC] and described briefly below for convenience. In addition 597 to the name and size of each field, the corresponding syntax element 598 name in [HEVC] is also provided. 600 F: 1 bit 601 forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as 602 a syntax violation. Note that the inclusion of this bit in the 603 NAL unit header is to enable transport of HEVC video over MPEG-2 604 transport systems (avoidance of start code emulations) [MPEG2S]. 606 Type: 6 bits 607 nal_unit_type. This field specifies the NAL unit type as defined 608 in Table 7-1 of [HEVC]. If the most significant bit of this 609 field of a NAL unit is equal to 0 (i.e. the value of this field 610 is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the 611 NAL unit is a non-VCL NAL unit. For a reference of all currently 612 defined NAL unit types and their semantics, please refer to 613 Section 7.4.1 in [HEVC]. 615 LayerId: 6 bits 616 nuh_layer_id. MUST be equal to zero. It is anticipated that in 617 future scalable or 3D video coding extensions of this 618 specification, this syntax element will be used to identify 619 additional layers that may be present in the coded video 620 sequence, wherein a layer may be, e.g. a spatial scalable layer, 621 a quality scalable layer, a texture view, or a depth view. 623 TID: 3 bits 624 nuh_temporal_id_plus1. This field specifies the temporal 625 identifier of the NAL unit plus 1. The value of TemporalId is 626 equal to TID minus 1. A TID value of 0 is illegal to ensure that 627 there is at least one bit in the NAL unit header equal to 1, so 628 to enable independent considerations of start code emulations in 629 the NAL unit header and in the NAL unit payload data. 631 1.2. Overview of the Payload Format 633 This payload format defines the following processes required for 634 transport of HEVC coded data over RTP [RFC3550]: 636 o Usage of RTP header with this payload format 638 o Packetization of HEVC coded NAL units into RTP packets using three 639 types of payload structures, namely single NAL unit packet, 640 aggregation packet, and fragment unit 642 o Transmission of HEVC NAL units of the same bitstream within a 643 single RTP stream or multiple RTP streams within one or more RTP 644 sessions, where within an RTP stream transmission of NAL units may 645 be either non-interleaved (i.e. the transmission order of NAL 646 units is the same as their decoding order) or interleaved (i.e. 648 the transmission order of NAL units is different from their 649 decoding order) 651 o Media type parameters to be used with the Session Description 652 Protocol (SDP) [RFC4566] 654 o A payload header extension mechanism and data structures for 655 enhanced support of temporal scalability based on that extension 656 mechanism. 658 2. Conventions 660 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 661 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 662 document are to be interpreted as described in BCP 14, RFC 2119 663 [RFC2119]. 665 In this document, these key words will appear with that 666 interpretation only when in ALL CAPS. Lower case uses of these 667 words are not to be interpreted as carrying the RFC 2119 668 significance. 670 This specification uses the notion of setting and clearing a bit 671 when bit fields are handled. Setting a bit is the same as assigning 672 that bit the value of 1 (On). Clearing a bit is the same as 673 assigning that bit the value of 0 (Off). 675 3. Definitions and Abbreviations 677 3.1 Definitions 679 This document uses the terms and definitions of [HEVC]. Section 680 3.1.1 lists relevant definitions copied from [HEVC] for convenience. 681 Section 3.1.2 provides definitions specific to this memo. 683 3.1.1 Definitions from the HEVC Specification 685 access unit: A set of NAL units that are associated with each other 686 according to a specified classification rule, are consecutive in 687 decoding order, and contain exactly one coded picture. 689 BLA access unit: An access unit in which the coded picture is a BLA 690 picture. 692 BLA picture: An IRAP picture for which each VCL NAL unit has 693 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 695 coded video sequence: A sequence of access units that consists, in 696 decoding order, of an IRAP access unit with NoRaslOutputFlag equal 697 to 1, followed by zero or more access units that are not IRAP access 698 units with NoRaslOutputFlag equal to 1, including all subsequent 699 access units up to but not including any subsequent access unit that 700 is an IRAP access unit with NoRaslOutputFlag equal to 1. 702 Informative note: An IRAP access unit may be an IDR access unit, 703 a BLA access unit, or a CRA access unit. The value of 704 NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA 705 access unit, and each CRA access unit that is the first access 706 unit in the bitstream in decoding order, is the first access unit 707 that follows an end of sequence NAL unit in decoding order, or 708 has HandleCraAsBlaFlag equal to 1. 710 CRA access unit: An access unit in which the coded picture is a CRA 711 picture. 713 CRA picture: A RAP picture for which each VCL NAL unit has 714 nal_unit_type equal to CRA_NUT. 716 IDR access unit: An access unit in which the coded picture is an IDR 717 picture. 719 IDR picture: A RAP picture for which each VCL NAL unit has 720 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 722 IRAP access unit: An access unit in which the coded picture is an 723 IRAP picture. 725 IRAP picture: A coded picture for which each VCL NAL unit has 726 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23), 727 inclusive. 729 layer: A set of VCL NAL units that all have a particular value of 730 nuh_layer_id and the associated non-VCL NAL units, or one of a set 731 of syntactical structures having a hierarchical relationship. 733 operation point: bitstream created from another bitstream by 734 operation of the sub-bitstream extraction process with the another 735 bitstream, a target highest TemporalId, and a target layer 736 identifier list as inputs. 738 random access: The act of starting the decoding process for a 739 bitstream at a point other than the beginning of the bitstream. 741 sub-layer: A temporal scalable layer of a temporal scalable 742 bitstream consisting of VCL NAL units with a particular value of the 743 TemporalId variable, and the associated non-VCL NAL units. 745 sub-layer representation: A subset of the bitstream consisting of 746 NAL units of a particular sub-layer and the lower sub-layers. 748 tile: A rectangular region of coding tree blocks within a particular 749 tile column and a particular tile row in a picture. 751 tile column: A rectangular region of coding tree blocks having a 752 height equal to the height of the picture and a width specified by 753 syntax elements in the picture parameter set. 755 tile row: A rectangular region of coding tree blocks having a height 756 specified by syntax elements in the picture parameter set and a 757 width equal to the width of the picture. 759 3.1.2 Definitions Specific to This Memo 761 dependee RTP stream: An RTP stream on which another RTP stream 762 depends. All RTP streams in an MSM except for the highest RTP 763 stream are dependee RTP streams. 765 highest RTP stream: The RTP stream on which no other RTP stream 766 depends. The RTP stream in an SSM is the highest RTP stream. 768 media aware network element (MANE): A network element, such as a 769 middlebox, selective forwarding unit, or application layer gateway 770 that is capable of parsing certain aspects of the RTP payload 771 headers or the RTP payload and reacting to their contents. 773 Informative note: The concept of a MANE goes beyond normal 774 routers or gateways in that a MANE has to be aware of the 775 signaling (e.g. to learn about the payload type mappings of the 776 media streams), and in that it has to be trusted when working 777 with SRTP. The advantage of using MANEs is that they allow 778 packets to be dropped according to the needs of the media coding. 779 For example, if a MANE has to drop packets due to congestion on a 780 certain link, it can identify and remove those packets whose 781 elimination produces the least adverse effect on the user 782 experience. After dropping packets, MANEs must rewrite RTCP 783 packets to match the changes to the RTP stream as specified in 784 Section 7 of [RFC3550]. 786 multi-stream mode(MSM): Transmission of an HEVC bitstream using more 787 than one RTP stream. 789 NAL unit decoding order: A NAL unit order that conforms to the 790 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 792 NAL-unit-like structure: A data structure that is similar to NAL 793 units in the sense that it also has a NAL unit header and a payload, 794 with a difference that the payload does not follow the start code 795 emulation prevention mechanism required for the NAL unit syntax as 796 specified in Section 7.3.1.1 of [HEVC]. Examples NAL-unit-like 797 structures defined in this memo are packet payloads of AP, PACI, and 798 FU packets. 800 NALU-time: The value that the RTP timestamp would have if the NAL 801 unit would be transported in its own RTP packet. 803 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within the 804 scope of this memo, one RTP stream is utilized to transport one or 805 more temporal sub-layers. 807 single-stream mode (SSM): Transmission of an HEVC bitstream using 808 only one RTP stream. 810 transmission order: The order of packets in ascending RTP sequence 811 number order (in modulo arithmetic). Within an aggregation packet, 812 the NAL unit transmission order is the same as the order of 813 appearance of NAL units in the packet. 815 3.2 Abbreviations 817 AP Aggregation Packet 819 BLA Broken Link Access 821 CRA Clean Random Access 823 CTB Coding Tree Block 825 CTU Coding Tree Unit 827 CVS Coded Video Sequence 829 DPH Decoded Picture Hash 831 FU Fragmentation Unit 833 GDR Gradual Decoding Refresh 835 HRD Hypothetical Reference Decoder 837 IDR Instantaneous Decoding Refresh 839 IRAP Intra Random Access Point 841 MANE Media Aware Network Element 843 MSM Multi-Stream Mode 845 MTU Maximum Transfer Unit 847 NAL Network Abstraction Layer 849 NALU Network Abstraction Layer Unit 851 PACI PAyload Content Information 852 PHES Payload Header Extension Structure 854 PPS Picture Parameter Set 856 RADL Random Access Decodable Leading (Picture) 858 RASL Random Access Skipped Leading (Picture) 860 RPS Reference Picture Set 862 SEI Supplemental Enhancement Information 864 SPS Sequence Parameter Set 866 SSM Single-Stream Mode 868 STSA Step-wise Temporal Sub-layer Access 870 TSA Temporal Sub-layer Access 872 TCSI Temporal Scalability Control Information 874 VCL Video Coding Layer 876 VPS Video Parameter Set 878 4. RTP Payload Format 880 4.1 RTP Header Usage 882 The format of the RTP header is specified in [RFC3550] and reprinted 883 in Figure 2 for convenience. This payload format uses the fields of 884 the header in a manner consistent with that specification. 886 The RTP payload (and the settings for some RTP header bits) for 887 aggregation packets and fragmentation units are specified in 888 Sections 4.7 and 4.8, respectively. 890 0 1 2 3 891 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 893 |V=2|P|X| CC |M| PT | sequence number | 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 895 | timestamp | 896 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 897 | synchronization source (SSRC) identifier | 898 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 899 | contributing source (CSRC) identifiers | 900 | .... | 901 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 903 Figure 2 RTP header according to [RFC3550] 905 The RTP header information to be set according to this RTP payload 906 format is set as follows: 908 Marker bit (M): 1 bit 910 Set for the last packet, carried in the current RTP stream, of 911 the access unit, in line with the normal use of the M bit in 912 video formats, to allow an efficient playout buffer handling. 913 When MSM is in use, if an access unit appears in multiple RTP 914 streams, the marker bit is set on each RTP stream's last packet 915 of the access unit. 917 Informative note: The content of a NAL unit does not tell 918 whether or not the NAL unit is the last NAL unit, in decoding 919 order, of an access unit. An RTP sender implementation may 920 obtain this information from the video encoder. If, however, 921 the implementation cannot obtain this information directly 922 from the encoder, e.g. when the bitstream was pre-encoded, and 923 also there is no timestamp allocated for each NAL unit, then 924 the sender implementation can inspect subsequent NAL units in 925 decoding order to determine whether or not the NAL unit is the 926 last NAL unit of an access unit as follows. A NAL unit naluX 927 is the last NAL unit of an access unit if it is the last NAL 928 unit of the bitstream or the next VCL NAL unit naluY in 929 decoding order has the high-order bit of the first byte after 930 its NAL unit header equal to 1, and all NAL units between 931 naluX and naluY, when present, have nal_unit_type in the range 932 of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to 933 44, inclusive, or 48 to 55, inclusive. 935 Payload type (PT): 7 bits 937 The assignment of an RTP payload type for this new packet format 938 is outside the scope of this document and will not be specified 939 here. The assignment of a payload type has to be performed 940 either through the profile used or in a dynamic way. 942 Informative note: It is not required to use different payload 943 type values for different RTP streams in MSM. 945 Sequence number (SN): 16 bits 947 Set and used in accordance with RFC 3550. 949 Timestamp: 32 bits 951 The RTP timestamp is set to the sampling timestamp of the 952 content. A 90 kHz clock rate MUST be used. 954 If the NAL unit has no timing properties of its own (e.g. 955 parameter set and SEI NAL units), the RTP timestamp MUST be set 956 to the RTP timestamp of the coded picture of the access unit in 957 which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is 958 included. 960 Receivers MUST use the RTP timestamp for the display process, 961 even when the bitstream contains picture timing SEI messages or 962 decoding unit information SEI messages as specified in [HEVC]. 963 However, this does not mean that picture timing SEI messages in 964 the bitstream should be discarded, as picture timing SEI messages 965 may contain frame-field information that is important in 966 appropriately rendering interlaced video. 968 Synchronization source (SSRC): 32-bits 970 Used to identify the source of the RTP packets. In SSM, by 971 definition a single SSRC is used for all parts of a single 972 bitstream. In MSM, each SSRC is used for an RTP stream 973 containing a subset of the sub-layers for a single (temporally 974 scalable) bitstream. A receiver is required to correctly 975 associate the set of SSRCs that are included parts of the same 976 bitstream. 978 Informative note: The term "bitstream" in this document is 979 equivalent to the term "encoded stream" in [I-D.ietf-avtext- 980 rtp-grouping-taxonomy]. 982 4.2 Payload Header Usage 984 The TID value indicates (among other things) the relative importance 985 of an RTP packet, for example because NAL units belonging to higher 986 temporal sub-layers are not used for the decoding of lower temporal 987 sub-layers. A lower value of TID indicates a higher importance. 988 More important NAL units MAY be better protected against 989 transmission losses than less important NAL units. 991 4.3 Payload Structures 993 The first two bytes of the payload of an RTP packet are referred to 994 as the payload header. The payload header consists of the same 995 fields (F, Type, LayerId, and TID) as the NAL unit header as shown 996 in section 1.1.4, irrespective of the type of the payload structure. 998 Four different types of RTP packet payload structures are specified. 999 A receiver can identify the type of an RTP packet payload through 1000 the Type field in the payload header. 1002 The four different payload structures are as follows: 1004 o Single NAL unit packet: Contains a single NAL unit in the 1005 payload, and the NAL unit header of the NAL unit also serves as 1006 the payload header. This payload structure is specified in 1007 section 4.6. 1009 o Aggregation packet (AP): Contains more than one NAL unit within 1010 one access unit. This payload structure is specified in 1011 section 4.7. 1013 o Fragmentation unit (FU): Contains a subset of a single NAL unit. 1014 This payload structure is specified in section 4.8. 1016 o PACI carrying RTP packet: Contains a payload header (that differs 1017 from other payload headers for efficiency), a Payload Header 1018 Extension Structure (PHES), and a PACI payload. This payload 1019 structure is specified in section 4.9. 1021 4.4 Transmission Modes 1023 This memo enables transmission of an HEVC bitstream over a single 1024 RTP stream or multiple RTP streams. The concept and working 1025 principle is inherited from the design of what was called single and 1026 multiple session transmission in [RFC6190] and follows a similar 1027 design. If only one RTP stream is used for transmission of the HEVC 1028 bitstream, the transmission mode is referred to as single-stream 1029 mode (SSM); otherwise (more than one RTP stream is used for 1030 transmission of the HEVC bitstream), the transmission mode is 1031 referred to as multi-stream mode (MSM). 1033 Dependency of one RTP stream on another RTP stream is typically 1034 indicated as specified in [RFC5583]. When an RTP stream A depends 1035 on another RTP stream B, the RTP stream B is referred to as a 1036 dependee RTP stream of the RTP stream A. 1038 Informative note: An MSM may involve one or more RTP sessions. 1039 For example, each RTP stream in an MSM may be in its own RTP 1040 session. For another example, a set of multiple RTP streams in 1041 an MSM may belong to the same RTP session, e.g. as indicated by 1042 the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or 1043 [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1045 SSM SHOULD be used for point-to-point unicast scenarios, while MSM 1046 SHOULD be used for point-to-multipoint multicast scenarios where 1047 different receivers require different operation points of the same 1048 HEVC bitstream, to improve bandwidth utilizing efficiency. 1050 Informative note: A multicast may degrade to a unicast after all 1051 but one receivers have left (this is a justification of the first 1052 "SHOULD" instead of "MUST"), and there might be scenarios where 1053 MSM is desirable but not possible e.g. when IP multicast is not 1054 deployed in certain network (this is a justification of the 1055 second "SHOULD" instead of "MUST"). 1057 The transmission mode is indicated by the tx-mode media parameter 1058 (see section 7.1). If tx-mode is equal to "SSM", SSM MUST be used. 1059 Otherwise (tx-mode is equal to "MSM"), MSM MUST be used. 1061 Receivers MUST support both SSM and MSM. 1063 4.5 Decoding Order Number 1065 For each NAL unit, the variable AbsDon is derived, representing the 1066 decoding order number that is indicative of the NAL unit decoding 1067 order. 1069 Let NAL unit n be the n-th NAL unit in transmission order within an 1070 RTP stream. 1072 If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to 0, 1073 AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal 1074 to n. 1076 Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is 1077 greater than 0), AbsDon[n] is derived as follows, where DON[n] is 1078 the value of the variable DON for NAL unit n: 1080 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in 1081 transmission order), AbsDon[0] is set equal to DON[0]. 1083 o Otherwise (n is greater than 0), the following applies for 1084 derivation of AbsDon[n]: 1086 If DON[n] == DON[n-1], 1087 AbsDon[n] = AbsDon[n-1] 1089 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1090 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1092 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1093 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1095 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1096 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) 1098 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1099 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1101 For any two NAL units m and n, the following applies: 1103 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1104 follows NAL unit m in NAL unit decoding order. 1106 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order 1107 of the two NAL units can be in either order. 1109 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes 1110 NAL unit m in decoding order. 1112 When two consecutive NAL units in the NAL unit decoding order have 1113 different values of AbsDon, the value of AbsDon for the second NAL 1114 unit in decoding order MUST be greater than the value of AbsDon for 1115 the first NAL unit, and the absolute difference between the two 1116 AbsDon values MAY be greater than or equal to 1. 1118 Informative note: There are multiple reasons to allow for the 1119 absolute difference of the values of AbsDon for two consecutive 1120 NAL units in the NAL unit decoding order to be greater than one. 1121 An increment by one is not required, as at the time of 1122 associating values of AbsDon to NAL units, it may not be known 1123 whether all NAL units are to be delivered to the receiver. For 1124 example, a gateway may not forward VCL NAL units of higher sub- 1125 layers or some SEI NAL units when there is congestion in the 1126 network. In another example, the first intra-coded picture of a 1127 pre-encoded clip is transmitted in advance to ensure that it is 1128 readily available in the receiver, and when transmitting the 1129 first intra-coded picture, the originator does not exactly know 1130 how many NAL units will be encoded before the first intra-coded 1131 picture of the pre-encoded clip follows in decoding order. Thus, 1132 the values of AbsDon for the NAL units of the first intra-coded 1133 picture of the pre-encoded clip have to be estimated when they 1134 are transmitted, and gaps in values of AbsDon may occur. Another 1135 example is MSM where the AbsDon values must indicate cross-layer 1136 decoding order for NAL units conveyed in all the RTP streams. 1138 4.6 Single NAL Unit Packets 1140 A single NAL unit packet contains exactly one NAL unit, and consists 1141 of a payload header (denoted as PayloadHdr), a conditional 16-bit 1142 DONL field (in network byte order), and the NAL unit payload data 1143 (the NAL unit excluding its NAL unit header) of the contained NAL 1144 unit, as shown in Figure 3. 1146 0 1 2 3 1147 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1149 | PayloadHdr | DONL (conditional) | 1150 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1151 | | 1152 | NAL unit payload data | 1153 | | 1154 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1155 | :...OPTIONAL RTP padding | 1156 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1158 Figure 3 The structure a single NAL unit packet 1160 The payload header SHOULD be an exact copy of the NAL unit header of 1161 the contained NAL unit. However, the Type (i.e. nal_unit_type) 1162 field MAY be changed, e.g. when it is desirable to handle a CRA 1163 picture to be a BLA picture [JCTVC-J0107]. 1165 The DONL field, when present, specifies the value of the 16 least 1166 significant bits of the decoding order number of the contained NAL 1167 unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1168 than 0, the DONL field MUST be present, and the variable DON for the 1169 contained NAL unit is derived as equal to the value of the DONL 1170 field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff 1171 is equal to 0), the DONL field MUST NOT be present. 1173 4.7 Aggregation Packets (APs) 1175 Aggregation packets (APs) are introduced to enable the reduction of 1176 packetization overhead for small NAL units, such as most of the non- 1177 VCL NAL units, which are often only a few octets in size. 1179 An AP aggregates NAL units within one access unit. Each NAL unit to 1180 be carried in an AP is encapsulated in an aggregation unit. NAL 1181 units aggregated in one AP are in NAL unit decoding order. 1183 An AP consists of a payload header (denoted as PayloadHdr) followed 1184 by two or more aggregation units, as shown in Figure 4. 1186 0 1 2 3 1187 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1188 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1189 | PayloadHdr (Type=48) | | 1190 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1191 | | 1192 | two or more aggregation units | 1193 | | 1194 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1195 | :...OPTIONAL RTP padding | 1196 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1198 Figure 4 The structure of an aggregation packet 1200 The fields in the payload header are set as follows. The F bit MUST 1201 be equal to 0 if the F bit of each aggregated NAL unit is equal to 1202 zero; otherwise, it MUST be equal to 1. The Type field MUST be 1203 equal to 48. The value of LayerId MUST be equal to the lowest value 1204 of LayerId of all the aggregated NAL units. The value of TID MUST 1205 be the lowest value of TID of all the aggregated NAL units. 1207 Informative Note: All VCL NAL units in an AP have the same TID 1208 value since they belong to the same access unit. However, an AP 1209 may contain non-VCL NAL units for which the TID value in the NAL 1210 unit header may be different than the TID value of the VCL NAL 1211 units in the same AP. 1213 An AP MUST carry at least two aggregation units and can carry as 1214 many aggregation units as necessary; however, the total amount of 1215 data in an AP obviously MUST fit into an IP packet, and the size 1216 SHOULD be chosen so that the resulting IP packet is smaller than the 1217 MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain 1218 Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be 1219 nested; i.e. an AP MUST NOT contain another AP. 1221 The first aggregation unit in an AP consists of a conditional 16-bit 1222 DONL field (in network byte order) followed by a 16-bit unsigned 1223 size information (in network byte order) that indicates the size of 1224 the NAL unit in bytes (excluding these two octets, but including the 1225 NAL unit header), followed by the NAL unit itself, including its NAL 1226 unit header, as shown in Figure 5. 1228 0 1 2 3 1229 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 : DONL (conditional) | NALU size | 1232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1233 | NALU size | | 1234 +-+-+-+-+-+-+-+-+ NAL unit | 1235 | | 1236 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1237 | : 1238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1240 Figure 5 The structure of the first aggregation unit in an AP 1242 The DONL field, when present, specifies the value of the 16 least 1243 significant bits of the decoding order number of the aggregated NAL 1244 unit. 1246 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1247 0, the DONL field MUST be present in an aggregation unit that is the 1248 first aggregation unit in an AP, and the variable DON for the 1249 aggregated NAL unit is derived as equal to the value of the DONL 1250 field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff 1251 is equal to 0), the DONL field MUST NOT be present in an aggregation 1252 unit that is the first aggregation unit in an AP. 1254 An aggregation unit that is not the first aggregation unit in an AP 1255 consists of a conditional 8-bit DOND field followed by a 16-bit 1256 unsigned size information (in network byte order) that indicates the 1257 size of the NAL unit in bytes (excluding these two octets, but 1258 including the NAL unit header), followed by the NAL unit itself, 1259 including its NAL unit header, as shown in Figure 6. 1261 0 1 2 3 1262 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1264 : DOND (cond) | NALU size | 1265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1266 | | 1267 | NAL unit | 1268 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1269 | : 1270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1272 Figure 6 The structure of an aggregation unit that is not the first 1273 aggregation unit in an AP 1275 When present, the DOND field plus 1 specifies the difference between 1276 the decoding order number values of the current aggregated NAL unit 1277 and the preceding aggregated NAL unit in the same AP. 1279 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1280 0, the DOND field MUST be present in an aggregation unit that is not 1281 the first aggregation unit in an AP, and the variable DON for the 1282 aggregated NAL unit is derived as equal to the DON of the preceding 1283 aggregated NAL unit in the same AP plus the value of the DOND field 1284 plus 1 modulo 65536. Otherwise (tx-mode is equal to "SSM" and 1285 sprop-max-don-diff is equal to 0), the DOND field MUST NOT be 1286 present in an aggregation unit that is not the first aggregation 1287 unit in an AP, and in this case the transmission order and decoding 1288 order of NAL units carried in the AP are the same as the order the 1289 NAL units appear in the AP. 1291 Figure 7 presents an example of an AP that contains two aggregation 1292 units, labeled as 1 and 2 in the figure, without the DONL and DOND 1293 fields being present. 1295 0 1 2 3 1296 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 | RTP Header | 1299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1300 | PayloadHdr (Type=48) | NALU 1 Size | 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1302 | NALU 1 HDR | | 1303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1304 | . . . | 1305 | | 1306 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | . . . | NALU 2 Size | NALU 2 HDR | 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1309 | NALU 2 HDR | | 1310 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1311 | . . . | 1312 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1313 | :...OPTIONAL RTP padding | 1314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1316 Figure 7 An example of an AP packet containing two aggregation units 1317 without the DONL and DOND fields 1319 Figure 8 presents an example of an AP that contains two aggregation 1320 units, labeled as 1 and 2 in the figure, with the DONL and DOND 1321 fields being present. 1323 0 1 2 3 1324 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1326 | RTP Header | 1327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1328 | PayloadHdr (Type=48) | NALU 1 DONL | 1329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1330 | NALU 1 Size | NALU 1 HDR | 1331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1332 | | 1333 | NALU 1 Data . . . | 1334 | | 1335 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 | | NALU 2 DOND | NALU 2 Size | 1337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1338 | NALU 2 HDR | | 1339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1340 | | 1341 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1342 | :...OPTIONAL RTP padding | 1343 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1345 Figure 8 An example of an AP containing two aggregation units with 1346 the DONL and DOND fields 1348 4.8 Fragmentation Units (FUs) 1350 Fragmentation units (FUs) are introduced to enable fragmenting a 1351 single NAL unit into multiple RTP packets, possibly without 1352 cooperation or knowledge of the HEVC encoder. A fragment of a NAL 1353 unit consists of an integer number of consecutive octets of that NAL 1354 unit. Fragments of the same NAL unit MUST be sent in consecutive 1355 order with ascending RTP sequence numbers (with no other RTP packets 1356 within the same RTP stream being sent between the first and last 1357 fragment). 1359 When a NAL unit is fragmented and conveyed within FUs, it is 1360 referred to as a fragmented NAL unit. APs MUST NOT be fragmented. 1361 FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of 1362 another FU. 1364 The RTP timestamp of an RTP packet carrying an FU is set to the 1365 NALU-time of the fragmented NAL unit. 1367 An FU consists of a payload header (denoted as PayloadHdr), an FU 1368 header of one octet, a conditional 16-bit DONL field (in network 1369 byte order), and an FU payload, as shown in Figure 9. 1371 0 1 2 3 1372 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1374 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1376 | DONL (cond) | | 1377 |-+-+-+-+-+-+-+-+ | 1378 | FU payload | 1379 | | 1380 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1381 | :...OPTIONAL RTP padding | 1382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1384 Figure 9 The structure of an FU 1386 The fields in the payload header are set as follows. The Type field 1387 MUST be equal to 49. The fields F, LayerId, and TID MUST be equal 1388 to the fields F, LayerId, and TID, respectively, of the fragmented 1389 NAL unit. 1391 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1392 field, as shown in Figure 10. 1394 +---------------+ 1395 |0|1|2|3|4|5|6|7| 1396 +-+-+-+-+-+-+-+-+ 1397 |S|E| FuType | 1398 +---------------+ 1400 Figure 10 The structure of FU header 1402 The semantics of the FU header fields are as follows: 1403 S: 1 bit 1404 When set to one, the S bit indicates the start of a fragmented 1405 NAL unit i.e. the first byte of the FU payload is also the first 1406 byte of the payload of the fragmented NAL unit. When the FU 1407 payload is not the start of the fragmented NAL unit payload, the 1408 S bit MUST be set to zero. 1410 E: 1 bit 1411 When set to one, the E bit indicates the end of a fragmented NAL 1412 unit, i.e. the last byte of the payload is also the last byte of 1413 the fragmented NAL unit. When the FU payload is not the last 1414 fragment of a fragmented NAL unit, the E bit MUST be set to zero. 1416 FuType: 6 bits 1417 The field FuType MUST be equal to the field Type of the 1418 fragmented NAL unit. 1420 The DONL field, when present, specifies the value of the 16 least 1421 significant bits of the decoding order number of the fragmented NAL 1422 unit. 1424 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1425 0, and the S bit is equal to 1, the DONL field MUST be present in 1426 the FU, and the variable DON for the fragmented NAL unit is derived 1427 as equal to the value of the DONL field. Otherwise (tx-mode is 1428 equal to "SSM" and sprop-max-don-diff is equal to 0, or the S bit is 1429 equal to 0), the DONL field MUST NOT be present in the FU. 1431 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1432 the Start bit and End bit MUST NOT both be set to one in the same FU 1433 header. 1435 The FU payload consists of fragments of the payload of the 1436 fragmented NAL unit so that if the FU payloads of consecutive FUs, 1437 starting with an FU with the S bit equal to 1 and ending with an FU 1438 with the E bit equal to 1, are sequentially concatenated, the 1439 payload of the fragmented NAL unit can be reconstructed. The NAL 1440 unit header of the fragmented NAL unit is not included as such in 1441 the FU payload, but rather the information of the NAL unit header of 1442 the fragmented NAL unit is conveyed in F, LayerId, and TID fields of 1443 the FU payload headers of the FUs and the FuType field of the FU 1444 header of the FUs. An FU payload MUST not be empty. 1446 If an FU is lost, the receiver SHOULD discard all following 1447 fragmentation units in transmission order corresponding to the same 1448 fragmented NAL unit, unless the decoder in the receiver is known to 1449 be prepared to gracefully handle incomplete NAL units. 1451 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1452 fragments of a NAL unit to an (incomplete) NAL unit, even if 1453 fragment n of that NAL unit is not received. In this case, the 1454 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1455 syntax violation. 1457 4.9 PACI packets 1459 This section specifies the PACI packet structure. The basic payload 1460 header specified in this memo is intentionally limited to the 16 1461 bits of the NAL unit header so to keep the packetization overhead to 1462 a minimum. However, cases have been identified where it is 1463 advisable to include control information in an easily accessible 1464 position in the packet header, despite the additional overhead. One 1465 such control information is the Temporal Scalability Control 1466 Information as specified in section 4.10 below. PACI packets carry 1467 this and future, similar structures. 1469 The PACI packet structure is based on a payload header extension 1470 mechanism that is generic and extensible to carry payload header 1471 extensions. In this section, the focus lies on the use within this 1472 specification. Section 4.9.2 below provides guidance for the 1473 specification designers in how to employ the extension mechanism in 1474 future specifications. 1476 A PACI packet consists of a payload header (denoted as PayloadHdr), 1477 for which the structure follows what is described in section 4.3 1478 above. The payload header is followed by the fields A, cType, 1479 PHSsize, F[0..2] and Y. 1481 Figure 11 shows a PACI packet in compliance with this memo; that is, 1482 without any extensions. 1484 0 1 2 3 1485 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1486 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1487 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1489 | Payload Header Extension Structure (PHES) | 1490 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1491 | | 1492 | PACI payload: NAL unit | 1493 | . . . | 1494 | | 1495 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1496 | :...OPTIONAL RTP padding | 1497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1499 Figure 11 The structure of a PACI 1501 The fields in the payload header are set as follows. The F bit MUST 1502 be equal to 0. The Type field MUST be equal to 50. The value of 1503 LayerId MUST be a copy of the LayerId field of the PACI payload NAL 1504 unit or NAL-unit-like structure. The value of TID MUST be a copy of 1505 the TID field of the PACI payload NAL unit or NAL-unit-like 1506 structure. 1508 The semantics of other fields are as follows: 1510 A: 1 bit 1511 Copy of the F bit of the PACI payload NAL unit or NAL-unit-like 1512 structure. 1514 cType: 6 bits 1515 Copy of the Type field of the PACI payload NAL unit or NAL-unit- 1516 like structure. 1518 PHSsize: 5 bits 1519 Indicates the total length of the fields F[0..2], Y, and PHES. 1520 The value is limited to be less than or equal to 32 octets, to 1521 simplify encoder design for MTU size matching. 1523 F0 1524 This field equal to 1 specifies the presence of a temporal 1525 scalability support extension in the PHES. 1527 F1, F2 1528 MUST be 0, available for future extensions, see section 4.9.2. 1530 Y: 1 bit 1531 MUST be 0, available for future extensions, see section 4.9.2. 1533 PHES: variable number of octets 1534 A variable number of octets as indicated by the value of PHSsize. 1536 PACI Payload 1537 The NAL unit or NAL-unit-like structure (such as: FU or AP) to be 1538 carried, not including the first two octets. 1540 Informative note: The first two octets of the NAL unit or NAL- 1541 unit-like structure carried in the PACI payload are not 1542 included in the PACI payload. Rather, the respective values 1543 are copied in locations of the PayloadHdr of the RTP packet. 1544 This design offers two advantages: first, the overall 1545 structure of the payload header is preserved, i.e. there is no 1546 special case of payload header structure that needs to be 1547 implemented for PACI. Second, no additional overhead is 1548 introduced. 1550 A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs 1551 MUST NOT be fragmented or aggregated. The following subsection 1552 documents the reasons for these design choices. 1554 4.9.1 Reasons for the PACI rules (informative) 1556 A PACI cannot be fragmented. If a PACI could be fragmented, and a 1557 fragment other than the first fragment would get lost, access to the 1558 information in the PACI would not be possible. Therefore, a PACI 1559 must not be fragmented. In other words, an FU must not carry 1560 (fragments of) a PACI. 1562 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1563 from a compression viewpoint, as, in many cases, several to be 1564 aggregated NAL units would share identical PACI fields and values 1565 which would be carried redundantly for no reason. Most, if not all 1566 the practical effects of PACI aggregation can be achieved by 1567 aggregating NAL units and bundling them with a PACI (see below). 1568 Therefore, a PACI must not be aggregated. In other words, an AP 1569 must not contain a PACI. 1571 The payload of a PACI can be a fragment. Both middleboxes and 1572 sending systems with inflexible (often hardware-based) encoders 1573 occasionally find themselves in situations where a PACI and its 1574 headers, combined, are larger than the MTU size. In such a 1575 scenario, the middlebox or sender can fragment the NAL unit and 1576 encapsulate the fragment in a PACI. Doing so preserves the payload 1577 header extension information for all fragments, allowing downstream 1578 middleboxes and the receiver to take advantage of that information. 1579 Therefore, a sender may place a fragment into a PACI, and a receiver 1580 must be able to handle such a PACI. 1582 The payload of a PACI can be an aggregation NAL unit. HEVC 1583 bitstreams can contain unevenly sized and/or small (when compared to 1584 the MTU size) NAL units. In order to efficiently packetize such 1585 small NAL units, AP were introduced. The benefits of APs are 1586 independent from the need for a payload header extension. 1587 Therefore, a sender may place an AP into a PACI, and a receiver must 1588 be able to handle such a PACI. 1590 4.9.2 PACI extensions (Informative) 1592 This subsection includes recommendations for future specification 1593 designers on how to extent the PACI syntax to accommodate future 1594 extensions. Obviously, designers are free to specify whatever 1595 appears to be appropriate to them at the time of their design. 1596 However, a lot of thought has been invested into the extension 1597 mechanism described below, and we suggest that deviations from it 1598 warrant a good explanation. 1600 This memo defines only a single payload header extension (Temporal 1601 Scalability Control Information, described below in section 4.10), 1602 and, therefore, only the F0 bit carries semantics. F1 and F2 are 1603 already named (and not just marked as reserved, as a typical video 1604 spec designer would do). They are intended to signal two additional 1605 extensions. The Y bit allows to, recursively, add further F and Y 1606 bits to extend the mechanism beyond 3 possible payload header 1607 extensions. It is suggested to define a new packet type (using a 1608 different value for Type) when assigning the F1, F2, or Y bits 1609 different semantics than what is suggested below. 1611 When a Y bit is set, an 8 bit flag-extension is inserted after the Y 1612 bit. A flag-extension consists of 7 flags F[n..n+6], and another Y 1613 bit. 1615 The basic PACI header already includes F0, F1, and F2. Therefore, 1616 the Fx bits in the first flag-extensions are numbered F3, F4, ..., 1617 F9, the F bits in the second flag-extension are numbered F10, F11, 1618 ..., F16, and so forth. As a result, at least 3 Fx bits are always 1619 in the PACI, but the number of Fx bits (and associated types of 1620 extensions), can be increased by setting the next Y bit and adding 1621 an octet of flag-extensions, carrying 7 flags and another Y bit. 1622 The size of this list of flags is subject to the limits specified in 1623 section 4.9 (32 octets for all flag-extensions and the PHES 1624 information combined). 1626 Each of the F bits can indicate either the presence of information 1627 in the Payload Header Extension Structure (PHES), described below, 1628 or a given F bit can indicate a certain condition, without including 1629 additional information in the PHES. 1631 When a spec developer devises a new syntax that takes advantage of 1632 the PACI extension mechanism, he/she must follow the constraints 1633 listed below; otherwise the extension mechanism may break. 1635 1) The fields added for a particular Fx bit MUST be fixed in 1636 length and not depend on what other Fx bits are set (no parsing 1637 dependency). 1638 2) The Fx bits must be assigned in order. 1639 3) An implementation that supports the n-th Fn bit for any value 1640 of n must understand the syntax (though not necessarily the 1641 semantics) of the fields Fk (with k < n), so to be able to 1642 either use those bits when present, or at least be able to skip 1643 over them. 1645 4.10 Temporal Scalability Control Information 1647 This section describes the single payload header extension defined 1648 in this specification, known as Temporal Scalability Control 1649 Information (TSCI). If, in the future, additional payload header 1650 extensions become necessary, they could be specified in this section 1651 of an updated version of this document, or in their own documents. 1653 When F0 is set to 1 in a PACI, this specifies that the PHES field 1654 includes the TSCI fields TL0REFIDX, IrapPicID, S, and E as follows: 1656 0 1 2 3 1657 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1659 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1661 | TL0REFIDX | IrapPicID |S|E|RES| | 1662 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1663 | .... | 1664 | PACI payload: NAL unit | 1665 | | 1666 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1667 | :...OPTIONAL RTP padding | 1668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1670 Figure 12 The structure of a PACI with a PHES containing a TSCI 1672 TL0PICIDX (8 bits) 1673 When present, the TL0PICIDX field MUST be set to equal to 1674 temporal_sub_layer_zero_idx as specified in Section D.3.32 of 1675 [H.265] for the access unit containing the NAL unit in the PACI. 1677 IrapPicID (8 bits) 1678 When present, the IrapPicID field MUST be set to equal to 1679 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1680 access unit containing the NAL unit in the PACI. 1682 S (1 bit) 1683 The S bit MUST be set to 1 if any of the following conditions is 1684 true and MUST be set to 0 otherwise: 1686 . The NAL unit in the payload of the PACI is the first VCL NAL 1687 unit, in decoding order, of a picture. 1688 . The NAL unit in the payload of the PACI is an AP and the NAL 1689 unit in the first contained aggregation unit is the first VCL 1690 NAL unit, in decoding order, of a picture. 1691 . The NAL unit in the payload of the PACI is an FU with its S bit 1692 equal to 1 and the FU payload containing a fragment of the 1693 first VCL NAL unit, in decoding order of a picture. 1695 E (1 bit) 1696 The E bit MUST be set to 1 if any of the following conditions is 1697 true and MUST be set to 0 otherwise: 1699 . The NAL unit in the payload of the PACI is the last VCL NAL 1700 unit, in decoding order, of a picture. 1701 . The NAL unit in the payload of the PACI is an AP and the NAL 1702 unit in the last contained aggregation unit is the last VCL NAL 1703 unit, in decoding order, of a picture. 1704 . The NAL unit in the payload of the PACI is an FU with its E bit 1705 equal to 1 and the FU payload containing a fragment of the last 1706 VCL NAL unit, in decoding order of a picture. 1708 RES (2 bits) 1709 MUST be equal to 0. Reserved for future extensions. 1711 The value of PHSsize MUST be set to 3. Receivers MUST allow other 1712 values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any 1713 additional fields, when present, than specified above in the PHES. 1715 5. Packetization Rules 1717 The following packetization rules apply: 1719 o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1720 than 0 for an RTP stream, the transmission order of NAL units 1721 carried in the RTP stream MAY be different than the NAL unit 1722 decoding order. Otherwise (tx-mode is equal to "SSM" and sprop- 1723 max-don-diff is equal to 0 for an RTP stream), the transmission 1724 order of NAL units carried in the RTP stream MUST be the same as 1725 the NAL unit decoding order. 1727 o A NAL unit of a small size SHOULD be encapsulated in an 1728 aggregation packet together with one or more other NAL units in 1729 order to avoid the unnecessary packetization overhead for small 1730 NAL units. For example, non-VCL NAL units such as access unit 1731 delimiters, parameter sets, or SEI NAL units are typically small 1732 and can often be aggregated with VCL NAL units without violating 1733 MTU size constraints. 1735 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1736 match viewpoint, be encapsulated in an aggregation packet 1737 together with its associated VCL NAL unit, as typically a non-VCL 1738 NAL unit would be meaningless without the associated VCL NAL unit 1739 being available. 1741 o For carrying exactly one NAL unit in an RTP packet, a single NAL 1742 unit packet MUST be used. 1744 6. De-packetization Process 1746 The general concept behind de-packetization is to get the NAL units 1747 out of the RTP packets in an RTP stream and all RTP streams the RTP 1748 stream depends on, if any, and pass them to the decoder in the NAL 1749 unit decoding order. 1751 The de-packetization process is implementation dependent. 1752 Therefore, the following description should be seen as an example of 1753 a suitable implementation. Other schemes may be used as well as 1754 long as the output for the same input is the same as the process 1755 described below. The output is the same when the set of output NAL 1756 units and their order are both identical. Optimizations relative to 1757 the described algorithms are possible. 1759 All normal RTP mechanisms related to buffer management apply. In 1760 particular, duplicated or outdated RTP packets (as indicated by the 1761 RTP sequences number and the RTP timestamp) are removed. To 1762 determine the exact time for decoding, factors such as a possible 1763 intentional delay to allow for proper inter-stream synchronization 1764 must be factored in. 1766 NAL units with NAL unit type values in the range of 0 to 47, 1767 inclusive may be passed to the decoder. NAL-unit-like structures 1768 with NAL unit type values in the range of 48 to 63, inclusive, MUST 1769 NOT be passed to the decoder. 1771 The receiver includes a receiver buffer, which is used to compensate 1772 for transmission delay jitter within individual RTP streams and 1773 across RTP streams, to reorder NAL units from transmission order to 1774 the NAL unit decoding order, and to recover the NAL unit decoding 1775 order in MSM, when applicable. In this section, the receiver 1776 operation is described under the assumption that there is no 1777 transmission delay jitter within an RTP stream and across RTP 1778 streams. To make a difference from a practical receiver buffer that 1779 is also used for compensation of transmission delay jitter, the 1780 receiver buffer is here after called the de-packetization buffer in 1781 this section. Receivers should also prepare for transmission delay 1782 jitter; i.e. either reserve separate buffers for transmission delay 1783 jitter buffering and de-packetization buffering or use a receiver 1784 buffer for both transmission delay jitter and de-packetization. 1785 Moreover, receivers should take transmission delay jitter into 1786 account in the buffering operation; e.g. by additional initial 1787 buffering before starting of decoding and playback. 1789 If only one RTP stream is being received and sprop-max-don-diff of 1790 the only RTP stream being received is equal to 0, the de- 1791 packetization buffer size is zero bytes, i.e. the NAL units carried 1792 in the RTP stream are directly passed to the decoder in their 1793 transmission order, which is identical to the decoding order of the 1794 NAL units. Otherwise, the process described in the remainder of this 1795 section applies. 1797 There are two buffering states in the receiver: initial buffering 1798 and buffering while playing. Initial buffering starts when the 1799 reception is initialized. After initial buffering, decoding and 1800 playback are started, and the buffering-while-playing mode is used. 1802 Regardless of the buffering state, the receiver stores incoming NAL 1803 units, in reception order, into the de-packetization buffer. NAL 1804 units carried in RTP packets are stored in the de-packetization 1805 buffer individually, and the value of AbsDon is calculated and 1806 stored for each NAL unit. When MSM is in use, NAL units of all RTP 1807 streams of a bitstream are stored in the same de-packetization 1808 buffer. When NAL units carried in any two RTP streams are available 1809 to be placed into the de-packetization buffer, those NAL units 1810 carried in the RTP stream that is lower in the dependency tree are 1811 placed into the buffer first. For example, if RTP stream A depends 1812 on RTP stream B, then NAL units carried in RTP stream B are placed 1813 into the buffer first. 1815 Initial buffering lasts until condition A (the difference between 1816 the greatest and smallest AbsDon values of the NAL units in the de- 1817 packetization buffer is greater than or equal to the value of sprop- 1818 max-don-diff of the highest RTP stream) or condition B (the number 1819 of NAL units in the de-packetization buffer is greater than the 1820 value of sprop-depack-buf-nalus) is true. 1822 After initial buffering, whenever condition A or condition B is 1823 true, the following operation is repeatedly applied until both 1824 condition A and condition A become false: 1826 o The NAL unit in the de-packetization buffer with the smallest 1827 value of AbsDon is removed from the de-packetization buffer and 1828 passed to the decoder. 1830 When no more NAL units are flowing into the de-packetization buffer, 1831 all NAL units remaining in the de-packetization buffer are removed 1832 from the buffer and passed to the decoder in the order of increasing 1833 AbsDon values. 1835 7. Payload Format Parameters 1837 This section specifies the parameters that MAY be used to select 1838 optional features of the payload format and certain features or 1839 properties of the bitstream or the RTP stream. The parameters are 1840 specified here as part of the media type registration for the HEVC 1841 codec. A mapping of the parameters into the Session Description 1842 Protocol (SDP) [RFC4566] is also provided for applications that use 1843 SDP. Equivalent parameters could be defined elsewhere for use with 1844 control protocols that do not use SDP. 1846 7.1 Media Type Registration 1848 The media subtype for the HEVC codec is allocated from the IETF 1849 tree. 1851 The receiver MUST ignore any unrecognized parameter. 1853 Media Type name: video 1855 Media subtype name: H265 1857 Required parameters: none 1859 OPTIONAL parameters: 1861 profile-space, tier-flag, profile-id, profile-compatibility- 1862 indicator, interop-constraints, and level-id: 1864 These parameters indicate the profile, tier, default level, 1865 and some constraints of the bitstream carried by the RTP 1866 stream and all RTP streams the RTP stream depends on, or a 1867 specific set of the profile, tier, default level, and some 1868 constraints the receiver supports. 1870 The profile and some constraints are indicated collectively by 1871 profile-space, profile-id, profile-compatibility-indicator, 1872 and interop-constraints. The profile specifies the subset of 1873 coding tools that may have been used to generate the bitstream 1874 or that the receiver supports. 1876 Informative note: There are 32 values of profile-id, and 1877 there are 32 flags in profile-compatibility-indicator, each 1878 flag corresponding to one value of profile-id. According 1879 to HEVC version 1 in [HEVC], when more than one of the 32 1880 flags is set for a bitstream, the bitstream would comply 1881 with all the profiles corresponding to the set flags. 1882 However, in a draft of HEVC version 2 in [HEVC draft v2], 1883 subclause A.3.5, 19 Format Range Extensions profiles have 1884 been specified, all using the same value of profile-id (4), 1885 differentiated by some of the 48 bits in interop- 1886 constraints - this (rather unexpected way of profile 1887 signalling) means that one of the 32 flags may correspond 1888 to multiple profiles. To be able to support whatever HEVC 1889 extension profile that might be specified and indicated 1890 using profile-space, profile-id, profile-compatibility- 1891 indicator, and interop-constraints in the future, it would 1892 be safe to require symmetric use of these parameters in SDP 1893 offer/answer unless recv-sub-layer-id is included in the 1894 SDP answer for choosing one of the sub-layers offered. 1896 The tier is indicated by tier-flag. The default level is 1897 indicated by level-id. The tier and the default level specify 1898 the limits on values of syntax elements or arithmetic 1899 combinations of values of syntax elements that are followed 1900 when generating the bitstream or that the receiver supports. 1902 A set of profile-space, tier-flag, profile-id, profile- 1903 compatibility-indicator, interop-constraints, and level-id 1904 parameters ptlA is said to be consistent with another set of 1905 these parameters ptlB if any decoder that conforms to the 1906 profile, tier, level, and constraints indicated by ptlB can 1907 decode any bitstream that conforms to the profile, tier, 1908 level, and constraints indicated by ptlA. 1910 In SDP offer/answer, when the SDP answer does not include the 1911 recv-sub-layer-id parameter that is less than the sprop-sub- 1912 layer-id parameter in the SDP offer, the following applies: 1914 o The profile-space, tier-flag, profile-id, profile- 1915 compatibility-indicator, and interop-constraints 1916 parameters MUST be used symmetrically, i.e. the value of 1917 each of these parameters in the offer MUST be the same as 1918 that in the answer, either explicitly signalled or 1919 implicitly inferred. 1920 o The level-id parameter is changeable as long as the 1921 highest level indicated by the answer is either equal to 1922 or lower than that in the offer. Note that the highest 1923 level is indicated by level-id and max-recv-level-id 1924 together. 1926 In SDP offer/answer, when the SDP answer does include the 1927 recv-sub-layer-id parameter that is less than the sprop-sub- 1928 layer-id parameter in the SDP offer, the set of profile-space, 1929 tier-flag, profile-id, profile-compatibility-indicator, 1930 interop-constraints, and level-id parameters included in the 1931 answer MUST be consistent with that for the chosen sub-layer 1932 representation as indicated in the SDP offer, with the 1933 exception that the level-id parameter in the SDP answer is 1934 changable as long as the highest level indicated by the answer 1935 is either lower than or equal to that in the offer. 1937 More specifications of these parameters, including how they 1938 relate to the values of the profile, tier, and level syntax 1939 elements specified in [HEVC] are provided below. 1941 profile-space, profile-id: 1943 The value of profile-space MUST be in the range of 0 to 3, 1944 inclusive. The value of profile-id MUST be in the range of 0 1945 to 31, inclusive. 1947 When profile-space is not present, a value of 0 MUST be 1948 inferred. When profile-id is not present, a value of 1 (i.e. 1949 the Main profile) MUST be inferred. 1951 When used to indicate properties of a bitstream, profile-space 1952 and profile-id are derived from the profile, tier, and level 1953 syntax elements in SPS or VPS NAL units as follows, where 1954 general_profile_space, general_profile_idc, 1955 sub_layer_profile_space[j], and sub_layer_profile_idc[j] are 1956 specified in [HEVC]: 1958 If the RTP stream is the highest RTP stream, the following 1959 applies: 1961 o profile_space = general_profile_space 1962 o profile_id = general_profile_idc 1964 Otherwise (the RTP stream is a dependee RTP stream), the 1965 following applies, with j being the value of the sprop-sub- 1966 layer-id parameter: 1968 o profile_space = sub_layer_profile_space[j] 1969 o profile_id = sub_layer_profile_idc[j] 1971 tier-flag, level-id: 1973 The value of tier-flag MUST be in the range of 0 to 1, 1974 inclusive. The value of level-id MUST be in the range of 0 1975 to 255, inclusive. 1977 If the tier-flag and level-id parameters are used to indicate 1978 properties of a bitstream, they indicate the tier and the 1979 highest level the bitstream complies with. 1981 If the tier-flag and level-id parameters are used for 1982 capability exchange, the following applies. If max-recv- 1983 level-id is not present, the default level defined by level-id 1984 indicates the highest level the codec wishes to support. 1985 Otherwise, max-recv-level-id indicates the highest level the 1986 codec supports for receiving. For either receiving or 1987 sending, all levels that are lower than the highest level 1988 supported MUST also be supported. 1990 If no tier-flag is present, a value of 0 MUST be inferred and 1991 if no level-id is present, a value of 93 (i.e. level 3.1) MUST 1992 be inferred. 1994 When used to indicate properties of a bitstream, the tier-flag 1995 and level-id parameters are derived from the profile, tier, 1996 and level syntax elements in SPS or VPS NAL units as follows, 1997 where general_tier_flag, general_level_idc, 1998 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 1999 specified in [HEVC]: 2001 If the RTP stream is the highest RTP stream, the following 2002 applies: 2004 o tier-flag = general_tier_flag 2005 o level-id = general_level_idc 2007 Otherwise (the RTP stream is a dependee RTP stream), the 2008 following applies, with j being the value of the sprop-sub- 2009 layer-id parameter: 2011 o tier-flag = sub_layer_tier_flag[j] 2012 o level-id = sub_layer_level_idc[j] 2014 interop-constraints: 2016 A base16 [RFC4648] (hexadecimal) representation of six bytes 2017 of data, consisting of progressive_source_flag, 2018 interlaced_source_flag, non_packed_constraint_flag, 2019 frame_only_constraint_flag, and reserved_zero_44bits. 2021 If the interop-constraints parameter is not present, the 2022 following MUST be inferred: 2024 o progressive_source_flag = 1 2025 o interlaced_source_flag = 0 2026 o non_packed_constraint_flag = 1 2027 o frame_only_constraint_flag = 1 2028 o reserved_zero_44bits = 0 2030 When the interop-constraints parameter is used to indicate 2031 properties of a bitstream, the following applies, where 2032 general_progressive_source_flag, 2033 general_interlaced_source_flag, 2034 general_non_packed_constraint_flag, 2035 general_non_packed_constraint_flag, 2036 general_frame_only_constraint_flag, 2037 general_reserved_zero_44bits, 2038 sub_layer_progressive_source_flag[j], 2039 sub_layer_interlaced_source_flag[j], 2040 sub_layer_non_packed_constraint_flag[j], 2041 sub_layer_frame_only_constraint_flag[j], and 2042 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2044 If the RTP stream is the highest RTP stream, the following 2045 applies: 2047 o progressive_source_flag = general_progressive_source_flag 2048 o interlaced_source_flag = general_interlaced_source_flag 2049 o non_packed_constraint_flag = 2050 general_non_packed_constraint_flag 2051 o frame_only_constraint_flag = 2052 general_frame_only_constraint_flag 2053 o reserved_zero_44bits = general_reserved_zero_44bits 2055 Otherwise (the RTP stream is a dependee RTP stream), the 2056 following applies, with j being the value of the sprop-sub- 2057 layer-id parameter: 2059 o progressive_source_flag = 2060 sub_layer_progressive_source_flag[j] 2061 o interlaced_source_flag = 2062 sub_layer_interlaced_source_flag[j] 2063 o non_packed_constraint_flag = 2064 sub_layer_non_packed_constraint_flag[j] 2065 o frame_only_constraint_flag = 2066 sub_layer_frame_only_constraint_flag[j] 2067 o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] 2069 Using interop-constraints for capability exchange results in a 2070 requirement on any bitstream to be compliant with the interop- 2071 constraints. 2073 profile-compatibility-indicator: 2075 A base16 [RFC4648] representation of four bytes of data. 2077 When profile-compatibility-indicator is used to indicate 2078 properties of a bitstream, the following applies, where 2079 general_profile_compatibility_flag[j] and 2080 sub_layer_profile_compatibility_flag[i][j] are specified in 2081 [HEVC]: 2083 The profile-compatibility-indicator in this case indicates 2084 additional profiles to the profile defined by 2085 profile_space, profile_id, and interop-constraints the 2086 bitstream conforms to. A decoder that conforms to any of 2087 all the profiles the bitstream conforms to would be capable 2088 of decoding the bitstream. These additional profiles are 2089 defined by profile-space, each set bit of profile- 2090 compatibility-indicator, and interop-constraints. 2092 If the RTP stream is the highest RTP stream, the following 2093 applies for each value of j in the range of 0 to 31, 2094 inclusive: 2096 o bit j of profile-compatibility-indicator = 2097 general_profile_compatibility_flag[j] 2099 Otherwise (the RTP stream is a dependee RTP stream), the 2100 following applies for i equal to sprop-sub-layer-id and for 2101 each value of j in the range of 0 to 31, inclusive: 2103 o bit j of profile-compatibility-indicator = 2104 sub_layer_profile_compatibility_flag[i][j] 2106 Using profile-compatibility-indicator for capability exchange 2107 results in a requirement on any bitstream to be compliant with 2108 the profile-compatibility-indicator. This is intended to 2109 handle cases where any future HEVC profile is defined as an 2110 intersection of two or more profiles. 2112 If this parameter is not present, this parameter defaults to 2113 the following: bit j, with j equal to profile-id, of profile- 2114 compatibility-indicator is inferred to be equal to 1, and all 2115 other bits are inferred to be equal to 0. 2117 sprop-sub-layer-id: 2119 This parameter MAY be used to indicate the highest allowed 2120 value of TID in the bitstream. When not present, the value of 2121 sprop-sub-layer-id is inferred to be equal to 6. 2123 The value of sprop-sub-layer-id MUST be in the range of 0 2124 to 6, inclusive. 2126 recv-sub-layer-id: 2128 This parameter MAY be used to signal a receiver's choice of 2129 the offered or declared sub-layer representations in the 2130 sprop-vps. The value of recv-sub-layer-id indicates the TID 2131 of the highest sub-layer of the bitstream that a receiver 2132 supports. When not present, the value of recv-sub-layer-id is 2133 inferred to be equal to the value of the sprop-sub-layer-id 2134 parameter in the SDP offer. 2136 The value of recv-sub-layer-id MUST be in the range of 0 to 6, 2137 inclusive. 2139 max-recv-level-id: 2141 This parameter MAY be used to indicate the highest level a 2142 receiver supports. The highest level the receiver supports is 2143 equal to the value of max-recv-level-id divided by 30. 2145 The value of max-recv-level-id MUST be in the range of 0 2146 to 255, inclusive. 2148 When max-recv-level-id is not present, the value is inferred 2149 to be equal to level-id. 2151 max-recv-level-id MUST NOT be present when the highest level 2152 the receiver supports is not higher than the default level. 2154 tx-mode: 2156 This parameter indicates whether the transmission mode is SSM 2157 or MSM. 2159 The value of tx-mode MUST be equal to either "MSM" or "SSM". 2160 When not present, the value of tx-mode is inferred to be equal 2161 to "SSM". 2163 If the value is equal to "MSM", MSM MUST be in use. Otherwise 2164 (the value is equal to "SSM"), SSM MUST be in use. 2166 The value of tx-mode MUST be equal to "MSM" for all RTP 2167 sessions in an MSM. 2169 sprop-vps: 2171 This parameter MAY be used to convey any video parameter set 2172 NAL unit of the bitstream for out-of-band transmission of 2173 video parameter sets. The parameter MAY also be used for 2174 capability exchange and to indicate sub-stream characteristics 2175 (i.e. properties of sub-layer representations as defined in 2176 [HEVC]). The value of the parameter is a comma-separated 2177 (',') list of base64 [RFC4648] representations of the video 2178 parameter set NAL units as specified in Section 7.3.2.1 of 2179 [HEVC]. 2181 The sprop-vps parameter MAY contain one or more than one video 2182 parameter set NAL unit. However, all other video parameter 2183 sets contained in the sprop-vps parameter MUST be consistent 2184 with the first video parameter set in the sprop-vps parameter. 2185 A video parameter set vpsB is said to be consistent with 2186 another video parameter set vpsA if any decoder that conforms 2187 to the profile, tier, level, and constraints indicated by the 2188 12 bytes of data starting from the syntax element 2189 general_profile_space to the syntax element general_level_id, 2190 inclusive, in the first profile_tier_level( ) syntax structure 2191 in vpsA can decode any bitstream that conforms to the profile, 2192 tier, level, and constraints indicated by the 12 bytes of data 2193 starting from the syntax element general_profile_space to the 2194 syntax element general_level_id, inclusive, in the first 2195 profile_tier_level( ) syntax structure in vpsB. 2197 sprop-sps: 2199 This parameter MAY be used to convey sequence parameter set 2200 NAL units of the bitstream for out-of-band transmission of 2201 sequence parameter sets. The value of the parameter is a 2202 comma-separated (',') list of base64 [RFC4648] representations 2203 of the sequence parameter set NAL units as specified in 2204 Section 7.3.2.2 of [HEVC]. 2206 sprop-pps: 2208 This parameter MAY be used to convey picture parameter set NAL 2209 units of the bitstream for out-of-band transmission of picture 2210 parameter sets. The value of the parameter is a comma- 2211 separated (',') list of base64 [RFC4648] representations of 2212 the picture parameter set NAL units as specified in Section 2213 7.3.2.3 of [HEVC]. 2215 sprop-sei: 2217 This parameter MAY be used to convey one or more SEI messages 2218 that describe bitstream characteristics. When present, a 2219 decoder can rely on the bitstream characteristics that are 2220 described in the SEI messages for the entire duration of the 2221 session, independently from the persistence scopes of the SEI 2222 messages as specified in [HEVC]. 2224 The value of the parameter is a comma-separated (',') list of 2225 base64 [RFC4648] representations of SEI NAL units as specified 2226 in Section 7.3.2.4 of [HEVC]. 2228 Informative note: Intentionally, no list of applicable or 2229 inapplicable SEI messages is specified here. Conveying 2230 certain SEI messages in sprop-sei may be sensible in some 2231 application scenarios and meaningless in others. However, 2232 a few examples are described below: 2234 1) In an environment where the bitstream was created from 2235 film-based source material, and no splicing is going to 2236 occur during the lifetime of the session, the film grain 2237 characteristics SEI message or the tone mapping 2238 information SEI message are likely meaningful, and 2239 sending them in sprop-sei rather than in the bitstream 2240 at each entry point may help saving bits and allows to 2241 configure the renderer only once, avoiding unwanted 2242 artifacts. 2243 2) The structure of pictures information SEI message in 2244 sprop-sei can be used to inform a decoder of information 2245 on the NAL unit types, picture order count values, and 2246 prediction dependencies of a sequence of pictures. 2247 Having such knowledge can be helpful for error recovery. 2248 3) Examples for SEI messages that would be meaningless to 2249 be conveyed in sprop-sei include the decoded picture 2250 hash SEI message (it is close to impossible that all 2251 decoded pictures have the same hash-tag), the display 2252 orientation SEI message when the device is a handheld 2253 device (as the display orientation may change when the 2254 handheld device is turned around), or the filler payload 2255 SEI message (as there is no point in just having more 2256 bits in SDP). 2258 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2260 These parameters MAY be used to signal the capabilities of a 2261 receiver implementation. These parameters MUST NOT be used 2262 for any other purpose. The highest level (specified by max- 2263 recv-level-id) MUST be such that the receiver is fully capable 2264 of supporting. max-lsr, max-lps, max-cpb, max-dpb, max-br, 2265 max-tr, and max-tc MAY be used to indicate capabilities of the 2266 receiver that extend the required capabilities of the highest 2267 level, as specified below. 2269 When more than one parameter from the set (max-lsr, max-lps, 2270 max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the 2271 receiver MUST support all signaled capabilities 2272 simultaneously. For example, if both max-lsr and max-br are 2273 present, the highest level with the extension of both the 2274 picture rate and bitrate is supported. That is, the receiver 2275 is able to decode bitstreams in which the luma sample rate is 2276 up to max-lsr (inclusive), the bitrate is up to max-br 2277 (inclusive), the coded picture buffer size is derived as 2278 specified in the semantics of the max-br parameter below, and 2279 the other properties comply with the highest level specified 2280 by max-recv-level-id. 2282 Informative note: When the OPTIONAL media type parameters 2283 are used to signal the properties of a bitstream, and max- 2284 lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc 2285 are not present, the values of profile-space, tier-flag, 2286 profile-id, profile-compatibility-indicator, interop- 2287 constraints, and level-id must always be such that the 2288 bitstream complies fully with the specified profile, tier, 2289 and level. 2291 max-lsr: 2292 The value of max-lsr is an integer indicating the maximum 2293 processing rate in units of luma samples per second. The max- 2294 lsr parameter signals that the receiver is capable of decoding 2295 video at a higher rate than is required by the highest level. 2297 When max-lsr is signaled, the receiver MUST be able to decode 2298 bitstreams that conform to the highest level, with the 2299 exception that the MaxLumaSR value in Table A-2 of [HEVC] for 2300 the highest level is replaced with the value of max-lsr. 2301 Senders MAY use this knowledge to send pictures of a given 2302 size at a higher picture rate than is indicated in the highest 2303 level. 2305 When not present, the value of max-lsr is inferred to be equal 2306 to the value of MaxLumaSR given in Table A-2 of [HEVC] for the 2307 highest level. 2309 The value of max-lsr MUST be in the range of MaxLumaSR to 2310 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table 2311 A-2 of [HEVC] for the highest level. 2313 max-lps: 2314 The value of max-lps is an integer indicating the maximum 2315 picture size in units of luma samples. The max-lps parameter 2316 signals that the receiver is capable of decoding larger 2317 picture sizes than are required by the highest level. When 2318 max-lps is signaled, the receiver MUST be able to decode 2319 bitstreams that conform to the highest level, with the 2320 exception that the MaxLumaPS value in Table A-1 of [HEVC] for 2321 the highest level is replaced with the value of max-lps. 2322 Senders MAY use this knowledge to send larger pictures at a 2323 proportionally lower picture rate than is indicated in the 2324 highest level. 2326 When not present, the value of max-lps is inferred to be equal 2327 to the value of MaxLumaPS given in Table A-1 of [HEVC] for the 2328 highest level. 2330 The value of max-lps MUST be in the range of MaxLumaPS to 2331 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table 2332 A-1 of [HEVC] for the highest level. 2334 max-cpb: 2335 The value of max-cpb is an integer indicating the maximum 2336 coded picture buffer size in units of CpbBrVclFactor bits for 2337 the VCL HRD parameters and in units of CpbBrNalFactor bits for 2338 the NAL HRD parameters, where CpbBrVclFactor and 2339 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- 2340 cpb parameter signals that the receiver has more memory than 2341 the minimum amount of coded picture buffer memory required by 2342 the highest level. When max-cpb is signaled, the receiver 2343 MUST be able to decode bitstreams that conform to the highest 2344 level, with the exception that the MaxCPB value in Table A-1 2345 of [HEVC] for the highest level is replaced with the value of 2346 max-cpb. Senders MAY use this knowledge to construct coded 2347 bitstreams with greater variation of bitrate than can be 2348 achieved with the MaxCPB value in Table A-1 of [HEVC]. 2350 When not present, the value of max-cpb is inferred to be equal 2351 to the value of MaxCPB given in Table A-1 of [HEVC] for the 2352 highest level. 2354 The value of max-cpb MUST be in the range of MaxCPB to 2355 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 2356 of [HEVC] for the highest level. 2358 Informative note: The coded picture buffer is used in the 2359 hypothetical reference decoder (Annex C of HEVC). The use 2360 of the hypothetical reference decoder is recommended in 2361 HEVC encoders to verify that the produced bitstream 2362 conforms to the standard and to control the output bitrate. 2363 Thus, the coded picture buffer is conceptually independent 2364 of any other potential buffers in the receiver, including 2365 de-packetization and de-jitter buffers. The coded picture 2366 buffer need not be implemented in decoders as specified in 2367 Annex C of HEVC, but rather standard-compliant decoders can 2368 have any buffering arrangements provided that they can 2369 decode standard-compliant bitstreams. Thus, in practice, 2370 the input buffer for a video decoder can be integrated with 2371 de-packetization and de-jitter buffers of the receiver. 2373 max-dpb: 2374 The value of max-dpb is an integer indicating the maximum 2375 decoded picture buffer size in units decoded pictures at the 2376 MaxLumaPS for the highest level, i.e. the number of decoded 2377 pictures at the maximum picture size defined by the highest 2378 level. The value of max-dpb MUST be in the range of 1 to 16, 2379 respectively. The max-dpb parameter signals that the receiver 2380 has more memory than the minimum amount of decoded picture 2381 buffer memory required by default, which is MaxDpbPicBuf as 2382 defined in [HEVC] (equal to 6). When max-dpb is signaled, the 2383 receiver MUST be able to decode bitstreams that conform to the 2384 highest level, with the exception that the MaxDpbPicBuff value 2385 defined in [HEVC] as 6 is replaced with the value of max-dpb. 2386 Consequently, a receiver that signals max-dpb MUST be capable 2387 of storing the following number of decoded pictures 2388 (MaxDpbSize) in its decoded picture buffer: 2390 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2391 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2392 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2393 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2394 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) 2395 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2396 else 2397 MaxDpbSize = max-dpb 2399 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest 2400 level and PicSizeInSamplesY is the current size of each 2401 decoded picture in units of luma samples as defined in [HEVC]. 2403 The value of max-dpb MUST be greater than or equal to the 2404 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders 2405 MAY use this knowledge to construct coded bitstreams with 2406 improved compression. 2408 When not present, the value of max-dpb is inferred to be equal 2409 to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2411 Informative note: This parameter was added primarily to 2412 complement a similar codepoint in the ITU-T Recommendation 2413 H.245, so as to facilitate signaling gateway designs. The 2414 decoded picture buffer stores reconstructed samples. There 2415 is no relationship between the size of the decoded picture 2416 buffer and the buffers used in RTP, especially de- 2417 packetization and de-jitter buffers. 2419 max-br: 2420 The value of max-br is an integer indicating the maximum video 2421 bitrate in units of CpbBrVclFactor bits per second for the VCL 2422 HRD parameters and in units of CpbBrNalFactor bits per second 2423 for the NAL HRD parameters, where CpbBrVclFactor and 2424 CpbBrNalFactor are defined in Section A.4 of [HEVC]. 2426 The max-br parameter signals that the video decoder of the 2427 receiver is capable of decoding video at a higher bitrate than 2428 is required by the highest level. 2430 When max-br is signaled, the video codec of the receiver MUST 2431 be able to decode bitstreams that conform to the highest 2432 level, with the following exceptions in the limits specified 2433 by the highest level: 2435 o The value of max-br replaces the MaxBR value in Table A-2 2436 of [HEVC] for the highest level. 2437 o When the max-cpb parameter is not present, the result of 2438 the following formula replaces the value of MaxCPB in Table 2439 A-1 of [HEVC]: 2441 (MaxCPB of the highest level) * max-br / (MaxBR of the 2442 highest level) 2444 For example, if a receiver signals capability for Main profile 2445 Level 2 with max-br equal to 2000, this indicates a maximum 2446 video bitrate of 2000 kbits/sec for VCL HRD parameters, a 2447 maximum video bitrate of 2200 kbits/sec for NAL HRD 2448 parameters, and a CPB size of 2000000 bits (2000000 / 1500000 2449 * 1500000). 2451 Senders MAY use this knowledge to send higher bitrate video as 2452 allowed in the level definition of Annex A of HEVC to achieve 2453 improved video quality. 2455 When not present, the value of max-br is inferred to be equal 2456 to the value of MaxBR given in Table A-2 of [HEVC] for the 2457 highest level. 2459 The value of max-br MUST be in the range of MaxBR to 2460 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2461 [HEVC] for the highest level. 2463 Informative note: This parameter was added primarily to 2464 complement a similar codepoint in the ITU-T Recommendation 2465 H.245, so as to facilitate signaling gateway designs. The 2466 assumption that the network is capable of handling such 2467 bitrates at any given time cannot be made from the value of 2468 this parameter. In particular, no conclusion can be drawn 2469 that the signaled bitrate is possible under congestion 2470 control constraints. 2472 max-tr: 2473 The value of max-tr is an integer indication the maximum 2474 number of tile rows. The max-tr parameter signals that the 2475 receiver is capable of decoding video with a larger number of 2476 tile rows than the value allowed by the highest level. 2478 When max-tr is signaled, the receiver MUST be able to decode 2479 bitstreams that conform to the highest level, with the 2480 exception that the MaxTileRows value in Table A-1 of [HEVC] 2481 for the highest level is replaced with the value of max-tr. 2483 Senders MAY use this knowledge to send pictures utilizing a 2484 larger number of tile rows than the value allowed by the 2485 highest level. 2487 When not present, the value of max-tr is inferred to be equal 2488 to the value of MaxTileRows given in Table A-1 of [HEVC] for 2489 the highest level. 2491 The value of max-tr MUST be in the range of MaxTileRows to 2492 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2493 Table A-1 of [HEVC] for the highest level. 2495 max-tc: 2496 The value of max-tc is an integer indication the maximum 2497 number of tile columns. The max-tc parameter signals that the 2498 receiver is capable of decoding video with a larger number of 2499 tile columns than the value allowed by the highest level. 2501 When max-tc is signaled, the receiver MUST be able to decode 2502 bitstreams that conform to the highest level, with the 2503 exception that the MaxTileCols value in Table A-1 of [HEVC] 2504 for the highest level is replaced with the value of max-tc. 2506 Senders MAY use this knowledge to send pictures utilizing a 2507 larger number of tile columns than the value allowed by the 2508 highest level. 2510 When not present, the value of max-tc is inferred to be equal 2511 to the value of MaxTileCols given in Table A-1 of [HEVC] for 2512 the highest level. 2514 The value of max-tc MUST be in the range of MaxTileCols to 2515 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2516 Table A-1 of [HEVC] for the highest level. 2518 max-fps: 2520 The value of max-fps is an integer indicating the maximum 2521 picture rate in units of pictures per 100 seconds that can be 2522 effectively processed by the receiver. The max-fps parameter 2523 MAY be used to signal that the receiver has a constraint in 2524 that it is not capable of processing video effectively at the 2525 full picture rate that is implied by the highest level and, 2526 when present, one or more of the parameters max-lsr, max-lps, 2527 and max-br. 2529 The value of max-fps is not necessarily the picture rate at 2530 which the maximum picture size can be sent, it constitutes a 2531 constraint on maximum picture rate for all resolutions. 2533 Informative note: The max-fps parameter is semantically 2534 different from max-lsr, max-lps, max-cpb, max-dpb, max-br, 2535 max-tr, and max-tc in that max-fps is used to signal a 2536 constraint, lowering the maximum picture rate from what is 2537 implied by other parameters. 2539 The encoder MUST use a picture rate equal to or less than this 2540 value. In cases where the max-fps parameter is absent the 2541 encoder is free to choose any picture rate according to the 2542 highest level and any signaled optional parameters. 2544 The value of max-fps MUST be smaller than or equal to the full 2545 picture rate that is implied by the highest level and, when 2546 present, one or more of the parameters max-lsr, max-lps, and 2547 max-br. 2549 sprop-max-don-diff: 2551 The value of this parameter MUST be equal to 0, if the RTP 2552 stream does not depend on other RTP streams and there is no 2553 NAL unit naluA that is followed in transmission order by any 2554 NAL unit preceding naluA in decoding order. Otherwise, this 2555 parameter specifies the maximum absolute difference between 2556 the decoding order number (i.e., AbsDon) values of any two NAL 2557 units naluA and naluB, where naluA follows naluB in decoding 2558 order and precedes naluB in transmission order. 2560 The value of sprop-max-don-diff MUST be an integer in the 2561 range of 0 to 32767, inclusive. 2563 When not present, the value of sprop-max-don-diff is inferred 2564 to be equal to 0. 2566 When the RTP stream depends on one or more other RTP streams 2567 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2568 use), this parameter MUST be present and the value MUST be 2569 greater than 0. 2571 Informative note: When the RTP stream does not depend on 2572 other RTP streams, either MSM or SSM may be in use. 2574 sprop-depack-buf-nalus: 2576 This parameter specifies the maximum number of NAL units that 2577 precede a NAL unit in transmission order and follow the NAL 2578 unit in decoding order. 2580 The value of sprop-depack-buf-nalus MUST be an integer in the 2581 range of 0 to 32767, inclusive. 2583 When not present, the value of sprop-depack-buf-nalus is 2584 inferred to be equal to 0. 2586 When the RTP stream depends on one or more other RTP streams 2587 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2588 use), this parameter MUST be present and the value MUST be 2589 greater than 0. 2591 sprop-depack-buf-bytes: 2593 This parameter signals the required size of the de- 2594 packetization buffer in units of bytes. The value of the 2595 parameter MUST be greater than or equal to the maximum buffer 2596 occupancy (in units of bytes) of the de-packetization buffer 2597 as specified in section 6. 2599 The value of sprop-depack-buf-bytes MUST be an integer in the 2600 range of 0 to 4294967295, inclusive. 2602 When the RTP stream depends on one or more other RTP streams 2603 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2604 use) or sprop-max-don-diff is present and greater than 0, this 2605 parameter MUST be present and the value MUST be greater than 2606 0. 2608 Informative note: The value of sprop-depack-buf-bytes 2609 indicates the required size of the de-packetization buffer 2610 only. When network jitter can occur, an appropriately 2611 sized jitter buffer has to be available as well. 2613 depack-buf-cap: 2615 This parameter signals the capabilities of a receiver 2616 implementation and indicates the amount of de-packetization 2617 buffer space in units of bytes that the receiver has available 2618 for reconstructing the NAL unit decoding order from NAL units 2619 carried in one or more RTP streams. A receiver is able to 2620 handle any RTP stream, and all RTP streams the RTP stream 2621 depends on, when present, for which the value of the sprop- 2622 depack-buf-bytes parameter is smaller than or equal to this 2623 parameter. 2625 When not present, the value of depack-buf-cap is inferred to 2626 be equal to 4294967295. The value of depack-buf-cap MUST be 2627 an integer in the range of 1 to 4294967295, inclusive. 2629 Informative note: depack-buf-cap indicates the maximum 2630 possible size of the de-packetization buffer of the 2631 receiver only. When network jitter can occur, an 2632 appropriately sized jitter buffer has to be available as 2633 well. 2635 sprop-segmentation-id: 2637 This parameter MAY be used to signal the segmentation tools 2638 present in the bitstream and that can be used for 2639 parallelization. The value of sprop-segmentation-id MUST be 2640 an integer in the range of 0 to 3, inclusive. When not 2641 present, the value of sprop-segmentation-id is inferred to be 2642 equal to 0. 2644 When sprop-segmentation-id is equal to 0, no information about 2645 the segmentation tools is provided. When sprop-segmentation- 2646 id is equal to 1, it indicates that slices are present in the 2647 bitstream. When sprop-segmentation-id is equal to 2, it 2648 indicates that tiles are present in the bitstream. When 2649 sprop-segmentation-id is equal to 3, it indicates that WPP is 2650 used in the bitstream. 2652 sprop-spatial-segmentation-idc: 2654 A base16 [RFC4648] representation of the syntax element 2655 min_spatial_segmentation_idc as specified in [HEVC]. This 2656 parameter MAY be used to describe parallelization capabilities 2657 of the bitstream. 2659 dec-parallel-cap: 2661 This parameter MAY be used to indicate the decoder's 2662 additional decoding capabilities given the presence of tools 2663 enabling parallel decoding, such as slices, tiles, and WPP, in 2664 the bitstream. The decoding capability of the decoder may 2665 vary with the setting of the parallel decoding tools present 2666 in the bitstream, e.g. the size of the tiles that are present 2667 in a bitstream. Therefore, multiple capability points may be 2668 provided, each indicating the minimum required decoding 2669 capability that is associated with a parallelism requirement, 2670 which is a requirement on the bitstream that enables parallel 2671 decoding. 2673 Each capability point is defined as a combination of 1) a 2674 parallelism requirement, 2) a profile (determined by profile- 2675 space and profile-id), 3) a highest level, and 4) a maximum 2676 processing rate, a maximum picture size, and a maximum video 2677 bitrate that may be equal to or greater than that determined 2678 by the highest level. The parameter's syntax in ABNF 2679 [RFC5234] is as follows: 2681 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2682 cap-point) "}" 2684 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2685 cap-parameter) 2687 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2688 cap-parameter = tier-flag / level-id / max-lsr 2689 / max-lps / max-br 2691 tier-flag = "tier-flag" EQ ("0" / "1") 2693 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2695 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2696 18,446,744,073,709,551,615) 2698 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2700 max-br = "max-br" EQ 1*20DIGIT ; (0- 2701 18,446,744,073,709,551,615) 2703 EQ = "=" 2705 The set of capability points expressed by the dec-parallel-cap 2706 parameter is enclosed in a pair of curly braces ("{}"). Each 2707 set of two consecutive capability points is separated by a 2708 comma (','). Within each capability point, each set of two 2709 consecutive parameters, and when present, their values, is 2710 separated by a semicolon (';'). 2712 The profile of all capability points is determined by profile- 2713 space and profile-id that are outside the dec-parallel-cap 2714 parameter. 2716 Each capability point starts with an indication of the 2717 parallelism requirement, which consists of a parallel tool 2718 type, which may be equal to 'w' or 't', and a decimal value of 2719 the spatial-seg-idc parameter. When the type is 'w', the 2720 capability point is valid only for H.265 bitstreams with WPP 2721 in use, i.e. entropy_coding_sync_enabled_flag equal to 1. 2722 When the type is 't', the capability point is valid only for 2723 H.265 bitstreams with WPP not in use (i.e. 2724 entropy_coding_sync_enabled_flag equal to 0). The capability- 2725 point is valid only for H.265 bitstreams with 2726 min_spatial_segmentation_idc equal to or greater than spatial- 2727 seg-idc. 2729 After the parallelism requirement indication, each capability 2730 point continues with one or more pairs of parameter and value 2731 in any order for any of the following parameters: 2733 o tier-flag 2734 o level-id 2735 o max-lsr 2736 o max-lps 2737 o max-br 2739 At most one occurrence of each of the above five parameters is 2740 allowed within each capability point. 2742 The values of dec-parallel-cap.tier-flag and dec-parallel- 2743 cap.level-id for a capability point indicate the highest level 2744 of the capability point. The values of dec-parallel-cap.max- 2745 lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for 2746 a capability point indicate the maximum processing rate in 2747 units of luma samples per second, the maximum picture size in 2748 units of luma samples, and the maximum video bitrate (in units 2749 of CpbBrVclFactor bits per second for the VCL HRD parameters 2750 and in units of CpbBrNalFactor bits per second for the NAL HRD 2751 parameters where CpbBrVclFactor and CpbBrNalFactor are defined 2752 in Section A.4 of [HEVC]). 2754 When not present, the value of dec-parallel-cap.tier-flag is 2755 inferred to be equal to the value of tier-flag outside the 2756 dec-parallel-cap parameter. When not present, the value of 2757 dec-parallel-cap.level-id is inferred to be equal to the value 2758 of max-recv-level-id outside the dec-parallel-cap parameter. 2759 When not present, the value of dec-parallel-cap.max-lsr, dec- 2760 parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred 2761 to be equal to the value of max-lsr, max-lps, or max-br, 2762 respectively, outside the dec-parallel-cap parameter. 2764 The general decoding capability, expressed by the set of 2765 parameters outside of dec-parallel-cap, is defined as the 2766 capability point that is determined by the following 2767 combination of parameters: 1) the parallelism requirement 2768 corresponding to the value of sprop-segmentation-id equal to 0 2769 for a bitstream, 2) the profile determined by profile-space, 2770 profile-id, profile-compatibility-indicator, and interop- 2771 constraints, 3) the tier and the highest level determined by 2772 tier-flag and max-recv-level-id, and 4) the maximum processing 2773 rate, the maximum picture size, and the maximum video bitrate 2774 determined by the highest level. The general decoding 2775 capability MUST NOT be included as one of the set of 2776 capability points in the dec-parallel-cap parameter. 2778 For example, the following parameters express the general 2779 decoding capability of 720p30 (Level 3.1) plus an additional 2780 decoding capability of 1080p30 (Level 4) given that the 2781 spatially largest tile or slice used in the bitstream is equal 2782 to or less than 1/3 of the picture size: 2784 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} 2786 For another example, the following parameters express an 2787 additional decoding capability of 1080p30, using dec-parallel- 2788 cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is 2789 used in the bitstream: 2791 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2792 max-lsr=62668800;max-lps=2088960} 2794 Informative note: When min_spatial_segmentation_idc is 2795 present in a bitstream and WPP is not used, [HEVC] 2796 specifies that there is no slice or no tile in the 2797 bitstream containing more than 4 * PicSizeInSamplesY / 2798 ( min_spatial_segmentation_idc + 4 ) luma samples. 2800 include-dph: 2802 This parameter is used to indicate the capability and 2803 preference to utilize or include decoded picture hash (DPH) 2804 SEI messages (See Section D.3.19 of [HEVC]) in the bitstream. 2805 DPH SEI messages can be used to detect picture corruption so 2806 the receiver can request picture repair, see Section 8. The 2807 value is a comma separated list of hash types that is 2808 supported or requested to be used, each hash type provided as 2809 an unsigned integer value (0-255), with the hash types listed 2810 from most preferred to the least preferred. Example: 2812 "include-dph=0,2", which indicates the capability for MD5 2813 (most preferred) and Checksum (less preferred). If the 2814 parameter is not included or the value contains no hash types, 2815 then no capability to utilize DPH SEI messages is assumed. 2816 Note that DPH SEI messages MAY still be included in the 2817 bitstream even when there is no declaration of capability to 2818 use them, as in general SEI messages do not affect the 2819 normative decoding process and decoders are allowed to ignore 2820 SEI messages. 2822 Encoding considerations: 2824 This type is only defined for transfer via RTP (RFC 3550). 2826 Security considerations: 2828 See Section 9 of RFC XXXX. 2830 Public specification: 2832 Please refer to Section 13 of RFC XXXX. 2834 Additional information: None 2836 File extensions: none 2838 Macintosh file type code: none 2840 Object identifier or OID: none 2842 Person & email address to contact for further information: 2844 Intended usage: COMMON 2846 Author: See Section 14 of RFC XXXX. 2848 Change controller: 2850 IETF Audio/Video Transport Payloads working group delegated 2851 from the IESG. 2853 7.2 SDP Parameters 2855 The receiver MUST ignore any parameter unspecified in this memo. 2857 7.2.1 Mapping of Payload Type Parameters to SDP 2859 The media type video/H265 string is mapped to fields in the Session 2860 Description Protocol (SDP) [RFC4566] as follows: 2862 o The media name in the "m=" line of SDP MUST be video. 2864 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the 2865 media subtype). 2867 o The clock rate in the "a=rtpmap" line MUST be 90000. 2869 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2870 flag", "level-id", "interop-constraints", "profile-compatibility- 2871 indicator", "sprop-sub-layer-id", "recv-sub-layer-id", "max-recv- 2872 level-id", "tx-mode", "max-lsr", "max-lps", "max-cpb", "max-dpb", 2873 "max-br", "max-tr", "max-tc", "max-fps", "sprop-max-don-diff", 2874 "sprop-depack-buf-nalus", "sprop-depack-buf-bytes", "depack-buf- 2875 cap", "sprop-segmentation-id", "sprop-spatial-segmentation-idc", 2876 "dec-parallel-cap", and "include-dph", when present, MUST be 2877 included in the "a=fmtp" line of SDP. This parameter is 2878 expressed as a media type string, in the form of a semicolon 2879 separated list of parameter=value pairs. 2881 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 2882 pps", when present, MUST be included in the "a=fmtp" line of SDP 2883 or conveyed using the "fmtp" source attribute as specified in 2884 section 6.3 of [RFC5576]. For a particular media format (i.e. 2885 RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST 2886 NOT be both included in the "a=fmtp" line of SDP and conveyed 2887 using the "fmtp" source attribute. When included in the "a=fmtp" 2888 line of SDP, these parameters are expressed as a media type 2889 string, in the form of a semicolon separated list of 2890 parameter=value pairs. When conveyed in the "a=fmtp" line of SDP 2891 for a particular payload type, the parameters "sprop-vps", 2892 "sprop-sps", and "sprop-pps" MUST be applied to each SSRC with 2893 the payload type. When conveyed using the "fmtp" source 2894 attribute, these parameters are only associated with the given 2895 source and payload type as parts of the "fmtp" source attribute. 2897 Informative note: Conveyance of "sprop-vps", "sprop-sps", and 2898 "sprop-pps" using the "fmtp" source attribute allows for out- 2899 of-band transport of parameter sets in topologies like Topo- 2900 Video-switch-MCU as specified in [RFC5117]. 2902 An example of media representation in SDP is as follows: 2904 m=video 49170 RTP/AVP 98 2905 a=rtpmap:98 H265/90000 2906 a=fmtp:98 profile-id=1; 2907 sprop-vps=