idnits 2.17.1 draft-ietf-payload-rtp-h265-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 33 instances of too long lines in the document, the longest one being 14 characters in excess of 72. ** The abstract seems to contain references ([HEVC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1637 has weird spacing: '...n must under...' == Line 3410 has weird spacing: '... value of ...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The FU payload consists of fragments of the payload of the fragmented NAL unit so that if the FU payloads of consecutive FUs, starting with an FU with the S bit equal to 1 and ending with an FU with the E bit equal to 1, are sequentially concatenated, the payload of the fragmented NAL unit can be reconstructed. The NAL unit header of the fragmented NAL unit is not included as such in the FU payload, but rather the information of the NAL unit header of the fragmented NAL unit is conveyed in F, LayerId, and TID fields of the FU payload headers of the FUs and the FuType field of the FU header of the FUs. An FU payload MUST not be empty. -- The document date (August 5, 2014) is 3545 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3GP' is mentioned on line 274, but not defined -- Looks like a reference, but probably isn't: '0' on line 1079 == Missing Reference: 'RFC5234' is mentioned on line 2675, but not defined == Missing Reference: 'RFC5117' is mentioned on line 2898, but not defined ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667) == Missing Reference: 'RFC2326' is mentioned on line 3267, but not defined ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826) == Missing Reference: 'RFC2974' is mentioned on line 3268, but not defined == Missing Reference: 'RFC3551' is mentioned on line 3501, but not defined == Missing Reference: 'RFC3711' is mentioned on line 3501, but not defined == Missing Reference: 'RFC5124' is mentioned on line 3502, but not defined == Missing Reference: 'RFC 3711' is mentioned on line 3527, but not defined == Missing Reference: 'RFC 3551' is mentioned on line 3551, but not defined == Unused Reference: '3GPPFF' is defined on line 3677, but no explicit reference was found in the text == Unused Reference: 'RFC5109' is defined on line 3740, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-01 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-01 Summary: 5 errors (**), 0 flaws (~~), 20 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: February 2015 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 August 5, 2014 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-05.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) [HEVC] and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization of 22 one or more Network Abstraction Layer (NAL) units in each RTP packet 23 payload, as well as fragmentation of a NAL unit into multiple RTP 24 packets. Furthermore, it supports transmission of an HEVC bitstream 25 over a single as well as multiple RTP streams. The payload format 26 has wide applicability in videoconferencing, Internet video 27 streaming, and high bit-rate entertainment-quality video, among 28 others. 30 Status of this Memo 32 This Internet-Draft is submitted to IETF in full conformance with 33 the provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six 41 months and may be updated, replaced, or obsoleted by other documents 42 at any time. It is inappropriate to use Internet-Drafts as 43 reference material or to cite them other than as "work in progress." 45 The list of current Internet-Drafts can be accessed at 46 http://www.ietf.org/ietf/1id-abstracts.txt. 48 The list of Internet-Draft Shadow Directories can be accessed at 49 http://www.ietf.org/shadow.html. 51 This Internet-Draft will expire on February 5, 2015. 53 Copyright and License Notice 55 Copyright (c) 2014 IETF Trust and the persons identified as the 56 document authors. All rights reserved. 58 This document is subject to BCP 78 and the IETF Trust's Legal 59 Provisions Relating to IETF Documents 60 (http://trustee.ietf.org/license-info) in effect on the date of 61 publication of this document. Please review these documents 62 carefully, as they describe your rights and restrictions with 63 respect to this document. Code Components extracted from this 64 document must include Simplified BSD License text as described in 65 Section 4.e of the Trust Legal Provisions and are provided without 66 warranty as described in the Simplified BSD License. 68 Table of Contents 70 Abstract..........................................................1 71 Status of this Memo...............................................1 72 Table of Contents.................................................3 73 1 Introduction....................................................5 74 1.1 Overview of the HEVC Codec.................................5 75 1.1.1 Coding-Tool Features..................................5 76 1.1.2 Systems and Transport Interfaces......................7 77 1.1.3 Parallel Processing Support..........................14 78 1.1.4 NAL Unit Header......................................16 79 1.2 Overview of the Payload Format............................17 80 2 Conventions....................................................18 81 3 Definitions and Abbreviations..................................18 82 3.1 Definitions...............................................18 83 3.1.1 Definitions from the HEVC Specification..............18 84 3.1.2 Definitions Specific to This Memo....................20 85 3.2 Abbreviations.............................................22 86 4 RTP Payload Format.............................................23 87 4.1 RTP Header Usage..........................................23 88 4.2 Payload Header Usage......................................26 89 4.3 Payload Structures........................................26 90 4.4 Transmission Modes........................................27 91 4.5 Decoding Order Number.....................................28 92 4.6 Single NAL Unit Packets...................................30 93 4.7 Aggregation Packets (APs).................................31 94 4.8 Fragmentation Units (FUs).................................35 95 4.9 PACI packets..............................................38 96 4.9.1 Reasons for the PACI rules (informative).............41 97 4.9.2 PACI extensions (Informative)........................41 98 4.10 Temporal Scalability Control Information.................43 99 5 Packetization Rules............................................45 100 6 De-packetization Process.......................................45 101 7 Payload Format Parameters......................................48 102 7.1 Media Type Registration...................................48 103 7.2 SDP Parameters............................................73 104 7.2.1 Mapping of Payload Type Parameters to SDP............73 105 7.2.2 Usage with SDP Offer/Answer Model....................74 106 7.2.3 Usage in Declarative Session Descriptions............83 107 7.2.4 Parameter Sets Considerations........................84 108 7.2.5 Dependency Signaling in Multi-Stream Mode............85 109 8 Use with Feedback Messages.....................................85 110 8.1 Picture Loss Indication (PLI).............................86 111 8.2 Slice Loss Indication.....................................86 112 8.3 Use of HEVC with the RPSI Feedback Message................87 113 8.4 Full Intra Request (FIR)..................................88 114 9 Security Considerations........................................88 115 10 Congestion Control............................................90 116 11 IANA Consideration............................................91 117 12 Acknowledgements..............................................91 118 13 References....................................................91 119 13.1 Normative References.....................................91 120 13.2 Informative References...................................93 121 14 Authors' Addresses............................................95 123 1 Introduction 125 1.1 Overview of the HEVC Codec 127 High Efficiency Video Coding [HEVC], formally known as ITU-T 128 Recommendation H.265 and ISO/IEC International Standard 23008-2 was 129 ratified by ITU-T in April 2013 and reportedly provides significant 130 coding efficiency gains over H.264 [H.264]. 132 As both H.264 [H.264] and its RTP payload format [RFC6184] are 133 widely deployed and generally known in the relevant implementer 134 communities, frequently only the differences between those two 135 specifications are highlighted in non-normative, explanatory parts 136 of this memo. Basic familiarity with both specifications is assumed 137 for those parts. However, the normative parts of this memo do not 138 require study of H.264 or its RTP payload format. 140 H.264 and HEVC share a similar hybrid video codec design. 141 Conceptually, both technologies include a video coding layer (VCL), 142 which is often used to refer to the coding-tool features, and a 143 network abstraction layer (NAL), which is often used to refer to the 144 systems and transport interface aspects of the codecs. 146 1.1.1 Coding-Tool Features 148 Similarly to earlier hybrid-video-coding-based standards, including 149 H.264, the following basic video coding design is employed by HEVC. 150 A prediction signal is first formed either by intra or motion 151 compensated prediction, and the residual (the difference between the 152 original and the prediction) is then coded. The gains in coding 153 efficiency are achieved by redesigning and improving almost all 154 parts of the codec over earlier designs. In addition, HEVC includes 155 several tools to make the implementation on parallel architectures 156 easier. Below is a summary of HEVC coding-tool features. 158 Quad-tree block and transform structure 160 One of the major tools that contribute significantly to the coding 161 efficiency of HEVC is the usage of flexible coding blocks and 162 transforms, which are defined in a hierarchical quad-tree manner. 163 Unlike H.264, where the basic coding block is a macroblock of fixed 164 size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size 165 of 64x64. Each CTU can be divided into smaller units in a 166 hierarchical quad-tree manner and can represent smaller blocks down 167 to size 4x4. Similarly, the transforms used in HEVC can have 168 different sizes, starting from 4x4 and going up to 32x32. Utilizing 169 large blocks and transforms contribute to the major gain of HEVC, 170 especially at high resolutions. 172 Entropy coding 174 HEVC uses a single entropy coding engine, which is based on Context 175 Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two 176 distinct entropy coding engines. CABAC in HEVC shares many 177 similarities with CABAC of H.264, but contains several improvements. 178 Those include improvements in coding efficiency and lowered 179 implementation complexity, especially for parallel architectures. 181 In-loop filtering 183 H.264 includes an in-loop adaptive deblocking filter, where the 184 blocking artifacts around the transform edges in the reconstructed 185 picture are smoothed to improve the picture quality and compression 186 efficiency. In HEVC, a similar deblocking filter is employed but 187 with somewhat lower complexity. In addition, pictures undergo a 188 subsequent filtering operation called Sample Adaptive Offset (SAO), 189 which is a new design element in HEVC. SAO basically adds a pixel- 190 level offset in an adaptive manner and usually acts as a de-ringing 191 filter. It is observed that SAO improves the picture quality, 192 especially around sharp edges contributing substantially to visual 193 quality improvements of HEVC. 195 Motion prediction and coding 197 There have been a number of improvements in this area that are 198 summarized as follows. The first category is motion merge and 199 advanced motion vector prediction (AMVP) modes. The motion 200 information of a prediction block can be inferred from the spatially 201 or temporally neighboring blocks. This is similar to the DIRECT 202 mode in H.264 but includes new aspects to incorporate the flexible 203 quad-tree structure and methods to improve the parallel 204 implementations. In addition, the motion vector predictor can be 205 signaled for improved efficiency. The second category is high- 206 precision interpolation. The interpolation filter length is 207 increased to 8-tap from 6-tap, which improves the coding efficiency 208 but also comes with increased complexity. In addition, the 209 interpolation filter is defined with higher precision without any 210 intermediate rounding operations to further improve the coding 211 efficiency. 213 Intra prediction and intra coding 215 Compared to 8 intra prediction modes in H.264, HEVC supports angular 216 intra prediction with 33 directions. This increased flexibility 217 improves both objective coding efficiency and visual quality as the 218 edges can be better predicted and ringing artifacts around the edges 219 can be reduced. In addition, the reference samples are adaptively 220 smoothed based on the prediction direction. To avoid contouring 221 artifacts a new interpolative prediction generation is included to 222 improve the visual quality. Furthermore, discrete sine transform 223 (DST) is utilized instead of traditional discrete cosine transform 224 (DCT) for 4x4 intra transform blocks. 226 Other coding-tool features 228 HEVC includes some tools for lossless coding and efficient screen 229 content coding, such as skipping the transform for certain blocks. 230 These tools are particularly useful for example when streaming the 231 user-interface of a mobile device to a large display. 233 1.1.2 Systems and Transport Interfaces 235 HEVC inherited the basic systems and transport interfaces designs, 236 such as the NAL-unit-based syntax structure, the hierarchical syntax 237 and data unit structure from sequence-level parameter sets, multi- 238 picture-level or picture-level parameter sets, slice-level header 239 parameters, lower-level parameters, the supplemental enhancement 240 information (SEI) message mechanism, the hypothetical reference 241 decoder (HRD) based video buffering model, and so on. In the 242 following, a list of differences in these aspects compared to H.264 243 is summarized. 245 Video parameter set 247 A new type of parameter set, called video parameter set (VPS), was 248 introduced. For the first (2013) version of [HEVC], the video 249 parameter set NAL unit is required to be available prior to its 250 activation, while the information contained in the video parameter 251 set is not necessary for operation of the decoding process. For 252 future HEVC extensions, such as the 3D or scalable extensions, the 253 video parameter set is expected to include information necessary for 254 operation of the decoding process, e.g. decoding dependency or 255 information for reference picture set construction of enhancement 256 layers. The VPS provides a "big picture" of a bitstream, including 257 what types of operation points are provided, the profile, tier, and 258 level of the operation points, and some other high-level properties 259 of the bitstream that can be used as the basis for session 260 negotiation and content selection, etc. (see section 7.1). 262 Profile, tier and level 264 The profile, tier and level syntax structure that can be included in 265 both VPS and sequence parameter set (SPS) includes 12 bytes of data 266 to describe the entire bitstream (including all temporally scalable 267 layers, which are referred to as sub-layers in the HEVC 268 specification), and can optionally include more profile, tier and 269 level information pertaining to individual temporally scalable 270 layers. The profile indicator indicates the "best viewed as" 271 profile when the bitstream conforms to multiple profiles, similar to 272 the major brand concept in the ISO base media file format (ISOBMFF) 273 [ISOBMFF] and file formats derived based on ISOBMFF, such as the 274 3GPP file format [3GP]. The profile, tier and level syntax 275 structure also includes the indications of whether the bitstream is 276 free of frame-packed content, whether the bitstream is free of 277 interlaced source content and free of field pictures, i.e. contains 278 only frame pictures of progressive source, such that clients/players 279 with no support of post-processing functionalities for handling of 280 frame-packed or interlaced source content or field pictures can 281 reject those bitstreams. 283 Bitstream and elementary stream 285 HEVC includes a definition of an elementary stream, which is new 286 compared to H.264. An elementary stream consists of a sequence of 287 one or more bitstreams. An elementary stream that consists of two 288 or more bitstreams has typically been formed by splicing together 289 two or more bitstreams (or parts thereof). When an elementary 290 stream contains more than one bitstream, the last NAL unit of the 291 last access unit of a bitstream (except the last bitstream in the 292 elementary stream) must contain an end of bitstream NAL unit and the 293 first access unit of the subsequent bitstream must be an intra 294 random access point (IRAP) access unit. This IRAP access unit may 295 be a clean random access (CRA), broken link access (BLA), or 296 instantaneous decoding refresh (IDR) access unit. 298 Random access support 300 HEVC includes signaling in NAL unit header, through NAL unit types, 301 of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, 302 namely IDR, CRA and BLA pictures are supported, wherein IDR pictures 303 are conventionally referred to as closed group-of-pictures (closed- 304 GOP) random access points, and CRA and BLA pictures are those 305 conventionally referred to as open-GOP random access points. BLA 306 pictures usually originate from splicing of two bitstreams or part 307 thereof at a CRA picture, e.g. during stream switching. To enable 308 better systems usage of IRAP pictures, altogether six different NAL 309 units are defined to signal the properties of the IRAP pictures, 310 which can be used to better match the stream access point (SAP) 311 types as defined in the ISOBMFF [ISOBMFF], which are utilized for 312 random access support in both 3GP-DASH [3GPDASH] and MPEG DASH 313 [MPEGDASH]. Pictures following an IRAP picture in decoding order 314 and preceding the IRAP picture in output order are referred to as 315 leading pictures associated with the IRAP picture. There are two 316 types of leading pictures, namely random access decodable leading 317 (RADL) pictures and random access skipped leading (RASL) pictures. 318 RADL pictures are decodable when the decoding started at the 319 associated IRAP picture, and RASL pictures are not decodable when 320 the decoding started at the associated IRAP picture and are usually 321 discarded. HEVC provides mechanisms to enable the specification of 322 conformance of bitstreams with RASL pictures being discarded, thus 323 to provide a standard-compliant way to enable systems components to 324 discard RASL pictures when needed. 326 Temporal scalability support 328 HEVC includes an improved support of temporal scalability, by 329 inclusion of the signaling of TemporalId in the NAL unit header, the 330 restriction that pictures of a particular temporal sub-layer cannot 331 be used for inter prediction reference by pictures of a lower 332 temporal sub-layer, the sub-bitstream extraction process, and the 333 requirement that each sub-bitstream extraction output be a 334 conforming bitstream. Media-aware network elements (MANEs) can 335 utilize the TemporalId in the NAL unit header for stream adaptation 336 purposes based on temporal scalability. 338 Temporal sub-layer switching support 340 HEVC specifies, through NAL unit types present in the NAL unit 341 header, the signaling of temporal sub-layer access (TSA) and 342 stepwise temporal sub-layer access (STSA). A TSA picture and 343 pictures following the TSA picture in decoding order do not use 344 pictures prior to the TSA picture in decoding order with TemporalId 345 greater than or equal to that of the TSA picture for inter 346 prediction reference. A TSA picture enables up-switching, at the 347 TSA picture, to the sub-layer containing the TSA picture or any 348 higher sub-layer, from the immediately lower sub-layer. An STSA 349 picture does not use pictures with the same TemporalId as the STSA 350 picture for inter prediction reference. Pictures following an STSA 351 picture in decoding order with the same TemporalId as the STSA 352 picture do not use pictures prior to the STSA picture in decoding 353 order with the same TemporalId as the STSA picture for inter 354 prediction reference. An STSA picture enables up-switching, at the 355 STSA picture, to the sub-layer containing the STSA picture, from the 356 immediately lower sub-layer. 358 Sub-layer reference or non-reference pictures 360 The concept and signaling of reference/non-reference pictures in 361 HEVC are different from H.264. In H.264, if a picture may be used 362 by any other picture for inter prediction reference, it is a 363 reference picture; otherwise it is a non-reference picture, and this 364 is signaled by two bits in the NAL unit header. In HEVC, a picture 365 is called a reference picture only when it is marked as "used for 366 reference". In addition, the concept of sub-layer reference picture 367 was introduced. If a picture may be used by another other picture 368 with the same TemporalId for inter prediction reference, it is a 369 sub-layer reference picture; otherwise it is a sub-layer non- 370 reference picture. Whether a picture is a sub-layer reference 371 picture or sub-layer non-reference picture is signaled through NAL 372 unit type values. 374 Extensibility 376 Besides the TemporalId in the NAL unit header, HEVC also includes 377 the signaling of a six-bit layer ID in the NAL unit header, which 378 must be equal to 0 for a single-layer bitstream. Extension 379 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice 380 headers, and so on. All these extension mechanisms enable future 381 extensions in a backward compatible manner, such that bitstreams 382 encoded according to potential future HEVC extensions can be fed to 383 then-legacy decoders (e.g. HEVC version 1 decoders) and the then- 384 legacy decoders can decode and output the base layer bitstream. 386 Bitstream extraction 388 HEVC includes a bitstream extraction process as an integral part of 389 the overall decoding process, as well as specification of the use of 390 the bitstream extraction process in description of bitstream 391 conformance tests as part of the hypothetical reference decoder 392 (HRD) specification. 394 Reference picture management 396 The reference picture management of HEVC, including reference 397 picture marking and removal from the decoded picture buffer (DPB) as 398 well as reference picture list construction (RPLC), differs from 399 that of H.264. Instead of the sliding window plus adaptive memory 400 management control operation (MMCO) based reference picture marking 401 mechanism in H.264, HEVC specifies a reference picture set (RPS) 402 based reference picture management and marking mechanism, and the 403 RPLC is consequently based on the RPS mechanism. A reference 404 picture set consists of a set of reference pictures associated with 405 a picture, consisting of all reference pictures that are prior to 406 the associated picture in decoding order, that may be used for inter 407 prediction of the associated picture or any picture following the 408 associated picture in decoding order. The reference picture set 409 consists of five lists of reference pictures; RefPicSetStCurrBefore, 410 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 411 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 412 RefPicSetLtCurr contain all reference pictures that may be used in 413 inter prediction of the current picture and that may be used in 414 inter prediction of one or more of the pictures following the 415 current picture in decoding order. RefPicSetStFoll and 416 RefPicSetLtFoll consist of all reference pictures that are not used 417 in inter prediction of the current picture but may be used in inter 418 prediction of one or more of the pictures following the current 419 picture in decoding order. RPS provides an "intra-coded" signaling 420 of the DPB status, instead of an "inter-coded" signaling, mainly for 421 improved error resilience. The RPLC process in HEVC is based on the 422 RPS, by signaling an index to an RPS subset for each reference 423 index; this process is simpler than the RPLC process in H.264. 425 Ultra low delay support 427 HEVC specifies a sub-picture-level HRD operation, for support of the 428 so-called ultra-low delay. The mechanism specifies a standard- 429 compliant way to enable delay reduction below one picture interval. 430 Sub-picture-level coded picture buffer (CPB) and DPB parameters may 431 be signaled, and utilization of these information for the derivation 432 of CPB timing (wherein the CPB removal time corresponds to decoding 433 time) and DPB output timing (display time) is specified. Decoders 434 are allowed to operate the HRD at the conventional access-unit- 435 level, even when the sub-picture-level HRD parameters are present. 437 New SEI messages 439 HEVC inherits many H.264 SEI messages with changes in syntax and/or 440 semantics making them applicable to HEVC. Additionally, there are a 441 few new SEI messages reviewed briefly in the following paragraphs. 443 The display orientation SEI message informs the decoder of a 444 transformation that is recommended to be applied to the cropped 445 decoded picture prior to display, such that the pictures can be 446 properly displayed, e.g. in an upside-up manner. 448 The structure of pictures SEI message provides information on the 449 NAL unit types, picture order count values, and prediction 450 dependencies of a sequence of pictures. The SEI message can be used 451 for example for concluding what impact a lost picture has on other 452 pictures. 454 The decoded picture hash SEI message provides a checksum derived 455 from the sample values of a decoded picture. It can be used for 456 detecting whether a picture was correctly received and decoded. 458 The active parameter sets SEI message includes the IDs of the active 459 video parameter set and the active sequence parameter set and can be 460 used to activate VPSs and SPSs. In addition, the SEI message 461 includes the following indications: 1) An indication of whether 462 "full random accessibility" is supported (when supported, all 463 parameter sets needed for decoding of the remaining of the bitstream 464 when random accessing from the beginning of the current coded video 465 sequence by completely discarding all access units earlier in 466 decoding order are present in the remaining bitstream and all coded 467 pictures in the remaining bitstream can be correctly decoded); 2) An 468 indication of whether there is no parameter set within the current 469 coded video sequence that updates another parameter set of the same 470 type preceding in decoding order. An update of a parameter set 471 refers to the use of the same parameter set ID but with some other 472 parameters changed. If this property is true for all coded video 473 sequences in the bitstream, then all parameter sets can be sent out- 474 of-band before session start. 476 The decoding unit information SEI message provides coded picture 477 buffer removal delay information for a decoding unit. The message 478 can be used in very-low-delay buffering operations. 480 The region refresh information SEI message can be used together with 481 the recovery point SEI message (present in both H.264 and HEVC) for 482 improved support of gradual decoding refresh (GDR). This supports 483 random access from inter-coded pictures, wherein complete pictures 484 can be correctly decoded or recovered after an indicated number of 485 pictures in output/display order. 487 1.1.3 Parallel Processing Support 489 The reportedly significantly higher encoding computational demand of 490 HEVC over H.264, in conjunction with the ever increasing video 491 resolution (both spatially and temporally) required by the market, 492 led to the adoption of VCL coding tools specifically targeted to 493 allow for parallelization on the sub-picture level. That is, 494 parallelization occurs, at the minimum, at the granularity of an 495 integer number of CTUs. The targets for this type of high-level 496 parallelization are multicore CPUs and DSPs as well as 497 multiprocessor systems. In a system design, to be useful, these 498 tools require signaling support, which is provided in Section 7 of 499 this memo. This section provides a brief overview of the tools 500 available in [HEVC]. 502 Many of the tools incorporated in HEVC were designed keeping in mind 503 the potential parallel implementations in multi-core/multi-processor 504 architectures. Specifically, for parallelization, four picture 505 partition strategies are available. 507 Slices are segments of the bitstream that can be reconstructed 508 independently from other slices within the same picture (though 509 there may still be interdependencies through loop filtering 510 operations). Slices are the only tool that can be used for 511 parallelization that is also available, in virtually identical form, 512 in H.264. Slices based parallelization does not require much inter- 513 processor or inter-core communication (except for inter-processor or 514 inter-core data sharing for motion compensation when decoding a 515 predictively coded picture, which is typically much heavier than 516 inter-processor or inter-core data sharing due to in-picture 517 prediction), as slices are designed to be independently decodable. 518 However, for the same reason, slices can require some coding 519 overhead. Further, slices (in contrast to some of the other tools 520 mentioned below) also serve as the key mechanism for bitstream 521 partitioning to match Maximum Transfer Unit (MTU) size requirements, 522 due to the in-picture independence of slices and the fact that each 523 regular slice is encapsulated in its own NAL unit. In many cases, 524 the goal of parallelization and the goal of MTU size matching can 525 place contradicting demands to the slice layout in a picture. The 526 realization of this situation led to the development of the more 527 advanced tools mentioned below. 529 Dependent slice segments allow for fragmentation of a coded slice 530 into fragments at CTU boundaries without breaking any in-picture 531 prediction mechanism. They are complementary to the fragmentation 532 mechanism described in this memo in that they need the cooperation 533 of the encoder. As a dependent slice segment necessarily contains 534 an integer number of CTUs, a decoder using multiple cores operating 535 on CTUs can process a dependent slice segment without communicating 536 parts of the slice segment's bitstream to other cores. 537 Fragmentation, as specified in this memo, in contrast, does not 538 guarantee that a fragment contains an integer number of CTUs. 540 In wavefront parallel processing (WPP), the picture is partitioned 541 into rows of CTUs. Entropy decoding and prediction are allowed to 542 use data from CTUs in other partitions. Parallel processing is 543 possible through parallel decoding of CTU rows, where the start of 544 the decoding of a row is delayed by two CTUs, so to ensure that data 545 related to a CTU above and to the right of the subject CTU is 546 available before the subject CTU is being decoded. Using this 547 staggered start (which appears like a wavefront when represented 548 graphically), parallelization is possible with up to as many 549 processors/cores as the picture contains CTU rows. 551 Because in-picture prediction between neighboring CTU rows within a 552 picture is allowed, the required inter-processor/inter-core 553 communication to enable in-picture prediction can be substantial. 554 The WPP partitioning does not result in the creation of more NAL 555 units compared to when it is not applied, thus WPP cannot be used 556 for MTU size matching, though slices can be used in combination for 557 that purpose. 559 Tiles define horizontal and vertical boundaries that partition a 560 picture into tile columns and rows. The scan order of CTUs is 561 changed to be local within a tile (in the order of a CTU raster scan 562 of a tile), before decoding the top-left CTU of the next tile in the 563 order of tile raster scan of a picture. Similar to slices, tiles 564 break in-picture prediction dependencies (including entropy decoding 565 dependencies). However, they do not need to be included into 566 individual NAL units (same as WPP in this regard), hence tiles 567 cannot be used for MTU size matching, though slices can be used in 568 combination for that purpose. Each tile can be processed by one 569 processor/core, and the inter-processor/inter-core communication 570 required for in-picture prediction between processing units decoding 571 neighboring tiles is limited to conveying the shared slice header in 572 cases a slice is spanning more than one tile, and loop filtering 573 related sharing of reconstructed samples and metadata. Insofar, 574 tiles are less demanding in terms of inter-processor communication 575 bandwidth compared to WPP due to the in-picture independence between 576 two neighboring partitions. 578 1.1.4 NAL Unit Header 580 HEVC maintains the NAL unit concept of H.264 with modifications. 581 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 582 payload of a NAL unit refers to the NAL unit excluding the NAL unit 583 header. 585 +---------------+---------------+ 586 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 588 |F| Type | LayerId | TID | 589 +-------------+-----------------+ 591 Figure 1 The structure of HEVC NAL unit header 593 The semantics of the fields in the NAL unit header are as specified 594 in [HEVC] and described briefly below for convenience. In addition 595 to the name and size of each field, the corresponding syntax element 596 name in [HEVC] is also provided. 598 F: 1 bit 599 forbidden_zero_bit. Required to be zero in [HEVC]. HEVC 600 declares a value of 1 as a syntax violation. Note that the 601 inclusion of this bit in the NAL unit header is to enable 602 transport of HEVC video over MPEG-2 transport systems (avoidance 603 of start code emulations) [MPEG2S]. 605 Type: 6 bits 606 nal_unit_type. This field specifies the NAL unit type as defined 607 in Table 7-1 of [HEVC]. If the most significant bit of this 608 field of a NAL unit is equal to 0 (i.e. the value of this field 609 is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the 610 NAL unit is a non-VCL NAL unit. For a reference of all currently 611 defined NAL unit types and their semantics, please refer to 612 Section 7.4.1 in [HEVC]. 614 LayerId: 6 bits 615 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 616 anticipated that in future scalable or 3D video coding extensions 617 of this specification, this syntax element will be used to 618 identify additional layers that may be present in the coded video 619 sequence, wherein a layer may be, e.g. a spatial scalable layer, 620 a quality scalable layer, a texture view, or a depth view. 622 TID: 3 bits 623 nuh_temporal_id_plus1. This field specifies the temporal 624 identifier of the NAL unit plus 1. The value of TemporalId is 625 equal to TID minus 1. A TID value of 0 is illegal to ensure that 626 there is at least one bit in the NAL unit header equal to 1, so 627 to enable independent considerations of start code emulations in 628 the NAL unit header and in the NAL unit payload data. 630 1.2 Overview of the Payload Format 632 This payload format defines the following processes required for 633 transport of HEVC coded data over RTP [RFC3550]: 635 o Usage of RTP header with this payload format 637 o Packetization of HEVC coded NAL units into RTP packets using three 638 types of payload structures, namely single NAL unit packet, 639 aggregation packet, and fragment unit 641 o Transmission of HEVC NAL units of the same bitstream within a 642 single RTP stream or multiple RTP streams within one or more RTP 643 sessions, where within an RTP stream transmission of NAL units may 644 be either non-interleaved (i.e. the transmission order of NAL 645 units is the same as their decoding order) or interleaved (i.e. 646 the transmission order of NAL units is different from their 647 decoding order) 649 o Media type parameters to be used with the Session Description 650 Protocol (SDP) [RFC4566] 652 o A payload header extension mechanism and data structures for 653 enhanced support of temporal scalability based on that extension 654 mechanism. 656 2 Conventions 658 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 659 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 660 document are to be interpreted as described in BCP 14, RFC 2119 661 [RFC2119]. 663 In this document, these key words will appear with that 664 interpretation only when in ALL CAPS. Lower case uses of these 665 words are not to be interpreted as carrying the RFC 2119 666 significance. 668 This specification uses the notion of setting and clearing a bit 669 when bit fields are handled. Setting a bit is the same as assigning 670 that bit the value of 1 (On). Clearing a bit is the same as 671 assigning that bit the value of 0 (Off). 673 3 Definitions and Abbreviations 675 3.1 Definitions 677 This document uses the terms and definitions of [HEVC]. Section 678 3.1.1 lists relevant definitions copied from [HEVC] for convenience. 679 Section 3.1.2 provides definitions specific to this memo. 681 3.1.1 Definitions from the HEVC Specification 683 access unit: A set of NAL units that are associated with each other 684 according to a specified classification rule, are consecutive in 685 decoding order, and contain exactly one coded picture. 687 BLA access unit: An access unit in which the coded picture is a BLA 688 picture. 690 BLA picture: An IRAP picture for which each VCL NAL unit has 691 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 693 coded video sequence: A sequence of access units that consists, in 694 decoding order, of an IRAP access unit with NoRaslOutputFlag equal 695 to 1, followed by zero or more access units that are not IRAP access 696 units with NoRaslOutputFlag equal to 1, including all subsequent 697 access units up to but not including any subsequent access unit that 698 is an IRAP access unit with NoRaslOutputFlag equal to 1. 700 Informative note: An IRAP access unit may be an IDR access unit, 701 a BLA access unit, or a CRA access unit. The value of 702 NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA 703 access unit, and each CRA access unit that is the first access 704 unit in the bitstream in decoding order, is the first access unit 705 that follows an end of sequence NAL unit in decoding order, or 706 has HandleCraAsBlaFlag equal to 1. 708 CRA access unit: An access unit in which the coded picture is a CRA 709 picture. 711 CRA picture: A RAP picture for which each VCL NAL unit has 712 nal_unit_type equal to CRA_NUT. 714 IDR access unit: An access unit in which the coded picture is an IDR 715 picture. 717 IDR picture: A RAP picture for which each VCL NAL unit has 718 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 720 IRAP access unit: An access unit in which the coded picture is an 721 IRAP picture. 723 IRAP picture: A coded picture for which each VCL NAL unit has 724 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23), 725 inclusive. 727 layer: A set of VCL NAL units that all have a particular value of 728 nuh_layer_id and the associated non-VCL NAL units, or one of a set 729 of syntactical structures having a hierarchical relationship. 731 operation point: bitstream created from another bitstream by 732 operation of the sub-bitstream extraction process with the another 733 bitstream, a target highest TemporalId, and a target layer 734 identifier list as inputs. 736 random access: The act of starting the decoding process for a 737 bitstream at a point other than the beginning of the bitstream. 739 sub-layer: A temporal scalable layer of a temporal scalable 740 bitstream consisting of VCL NAL units with a particular value of the 741 TemporalId variable, and the associated non-VCL NAL units. 743 sub-layer representation: A subset of the bitstream consisting of 744 NAL units of a particular sub-layer and the lower sub-layers. 746 tile: A rectangular region of coding tree blocks within a particular 747 tile column and a particular tile row in a picture. 749 tile column: A rectangular region of coding tree blocks having a 750 height equal to the height of the picture and a width specified by 751 syntax elements in the picture parameter set. 753 tile row: A rectangular region of coding tree blocks having a height 754 specified by syntax elements in the picture parameter set and a 755 width equal to the width of the picture. 757 3.1.2 Definitions Specific to This Memo 759 dependee RTP stream: An RTP stream on which another RTP stream 760 depends. All RTP streams in an MSM except for the highest RTP 761 stream are dependee RTP streams. 763 highest RTP stream: The RTP stream on which no other RTP stream 764 depends. The RTP stream in an SSM is the highest RTP stream. 766 media aware network element (MANE): A network element, such as a 767 middlebox, selective forwarding unit, or application layer gateway 768 that is capable of parsing certain aspects of the RTP payload 769 headers or the RTP payload and reacting to their contents. 771 Informative note: The concept of a MANE goes beyond normal 772 routers or gateways in that a MANE has to be aware of the 773 signaling (e.g. to learn about the payload type mappings of the 774 media streams), and in that it has to be trusted when working 775 with SRTP. The advantage of using MANEs is that they allow 776 packets to be dropped according to the needs of the media coding. 777 For example, if a MANE has to drop packets due to congestion on a 778 certain link, it can identify and remove those packets whose 779 elimination produces the least adverse effect on the user 780 experience. After dropping packets, MANEs must rewrite RTCP 781 packets to match the changes to the RTP stream as specified in 782 Section 7 of [RFC3550]. 784 multi-stream mode(MSM): Transmission of an HEVC bitstream using more 785 than one RTP stream. 787 NAL unit decoding order: A NAL unit order that conforms to the 788 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 790 NAL-unit-like structure: A data structure that is similar to NAL 791 units in the sense that it also has a NAL unit header and a payload, 792 with a difference that the payload does not follow the start code 793 emulation prevention mechanism required for the NAL unit syntax as 794 specified in Section 7.3.1.1 of [HEVC]. Examples NAL-unit-like 795 structures defined in this memo are packet payloads of AP, PACI, and 796 FU packets. 798 NALU-time: The value that the RTP timestamp would have if the NAL 799 unit would be transported in its own RTP packet. 801 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within the 802 scope of this memo, one RTP stream is utilized to transport one or 803 more temporal sub-layers. 805 single-stream mode (SSM): Transmission of an HEVC bitstream using 806 only one RTP stream. 808 transmission order: The order of packets in ascending RTP sequence 809 number order (in modulo arithmetic). Within an aggregation packet, 810 the NAL unit transmission order is the same as the order of 811 appearance of NAL units in the packet. 813 3.2 Abbreviations 815 AP Aggregation Packet 817 BLA Broken Link Access 819 CRA Clean Random Access 821 CTB Coding Tree Block 823 CTU Coding Tree Unit 825 CVS Coded Video Sequence 827 DPH Decoded Picture Hash 829 FU Fragmentation Unit 831 GDR Gradual Decoding Refresh 833 HRD Hypothetical Reference Decoder 835 IDR Instantaneous Decoding Refresh 837 IRAP Intra Random Access Point 839 MANE Media Aware Network Element 841 MSM Multi-Stream Mode 843 MTU Maximum Transfer Unit 845 NAL Network Abstraction Layer 847 NALU Network Abstraction Layer Unit 849 PACI PAyload Content Information 851 PHES Payload Header Extension Structure 853 PPS Picture Parameter Set 855 RADL Random Access Decodable Leading (Picture) 856 RASL Random Access Skipped Leading (Picture) 858 RPS Reference Picture Set 860 SEI Supplemental Enhancement Information 862 SPS Sequence Parameter Set 864 SSM Single-Stream Mode 866 STSA Step-wise Temporal Sub-layer Access 868 TSA Temporal Sub-layer Access 870 TCSI Temporal Scalability Control Information 872 VCL Video Coding Layer 874 VPS Video Parameter Set 876 4 RTP Payload Format 878 4.1 RTP Header Usage 880 The format of the RTP header is specified in [RFC3550] and reprinted 881 in Figure 2 for convenience. This payload format uses the fields of 882 the header in a manner consistent with that specification. 884 The RTP payload (and the settings for some RTP header bits) for 885 aggregation packets and fragmentation units are specified in 886 Sections 4.7 and 4.8, respectively. 888 0 1 2 3 889 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 890 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 891 |V=2|P|X| CC |M| PT | sequence number | 892 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 893 | timestamp | 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 895 | synchronization source (SSRC) identifier | 896 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 897 | contributing source (CSRC) identifiers | 898 | .... | 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 Figure 2 RTP header according to [RFC3550] 903 The RTP header information to be set according to this RTP payload 904 format is set as follows: 906 Marker bit (M): 1 bit 908 Set for the last packet, carried in the current RTP stream, of 909 the access unit, in line with the normal use of the M bit in 910 video formats, to allow an efficient playout buffer handling. 911 When MSM is in use, if an access unit appears in multiple RTP 912 streams, the marker bit is set on each RTP stream's last packet 913 of the access unit. 915 Informative note: The content of a NAL unit does not tell 916 whether or not the NAL unit is the last NAL unit, in decoding 917 order, of an access unit. An RTP sender implementation may 918 obtain this information from the video encoder. If, however, 919 the implementation cannot obtain this information directly 920 from the encoder, e.g. when the bitstream was pre-encoded, and 921 also there is no timestamp allocated for each NAL unit, then 922 the sender implementation can inspect subsequent NAL units in 923 decoding order to determine whether or not the NAL unit is the 924 last NAL unit of an access unit as follows. A NAL unit naluX 925 is the last NAL unit of an access unit if it is the last NAL 926 unit of the bitstream or the next VCL NAL unit naluY in 927 decoding order has the high-order bit of the first byte after 928 its NAL unit header equal to 1, and all NAL units between 929 naluX and naluY, when present, have nal_unit_type in the range 930 of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to 931 44, inclusive, or 48 to 55, inclusive. 933 Payload type (PT): 7 bits 935 The assignment of an RTP payload type for this new packet format 936 is outside the scope of this document and will not be specified 937 here. The assignment of a payload type has to be performed 938 either through the profile used or in a dynamic way. 940 Informative note: It is not required to use different payload 941 type values for different RTP streams in MSM. 943 Sequence number (SN): 16 bits 945 Set and used in accordance with RFC 3550. 947 Timestamp: 32 bits 949 The RTP timestamp is set to the sampling timestamp of the 950 content. A 90 kHz clock rate MUST be used. 952 If the NAL unit has no timing properties of its own (e.g. 953 parameter set and SEI NAL units), the RTP timestamp MUST be set 954 to the RTP timestamp of the coded picture of the access unit in 955 which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is 956 included. 958 Receivers MUST use the RTP timestamp for the display process, 959 even when the bitstream contains picture timing SEI messages or 960 decoding unit information SEI messages as specified in [HEVC]. 961 However, this does not mean that picture timing SEI messages in 962 the bitstream should be discarded, as picture timing SEI messages 963 may contain frame-field information that is important in 964 appropriately rendering interlaced video. 966 Synchronization source (SSRC): 32-bits 968 Used to identify the source of the RTP packets. In SSM, by 969 definition a single SSRC is used for all parts of a single 970 bitstream. In MSM, each SSRC is used for an RTP stream 971 containing a subset of the sub-layers for a single (temporally 972 scalable) bitstream. A receiver is required to correctly 973 associate the set of SSRCs that are included parts of the same 974 bitstream. 976 Informative note: The term "bitstream" in this document is 977 equivalent to the term "encoded stream" in [I-D.ietf-avtext- 978 rtp-grouping-taxonomy]. 980 4.2 Payload Header Usage 982 The TID value indicates (among other things) the relative importance 983 of an RTP packet, for example because NAL units belonging to higher 984 temporal sub-layers are not used for the decoding of lower temporal 985 sub-layers. A lower value of TID indicates a higher importance. 986 More important NAL units MAY be better protected against 987 transmission losses than less important NAL units. 989 4.3 Payload Structures 991 The first two bytes of the payload of an RTP packet are referred to 992 as the payload header. The payload header consists of the same 993 fields (F, Type, LayerId, and TID) as the NAL unit header as shown 994 in section 1.1.4, irrespective of the type of the payload structure. 996 Four different types of RTP packet payload structures are specified. 997 A receiver can identify the type of an RTP packet payload through 998 the Type field in the payload header. 1000 The four different payload structures are as follows: 1002 o Single NAL unit packet: Contains a single NAL unit in the 1003 payload, and the NAL unit header of the NAL unit also serves as 1004 the payload header. This payload structure is specified in 1005 section 4.6. 1007 o Aggregation packet (AP): Contains more than one NAL unit within 1008 one access unit. This payload structure is specified in 1009 section 4.7. 1011 o Fragmentation unit (FU): Contains a subset of a single NAL unit. 1012 This payload structure is specified in section 4.8. 1014 o PACI carrying RTP packet: Contains a payload header (that differs 1015 from other payload headers for efficiency), a Payload Header 1016 Extension Structure (PHES), and a PACI payload. This payload 1017 structure is specified in section 4.9. 1019 4.4 Transmission Modes 1021 This memo enables transmission of an HEVC bitstream over a single 1022 RTP stream or multiple RTP streams. The concept and working 1023 principle is inherited from the design of what was called single and 1024 multiple session transmission in [RFC6190] and follows a similar 1025 design. If only one RTP stream is used for transmission of the HEVC 1026 bitstream, the transmission mode is referred to as single-stream 1027 mode (SSM); otherwise (more than one RTP stream is used for 1028 transmission of the HEVC bitstream), the transmission mode is 1029 referred to as multi-stream mode (MSM). 1031 Dependency of one RTP stream on another RTP stream is typically 1032 indicated as specified in [RFC5583]. When an RTP stream A depends 1033 on another RTP stream B, the RTP stream B is referred to as a 1034 dependee RTP stream of the RTP stream A. 1036 Informative note: An MSM may involve one or more RTP sessions. 1037 For example, each RTP stream in an MSM may be in its own RTP 1038 session. For another example, a set of multiple RTP streams in 1039 an MSM may belong to the same RTP session, e.g. as indicated by 1040 the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or 1041 [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1043 SSM SHOULD be used for point-to-point unicast scenarios, while MSM 1044 SHOULD be used for point-to-multipoint multicast scenarios where 1045 different receivers require different operation points of the same 1046 HEVC bitstream, to improve bandwidth utilizing efficiency. 1048 Informative note: A multicast may degrade to a unicast after all 1049 but one receivers have left (this is a justification of the first 1050 "SHOULD" instead of "MUST"), and there might be scenarios where 1051 MSM is desirable but not possible e.g. when IP multicast is not 1052 deployed in certain network (this is a justification of the 1053 second "SHOULD" instead of "MUST"). 1055 The transmission mode is indicated by the tx-mode media parameter 1056 (see section 7.1). If tx-mode is equal to "SSM", SSM MUST be used. 1057 Otherwise (tx-mode is equal to "MSM"), MSM MUST be used. 1059 Receivers MUST support both SSM and MSM. 1061 4.5 Decoding Order Number 1063 For each NAL unit, the variable AbsDon is derived, representing the 1064 decoding order number that is indicative of the NAL unit decoding 1065 order. 1067 Let NAL unit n be the n-th NAL unit in transmission order within an 1068 RTP stream. 1070 If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to 0, 1071 AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal 1072 to n. 1074 Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is 1075 greater than 0), AbsDon[n] is derived as follows, where DON[n] is 1076 the value of the variable DON for NAL unit n: 1078 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in 1079 transmission order), AbsDon[0] is set equal to DON[0]. 1081 o Otherwise (n is greater than 0), the following applies for 1082 derivation of AbsDon[n]: 1084 If DON[n] == DON[n-1], 1085 AbsDon[n] = AbsDon[n-1] 1087 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1088 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1090 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1091 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1093 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1094 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) 1096 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1097 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1099 For any two NAL units m and n, the following applies: 1101 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1102 follows NAL unit m in NAL unit decoding order. 1104 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order 1105 of the two NAL units can be in either order. 1107 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes 1108 NAL unit m in decoding order. 1110 When two consecutive NAL units in the NAL unit decoding order have 1111 different values of AbsDon, the value of AbsDon for the second NAL 1112 unit in decoding order MUST be greater than the value of AbsDon for 1113 the first NAL unit, and the absolute difference between the two 1114 AbsDon values MAY be greater than or equal to 1. 1116 Informative note: There are multiple reasons to allow for the 1117 absolute difference of the values of AbsDon for two consecutive 1118 NAL units in the NAL unit decoding order to be greater than one. 1119 An increment by one is not required, as at the time of 1120 associating values of AbsDon to NAL units, it may not be known 1121 whether all NAL units are to be delivered to the receiver. For 1122 example, a gateway may not forward VCL NAL units of higher sub- 1123 layers or some SEI NAL units when there is congestion in the 1124 network. In another example, the first intra-coded picture of a 1125 pre-encoded clip is transmitted in advance to ensure that it is 1126 readily available in the receiver, and when transmitting the 1127 first intra-coded picture, the originator does not exactly know 1128 how many NAL units will be encoded before the first intra-coded 1129 picture of the pre-encoded clip follows in decoding order. Thus, 1130 the values of AbsDon for the NAL units of the first intra-coded 1131 picture of the pre-encoded clip have to be estimated when they 1132 are transmitted, and gaps in values of AbsDon may occur. Another 1133 example is MSM where the AbsDon values must indicate cross-layer 1134 decoding order for NAL units conveyed in all the RTP streams. 1136 4.6 Single NAL Unit Packets 1138 A single NAL unit packet contains exactly one NAL unit, and consists 1139 of a payload header (denoted as PayloadHdr), a conditional 16-bit 1140 DONL field (in network byte order), and the NAL unit payload data 1141 (the NAL unit excluding its NAL unit header) of the contained NAL 1142 unit, as shown in Figure 3. 1144 0 1 2 3 1145 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 | PayloadHdr | DONL (conditional) | 1148 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1149 | | 1150 | NAL unit payload data | 1151 | | 1152 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1153 | :...OPTIONAL RTP padding | 1154 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1156 Figure 3 The structure a single NAL unit packet 1158 The payload header SHOULD be an exact copy of the NAL unit header of 1159 the contained NAL unit. However, the Type (i.e. nal_unit_type) 1160 field MAY be changed, e.g. when it is desirable to handle a CRA 1161 picture to be a BLA picture [JCTVC-J0107]. 1163 The DONL field, when present, specifies the value of the 16 least 1164 significant bits of the decoding order number of the contained NAL 1165 unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1166 than 0, the DONL field MUST be present, and the variable DON for the 1167 contained NAL unit is derived as equal to the value of the DONL 1168 field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff 1169 is equal to 0), the DONL field MUST NOT be present. 1171 4.7 Aggregation Packets (APs) 1173 Aggregation packets (APs) are introduced to enable the reduction of 1174 packetization overhead for small NAL units, such as most of the non- 1175 VCL NAL units, which are often only a few octets in size. 1177 An AP aggregates NAL units within one access unit. Each NAL unit to 1178 be carried in an AP is encapsulated in an aggregation unit. NAL 1179 units aggregated in one AP are in NAL unit decoding order. 1181 An AP consists of a payload header (denoted as PayloadHdr) followed 1182 by two or more aggregation units, as shown in Figure 4. 1184 0 1 2 3 1185 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1187 | PayloadHdr (Type=48) | | 1188 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1189 | | 1190 | two or more aggregation units | 1191 | | 1192 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1193 | :...OPTIONAL RTP padding | 1194 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1196 Figure 4 The structure of an aggregation packet 1198 The fields in the payload header are set as follows. The F bit MUST 1199 be equal to 0 if the F bit of each aggregated NAL unit is equal to 1200 zero; otherwise, it MUST be equal to 1. The Type field MUST be 1201 equal to 48. The value of LayerId MUST be equal to the lowest value 1202 of LayerId of all the aggregated NAL units. The value of TID MUST 1203 be the lowest value of TID of all the aggregated NAL units. 1205 Informative Note: All VCL NAL units in an AP have the same TID 1206 value since they belong to the same access unit. However, an AP 1207 may contain non-VCL NAL units for which the TID value in the NAL 1208 unit header may be different than the TID value of the VCL NAL 1209 units in the same AP. 1211 An AP MUST carry at least two aggregation units and can carry as 1212 many aggregation units as necessary; however, the total amount of 1213 data in an AP obviously MUST fit into an IP packet, and the size 1214 SHOULD be chosen so that the resulting IP packet is smaller than the 1215 MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain 1216 Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be 1217 nested; i.e. an AP MUST NOT contain another AP. 1219 The first aggregation unit in an AP consists of a conditional 16-bit 1220 DONL field (in network byte order) followed by a 16-bit unsigned 1221 size information (in network byte order) that indicates the size of 1222 the NAL unit in bytes (excluding these two octets, but including the 1223 NAL unit header), followed by the NAL unit itself, including its NAL 1224 unit header, as shown in Figure 5. 1226 0 1 2 3 1227 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1229 : DONL (conditional) | NALU size | 1230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 | NALU size | | 1232 +-+-+-+-+-+-+-+-+ NAL unit | 1233 | | 1234 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1235 | : 1236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1238 Figure 5 The structure of the first aggregation unit in an AP 1240 The DONL field, when present, specifies the value of the 16 least 1241 significant bits of the decoding order number of the aggregated NAL 1242 unit. 1244 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1245 0, the DONL field MUST be present in an aggregation unit that is the 1246 first aggregation unit in an AP, and the variable DON for the 1247 aggregated NAL unit is derived as equal to the value of the DONL 1248 field. Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff 1249 is equal to 0), the DONL field MUST NOT be present in an aggregation 1250 unit that is the first aggregation unit in an AP. 1252 An aggregation unit that is not the first aggregation unit in an AP 1253 consists of a conditional 8-bit DOND field followed by a 16-bit 1254 unsigned size information (in network byte order) that indicates the 1255 size of the NAL unit in bytes (excluding these two octets, but 1256 including the NAL unit header), followed by the NAL unit itself, 1257 including its NAL unit header, as shown in Figure 6. 1259 0 1 2 3 1260 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1262 : DOND (cond) | NALU size | 1263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1264 | | 1265 | NAL unit | 1266 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 | : 1268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 Figure 6 The structure of an aggregation unit that is not the first 1271 aggregation unit in an AP 1273 When present, the DOND field plus 1 specifies the difference between 1274 the decoding order number values of the current aggregated NAL unit 1275 and the preceding aggregated NAL unit in the same AP. 1277 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1278 0, the DOND field MUST be present in an aggregation unit that is not 1279 the first aggregation unit in an AP, and the variable DON for the 1280 aggregated NAL unit is derived as equal to the DON of the preceding 1281 aggregated NAL unit in the same AP plus the value of the DOND field 1282 plus 1 modulo 65536. Otherwise (tx-mode is equal to "SSM" and 1283 sprop-max-don-diff is equal to 0), the DOND field MUST NOT be 1284 present in an aggregation unit that is not the first aggregation 1285 unit in an AP, and in this case the transmission order and decoding 1286 order of NAL units carried in the AP are the same as the order the 1287 NAL units appear in the AP. 1289 Figure 7 presents an example of an AP that contains two aggregation 1290 units, labeled as 1 and 2 in the figure, without the DONL and DOND 1291 fields being present. 1293 0 1 2 3 1294 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1296 | RTP Header | 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 | PayloadHdr (Type=48) | NALU 1 Size | 1299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1300 | NALU 1 HDR | | 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1302 | . . . | 1303 | | 1304 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1305 | . . . | NALU 2 Size | NALU 2 HDR | 1306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | NALU 2 HDR | | 1308 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1309 | . . . | 1310 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 | :...OPTIONAL RTP padding | 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1314 Figure 7 An example of an AP packet containing two aggregation units 1315 without the DONL and DOND fields 1317 Figure 8 presents an example of an AP that contains two aggregation 1318 units, labeled as 1 and 2 in the figure, with the DONL and DOND 1319 fields being present. 1321 0 1 2 3 1322 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1324 | RTP Header | 1325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1326 | PayloadHdr (Type=48) | NALU 1 DONL | 1327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1328 | NALU 1 Size | NALU 1 HDR | 1329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1330 | | 1331 | NALU 1 Data . . . | 1332 | | 1333 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1334 | | NALU 2 DOND | NALU 2 Size | 1335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 | NALU 2 HDR | | 1337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1338 | | 1339 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1340 | :...OPTIONAL RTP padding | 1341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1343 Figure 8 An example of an AP containing two aggregation units with 1344 the DONL and DOND fields 1346 4.8 Fragmentation Units (FUs) 1348 Fragmentation units (FUs) are introduced to enable fragmenting a single 1349 NAL unit into multiple RTP packets, possibly without cooperation or 1350 knowledge of the HEVC encoder. A fragment of a NAL unit consists of 1351 an integer number of consecutive octets of that NAL unit. Fragments 1352 of the same NAL unit MUST be sent in consecutive order with ascending 1353 RTP sequence numbers (with no other RTP packets within the same RTP 1354 stream being sent between the first and last fragment). 1356 When a NAL unit is fragmented and conveyed within FUs, it is 1357 referred to as a fragmented NAL unit. APs MUST NOT be fragmented. 1358 FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of 1359 another FU. 1361 The RTP timestamp of an RTP packet carrying an FU is set to the 1362 NALU-time of the fragmented NAL unit. 1364 An FU consists of a payload header (denoted as PayloadHdr), an FU 1365 header of one octet, a conditional 16-bit DONL field (in network 1366 byte order), and an FU payload, as shown in Figure 9. 1368 0 1 2 3 1369 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1371 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1373 | DONL (cond) | | 1374 |-+-+-+-+-+-+-+-+ | 1375 | FU payload | 1376 | | 1377 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1378 | :...OPTIONAL RTP padding | 1379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1381 Figure 9 The structure of an FU 1383 The fields in the payload header are set as follows. The Type field 1384 MUST be equal to 49. The fields F, LayerId, and TID MUST be equal 1385 to the fields F, LayerId, and TID, respectively, of the fragmented 1386 NAL unit. 1388 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1389 field, as shown in Figure 10. 1391 +---------------+ 1392 |0|1|2|3|4|5|6|7| 1393 +-+-+-+-+-+-+-+-+ 1394 |S|E| FuType | 1395 +---------------+ 1397 Figure 10 The structure of FU header 1399 The semantics of the FU header fields are as follows: 1400 S: 1 bit 1401 When set to one, the S bit indicates the start of a fragmented 1402 NAL unit i.e. the first byte of the FU payload is also the first 1403 byte of the payload of the fragmented NAL unit. When the FU 1404 payload is not the start of the fragmented NAL unit payload, the 1405 S bit MUST be set to zero. 1407 E: 1 bit 1408 When set to one, the E bit indicates the end of a fragmented NAL 1409 unit, i.e. the last byte of the payload is also the last byte of 1410 the fragmented NAL unit. When the FU payload is not the last 1411 fragment of a fragmented NAL unit, the E bit MUST be set to zero. 1413 FuType: 6 bits 1414 The field FuType MUST be equal to the field Type of the 1415 fragmented NAL unit. 1417 The DONL field, when present, specifies the value of the 16 least 1418 significant bits of the decoding order number of the fragmented NAL 1419 unit. 1421 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1422 0, and the S bit is equal to 1, the DONL field MUST be present in 1423 the FU, and the variable DON for the fragmented NAL unit is derived 1424 as equal to the value of the DONL field. Otherwise (tx-mode is 1425 equal to "SSM" and sprop-max-don-diff is equal to 0, or the S bit is 1426 equal to 0), the DONL field MUST NOT be present in the FU. 1428 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1429 the Start bit and End bit MUST NOT both be set to one in the same FU 1430 header. 1432 The FU payload consists of fragments of the payload of the 1433 fragmented NAL unit so that if the FU payloads of consecutive FUs, 1434 starting with an FU with the S bit equal to 1 and ending with an FU 1435 with the E bit equal to 1, are sequentially concatenated, the 1436 payload of the fragmented NAL unit can be reconstructed. The NAL 1437 unit header of the fragmented NAL unit is not included as such in 1438 the FU payload, but rather the information of the NAL unit header of 1439 the fragmented NAL unit is conveyed in F, LayerId, and TID fields of 1440 the FU payload headers of the FUs and the FuType field of the FU 1441 header of the FUs. An FU payload MUST not be empty. 1443 If an FU is lost, the receiver SHOULD discard all following 1444 fragmentation units in transmission order corresponding to the same 1445 fragmented NAL unit, unless the decoder in the receiver is known to 1446 be prepared to gracefully handle incomplete NAL units. 1448 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1449 fragments of a NAL unit to an (incomplete) NAL unit, even if 1450 fragment n of that NAL unit is not received. In this case, the 1451 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1452 syntax violation. 1454 4.9 PACI packets 1456 This section specifies the PACI packet structure. The basic payload 1457 header specified in this memo is intentionally limited to the 16 1458 bits of the NAL unit header so to keep the packetization overhead to 1459 a minimum. However, cases have been identified where it is 1460 advisable to include control information in an easily accessible 1461 position in the packet header, despite the additional overhead. One 1462 such control information is the Temporal Scalability Control 1463 Information as specified in section 4.10 below. PACI packets carry 1464 this and future, similar structures. 1466 The PACI packet structure is based on a payload header extension 1467 mechanism that is generic and extensible to carry payload header 1468 extensions. In this section, the focus lies on the use within this 1469 specification. Section 4.9.2 below provides guidance for the 1470 specification designers in how to employ the extension mechanism in 1471 future specifications. 1473 A PACI packet consists of a payload header (denoted as PayloadHdr), 1474 for which the structure follows what is described in section 4.3 1475 above. The payload header is followed by the fields A, cType, 1476 PHSsize, F[0..2] and Y. 1478 Figure 11 shows a PACI packet in compliance with this memo; that is, 1479 without any extensions. 1481 0 1 2 3 1482 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1484 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1486 | Payload Header Extension Structure (PHES) | 1487 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1488 | | 1489 | PACI payload: NAL unit | 1490 | . . . | 1491 | | 1492 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1493 | :...OPTIONAL RTP padding | 1494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1496 Figure 11 The structure of a PACI 1498 The fields in the payload header are set as follows. The F bit MUST 1499 be equal to 0. The Type field MUST be equal to 50. The value of 1500 LayerId MUST be a copy of the LayerId field of the PACI payload NAL 1501 unit or NAL-unit-like structure. The value of TID MUST be a copy of 1502 the TID field of the PACI payload NAL unit or NAL-unit-like 1503 structure. 1505 The semantics of other fields are as follows: 1507 A: 1 bit 1508 Copy of the F bit of the PACI payload NAL unit or NAL-unit-like 1509 structure. 1511 cType: 6 bits 1512 Copy of the Type field of the PACI payload NAL unit or NAL-unit- 1513 like structure. 1515 PHSsize: 5 bits 1516 Indicates the length of the PHES field. The value is limited to 1517 be less than or equal to 32 octets, to simplify encoder design 1518 for MTU size matching. 1520 F0 1521 This field equal to 1 specifies the presence of a temporal 1522 scalability support extension in the PHES. 1524 F1, F2 1525 MUST be 0, available for future extensions, see section 4.9.2. 1527 Y: 1 bit 1528 MUST be 0, available for future extensions, see section 4.9.2. 1530 PHES: variable number of octets 1531 A variable number of octets as indicated by the value of PHSsize. 1533 PACI Payload 1534 The single NAL unit packet or NAL-unit-like structure (such as: 1535 FU or AP) to be carried, not including the first two octets. 1537 Informative note: The first two octets of the NAL unit or NAL- 1538 unit-like structure carried in the PACI payload are not 1539 included in the PACI payload. Rather, the respective values 1540 are copied in locations of the PayloadHdr of the RTP packet. 1541 This design offers two advantages: first, the overall 1542 structure of the payload header is preserved, i.e. there is no 1543 special case of payload header structure that needs to be 1544 implemented for PACI. Second, no additional overhead is 1545 introduced. 1547 A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs 1548 MUST NOT be fragmented or aggregated. The following subsection 1549 documents the reasons for these design choices. 1551 4.9.1 Reasons for the PACI rules (informative) 1553 A PACI cannot be fragmented. If a PACI could be fragmented, and a 1554 fragment other than the first fragment would get lost, access to the 1555 information in the PACI would not be possible. Therefore, a PACI 1556 must not be fragmented. In other words, an FU must not carry 1557 (fragments of) a PACI. 1559 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1560 from a compression viewpoint, as, in many cases, several to be 1561 aggregated NAL units would share identical PACI fields and values 1562 which would be carried redundantly for no reason. Most, if not all 1563 the practical effects of PACI aggregation can be achieved by 1564 aggregating NAL units and bundling them with a PACI (see below). 1565 Therefore, a PACI must not be aggregated. In other words, an AP 1566 must not contain a PACI. 1568 The payload of a PACI can be a fragment. Both middleboxes and 1569 sending systems with inflexible (often hardware-based) encoders 1570 occasionally find themselves in situations where a PACI and its 1571 headers, combined, are larger than the MTU size. In such a 1572 scenario, the middlebox or sender can fragment the NAL unit and 1573 encapsulate the fragment in a PACI. Doing so preserves the payload 1574 header extension information for all fragments, allowing downstream 1575 middleboxes and the receiver to take advantage of that information. 1576 Therefore, a sender may place a fragment into a PACI, and a receiver 1577 must be able to handle such a PACI. 1579 The payload of a PACI can be an aggregation NAL unit. HEVC 1580 bitstreams can contain unevenly sized and/or small (when compared to 1581 the MTU size) NAL units. In order to efficiently packetize such 1582 small NAL units, AP were introduced. The benefits of APs are 1583 independent from the need for a payload header extension. 1584 Therefore, a sender may place an AP into a PACI, and a receiver must 1585 be able to handle such a PACI. 1587 4.9.2 PACI extensions (Informative) 1589 This subsection includes recommendations for future specification 1590 designers on how to extent the PACI syntax to accommodate future 1591 extensions. Obviously, designers are free to specify whatever appears 1592 to be appropriate to them at the time of their design. However, a lot 1593 of thought has been invested into the extension mechanism described 1594 below, and we suggest that deviations from it warrant a good 1595 explanation. 1597 This memo defines only a single payload header extension (Temporal 1598 Scalability Control Information, described below in section 4.10), 1599 and, therefore, only the F0 bit carries semantics. F1 and F2 are 1600 already named (and not just marked as reserved, as a typical video 1601 spec designer would do). They are intended to signal two additional 1602 extensions. The Y bit allows to, recursively, add further F and Y 1603 bits to extend the mechanism beyond 3 possible payload header 1604 extensions. It is suggested to define a new packet type (using a 1605 different value for Type) when assigning the F1, F2, or Y bits 1606 different semantics than what is suggested below. 1608 When a Y bit is set, an 8 bit flag-extension is inserted after the Y 1609 bit. A flag-extension consists of 7 flags F[n..n+6], and another Y 1610 bit. 1612 The basic PACI header already includes F0, F1, and F2. Therefore, 1613 the Fx bits in the first flag-extensions are numbered F3, F4, ..., 1614 F9, the F bits in the second flag-extension are numbered F10, F11, 1615 ..., F16, and so forth. As a result, at least 3 Fx bits are always 1616 in the PACI, but the number of Fx bits (and associated types of 1617 extensions), can be increased by setting the next Y bit and adding 1618 an octet of flag-extensions, carrying 7 flags and another Y bit. 1619 The size of this list of flags is subject to the limits specified in 1620 section 4.9 (32 octets for all flag-extensions and the PHES 1621 information combined). 1623 Each of the F bits can indicate either the presence of information in 1624 the Payload Header Extension Structure (PHES), described below, or a 1625 given F bit can indicate a certain condition, without including 1626 additional information in the PHES. 1628 When a spec developer devises a new syntax that takes advantage of the 1629 PACI extension mechanism, he/she must follow the constraints listed 1630 below; otherwise the extension mechanism may break. 1632 1) The fields added for a particular Fx bit MUST be fixed in length 1633 and not depend on what other Fx bits are set (no parsing 1634 dependency). 1635 2) The Fx bits must be assigned in order. 1636 3) An implementation that supports the n-th Fn bit for any value of 1637 n must understand the syntax (though not necessarily the 1638 semantics) of the fields Fk (with k < n), so to be able to either 1639 use those bits when present, or at least be able to skip over 1640 them. 1642 4.10 Temporal Scalability Control Information 1644 This section describes the single payload header extension defined 1645 in this specification, known as Temporal Scalability Control 1646 Information (TSCI). If, in the future, additional payload header 1647 extensions become necessary, they could be specified in this section 1648 of an updated version of this document, or in their own documents. 1650 When F0 is set to 1 in a PACI, this specifies that the PHES field 1651 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows: 1653 0 1 2 3 1654 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1656 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1658 | TL0PICIDX | IrapPicID |S|E| RES | | 1659 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1660 | .... | 1661 | PACI payload: NAL unit | 1662 | | 1663 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1664 | :...OPTIONAL RTP padding | 1665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1667 Figure 12 The structure of a PACI with a PHES containing a TSCI 1669 TL0PICIDX (8 bits) 1670 When present, the TL0PICIDX field MUST be set to equal to 1671 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1672 [H.265] for the access unit containing the NAL unit in the PACI. 1674 IrapPicID (8 bits) 1675 When present, the IrapPicID field MUST be set to equal to 1676 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1677 access unit containing the NAL unit in the PACI. 1679 S (1 bit) 1680 The S bit MUST be set to 1 if any of the following conditions is 1681 true and MUST be set to 0 otherwise: 1683 . The NAL unit in the payload of the PACI is the first VCL NAL 1684 unit, in decoding order, of a picture. 1685 . The NAL unit in the payload of the PACI is an AP and the NAL 1686 unit in the first contained aggregation unit is the first VCL 1687 NAL unit, in decoding order, of a picture. 1688 . The NAL unit in the payload of the PACI is an FU with its S bit 1689 equal to 1 and the FU payload containing a fragment of the 1690 first VCL NAL unit, in decoding order of a picture. 1692 E (1 bit) 1693 The E bit MUST be set to 1 if any of the following conditions is 1694 true and MUST be set to 0 otherwise: 1696 . The NAL unit in the payload of the PACI is the last VCL NAL 1697 unit, in decoding order, of a picture. 1698 . The NAL unit in the payload of the PACI is an AP and the NAL 1699 unit in the last contained aggregation unit is the last VCL NAL 1700 unit, in decoding order, of a picture. 1701 . The NAL unit in the payload of the PACI is an FU with its E bit 1702 equal to 1 and the FU payload containing a fragment of the last 1703 VCL NAL unit, in decoding order of a picture. 1705 RES (6 bits) 1706 MUST be equal to 0. Reserved for future extensions. 1708 The value of PHSsize MUST be set to 3. Receivers MUST allow other 1709 values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any 1710 additional fields, when present, than specified above in the PHES. 1712 5 Packetization Rules 1714 The following packetization rules apply: 1716 o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than 1717 0 for an RTP stream, the transmission order of NAL units carried in 1718 the RTP stream MAY be different than the NAL unit decoding order. 1719 Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is equal 1720 to 0 for an RTP stream), the transmission order of NAL units carried 1721 in the RTP stream MUST be the same as the NAL unit decoding order. 1723 o A NAL unit of a small size SHOULD be encapsulated in an 1724 aggregation packet together with one or more other NAL units in 1725 order to avoid the unnecessary packetization overhead for small 1726 NAL units. For example, non-VCL NAL units such as access unit 1727 delimiters, parameter sets, or SEI NAL units are typically small 1728 and can often be aggregated with VCL NAL units without violating 1729 MTU size constraints. 1731 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1732 match viewpoint, be encapsulated in an aggregation packet 1733 together with its associated VCL NAL unit, as typically a non-VCL 1734 NAL unit would be meaningless without the associated VCL NAL unit 1735 being available. 1737 o For carrying exactly one NAL unit in an RTP packet, a single NAL 1738 unit packet MUST be used. 1740 6 De-packetization Process 1742 The general concept behind de-packetization is to get the NAL units 1743 out of the RTP packets in an RTP stream and all RTP streams the RTP 1744 stream depends on, if any, and pass them to the decoder in the NAL 1745 unit decoding order. 1747 The de-packetization process is implementation dependent. 1748 Therefore, the following description should be seen as an example of 1749 a suitable implementation. Other schemes may be used as well as 1750 long as the output for the same input is the same as the process 1751 described below. The output is the same when the set of output NAL 1752 units and their order are both identical. Optimizations relative to 1753 the described algorithms are possible. 1755 All normal RTP mechanisms related to buffer management apply. In 1756 particular, duplicated or outdated RTP packets (as indicated by the 1757 RTP sequences number and the RTP timestamp) are removed. To 1758 determine the exact time for decoding, factors such as a possible 1759 intentional delay to allow for proper inter-stream synchronization 1760 must be factored in. 1762 NAL units with NAL unit type values in the range of 0 to 47, 1763 inclusive may be passed to the decoder. NAL-unit-like structures 1764 with NAL unit type values in the range of 48 to 63, inclusive, MUST 1765 NOT be passed to the decoder. 1767 The receiver includes a receiver buffer, which is used to compensate 1768 for transmission delay jitter within individual RTP streams and 1769 across RTP streams, to reorder NAL units from transmission order to 1770 the NAL unit decoding order, and to recover the NAL unit decoding 1771 order in MSM, when applicable. In this section, the receiver 1772 operation is described under the assumption that there is no 1773 transmission delay jitter within an RTP stream and across RTP 1774 streams. To make a difference from a practical receiver buffer that 1775 is also used for compensation of transmission delay jitter, the 1776 receiver buffer is here after called the de-packetization buffer in 1777 this section. Receivers should also prepare for transmission delay 1778 jitter; i.e. either reserve separate buffers for transmission delay 1779 jitter buffering and de-packetization buffering or use a receiver 1780 buffer for both transmission delay jitter and de-packetization. 1781 Moreover, receivers should take transmission delay jitter into 1782 account in the buffering operation; e.g. by additional initial 1783 buffering before starting of decoding and playback. 1785 If only one RTP stream is being received and sprop-max-don-diff of 1786 the only RTP stream being received is equal to 0, the de- 1787 packetization buffer size is zero bytes, i.e. the NAL units carried 1788 in the RTP stream are directly passed to the decoder in their 1789 transmission order, which is identical to the decoding order of the 1790 NAL units. Otherwise, the process described in the remainder of this 1791 section applies. 1793 There are two buffering states in the receiver: initial buffering 1794 and buffering while playing. Initial buffering starts when the 1795 reception is initialized. After initial buffering, decoding and 1796 playback are started, and the buffering-while-playing mode is used. 1798 Regardless of the buffering state, the receiver stores incoming NAL 1799 units, in reception order, into the de-packetization buffer. NAL 1800 units carried in RTP packets are stored in the de-packetization 1801 buffer individually, and the value of AbsDon is calculated and 1802 stored for each NAL unit. When MSM is in use, NAL units of all RTP 1803 streams of a bitstream are stored in the same de-packetization 1804 buffer. When NAL units carried in any two RTP streams are available 1805 to be placed into the de-packetization buffer, those NAL units 1806 carried in the RTP stream that is lower in the dependency tree are 1807 placed into the buffer first. For example, if RTP stream A depends 1808 on RTP stream B, then NAL units carried in RTP stream B are placed 1809 into the buffer first. 1811 Initial buffering lasts until condition A (the difference between 1812 the greatest and smallest AbsDon values of the NAL units in the de- 1813 packetization buffer is greater than or equal to the value of sprop- 1814 max-don-diff of the highest RTP stream) or condition B (the number 1815 of NAL units in the de-packetization buffer is greater than the 1816 value of sprop-depack-buf-nalus) is true. 1818 After initial buffering, whenever condition A or condition B is 1819 true, the following operation is repeatedly applied until both 1820 condition A and condition A become false: 1822 o The NAL unit in the de-packetization buffer with the smallest 1823 value of AbsDon is removed from the de-packetization buffer and 1824 passed to the decoder. 1826 When no more NAL units are flowing into the de-packetization buffer, 1827 all NAL units remaining in the de-packetization buffer are removed 1828 from the buffer and passed to the decoder in the order of increasing 1829 AbsDon values. 1831 7 Payload Format Parameters 1833 This section specifies the parameters that MAY be used to select 1834 optional features of the payload format and certain features or 1835 properties of the bitstream or the RTP stream. The parameters are 1836 specified here as part of the media type registration for the HEVC 1837 codec. A mapping of the parameters into the Session Description 1838 Protocol (SDP) [RFC4566] is also provided for applications that use 1839 SDP. Equivalent parameters could be defined elsewhere for use with 1840 control protocols that do not use SDP. 1842 7.1 Media Type Registration 1844 The media subtype for the HEVC codec is allocated from the IETF 1845 tree. 1847 The receiver MUST ignore any unrecognized parameter. 1849 Media Type name: video 1851 Media subtype name: H265 1853 Required parameters: none 1855 OPTIONAL parameters: 1857 profile-space, tier-flag, profile-id, profile-compatibility- 1858 indicator, interop-constraints, and level-id: 1860 These parameters indicate the profile, tier, default level, 1861 and some constraints of the bitstream carried by the RTP 1862 stream and all RTP streams the RTP stream depends on, or a 1863 specific set of the profile, tier, default level, and some 1864 constraints the receiver supports. 1866 The profile and some constraints are indicated collectively by 1867 profile-space, profile-id, profile-compatibility-indicator, 1868 and interop-constraints. The profile specifies the subset of 1869 coding tools that may have been used to generate the bitstream 1870 or that the receiver supports. 1872 Informative note: There are 32 values of profile-id, and 1873 there are 32 flags in profile-compatibility-indicator, each 1874 flag corresponding to one value of profile-id. According 1875 to HEVC version 1 in [HEVC], when more than one of the 32 1876 flags is set for a bitstream, the bitstream would comply 1877 with all the profiles corresponding to the set flags. 1878 However, in a draft of HEVC version 2 in [HEVC draft v2], 1879 subclause A.3.5, 19 Format Range Extensions profiles have 1880 been specified, all using the same value of profile-id (4), 1881 differentiated by some of the 48 bits in interop- 1882 constraints - this (rather unexpected way of profile 1883 signalling) means that one of the 32 flags may correspond 1884 to multiple profiles. To be able to support whatever HEVC 1885 extension profile that might be specified and indicated 1886 using profile-space, profile-id, profile-compatibility- 1887 indicator, and interop-constraints in the future, it would 1888 be safe to require symmetric use of these parameters in SDP 1889 offer/answer unless recv-sub-layer-id is included in the 1890 SDP answer for choosing one of the sub-layers offered. 1892 The tier is indicated by tier-flag. The default level is 1893 indicated by level-id. The tier and the default level specify 1894 the limits on values of syntax elements or arithmetic 1895 combinations of values of syntax elements that are followed 1896 when generating the bitstream or that the receiver supports. 1898 A set of profile-space, tier-flag, profile-id, profile- 1899 compatibility-indicator, interop-constraints, and level-id 1900 parameters ptlA is said to be consistent with another set of 1901 these parameters ptlB if any decoder that conforms to the 1902 profile, tier, level, and constraints indicated by ptlB can 1903 decode any bitstream that conforms to the profile, tier, 1904 level, and constraints indicated by ptlA. 1906 In SDP offer/answer, when the SDP answer does not include the 1907 recv-sub-layer-id parameter that is less than the sprop-sub- 1908 layer-id parameter in the SDP offer, the following applies: 1910 o The profile-space, tier-flag, profile-id, profile- 1911 compatibility-indicator, and interop-constraints 1912 parameters MUST be used symmetrically, i.e. the value of 1913 each of these parameters in the offer MUST be the same as 1914 that in the answer, either explicitly signalled or 1915 implicitly inferred. 1916 o The level-id parameter is changeable as long as the 1917 highest level indicated by the answer is either equal to 1918 or lower than that in the offer. Note that the highest 1919 level is indicated by level-id and max-recv-level-id 1920 together. 1922 In SDP offer/answer, when the SDP answer does include the 1923 recv-sub-layer-id parameter that is less than the sprop-sub- 1924 layer-id parameter in the SDP offer, the set of profile-space, 1925 tier-flag, profile-id, profile-compatibility-indicator, 1926 interop-constraints, and level-id parameters included in the 1927 answer MUST be consistent with that for the chosen sub-layer 1928 representation as indicated in the SDP offer, with the 1929 exception that the level-id parameter in the SDP answer is 1930 changable as long as the highest level indicated by the answer 1931 is either lower than or equal to that in the offer. 1933 More specifications of these parameters, including how they 1934 relate to the values of the profile, tier, and level syntax 1935 elements specified in [HEVC] are provided below. 1937 profile-space, profile-id: 1939 The value of profile-space MUST be in the range of 0 to 3, 1940 inclusive. The value of profile-id MUST be in the range of 0 1941 to 31, inclusive. 1943 When profile-space is not present, a value of 0 MUST be 1944 inferred. When profile-id is not present, a value of 1 (i.e. 1945 the Main profile) MUST be inferred. 1947 When used to indicate properties of a bitstream, profile-space 1948 and profile-id are derived from the profile, tier, and level 1949 syntax elements in SPS or VPS NAL units as follows, where 1950 general_profile_space, general_profile_idc, 1951 sub_layer_profile_space[j], and sub_layer_profile_idc[j] are 1952 specified in [HEVC]: 1954 If the RTP stream is the highest RTP stream, the following 1955 applies: 1957 o profile_space = general_profile_space 1958 o profile_id = general_profile_idc 1960 Otherwise (the RTP stream is a dependee RTP stream), the 1961 following applies, with j being the value of the sprop-sub- 1962 layer-id parameter: 1964 o profile_space = sub_layer_profile_space[j] 1965 o profile_id = sub_layer_profile_idc[j] 1967 tier-flag, level-id: 1969 The value of tier-flag MUST be in the range of 0 to 1, 1970 inclusive. The value of level-id MUST be in the range of 0 1971 to 255, inclusive. 1973 If the tier-flag and level-id parameters are used to indicate 1974 properties of a bitstream, they indicate the tier and the 1975 highest level the bitstream complies with. 1977 If the tier-flag and level-id parameters are used for 1978 capability exchange, the following applies. If max-recv- 1979 level-id is not present, the default level defined by level-id 1980 indicates the highest level the codec wishes to support. 1981 Otherwise, max-recv-level-id indicates the highest level the 1982 codec supports for receiving. For either receiving or 1983 sending, all levels that are lower than the highest level 1984 supported MUST also be supported. 1986 If no tier-flag is present, a value of 0 MUST be inferred and 1987 if no level-id is present, a value of 93 (i.e. level 3.1) MUST 1988 be inferred. 1990 When used to indicate properties of a bitstream, the tier-flag 1991 and level-id parameters are derived from the profile, tier, 1992 and level syntax elements in SPS or VPS NAL units as follows, 1993 where general_tier_flag, general_level_idc, 1994 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 1995 specified in [HEVC]: 1997 If the RTP stream is the highest RTP stream, the following 1998 applies: 2000 o tier-flag = general_tier_flag 2001 o level-id = general_level_idc 2003 Otherwise (the RTP stream is a dependee RTP stream), the 2004 following applies, with j being the value of the sprop-sub- 2005 layer-id parameter: 2007 o tier-flag = sub_layer_tier_flag[j] 2008 o level-id = sub_layer_level_idc[j] 2010 interop-constraints: 2012 A base16 [RFC4648] (hexadecimal) representation of six bytes 2013 of data, consisting of progressive_source_flag, 2014 interlaced_source_flag, non_packed_constraint_flag, 2015 frame_only_constraint_flag, and reserved_zero_44bits. 2017 If the interop-constraints parameter is not present, the 2018 following MUST be inferred: 2020 o progressive_source_flag = 1 2021 o interlaced_source_flag = 0 2022 o non_packed_constraint_flag = 1 2023 o frame_only_constraint_flag = 1 2024 o reserved_zero_44bits = 0 2026 When the interop-constraints parameter is used to indicate 2027 properties of a bitstream, the following applies, where 2028 general_progressive_source_flag, 2029 general_interlaced_source_flag, 2030 general_non_packed_constraint_flag, 2031 general_non_packed_constraint_flag, 2032 general_frame_only_constraint_flag, 2033 general_reserved_zero_44bits, 2034 sub_layer_progressive_source_flag[j], 2035 sub_layer_interlaced_source_flag[j], 2036 sub_layer_non_packed_constraint_flag[j], 2037 sub_layer_frame_only_constraint_flag[j], and 2038 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2040 If the RTP stream is the highest RTP stream, the following 2041 applies: 2043 o progressive_source_flag = general_progressive_source_flag 2044 o interlaced_source_flag = general_interlaced_source_flag 2045 o non_packed_constraint_flag = 2046 general_non_packed_constraint_flag 2047 o frame_only_constraint_flag = 2048 general_frame_only_constraint_flag 2049 o reserved_zero_44bits = general_reserved_zero_44bits 2051 Otherwise (the RTP stream is a dependee RTP stream), the 2052 following applies, with j being the value of the sprop-sub- 2053 layer-id parameter: 2055 o progressive_source_flag = 2056 sub_layer_progressive_source_flag[j] 2057 o interlaced_source_flag = 2058 sub_layer_interlaced_source_flag[j] 2059 o non_packed_constraint_flag = 2060 sub_layer_non_packed_constraint_flag[j] 2061 o frame_only_constraint_flag = 2062 sub_layer_frame_only_constraint_flag[j] 2063 o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] 2065 Using interop-constraints for capability exchange results in a 2066 requirement on any bitstream to be compliant with the interop- 2067 constraints. 2069 profile-compatibility-indicator: 2071 A base16 [RFC4648] representation of four bytes of data. 2073 When profile-compatibility-indicator is used to indicate 2074 properties of a bitstream, the following applies, where 2075 general_profile_compatibility_flag[j] and 2076 sub_layer_profile_compatibility_flag[i][j] are specified in 2077 [HEVC]: 2079 The profile-compatibility-indicator in this case indicates 2080 additional profiles to the profile defined by 2081 profile_space, profile_id, and interop-constraints the 2082 bitstream conforms to. A decoder that conforms to any of 2083 all the profiles the bitstream conforms to would be capable 2084 of decoding the bitstream. These additional profiles are 2085 defined by profile-space, each set bit of profile- 2086 compatibility-indicator, and interop-constraints. 2088 If the RTP stream is the highest RTP stream, the following 2089 applies for each value of j in the range of 0 to 31, 2090 inclusive: 2092 o bit j of profile-compatibility-indicator = 2093 general_profile_compatibility_flag[j] 2095 Otherwise (the RTP stream is a dependee RTP stream), the 2096 following applies for i equal to sprop-sub-layer-id and for 2097 each value of j in the range of 0 to 31, inclusive: 2099 o bit j of profile-compatibility-indicator = 2100 sub_layer_profile_compatibility_flag[i][j] 2102 Using profile-compatibility-indicator for capability exchange 2103 results in a requirement on any bitstream to be compliant with 2104 the profile-compatibility-indicator. This is intended to 2105 handle cases where any future HEVC profile is defined as an 2106 intersection of two or more profiles. 2108 If this parameter is not present, this parameter defaults to 2109 the following: bit j, with j equal to profile-id, of profile- 2110 compatibility-indicator is inferred to be equal to 1, and all 2111 other bits are inferred to be equal to 0. 2113 sprop-sub-layer-id: 2115 This parameter MAY be used to indicate the highest allowed 2116 value of TID in the bitstream. When not present, the value of 2117 sprop-sub-layer-id is inferred to be equal to 6. 2119 The value of sprop-sub-layer-id MUST be in the range of 0 2120 to 6, inclusive. 2122 recv-sub-layer-id: 2124 This parameter MAY be used to signal a receiver's choice of 2125 the offered or declared sub-layer representations in the 2126 sprop-vps. The value of recv-sub-layer-id indicates the TID 2127 of the highest sub-layer of the bitstream that a receiver 2128 supports. When not present, the value of recv-sub-layer-id is 2129 inferred to be equal to the value of the sprop-sub-layer-id 2130 parameter in the SDP offer. 2132 The value of recv-sub-layer-id MUST be in the range of 0 to 6, 2133 inclusive. 2135 max-recv-level-id: 2137 This parameter MAY be used to indicate the highest level a 2138 receiver supports. The highest level the receiver supports is 2139 equal to the value of max-recv-level-id divided by 30. 2141 The value of max-recv-level-id MUST be in the range of 0 2142 to 255, inclusive. 2144 When max-recv-level-id is not present, the value is inferred 2145 to be equal to level-id. 2147 max-recv-level-id MUST NOT be present when the highest level 2148 the receiver supports is not higher than the default level. 2150 tx-mode: 2152 This parameter indicates whether the transmission mode is SSM 2153 or MSM. 2155 The value of tx-mode MUST be equal to either "MSM" or "SSM". 2156 When not present, the value of tx-mode is inferred to be equal 2157 to "SSM". 2159 If the value is equal to "MSM", MSM MUST be in use. Otherwise 2160 (the value is equal to "SSM"), SSM MUST be in use. 2162 The value of tx-mode MUST be equal to "MSM" for all RTP sessions 2163 in an MSM. 2165 sprop-vps: 2167 This parameter MAY be used to convey any video parameter set 2168 NAL unit of the bitstream for out-of-band transmission of 2169 video parameter sets. The parameter MAY also be used for 2170 capability exchange and to indicate sub-stream characteristics 2171 (i.e. properties of sub-layer representations as defined in 2172 [HEVC]). The value of the parameter is a comma-separated 2173 (',') list of base64 [RFC4648] representations of the video 2174 parameter set NAL units as specified in Section 7.3.2.1 of 2175 [HEVC]. 2177 The sprop-vps parameter MAY contain one or more than one video 2178 parameter set NAL unit. However, all other video parameter 2179 sets contained in the sprop-vps parameter MUST be consistent 2180 with the first video parameter set in the sprop-vps parameter. 2181 A video parameter set vpsB is said to be consistent with 2182 another video parameter set vpsA if any decoder that conforms 2183 to the profile, tier, level, and constraints indicated by the 2184 12 bytes of data starting from the syntax element 2185 general_profile_space to the syntax element general_level_id, 2186 inclusive, in the first profile_tier_level( ) syntax structure 2187 in vpsA can decode any bitstream that conforms to the profile, 2188 tier, level, and constraints indicated by the 12 bytes of data 2189 starting from the syntax element general_profile_space to the 2190 syntax element general_level_id, inclusive, in the first 2191 profile_tier_level( ) syntax structure in vpsB. 2193 sprop-sps: 2195 This parameter MAY be used to convey sequence parameter set 2196 NAL units of the bitstream for out-of-band transmission of 2197 sequence parameter sets. The value of the parameter is a 2198 comma-separated (',') list of base64 [RFC4648] representations 2199 of the sequence parameter set NAL units as specified in 2200 Section 7.3.2.2 of [HEVC]. 2202 sprop-pps: 2204 This parameter MAY be used to convey picture parameter set NAL 2205 units of the bitstream for out-of-band transmission of picture 2206 parameter sets. The value of the parameter is a comma- 2207 separated (',') list of base64 [RFC4648] representations of 2208 the picture parameter set NAL units as specified in Section 2209 7.3.2.3 of [HEVC]. 2211 sprop-sei: 2213 This parameter MAY be used to convey one or more SEI messages 2214 that describe bitstream characteristics. When present, a 2215 decoder can rely on the bitstream characteristics that are 2216 described in the SEI messages for the entire duration of the 2217 session, independently from the persistence scopes of the SEI 2218 messages as specified in [HEVC]. 2220 The value of the parameter is a comma-separated (',') list of 2221 base64 [RFC4648] representations of SEI NAL units as specified 2222 in Section 7.3.2.4 of [HEVC]. 2224 Informative note: Intentionally, no list of applicable or 2225 inapplicable SEI messages is specified here. Conveying 2226 certain SEI messages in sprop-sei may be sensible in some 2227 application scenarios and meaningless in others. However, 2228 a few examples are described below: 2230 1) In an environment where the bitstream was created from 2231 film-based source material, and no splicing is going to 2232 occur during the lifetime of the session, the film grain 2233 characteristics SEI message or the tone mapping 2234 information SEI message are likely meaningful, and 2235 sending them in sprop-sei rather than in the bitstream 2236 at each entry point may help saving bits and allows to 2237 configure the renderer only once, avoiding unwanted 2238 artifacts. 2239 2) The structure of pictures information SEI message in 2240 sprop-sei can be used to inform a decoder of information 2241 on the NAL unit types, picture order count values, and 2242 prediction dependencies of a sequence of pictures. 2243 Having such knowledge can be helpful for error recovery. 2244 3) Examples for SEI messages that would be meaningless to 2245 be conveyed in sprop-sei include the decoded picture 2246 hash SEI message (it is close to impossible that all 2247 decoded pictures have the same hash-tag), the display 2248 orientation SEI message when the device is a handheld 2249 device (as the display orientation may change when the 2250 handheld device is turned around), or the filler payload 2251 SEI message (as there is no point in just having more 2252 bits in SDP). 2254 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2256 These parameters MAY be used to signal the capabilities of a 2257 receiver implementation. These parameters MUST NOT be used 2258 for any other purpose. The highest level (specified by max- 2259 recv-level-id) MUST be such that the receiver is fully capable 2260 of supporting. max-lsr, max-lps, max-cpb, max-dpb, max-br, 2261 max-tr, and max-tc MAY be used to indicate capabilities of the 2262 receiver that extend the required capabilities of the highest 2263 level, as specified below. 2265 When more than one parameter from the set (max-lsr, max-lps, 2266 max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the 2267 receiver MUST support all signaled capabilities 2268 simultaneously. For example, if both max-lsr and max-br are 2269 present, the highest level with the extension of both the 2270 picture rate and bitrate is supported. That is, the receiver 2271 is able to decode bitstreams in which the luma sample rate is 2272 up to max-lsr (inclusive), the bitrate is up to max-br 2273 (inclusive), the coded picture buffer size is derived as 2274 specified in the semantics of the max-br parameter below, and 2275 the other properties comply with the highest level specified 2276 by max-recv-level-id. 2278 Informative note: When the OPTIONAL media type parameters 2279 are used to signal the properties of a bitstream, and max- 2280 lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc 2281 are not present, the values of profile-space, tier-flag, 2282 profile-id, profile-compatibility-indicator, interop- 2283 constraints, and level-id must always be such that the 2284 bitstream complies fully with the specified profile, tier, 2285 and level. 2287 max-lsr: 2288 The value of max-lsr is an integer indicating the maximum 2289 processing rate in units of luma samples per second. The max- 2290 lsr parameter signals that the receiver is capable of decoding 2291 video at a higher rate than is required by the highest level. 2293 When max-lsr is signaled, the receiver MUST be able to decode 2294 bitstreams that conform to the highest level, with the 2295 exception that the MaxLumaSR value in Table A-2 of [HEVC] for 2296 the highest level is replaced with the value of max-lsr. 2297 Senders MAY use this knowledge to send pictures of a given 2298 size at a higher picture rate than is indicated in the highest 2299 level. 2301 When not present, the value of max-lsr is inferred to be equal 2302 to the value of MaxLumaSR given in Table A-2 of [HEVC] for the 2303 highest level. 2305 The value of max-lsr MUST be in the range of MaxLumaSR to 2306 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table 2307 A-2 of [HEVC] for the highest level. 2309 max-lps: 2310 The value of max-lps is an integer indicating the maximum 2311 picture size in units of luma samples. The max-lps parameter 2312 signals that the receiver is capable of decoding larger 2313 picture sizes than are required by the highest level. When 2314 max-lps is signaled, the receiver MUST be able to decode 2315 bitstreams that conform to the highest level, with the 2316 exception that the MaxLumaPS value in Table A-1 of [HEVC] for 2317 the highest level is replaced with the value of max-lps. 2318 Senders MAY use this knowledge to send larger pictures at a 2319 proportionally lower picture rate than is indicated in the 2320 highest level. 2322 When not present, the value of max-lps is inferred to be equal 2323 to the value of MaxLumaPS given in Table A-1 of [HEVC] for the 2324 highest level. 2326 The value of max-lps MUST be in the range of MaxLumaPS to 2327 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table 2328 A-1 of [HEVC] for the highest level. 2330 max-cpb: 2331 The value of max-cpb is an integer indicating the maximum 2332 coded picture buffer size in units of CpbBrVclFactor bits for 2333 the VCL HRD parameters and in units of CpbBrNalFactor bits for 2334 the NAL HRD parameters, where CpbBrVclFactor and 2335 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- 2336 cpb parameter signals that the receiver has more memory than 2337 the minimum amount of coded picture buffer memory required by 2338 the highest level. When max-cpb is signaled, the receiver 2339 MUST be able to decode bitstreams that conform to the highest 2340 level, with the exception that the MaxCPB value in Table A-1 2341 of [HEVC] for the highest level is replaced with the value of 2342 max-cpb. Senders MAY use this knowledge to construct coded 2343 bitstreams with greater variation of bitrate than can be 2344 achieved with the MaxCPB value in Table A-1 of [HEVC]. 2346 When not present, the value of max-cpb is inferred to be equal 2347 to the value of MaxCPB given in Table A-1 of [HEVC] for the 2348 highest level. 2350 The value of max-cpb MUST be in the range of MaxCPB to 2351 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1 2352 of [HEVC] for the highest level. 2354 Informative note: The coded picture buffer is used in the 2355 hypothetical reference decoder (Annex C of HEVC). The use 2356 of the hypothetical reference decoder is recommended in 2357 HEVC encoders to verify that the produced bitstream 2358 conforms to the standard and to control the output bitrate. 2359 Thus, the coded picture buffer is conceptually independent 2360 of any other potential buffers in the receiver, including 2361 de-packetization and de-jitter buffers. The coded picture 2362 buffer need not be implemented in decoders as specified in 2363 Annex C of HEVC, but rather standard-compliant decoders can 2364 have any buffering arrangements provided that they can 2365 decode standard-compliant bitstreams. Thus, in practice, 2366 the input buffer for a video decoder can be integrated with 2367 de-packetization and de-jitter buffers of the receiver. 2369 max-dpb: 2370 The value of max-dpb is an integer indicating the maximum 2371 decoded picture buffer size in units decoded pictures at the 2372 MaxLumaPS for the highest level, i.e. the number of decoded 2373 pictures at the maximum picture size defined by the highest 2374 level. The value of max-dpb MUST be in the range of 1 to 16, 2375 respectively. The max-dpb parameter signals that the receiver 2376 has more memory than the minimum amount of decoded picture 2377 buffer memory required by default, which is MaxDpbPicBuf as 2378 defined in [HEVC] (equal to 6). When max-dpb is signaled, the 2379 receiver MUST be able to decode bitstreams that conform to the 2380 highest level, with the exception that the MaxDpbPicBuff value 2381 defined in [HEVC] as 6 is replaced with the value of max-dpb. 2382 Consequently, a receiver that signals max-dpb MUST be capable 2383 of storing the following number of decoded pictures 2384 (MaxDpbSize) in its decoded picture buffer: 2386 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2387 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2388 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2389 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2390 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) 2391 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2392 else 2393 MaxDpbSize = max-dpb 2395 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest 2396 level and PicSizeInSamplesY is the current size of each 2397 decoded picture in units of luma samples as defined in [HEVC]. 2399 The value of max-dpb MUST be greater than or equal to the 2400 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders 2401 MAY use this knowledge to construct coded bitstreams with 2402 improved compression. 2404 When not present, the value of max-dpb is inferred to be equal 2405 to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2407 Informative note: This parameter was added primarily to 2408 complement a similar codepoint in the ITU-T Recommendation 2409 H.245, so as to facilitate signaling gateway designs. The 2410 decoded picture buffer stores reconstructed samples. There 2411 is no relationship between the size of the decoded picture 2412 buffer and the buffers used in RTP, especially de- 2413 packetization and de-jitter buffers. 2415 max-br: 2416 The value of max-br is an integer indicating the maximum video 2417 bitrate in units of CpbBrVclFactor bits per second for the VCL 2418 HRD parameters and in units of CpbBrNalFactor bits per second 2419 for the NAL HRD parameters, where CpbBrVclFactor and 2420 CpbBrNalFactor are defined in Section A.4 of [HEVC]. 2422 The max-br parameter signals that the video decoder of the 2423 receiver is capable of decoding video at a higher bitrate than 2424 is required by the highest level. 2426 When max-br is signaled, the video codec of the receiver MUST 2427 be able to decode bitstreams that conform to the highest 2428 level, with the following exceptions in the limits specified 2429 by the highest level: 2431 o The value of max-br replaces the MaxBR value in Table A-2 2432 of [HEVC] for the highest level. 2433 o When the max-cpb parameter is not present, the result of 2434 the following formula replaces the value of MaxCPB in Table 2435 A-1 of [HEVC]: 2437 (MaxCPB of the highest level) * max-br / (MaxBR of the 2438 highest level) 2440 For example, if a receiver signals capability for Main profile 2441 Level 2 with max-br equal to 2000, this indicates a maximum 2442 video bitrate of 2000 kbits/sec for VCL HRD parameters, a 2443 maximum video bitrate of 2200 kbits/sec for NAL HRD 2444 parameters, and a CPB size of 2000000 bits (2000000 / 1500000 2445 * 1500000). 2447 Senders MAY use this knowledge to send higher bitrate video as 2448 allowed in the level definition of Annex A of HEVC to achieve 2449 improved video quality. 2451 When not present, the value of max-br is inferred to be equal 2452 to the value of MaxBR given in Table A-2 of [HEVC] for the 2453 highest level. 2455 The value of max-br MUST be in the range of MaxBR to 2456 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2457 [HEVC] for the highest level. 2459 Informative note: This parameter was added primarily to 2460 complement a similar codepoint in the ITU-T Recommendation 2461 H.245, so as to facilitate signaling gateway designs. The 2462 assumption that the network is capable of handling such 2463 bitrates at any given time cannot be made from the value of 2464 this parameter. In particular, no conclusion can be drawn 2465 that the signaled bitrate is possible under congestion 2466 control constraints. 2468 max-tr: 2469 The value of max-tr is an integer indication the maximum 2470 number of tile rows. The max-tr parameter signals that the 2471 receiver is capable of decoding video with a larger number of 2472 tile rows than the value allowed by the highest level. 2474 When max-tr is signaled, the receiver MUST be able to decode 2475 bitstreams that conform to the highest level, with the 2476 exception that the MaxTileRows value in Table A-1 of [HEVC] 2477 for the highest level is replaced with the value of max-tr. 2479 Senders MAY use this knowledge to send pictures utilizing a 2480 larger number of tile rows than the value allowed by the 2481 highest level. 2483 When not present, the value of max-tr is inferred to be equal 2484 to the value of MaxTileRows given in Table A-1 of [HEVC] for 2485 the highest level. 2487 The value of max-tr MUST be in the range of MaxTileRows to 2488 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2489 Table A-1 of [HEVC] for the highest level. 2491 max-tc: 2492 The value of max-tc is an integer indication the maximum 2493 number of tile columns. The max-tc parameter signals that the 2494 receiver is capable of decoding video with a larger number of 2495 tile columns than the value allowed by the highest level. 2497 When max-tc is signaled, the receiver MUST be able to decode 2498 bitstreams that conform to the highest level, with the 2499 exception that the MaxTileCols value in Table A-1 of [HEVC] 2500 for the highest level is replaced with the value of max-tc. 2502 Senders MAY use this knowledge to send pictures utilizing a 2503 larger number of tile columns than the value allowed by the 2504 highest level. 2506 When not present, the value of max-tc is inferred to be equal 2507 to the value of MaxTileCols given in Table A-1 of [HEVC] for 2508 the highest level. 2510 The value of max-tc MUST be in the range of MaxTileCols to 2511 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2512 Table A-1 of [HEVC] for the highest level. 2514 max-fps: 2516 The value of max-fps is an integer indicating the maximum 2517 picture rate in units of pictures per 100 seconds that can be 2518 effectively processed by the receiver. The max-fps parameter 2519 MAY be used to signal that the receiver has a constraint in 2520 that it is not capable of processing video effectively at the 2521 full picture rate that is implied by the highest level and, 2522 when present, one or more of the parameters max-lsr, max-lps, 2523 and max-br. 2525 The value of max-fps is not necessarily the picture rate at 2526 which the maximum picture size can be sent, it constitutes a 2527 constraint on maximum picture rate for all resolutions. 2529 Informative note: The max-fps parameter is semantically 2530 different from max-lsr, max-lps, max-cpb, max-dpb, max-br, 2531 max-tr, and max-tc in that max-fps is used to signal a 2532 constraint, lowering the maximum picture rate from what is 2533 implied by other parameters. 2535 The encoder MUST use a picture rate equal to or less than this 2536 value. In cases where the max-fps parameter is absent the 2537 encoder is free to choose any picture rate according to the 2538 highest level and any signaled optional parameters. 2540 The value of max-fps MUST be smaller than or equal to the full 2541 picture rate that is implied by the highest level and, when 2542 present, one or more of the parameters max-lsr, max-lps, and 2543 max-br. 2545 sprop-max-don-diff: 2547 The value of this parameter MUST be equal to 0, if the RTP 2548 stream does not depend on other RTP streams and there is no 2549 NAL unit naluA that is followed in transmission order by any 2550 NAL unit preceding naluA in decoding order. Otherwise, this 2551 parameter specifies the maximum absolute difference between 2552 the decoding order number (i.e., AbsDon) values of any two NAL 2553 units naluA and naluB, where naluA follows naluB in decoding 2554 order and precedes naluB in transmission order. 2556 The value of sprop-max-don-diff MUST be an integer in the 2557 range of 0 to 32767, inclusive. 2559 When not present, the value of sprop-max-don-diff is inferred 2560 to be equal to 0. 2562 When the RTP stream depends on one or more other RTP streams 2563 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2564 use), this parameter MUST be present and the value MUST be 2565 greater than 0. 2567 Informative note: When the RTP stream does not depend on 2568 other RTP streams, either MSM or SSM may be in use. 2570 sprop-depack-buf-nalus: 2572 This parameter specifies the maximum number of NAL units that 2573 precede a NAL unit in transmission order and follow the NAL 2574 unit in decoding order. 2576 The value of sprop-depack-buf-nalus MUST be an integer in the 2577 range of 0 to 32767, inclusive. 2579 When not present, the value of sprop-depack-buf-nalus is 2580 inferred to be equal to 0. 2582 When the RTP stream depends on one or more other RTP streams 2583 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2584 use), this parameter MUST be present and the value MUST be 2585 greater than 0. 2587 sprop-depack-buf-bytes: 2589 This parameter signals the required size of the de- 2590 packetization buffer in units of bytes. The value of the 2591 parameter MUST be greater than or equal to the maximum buffer 2592 occupancy (in units of bytes) of the de-packetization buffer 2593 as specified in section 6. 2595 The value of sprop-depack-buf-bytes MUST be an integer in the 2596 range of 0 to 4294967295, inclusive. 2598 When the RTP stream depends on one or more other RTP streams 2599 (in this case tx-mode MUST be equal to "MSM" and MSM is in 2600 use) or sprop-max-don-diff is present and greater than 0, this 2601 parameter MUST be present and the value MUST be greater than 2602 0. 2604 Informative note: The value of sprop-depack-buf-bytes 2605 indicates the required size of the de-packetization buffer 2606 only. When network jitter can occur, an appropriately 2607 sized jitter buffer has to be available as well. 2609 depack-buf-cap: 2611 This parameter signals the capabilities of a receiver 2612 implementation and indicates the amount of de-packetization 2613 buffer space in units of bytes that the receiver has available 2614 for reconstructing the NAL unit decoding order from NAL units 2615 carried in one or more RTP streams. A receiver is able to 2616 handle any RTP stream, and all RTP streams the RTP stream 2617 depends on, when present, for which the value of the sprop- 2618 depack-buf-bytes parameter is smaller than or equal to this 2619 parameter. 2621 When not present, the value of depack-buf-cap is inferred to 2622 be equal to 4294967295. The value of depack-buf-cap MUST be 2623 an integer in the range of 1 to 4294967295, inclusive. 2625 Informative note: depack-buf-cap indicates the maximum 2626 possible size of the de-packetization buffer of the 2627 receiver only. When network jitter can occur, an 2628 appropriately sized jitter buffer has to be available as 2629 well. 2631 sprop-segmentation-id: 2633 This parameter MAY be used to signal the segmentation tools 2634 present in the bitstream and that can be used for 2635 parallelization. The value of sprop-segmentation-id MUST be 2636 an integer in the range of 0 to 3, inclusive. When not 2637 present, the value of sprop-segmentation-id is inferred to be 2638 equal to 0. 2640 When sprop-segmentation-id is equal to 0, no information about 2641 the segmentation tools is provided. When sprop-segmentation- 2642 id is equal to 1, it indicates that slices are present in the 2643 bitstream. When sprop-segmentation-id is equal to 2, it 2644 indicates that tiles are present in the bitstream. When 2645 sprop-segmentation-id is equal to 3, it indicates that WPP is 2646 used in the bitstream. 2648 sprop-spatial-segmentation-idc: 2650 A base16 [RFC4648] representation of the syntax element 2651 min_spatial_segmentation_idc as specified in [HEVC]. This 2652 parameter MAY be used to describe parallelization capabilities 2653 of the bitstream. 2655 dec-parallel-cap: 2657 This parameter MAY be used to indicate the decoder's 2658 additional decoding capabilities given the presence of tools 2659 enabling parallel decoding, such as slices, tiles, and WPP, in 2660 the bitstream. The decoding capability of the decoder may 2661 vary with the setting of the parallel decoding tools present 2662 in the bitstream, e.g. the size of the tiles that are present 2663 in a bitstream. Therefore, multiple capability points may be 2664 provided, each indicating the minimum required decoding 2665 capability that is associated with a parallelism requirement, 2666 which is a requirement on the bitstream that enables parallel 2667 decoding. 2669 Each capability point is defined as a combination of 1) a 2670 parallelism requirement, 2) a profile (determined by profile- 2671 space and profile-id), 3) a highest level, and 4) a maximum 2672 processing rate, a maximum picture size, and a maximum video 2673 bitrate that may be equal to or greater than that determined 2674 by the highest level. The parameter's syntax in ABNF 2675 [RFC5234] is as follows: 2677 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2678 cap-point) "}" 2680 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2681 cap-parameter) 2683 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2684 cap-parameter = tier-flag / level-id / max-lsr 2685 / max-lps / max-br 2687 tier-flag = "tier-flag" EQ ("0" / "1") 2689 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2691 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2692 18,446,744,073,709,551,615) 2694 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2696 max-br = "max-br" EQ 1*20DIGIT ; (0- 2697 18,446,744,073,709,551,615) 2699 EQ = "=" 2701 The set of capability points expressed by the dec-parallel-cap 2702 parameter is enclosed in a pair of curly braces ("{}"). Each 2703 set of two consecutive capability points is separated by a 2704 comma (','). Within each capability point, each set of two 2705 consecutive parameters, and when present, their values, is 2706 separated by a semicolon (';'). 2708 The profile of all capability points is determined by profile- 2709 space and profile-id that are outside the dec-parallel-cap 2710 parameter. 2712 Each capability point starts with an indication of the 2713 parallelism requirement, which consists of a parallel tool 2714 type, which may be equal to 'w' or 't', and a decimal value of 2715 the spatial-seg-idc parameter. When the type is 'w', the 2716 capability point is valid only for H.265 bitstreams with WPP 2717 in use, i.e. entropy_coding_sync_enabled_flag equal to 1. 2718 When the type is 't', the capability point is valid only for 2719 H.265 bitstreams with WPP not in use (i.e. 2720 entropy_coding_sync_enabled_flag equal to 0). The capability- 2721 point is valid only for H.265 bitstreams with 2722 min_spatial_segmentation_idc equal to or greater than spatial- 2723 seg-idc. 2725 After the parallelism requirement indication, each capability 2726 point continues with one or more pairs of parameter and value 2727 in any order for any of the following parameters: 2729 o tier-flag 2730 o level-id 2731 o max-lsr 2732 o max-lps 2733 o max-br 2735 At most one occurrence of each of the above five parameters is 2736 allowed within each capability point. 2738 The values of dec-parallel-cap.tier-flag and dec-parallel- 2739 cap.level-id for a capability point indicate the highest level 2740 of the capability point. The values of dec-parallel-cap.max- 2741 lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for 2742 a capability point indicate the maximum processing rate in 2743 units of luma samples per second, the maximum picture size in 2744 units of luma samples, and the maximum video bitrate (in units 2745 of CpbBrVclFactor bits per second for the VCL HRD parameters 2746 and in units of CpbBrNalFactor bits per second for the NAL HRD 2747 parameters where CpbBrVclFactor and CpbBrNalFactor are defined 2748 in Section A.4 of [HEVC]). 2750 When not present, the value of dec-parallel-cap.tier-flag is 2751 inferred to be equal to the value of tier-flag outside the 2752 dec-parallel-cap parameter. When not present, the value of 2753 dec-parallel-cap.level-id is inferred to be equal to the value 2754 of max-recv-level-id outside the dec-parallel-cap parameter. 2755 When not present, the value of dec-parallel-cap.max-lsr, dec- 2756 parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred 2757 to be equal to the value of max-lsr, max-lps, or max-br, 2758 respectively, outside the dec-parallel-cap parameter. 2760 The general decoding capability, expressed by the set of 2761 parameters outside of dec-parallel-cap, is defined as the 2762 capability point that is determined by the following 2763 combination of parameters: 1) the parallelism requirement 2764 corresponding to the value of sprop-segmentation-id equal to 0 2765 for a bitstream, 2) the profile determined by profile-space, 2766 profile-id, profile-compatibility-indicator, and interop- 2767 constraints, 3) the tier and the highest level determined by 2768 tier-flag and max-recv-level-id, and 4) the maximum processing 2769 rate, the maximum picture size, and the maximum video bitrate 2770 determined by the highest level. The general decoding 2771 capability MUST NOT be included as one of the set of 2772 capability points in the dec-parallel-cap parameter. 2774 For example, the following parameters express the general 2775 decoding capability of 720p30 (Level 3.1) plus an additional 2776 decoding capability of 1080p30 (Level 4) given that the 2777 spatially largest tile or slice used in the bitstream is equal 2778 to or less than 1/3 of the picture size: 2780 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} 2782 For another example, the following parameters express an 2783 additional decoding capability of 1080p30, using dec-parallel- 2784 cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is 2785 used in the bitstream: 2787 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2788 max-lsr=62668800;max-lps=2088960} 2790 Informative note: When min_spatial_segmentation_idc is 2791 present in a bitstream and WPP is not used, [HEVC] 2792 specifies that there is no slice or no tile in the 2793 bitstream containing more than 4 * PicSizeInSamplesY / 2794 ( min_spatial_segmentation_idc + 4 ) luma samples. 2796 include-dph: 2798 This parameter is used to indicate the capability and 2799 preference to utilize or include decoded picture hash (DPH) 2800 SEI messages (See Section D.3.19 of [HEVC]) in the bitstream. 2801 DPH SEI messages can be used to detect picture corruption so 2802 the receiver can request picture repair, see Section 8. The 2803 value is a comma separated list of hash types that is 2804 supported or requested to be used, each hash type provided as 2805 an unsigned integer value (0-255), with the hash types listed 2806 from most preferred to the least preferred. Example: 2808 "include-dph=0,2", which indicates the capability for MD5 2809 (most preferred) and Checksum (less preferred). If the 2810 parameter is not included or the value contains no hash types, 2811 then no capability to utilize DPH SEI messages is assumed. 2812 Note that DPH SEI messages MAY still be included in the 2813 bitstream even when there is no declaration of capability to 2814 use them, as in general SEI messages do not affect the 2815 normative decoding process and decoders are allowed to ignore 2816 SEI messages. 2818 Encoding considerations: 2820 This type is only defined for transfer via RTP (RFC 3550). 2822 Security considerations: 2824 See Section 9 of RFC XXXX. 2826 Public specification: 2828 Please refer to Section 13 of RFC XXXX. 2830 Additional information: None 2832 File extensions: none 2834 Macintosh file type code: none 2836 Object identifier or OID: none 2838 Person & email address to contact for further information: 2840 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 2842 Intended usage: COMMON 2844 Author: See Section 14 of RFC XXXX. 2846 Change controller: 2848 IETF Audio/Video Transport Payloads working group delegated 2849 from the IESG. 2851 7.2 SDP Parameters 2853 The receiver MUST ignore any parameter unspecified in this memo. 2855 7.2.1 Mapping of Payload Type Parameters to SDP 2857 The media type video/H265 string is mapped to fields in the Session 2858 Description Protocol (SDP) [RFC4566] as follows: 2860 o The media name in the "m=" line of SDP MUST be video. 2862 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the 2863 media subtype). 2865 o The clock rate in the "a=rtpmap" line MUST be 90000. 2867 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2868 flag", "level-id", "interop-constraints", "profile-compatibility- 2869 indicator", "sprop-sub-layer-id", "recv-sub-layer-id", "max-recv- 2870 level-id", "tx-mode", "max-lsr", "max-lps", "max-cpb", "max-dpb", 2871 "max-br", "max-tr", "max-tc", "max-fps", "sprop-max-don-diff", 2872 "sprop-depack-buf-nalus", "sprop-depack-buf-bytes", "depack-buf- 2873 cap", "sprop-segmentation-id", "sprop-spatial-segmentation-idc", 2874 "dec-parallel-cap", and "include-dph", when present, MUST be 2875 included in the "a=fmtp" line of SDP. This parameter is 2876 expressed as a media type string, in the form of a semicolon 2877 separated list of parameter=value pairs. 2879 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 2880 pps", when present, MUST be included in the "a=fmtp" line of SDP 2881 or conveyed using the "fmtp" source attribute as specified in 2882 section 6.3 of [RFC5576]. For a particular media format (i.e. 2883 RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST 2884 NOT be both included in the "a=fmtp" line of SDP and conveyed 2885 using the "fmtp" source attribute. When included in the "a=fmtp" 2886 line of SDP, these parameters are expressed as a media type 2887 string, in the form of a semicolon separated list of 2888 parameter=value pairs. When conveyed in the "a=fmtp" line of SDP 2889 for a particular payload type, the parameters "sprop-vps", 2890 "sprop-sps", and "sprop-pps" MUST be applied to each SSRC with 2891 the payload type. When conveyed using the "fmtp" source 2892 attribute, these parameters are only associated with the given 2893 source and payload type as parts of the "fmtp" source attribute. 2895 Informative note: Conveyance of "sprop-vps", "sprop-sps", and 2896 "sprop-pps" using the "fmtp" source attribute allows for out- 2897 of-band transport of parameter sets in topologies like Topo- 2898 Video-switch-MCU as specified in [RFC5117]. 2900 An example of media representation in SDP is as follows: 2902 m=video 49170 RTP/AVP 98 2903 a=rtpmap:98 H265/90000 2904 a=fmtp:98 profile-id=1; 2905 sprop-vps=