idnits 2.17.1 draft-ietf-payload-rtp-h265-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 165 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 3 instances of too long lines in the document, the longest one being 14 characters in excess of 72. ** The abstract seems to contain references ([HEVC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 27 has weird spacing: '... at any ti...' == Line 30 has weird spacing: '... The list ...' == Line 45 has weird spacing: '...fo) in effec...' == Line 46 has weird spacing: '...ication of t...' == Line 47 has weird spacing: '...ly, as they ...' == (160 more instances...) -- The document date (September 6, 2013) is 3885 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3GP' is mentioned on line 266, but not defined -- Looks like a reference, but probably isn't: '0' on line 996 == Missing Reference: 'RFC5234' is mentioned on line 2133, but not defined == Missing Reference: 'RFC5117' is mentioned on line 2318, but not defined ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667) == Missing Reference: 'RFC2326' is mentioned on line 2543, but not defined ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826) == Missing Reference: 'RFC2974' is mentioned on line 2544, but not defined == Missing Reference: 'RFC5583' is mentioned on line 2594, but not defined == Missing Reference: 'RFC3551' is mentioned on line 2754, but not defined == Missing Reference: 'RFC3711' is mentioned on line 2754, but not defined == Missing Reference: 'RFC5124' is mentioned on line 2755, but not defined == Missing Reference: 'I-D.ietf-avt-srtp-not-mandatory' is mentioned on line 2757, but not defined == Missing Reference: 'I-D.ietf-avtcore-rtp-security-options' is mentioned on line 2764, but not defined == Missing Reference: 'RFC 3711' is mentioned on line 2780, but not defined == Missing Reference: 'RFC 3551' is mentioned on line 2804, but not defined == Unused Reference: 'RFC6051' is defined on line 2887, but no explicit reference was found in the text == Unused Reference: '3GPPFF' is defined on line 2927, but no explicit reference was found in the text == Unused Reference: 'RFC5109' is defined on line 2943, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) Summary: 6 errors (**), 0 flaws (~~), 24 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: March 2014 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 September 6, 2013 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-01.txt 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with 18 the provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on December 11, 2013. 38 Copyright and License Notice 40 Copyright (c) 2013 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with 48 respect to this document. Code Components extracted from this 49 document must include Simplified BSD License text as described in 50 Section 4.e of the Trust Legal Provisions and are provided without 51 warranty as described in the Simplified BSD License. 53 Abstract 55 This memo describes an RTP payload format for the video coding 56 standard ITU-T Recommendation H.265 and ISO/IEC International 57 Standard 23008-2, both also known as High Efficiency Video Coding 58 (HEVC) [HEVC], developed by the Joint Collaborative Team on Video 59 Coding (JCT-VC). The RTP payload format allows for packetization of 60 one or more Network Abstraction Layer (NAL) units in each RTP packet 61 payload, as well as fragmentation of a NAL unit into multiple RTP 62 packets. Furthermore, it supports transmission of an HEVC stream 63 over a single as well as multiple RTP flows. The payload format has 64 wide applicability in videoconferencing, Internet video streaming, 65 and high bit-rate entertainment-quality video, among others. 67 Table of Contents 69 Status of this Memo...............................................1 70 Abstract..........................................................3 71 Table of Contents.................................................3 72 1 . Introduction..................................................5 73 1.1 . Overview of the HEVC Codec...............................5 74 1.1.1 Coding-Tool Features..................................5 75 1.1.2 Systems and Transport Interfaces......................7 76 1.1.3 Parallel Processing Support..........................13 77 1.1.4 NAL Unit Header......................................15 78 1.2 . Overview of the Payload Format..........................17 79 2 . Conventions..................................................17 80 3 . Definitions and Abbreviations................................17 81 3.1 Definitions...............................................17 82 3.1.1 Definitions from the HEVC Specification..............18 83 3.1.2 Definitions Specific to This Memo....................19 84 3.2 Abbreviations.............................................20 85 4 . RTP Payload Format...........................................22 86 4.1 RTP Header Usage..........................................22 87 4.2 Payload Structures........................................23 88 4.3 Transmission Modes........................................24 89 4.4 Decoding Order Number.....................................25 90 4.5 Single NAL Unit Packets...................................27 91 4.6 Aggregation Packets (APs).................................27 92 4.7 Fragmentation Units (FUs).................................32 93 5 . Packetization Rules..........................................36 94 6 . De-packetization Process.....................................37 95 7 . Payload Format Parameters....................................38 96 7.1 Media Type Registration...................................39 97 7.2 SDP Parameters............................................52 98 7.2.1 Mapping of Payload Type Parameters to SDP............53 99 7.2.2 Usage with SDP Offer/Answer Model....................54 100 7.2.3 Usage in Declarative Session Descriptions............58 101 7.2.4 Dependency Signaling in Multi-Session Transmission...60 102 8 . Use with Feedback Messages...................................60 103 8.1 Definition of the SPLI Feedback Message...................62 104 8.2 Use of HEVC with the RPSI Feedback Message................63 105 8.3 Use of HEVC with the SPLI Feedback Message................63 106 9 . Security Considerations......................................63 107 10 . Congestion Control..........................................65 108 11 . IANA Consideration..........................................66 109 12 . Acknowledgements............................................66 110 13 . References..................................................66 111 13.1 Normative References.....................................66 112 13.2 Informative References...................................67 113 14 . Authors' Addresses..........................................68 115 1. Introduction 117 1.1. Overview of the HEVC Codec 119 High Efficiency Video Coding [HEVC], formally known as ITU-T 120 Recommendation H.265 and ISO/IEC International Standard 23008-2 was 121 ratified by ITU-T in April 2013 and reportedly provides significant 122 coding efficiency gains over H.264 [H.264]. 124 As both H.264 [H.264] and its RTP payload format [RFC6184] are 125 widely deployed and generally known in the relevant implementer 126 communities, frequently only the differences between those two 127 specifications are highlighted in non-normative, explanatory parts 128 of this memo. Basic familiarity with both specifications is assumed 129 for those parts. However, the normative parts of this memo do not 130 require study of H.264 or its RTP payload format. 132 H.264 and HEVC share a similar hybrid video codec design. 133 Conceptually, both technologies include a video coding layer (VCL), 134 which is often used to refer to the coding-tool features, and a 135 network abstraction layer (NAL), which is often used to refer to the 136 systems and transport interface aspects of the codecs. 138 1.1.1 Coding-Tool Features 140 Similarly to earlier hybrid-video-coding-based standards, including 141 H.264, the following basic video coding design is employed by HEVC. 142 A prediction signal is first formed either by intra or motion 143 compensated prediction, and the residual (the difference between the 144 original and the prediction) is then coded. The gains in coding 145 efficiency are achieved by redesigning and improving almost all 146 parts of the codec over earlier designs. In addition, HEVC includes 147 several tools to make the implementation on parallel architectures 148 easier. Below is a summary of HEVC coding-tool features. 150 Quad-tree block and transform structure 152 One of the major tools that contribute significantly to the coding 153 efficiency of HEVC is the usage of flexible coding blocks and 154 transforms, which are defined in a hierarchical quad-tree manner. 155 Unlike H.264, where the basic coding block is a macroblock of fixed 156 size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size 157 of 64x64. Each CTU can be divided into smaller units in a 158 hierarchical quad-tree manner and can represent smaller blocks down 159 to size 4x4. Similarly, the transforms used in HEVC can have 160 different sizes, starting from 4x4 and going up to 32x32. Utilizing 161 large blocks and transforms contribute to the major gain of HEVC, 162 especially at high resolutions. 164 Entropy coding 166 HEVC uses a single entropy coding engine, which is based on Context 167 Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two 168 distinct entropy coding engines. CABAC in HEVC shares many 169 similarities with CABAC of H.264, but contains several improvements. 170 Those include improvements in coding efficiency and lowered 171 implementation complexity, especially for parallel architectures. 173 In-loop filtering 175 H.264 includes an in-loop adaptive deblocking filter, where the 176 blocking artifacts around the transform edges in the reconstructed 177 picture are smoothed to improve the picture quality and compression 178 efficiency. In HEVC, a similar deblocking filter is employed but 179 with somewhat lower complexity. In addition, pictures undergo a 180 subsequent filtering operation called Sample Adaptive Offset (SAO), 181 which is a new design element in HEVC. SAO basically adds a pixel- 182 level offset in an adaptive manner and usually acts as a de-ringing 183 filter. It is observed that SAO improves the picture quality, 184 especially around sharp edges contributing substantially to visual 185 quality improvements of HEVC. 187 Motion prediction and coding 189 There have been a number of improvements in this area that are 190 summarized as follows. The first category is motion merge and 191 advanced motion vector prediction (AMVP) modes. The motion 192 information of a prediction block can be inferred from the spatially 193 or temporally neighboring blocks. This is similar to the DIRECT 194 mode in H.264 but includes new aspects to incorporate the flexible 195 quad-tree structure and methods to improve the parallel 196 implementations. In addition, the motion vector predictor can be 197 signaled for improved efficiency. The second category is high- 198 precision interpolation. The interpolation filter length is 199 increased to 8-tap from 6-tap, which improves the coding efficiency 200 but also comes with increased complexity. In addition, the 201 interpolation filter is defined with higher precision without any 202 intermediate rounding operations to further improve the coding 203 efficiency. 205 Intra prediction and intra coding 207 Compared to 8 intra prediction modes in H.264, HEVC supports angular 208 intra prediction with 33 directions. This increased flexibility 209 improves both objective coding efficiency and visual quality as the 210 edges can be better predicted and ringing artifacts around the edges 211 can be reduced. In addition, the reference samples are adaptively 212 smoothed based on the prediction direction. To avoid contouring 213 artifacts a new interpolative prediction generation is included to 214 improve the visual quality. Furthermore, discrete sine transform 215 (DST) is utilized instead of traditional discrete cosine transform 216 (DCT) for 4x4 intra transform blocks. 218 Other coding-tool features 220 HEVC includes some tools for lossless coding and efficient screen 221 content coding, such as skipping the transform for certain blocks. 222 These tools are particularly useful for example when streaming the 223 user-interface of a mobile device to a large display. 225 1.1.2 Systems and Transport Interfaces 227 HEVC inherited the basic systems and transport interfaces designs, 228 such as the NAL-unit-based syntax structure, the hierarchical syntax 229 and data unit structure from sequence-level parameter sets, multi- 230 picture-level or picture-level parameter sets, slice-level header 231 parameters, lower-level parameters, the supplemental enhancement 232 information (SEI) message mechanism, the hypothetical reference 233 decoder (HRD) based video buffering model, and so on. In the 234 following, a list of differences in these aspects compared to H.264 235 is summarized. 237 Video parameter set 239 A new type of parameter set, called video parameter set (VPS), was 240 introduced. For the first (2013) version of [HEVC], the video 241 parameter set NAL unit is required to be available prior to its 242 activation, while the information contained in the video parameter 243 set is not necessary for operation of the decoding process. For 244 future HEVC extensions, such as the 3D or scalable extensions, the 245 video parameter set is expected to include information necessary for 246 operation of the decoding process, e.g. decoding dependency or 247 information for reference picture set construction of enhancement 248 layers. The VPS provides a "big picture" of a bitstream, including 249 what types of operation points are provided, the profile, tier, and 250 level of the operation points, and some other high-level properties 251 of the bitstream that can be used as the basis for session 252 negotiation and content selection, etc. (see section 7.1). 254 Profile, tier and level 256 The profile, tier and level syntax structure that can be included in 257 both VPS and sequence parameter set (SPS) includes 12 bytes data to 258 describe the entire bitstream (including all temporally scalable 259 layers, which are referred to as sub-layers in the HEVC 260 specification), and can optionally include more profile, tier and 261 level information pertaining to individual temporally scalable 262 layers. The profile indicator indicates the "best viewed as" 263 profile when the bitstream conforms to multiple profiles, similar to 264 the major brand concept in the ISO base media file format (ISOBMFF) 265 [ISOBMFF] and file formats derived based on ISOBMFF, such as the 266 3GPP file format [3GP]. The profile, tier and level syntax 267 structure also includes the indications of whether the bitstream is 268 free of frame-packed content, whether the bitstream is free of 269 interlaced source content and free of field pictures, i.e., contains 270 only frame pictures of progressive source, such that clients/players 271 with no support of post-processing functionalities for handling of 272 frame-packed or interlaced source content or field pictures can 273 reject those bitstreams. 275 Bitstream and elementary stream 277 HEVC includes a definition of an elementary stream, which is new 278 compared to H.264. An elementary stream consists of a sequence of 279 one or more bitstreams. An elementary stream that consists of two 280 or more bitstreams has typically been formed by splicing together 281 two or more bitstreams (or parts thereof). When an elementary 282 stream contains more than one bitstream, the last NAL unit of the 283 last access unit of a bitstream (except the last bitstream in the 284 elementary stream) must contain an end of bitstream NAL unit and the 285 first access unit of the subsequent bitstream must be an intra 286 random access point (IRAP) access unit. This IRAP access unit may 287 be a clean random access (CRA), broken link access (BLA), or 288 instantaneous decoding refresh (IDR) access unit. 290 Random access support 292 HEVC includes signaling in NAL unit header, through NAL unit types, 293 of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, 294 namely IDR, CRA and BLA pictures are supported, wherein IDR pictures 295 are conventionally referred to as closed group-of-pictures (closed- 296 GOP) random access points, and CRA and BLA pictures are those 297 conventionally referred to as open-GOP random access points. BLA 298 pictures usually originate from splicing of two bitstreams or part 299 thereof at a CRA picture, e.g. during stream switching. To enable 300 better systems usage of IRAP pictures, altogether six different NAL 301 units are defined to signal the properties of the IRAP pictures, 302 which can be used to better match the stream access point (SAP) 303 types as defined in the ISOBMFF [ISOBMFF], which are utilized for 304 random access support in both 3GP-DASH [3GPDASH] and MPEG DASH 305 [MPEGDASH]. Pictures following an IRAP picture in decoding order 306 and preceding the IRAP picture in output order are referred to as 307 leading pictures associated with the IRAP picture. There are two 308 types of leading pictures, namely random access decodable leading 309 (RADL) pictures and random access skipped leading (RASL) pictures. 310 RADL pictures are decodable when the decoding started at the 311 associated IRAP picture, and RASL pictures are not decodable when 312 the decoding started at the associated IRAP picture and are usually 313 discarded. HEVC provides mechanisms to enable the specification of 314 conformance of bitstreams with RASL pictures being discarded, thus 315 to provide a standard-compliant way to enable systems components to 316 discard RASL pictures when needed. 318 Temporal scalability support 320 HEVC includes an improved support of temporal scalability, by 321 inclusion of the signaling of TemporalId in the NAL unit header, the 322 restriction that pictures of a particular temporal sub-layer cannot 323 be used for inter prediction reference by pictures of a higher 324 temporal sub-layer, the sub-bitstream extraction process, and the 325 requirement that each sub-bitstream extraction output be a 326 conforming bitstream. Media-aware network elements (MANEs) can 327 utilize the TemporalId in the NAL unit header for stream adaptation 328 purposes based on temporal scalability. 330 Temporal sub-layer switching support 332 HEVC specifies, through NAL unit types present in the NAL unit 333 header, the signaling of temporal sub-layer access (TSA) and 334 stepwise temporal sub-layer access (STSA). A TSA picture and 335 pictures following the TSA picture in decoding order do not use 336 pictures prior to the TSA picture in decoding order with TemporalId 337 greater than or equal to that of the TSA picture for inter 338 prediction reference. A TSA picture enables up-switching, at the 339 TSA picture, to the sub-layer containing the TSA picture or any 340 higher sub-layer, from the immediately lower sub-layer. An STSA 341 picture does not use pictures with the same TemporalId as the STSA 342 picture for inter prediction reference. Pictures following an STSA 343 picture in decoding order with the same TemporalId as the STSA 344 picture do not use pictures prior to the STSA picture in decoding 345 order with the same TemporalId as the STSA picture for inter 346 prediction reference. An STSA picture enables up-switching, at the 347 STSA picture, to the sub-layer containing the STSA picture, from the 348 immediately lower sub-layer. 350 Sub-layer reference or non-reference pictures 352 The concept and signaling of reference/non-reference pictures in 353 HEVC are different from H.264. In H.264, if a picture may be used 354 by any other picture for inter prediction reference, it is a 355 reference picture; otherwise it is a non-reference picture, and this 356 is signaled by two bits in the NAL unit header. In HEVC, a picture 357 is called a reference picture only when it is marked as "used for 358 reference". In addition, the concept of sub-layer reference picture 359 was introduced. If a picture may be used by another other picture 360 with the same TemporalId for inter prediction reference, it is a 361 sub-layer reference picture; otherwise it is a sub-layer non- 362 reference picture. Whether a picture is a sub-layer reference 363 picture or sub-layer non-reference picture is signaled through NAL 364 unit type values. 366 Extensibility 368 Besides the TemporalId in the NAL unit header, HEVC also includes 369 the signaling of a six-bit layer ID in the NAL unit header, which 370 must be equal to 0 for a single-layer bitstream. Extension 371 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice 372 headers, and so on. All these extension mechanisms enable future 373 extensions in a backward compatible manner, such that bitstreams 374 encoded according to potential future HEVC extensions can be fed to 375 then-legacy decoders (e.g. HEVC version 1 decoders) and the then- 376 legacy decoders can decode and output the base layer bitstream. 378 Bitstream extraction 380 HEVC includes a bitstream extraction process as an integral part of 381 the overall decoding process, as well as specification of the use of 382 the bitstream extraction process in description of bitstream 383 conformance tests as part of the hypothetical reference decoder 384 (HRD) specification. 386 Reference picture management 388 The reference picture management of HEVC, including reference 389 picture marking and removal from the decoded picture buffer (DPB) as 390 well as reference picture list construction (RPLC), differs from 391 that of H.264. Instead of the sliding window plus adaptive memory 392 management control operation (MMCO) based reference picture marking 393 mechanism in H.264, HEVC specifies a reference picture set (RPS) 394 based reference picture management and marking mechanism, and the 395 RPLC is consequently based on the RPS mechanism. A reference 396 picture set consists of a set of reference pictures associated with 397 a picture, consisting of all reference pictures that are prior to 398 the associated picture in decoding order, that may be used for inter 399 prediction of the associated picture or any picture following the 400 associated picture in decoding order. The reference picture set 401 consists of five lists of reference pictures; RefPicSetStCurrBefore, 402 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 403 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 404 RefPicSetLtCurr contain all reference pictures that may be used in 405 inter prediction of the current picture and that may be used in 406 inter prediction of one or more of the pictures following the 407 current picture in decoding order. RefPicSetStFoll and 408 RefPicSetLtFoll consist of all reference pictures that are not used 409 in inter prediction of the current picture but may be used in inter 410 prediction of one or more of the pictures following the current 411 picture in decoding order. RPS provides an "intra-coded" signaling 412 of the DPB status, instead of an "inter-coded" signaling, mainly for 413 improved error resilience. The RPLC process in HEVC is based on the 414 RPS, by signaling an index to an RPS subset for each reference 415 index. The RPLC process has been simplified compared to that in 416 H.264, by removal of the reference picture list modification (also 417 referred to as reference picture list reordering) process. 419 Ultra low delay support 421 HEVC specifies a sub-picture-level HRD operation, for support of the 422 so-called ultra-low delay. The mechanism specifies a standard- 423 compliant way to enable delay reduction below one picture interval. 424 Sub-picture-level coded picture buffer (CPB) and DPB parameters may 425 be signaled, and utilization of these information for the derivation 426 of CPB timing (wherein the CPB removal time corresponds to decoding 427 time) and DPB output timing (display time) is specified. Decoders 428 are allowed to operate the HRD at the conventional access-unit- 429 level, even when the sub-picture-level HRD parameters are present. 431 New SEI messages 433 HEVC inherits many H.264 SEI messages with changes in syntax and/or 434 semantics making them applicable to HEVC. Additionally, there are a 435 few new SEI messages reviewed briefly in the following paragraphs. 437 The structure of pictures SEI message provides information on the 438 NAL unit types, picture order count values, and prediction 439 dependencies of a sequence of pictures. The SEI message can be used 440 for example for concluding what impact a lost picture has on other 441 pictures. 443 The decoded picture hash SEI message provides a checksum derived 444 from the sample values of a decoded picture. It can be used for 445 detecting whether a picture was correctly received and decoded. 447 The active parameter sets SEI message includes the IDs of the active 448 video parameter set and the active sequence parameter set and can be 449 used to activate VPSs and SPSs. In addition, the SEI message 450 includes the following indications: 1) An indication of whether 451 "full random accessibility" is supported (when supported, all 452 parameter sets needed for decoding of the remaining of the bitstream 453 when random accessing from the beginning of the current coded video 454 sequence by completely discarding all access units earlier in 455 decoding order are present in the remaining bitstream and all coded 456 pictures in the remaining bitstream can be correctly decoded); 2) An 457 indication of whether there is no parameter set within the current 458 coded video sequence that updates another parameter set of the same 459 type preceding in decoding order. An update of a parameter set 460 refers to the use of the same parameter set ID but with some other 461 parameters changed. If this property is true for all coded video 462 sequences in the bitstream, then all parameter sets can be sent out- 463 of-band before session start. 465 The decoding unit information SEI message provides coded picture 466 buffer removal delay information for a decoding unit. The message 467 can be used in very-low-delay buffering operations. 469 The region refresh information SEI message can be used together with 470 the recovery point SEI message (present in both H.264 and HEVC) for 471 improved support of gradual decoding refresh (GDR). This supports 472 random access from inter-coded pictures, wherein complete pictures 473 can be correctly decoded or recovered after an indicated number of 474 pictures in output/display order. 476 1.1.3 Parallel Processing Support 478 The reportedly significantly higher encoding computational demand of 479 HEVC over H.264, in conjunction with the ever increasing video 480 resolution (both spatially and temporally) required by the market, 481 led to the adoption of VCL coding tools specifically targeted to 482 allow for parallelization on the sub-picture level. That is, 483 parallelization occurs, at the minimum, at the granularity of an 484 integer number of CTUs. The targets for this type of high-level 485 parallelization are multicore CPUs and DSPs as well as 486 multiprocessor systems. In a system design, to be useful, these 487 tools require signaling support, which is provided in Section 7 of 488 this memo. This section provides a brief overview of the tools 489 available in [HEVC]. 491 Many of the tools incorporated in HEVC were designed keeping in mind 492 the potential parallel implementations in multi-core/multi-processor 493 architectures. Specifically, for parallelization, four picture 494 partition strategies are available. 496 Slices are segments of the bitstream that can be reconstructed 497 independently from other slices within the same picture (though 498 there may still be interdependencies through loop filtering 499 operations). Slices are the only tool that can be used for 500 parallelization that is also available, in virtually identical form, 501 in H.264. Slices based parallelization does not require much inter- 502 processor or inter-core communication (except for inter-processor or 503 inter-core data sharing for motion compensation when decoding a 504 predictively coded picture, which is typically much heavier than 505 inter-processor or inter-core data sharing due to in-picture 506 prediction), as slices are designed to be independently decodable. 507 However, for the same reason, slices can require some coding 508 overhead. Further, slices (in contrast to some of the other tools 509 mentioned below) also serve as the key mechanism for bitstream 510 partitioning to match Maximum Transfer Unit (MTU) size requirements, 511 due to the in-picture independence of slices and the fact that each 512 regular slice is encapsulated in its own NAL unit. In many cases, 513 the goal of parallelization and the goal of MTU size matching can 514 place contradicting demands to the slice layout in a picture. The 515 realization of this situation led to the development of the more 516 advanced tools mentioned below. This payload format does not 517 contain any specific mechanisms aiding parallelization through 518 slices. 520 Dependent slice segments allow for fragmentation of a coded slice 521 into fragments at CTU boundaries without breaking any in-picture 522 prediction mechanism. They are complementary to the fragmentation 523 mechanism described in this memo in that they need the cooperation 524 of the encoder. As a dependent slice segment necessarily contains 525 an integer number of CTUs, a decoder using multiple cores operating 526 on CTUs can process a dependent slice segment without communicating 527 parts of the slice segment's bitstream to other cores. 528 Fragmentation, as specified in this memo, in contrast, does not 529 guarantee that a fragment contains an integer number of CTUs. 531 In wavefront parallel processing (WPP), the picture is partitioned 532 into rows of CTUs. Entropy decoding and prediction are allowed to 533 use data from CTUs in other partitions. Parallel processing is 534 possible through parallel decoding of CTU rows, where the start of 535 the decoding of a row is delayed by two CTUs, so to ensure that data 536 related to a CTU above and to the right of the subject CTU is 537 available before the subject CTU is being decoded. Using this 538 staggered start (which appears like a wavefront when represented 539 graphically), parallelization is possible with up to as many 540 processors/cores as the picture contains CTU rows. 542 Because in-picture prediction between neighboring CTU rows within a 543 picture is allowed, the required inter-processor/inter-core 544 communication to enable in-picture prediction can be substantial. 545 The WPP partitioning does not result in the creation of more NAL 546 units compared to when it is not applied, thus WPP cannot be used 547 for MTU size matching, though slices can be used in combination for 548 that purpose. 550 Tiles define horizontal and vertical boundaries that partition a 551 picture into tile columns and rows. The scan order of CTUs is 552 changed to be local within a tile (in the order of a CTU raster scan 553 of a tile), before decoding the top-left CTU of the next tile in the 554 order of tile raster scan of a picture. Similar to slices, tiles 555 break in-picture prediction dependencies (including entropy decoding 556 dependencies). However, they do not need to be included into 557 individual NAL units (same as WPP in this regard), hence tiles 558 cannot be used for MTU size matching, though slices can be used in 559 combination for that purpose. Each tile can be processed by one 560 processor/core, and the inter-processor/inter-core communication 561 required for in-picture prediction between processing units decoding 562 neighboring tiles is limited to conveying the shared slice header in 563 cases a slice is spanning more than one tile, and loop filtering 564 related sharing of reconstructed samples and metadata. Insofar, 565 tiles are less demanding in terms of inter-processor communication 566 bandwidth compared to WPP due to the in-picture independence between 567 two neighboring partitions. 569 1.1.4 NAL Unit Header 571 HEVC maintains the NAL unit concept of H.264 with modifications. 572 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 573 payload of a NAL unit refers to the NAL unit excluding the NAL unit 574 header. 576 +---------------+---------------+ 577 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 |F| Type | LayerId | TID | 580 +-------------+-----------------+ 582 Figure 1 The structure of HEVC NAL unit header 584 The semantics of the fields in the NAL unit header are as specified 585 in [HEVC] and described briefly below for convenience. In addition 586 to the name and size of each field, the corresponding syntax element 587 name in [HEVC] is also provided. 589 F: 1 bit 590 forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as 591 a syntax violation. Note that the inclusion of this bit in the 592 NAL unit header is to enable transport of HEVC video over MPEG-2 593 transport systems (avoidance of start code emulations) [MPEG2S]. 595 Type: 6 bits 596 nal_unit_type. This field specifies the NAL unit type as defined 597 in Table 7-1 of [HEVC]. If the most significant bit of this 598 field of a NAL unit is equal to 0 (i.e. the value of this field 599 is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the 600 NAL unit is a non-VCL NAL unit. For a reference of all currently 601 defined NAL unit types and their semantics, please refer to 602 Section 7.4.1 in [HEVC]. 604 LayerId: 6 bits 605 nuh_layer_id. MUST be equal to zero. It is anticipated that in 606 future scalable or 3D video coding extensions of this 607 specification, this syntax element will be used to identify 608 additional layers that may be present in the coded video 609 sequence, wherein a layer may be, e.g. a spatial scalable layer, 610 a quality scalable layer, a texture view, or a depth view. 612 TID: 3 bits 613 nuh_temporal_id_plus1. This field specifies the temporal 614 identifier of the NAL unit plus 1. The value of TemporalId is 615 equal to TID minus 1. A TID value of 0 is illegal to ensure that 616 there is at least one bit in the NAL unit header equal to 1, so 617 to enable independent considerations of start code emulations in 618 the NAL unit header and in the NAL unit payload data. 620 1.2. Overview of the Payload Format 622 This payload format defines the following processes required for 623 transport of HEVC coded data over RTP [RFC3550]: 625 o Usage of RTP header with this payload format 627 o Packetization of HEVC coded NAL units into RTP packets using three 628 types of payload structures, namely single NAL unit packet, 629 aggregation packet, and fragment unit 631 o Transmission of HEVC NAL units of the same bitstream within a 632 single RTP session or multiple RTP sessions 634 o Media type parameters to be used with the Session Description 635 Protocol (SDP) [RFC4566] 637 2. Conventions 639 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 640 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 641 document are to be interpreted as described in BCP 14, RFC 2119 642 [RFC2119]. 644 This specification uses the notion of setting and clearing a bit 645 when bit fields are handled. Setting a bit is the same as assigning 646 that bit the value of 1 (On). Clearing a bit is the same as 647 assigning that bit the value of 0 (Off). 649 3. Definitions and Abbreviations 651 3.1 Definitions 653 This document uses the terms and definitions of [HEVC]. Section 654 3.1.1 lists relevant definitions copied from [HEVC] for convenience. 655 Section 3.1.2 gives definitions specific to this memo. 657 3.1.1 Definitions from the HEVC Specification 659 access unit: A set of NAL units that are associated with each other 660 according to a specified classification rule, are consecutive in 661 decoding order, and contain exactly one coded picture. 663 BLA access unit: An access unit in which the coded picture is a BLA 664 picture. 666 BLA picture: An IRAP picture for which each VCL NAL unit has 667 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 669 coded video sequence: A sequence of access units that consists, in 670 decoding order, of an IRAP access unit with NoRaslOutputFlag equal 671 to 1, followed by zero or more access units that are not IRAP access 672 units with NoRaslOutputFlag equal to 1, including all subsequent 673 access units up to but not including any subsequent access unit that 674 is an IRAP access unit with NoRaslOutputFlag equal to 1. 676 Informative note: An IRAP access unit may be an IDR access unit, 677 a BLA access unit, or a CRA access unit. The value of 678 NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA 679 access unit, and each CRA access unit that is the first access 680 unit in the bitstream in decoding order, is the first access unit 681 that follows an end of sequence NAL unit in decoding order, or 682 has HandleCraAsBlaFlag equal to 1. 684 CRA access unit: An access unit in which the coded picture is a CRA 685 picture. 687 CRA picture: A RAP picture for which each VCL NAL unit has 688 nal_unit_type equal to CRA_NUT. 690 IDR access unit: An access unit in which the coded picture is an IDR 691 picture. 693 IDR picture: A RAP picture for which each VCL NAL unit has 694 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 696 IRAP access unit: An access unit in which the coded picture is an 697 IRAP picture. 699 IRAP picture: A coded picture for which each VCL NAL unit has 700 nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive. 702 layer: A set of VCL NAL units that all have a particular value of 703 nuh_layer_id and the associated non-VCL NAL units, or one of a set 704 of syntactical structures having a hierarchical relationship. 706 operation point: bitstream created from another bitstream by 707 operation of the sub-bitstream extraction process with the another 708 bitstream, a target highest TemporalId, and a target layer 709 identifier list as inputs. 711 random access: The act of starting the decoding process for a 712 bitstream at a point other than the beginning of the stream. 714 sub-layer: A temporal scalable layer of a temporal scalable 715 bitstream consisting of VCL NAL units with a particular value of the 716 TemporalId variable, and the associated non-VCL NAL units. 718 tile: A rectangular region of coding tree blocks within a particular 719 tile column and a particular tile row in a picture. 721 tile column: A rectangular region of coding tree blocks having a 722 height equal to the height of the picture and a width specified by 723 syntax elements in the picture parameter set. 725 tile row: A rectangular region of coding tree blocks having a height 726 specified by syntax elements in the picture parameter set and a 727 width equal to the width of the picture. 729 3.1.2 Definitions Specific to This Memo 731 media aware network element (MANE): A network element, such as a 732 middlebox or application layer gateway that is capable of parsing 733 certain aspects of the RTP payload headers or the RTP payload and 734 reacting to their contents. 736 Informative note: The concept of a MANE goes beyond normal 737 routers or gateways in that a MANE has to be aware of the 738 signaling (e.g., to learn about the payload type mappings of the 739 media streams), and in that it has to be trusted when working 740 with SRTP. The advantage of using MANEs is that they allow 741 packets to be dropped according to the needs of the media coding. 742 For example, if a MANE has to drop packets due to congestion on a 743 certain link, it can identify and remove those packets whose 744 elimination produces the least adverse effect on the user 745 experience. After dropping packets, MANEs must rewrite RTCP 746 packets to match the changes to the RTP packet stream as 747 specified in Section 7 of [RFC3550]. 749 NAL unit decoding order: A NAL unit order that conforms to the 750 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 752 NALU-time: The value that the RTP timestamp would have if the NAL 753 unit would be transported in its own RTP packet. 755 RTP packet stream: A sequence of RTP packets with increasing 756 sequence numbers (except for wrap-around), identical PT and 757 identical SSRC (Synchronization Source), carried in one RTP session. 758 Within the scope of this memo, one RTP packet stream is utilized to 759 transport one or more temporal sub-layers. 761 transmission order: The order of packets in ascending RTP sequence 762 number order (in modulo arithmetic). Within an aggregation packet, 763 the NAL unit transmission order is the same as the order of 764 appearance of NAL units in the packet. 766 base session: an RTP session in Multi-Session Transmission mode that 767 transports a bitstream subset which the rest of RTP sessions in the 768 Multi-Session Transmission depends on. [Ed. (YK): Check the need of 769 this definition after the draft is more complete.] 771 3.2 Abbreviations 773 AP Aggregation Packet 775 BLA Broken Link Access 777 CRA Clean Random Access 779 CTB Coding Tree Block 781 CTU Coding Tree Unit 783 CVS Coded Video Sequence 785 FU Fragmentation Unit 787 GDR Gradual Decoding Refresh 789 HRD Hypothetical Reference Decoder 791 IDR Instantaneous Decoding Refresh 793 IRAP Intra Random Access Point 795 MANE Media Aware Network Element 797 MST Multi-Session Transmission 799 MTU Maximum Transfer Unit 801 NAL Network Abstraction Layer 802 NALU Network Abstraction Layer Unit 804 PPS Picture Parameter Set 806 RADL Random Access Decodable Leading (Picture) 808 RASL Random Access Skipped Leading (Picture) 810 RPS Reference Picture Set 812 SEI Supplemental Enhancement Information 814 SPS Sequence Parameter Set 816 SST Single-Session Transmission 818 STSA Step-wise Temporal Sub-layer Access 820 TSA Temporal Sub-layer Access 822 VCL Video Coding Layer 824 VPS Video Parameter Set 826 4. RTP Payload Format 828 4.1 RTP Header Usage 830 The format of the RTP header is specified in [RFC3550] and reprinted 831 in Figure 2 for convenience. This payload format uses the fields of 832 the header in a manner consistent with that specification. 834 The RTP payload (and the settings for some RTP header bits) for 835 aggregation packets and fragmentation units are specified in 836 Sections 4.7 and 4.8, respectively. 838 0 1 2 3 839 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 840 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 841 |V=2|P|X| CC |M| PT | sequence number | 842 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 843 | timestamp | 844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 845 | synchronization source (SSRC) identifier | 846 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 847 | contributing source (CSRC) identifiers | 848 | .... | 849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 851 Figure 2 RTP header according to [RFC3550] 853 The RTP header information to be set according to this RTP payload 854 format is set as follows: 856 Marker bit (M): 1 bit 858 Set for the last packet of the access unit indicated by the RTP 859 timestamp, in line with the normal use of the M bit in video 860 formats, to allow an efficient playout buffer handling. Decoders 861 can use this bit as an early indication of the last packet of an 862 access unit. 864 Informative note: The content of a NAL unit does not tell 865 whether or not the NAL unit is the last NAL unit, in decoding 866 order, of an access unit. An RTP sender implementation may 867 obtain this information from the video encoder. If, however, 868 the implementation cannot obtain this information directly 869 from the encoder, e.g., when the stream was pre-encoded, and 870 also there is no timestamp allocated for each NAL unit, then 871 the sender implementation can inspect subsequent NAL units in 872 decoding order to determine whether or not the NAL unit is the 873 last NAL unit of an access unit as follows. A NAL unit naluX 874 is the last NAL unit of an access unit if it is the last NAL 875 unit of the stream or the next VCL NAL unit naluY in decoding 876 order has the high-order bit of the first byte after its NAL 877 unit header equal to 1, and all NAL units between naluX and 878 naluY, when present, have nal_unit_type in the range of 32 to 879 35, inclusive, equal to 39, or in the ranges of 41 to 44, 880 inclusive, or 48 to 55, inclusive. 882 Payload type (PT): 7 bits 884 The assignment of an RTP payload type for this new packet format 885 is outside the scope of this document and will not be specified 886 here. The assignment of a payload type has to be performed 887 either through the profile used or in a dynamic way. 889 Sequence number (SN): 16 bits 891 Set and used in accordance with RFC 3550. 893 Timestamp: 32 bits 895 The RTP timestamp is set to the sampling timestamp of the 896 content. A 90 kHz clock rate MUST be used. 898 If the NAL unit has no timing properties of its own (e.g., 899 parameter set and SEI NAL units), the RTP timestamp is set to the 900 RTP timestamp of the coded picture of the access unit in which 901 the NAL unit is included, according to Section 7.4.2.4.4 of 902 [HEVC]. 904 Receivers SHOULD ignore the picture output timing information in 905 any picture timing SEI messages or decoding unit information SEI 906 messages as specified in [HEVC]. Instead, receivers SHOULD use 907 the RTP timestamp for the display process. Receivers MUST pass 908 picture timing SEI messages and decoding unit information SEI 909 messages to the decoder and MAY use the field/frame related 910 information for the display process e.g. when frame doubling or 911 frame tripling is indicated by the field/frame related 912 information. 914 4.2 Payload Header Usage 916 The TID value indicates (among other things) the relative importance 917 of an RTP packet, for example because NAL units belonging to higher 918 temporal sub-layers are not used for the decoding of lower temporal 919 sub-layers. A lower value of TID indicates a higher importance. 921 More important NAL units MAY be better protected against 922 transmission losses than less important NAL units. 924 4.3 Payload Structures 926 The first two bytes of the payload of an RTP packet are referred to 927 as the payload header. The payload header consists of the same 928 fields (F, Type, LayerId, and TID) as the NAL unit header as shown 929 in section 1.1.4, irrespective of the type of the payload structure. 931 Three different types of RTP packet payload structures are 932 specified. A receiver can identify the type of an RTP packet 933 payload through the Type field in the payload header. 935 The three different payload structures are as follows: 937 o Single NAL unit packet: Contains a single NAL unit in the 938 payload, and the NAL unit header of the NAL unit also serves as 939 the payload header. This payload structure is specified in 940 section 4.6. 942 o Aggregation packet (AP): Contains more than one NAL unit within 943 one access unit. This payload structure is specified in 944 section 4.7. 946 o Fragmentation unit (FU): Contains a subset of a single NAL unit. 947 This payload structure is specified in section 4.8. 949 4.4 Transmission Modes 951 This memo enables transmission of an HEVC bitstream over a single 952 RTP session or multiple RTP sessions. The concept and working 953 principle is inherited from [RFC6190] and follows a similar design. 954 If only one RTP session is used for transmission of the HEVC 955 bitstream, the transmission mode is referred to as single-session 956 transmission (SST); otherwise (more than one RTP session is used for 957 transmission of the HEVC bitstream), the transmission mode is 958 referred to as multi-session transmission (MST). 960 [Ed. (YK): Unify the style of abbreviated words throughout the 961 document.] 962 SST SHOULD be used for point-to-point unicast scenarios, while MST 963 SHOULD be used for point-to-multipoint multicast scenarios where 964 different receivers require different operation points of the same 965 HEVC bitstream, to improve bandwidth utilizing efficiency. 967 Informative note: A multicast may degrade to a unicast after all 968 but one receivers have left (this is a justification of the first 969 "SHOULD" instead of "MUST"), and there might be scenarios where 970 MST is desirable but not possible e.g. when IP multicast is not 971 deployed in certain network (this is a justification of the 972 second "SHOULD" instead of "MUST"). 974 The transmission mode is indicated by the tx-mode media parameter 975 (see section 7.1). If tx-mode is equal to "SST", SST MUST be used. 976 Otherwise (tx-mode is equal to "MST"), MST MUST be used. 978 4.5 Decoding Order Number 980 For each NAL unit, the variable AbsDon is derived, representing the 981 decoding order number that is indicative of the NAL unit decoding 982 order. 984 Let NAL unit n be the n-th NAL unit in transmission order within an 985 RTP session. 987 If tx-mode is equal to "SST" and sprop-depack-buf-nalus is equal 988 to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as 989 equal to n. 991 Otherwise (tx-mode is equal to "MST" or sprop-depack-buf-nalus is 992 greater than 0), AbsDon[n] is derived as follows, where DON[n] is 993 the value of the variable DON for NAL unit n: 995 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in 996 transmission order), AbsDon[0] is set equal to DON[0]. 998 o Otherwise (n is greater than 0), the following applies for 999 derivation of AbsDon[n]: 1001 If DON[n] == DON[n-1], 1002 AbsDon[n] = AbsDon[n-1] 1004 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1005 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1007 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1008 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1010 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1011 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) 1013 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1014 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1016 For any two NAL units m and n, the following applies: 1018 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1019 follows NAL unit m in NAL unit decoding order. 1021 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order 1022 of the two NAL units can be in either order. 1024 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes 1025 NAL unit m in decoding order. 1027 When two consecutive NAL units in the NAL unit decoding order have 1028 different values of AbsDon, the value of AbsDon for the second NAL 1029 unit in decoding order MUST be greater than the value of AbsDon for 1030 the first NAL unit, and the absolute difference between the two 1031 AbsDon values MAY be greater than or equal to 1. 1033 Informative note: There are multiple reasons to allow for the 1034 absolute difference of the values of AbsDon for two consecutive 1035 NAL units in the NAL unit decoding order to be greater than one. 1036 An increment by one is not required, as at the time of 1037 associating values of AbsDon to NAL units, it may not be known 1038 whether all NAL units are to be delivered to the receiver. For 1039 example, a gateway may not forward VCL NAL units of higher sub- 1040 layers or some SEI NAL units when there is congestion in the 1041 network. In another example, the first intra picture of a pre- 1042 encoded clip is transmitted in advance to ensure that it is 1043 readily available in the receiver, and when transmitting the 1044 first intra picture, the originator does not exactly know how 1045 many NAL units will be encoded before the first intra picture of 1046 the pre-encoded clip follows in decoding order. Thus, the values 1047 of AbsDon for the NAL units of the first intra picture of the 1048 pre-encoded clip have to be estimated when they are transmitted, 1049 and gaps in values of AbsDon may occur. Another example is MST 1050 where the AbsDon values must indicate cross-layer decoding order 1051 for NAL units conveyed in all the RTP sessions. 1053 4.6 Single NAL Unit Packets 1055 A single NAL unit packet contains exactly one NAL unit, and consists 1056 of a payload header (denoted as PayloadHdr), an optional 16-bit DONL 1057 field (in network byte order), and the NAL unit payload data (the 1058 NAL unit excluding its NAL unit header) of the contained NAL unit, 1059 as shown in Figure 3. 1061 0 1 2 3 1062 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1063 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1064 | PayloadHdr | DONL (optional) | 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1066 | | 1067 | NAL unit payload data | 1068 | | 1069 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1070 | :...OPTIONAL RTP padding | 1071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 Figure 3 The structure a single NAL unit packet 1075 The payload header SHOULD be an exact copy of the NAL unit header of 1076 the contained NAL unit. However, the Type (i.e. nal_unit_type) 1077 field MAY be changed, e.g. when it is desirable to handle a CRA 1078 picture to be a BLA picture [JCTVC-J0107]. 1080 The DONL field, when present, specifies the value of the 16 least 1081 significant bits of the decoding order number of the contained NAL 1082 unit. 1084 If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater 1085 than 0, the DONL field MUST be present, and the variable DON for the 1086 contained NAL unit is derived as equal to the value of the DONL 1087 field. Otherwise (tx-mode is equal to "SST" and sprop-depack-buf- 1088 nalus is equal to 0), the DONL field MUST NOT be present. 1090 4.7 Aggregation Packets (APs) 1092 Aggregation packets (APs) are introduced to enable the reduction of 1093 packetization overhead for small NAL units, such as most of the non- 1094 VCL NAL units, which are often only a few octets in size. 1096 An AP aggregates NAL units within one access unit. Each NAL unit to 1097 be carried in an AP is encapsulated in an aggregation unit. NAL 1098 units aggregated in one AP are in NAL unit decoding order. 1100 An AP consists of a payload header (denoted as PayloadHdr) followed 1101 by two or more aggregation units, as shown in Figure 4. 1103 0 1 2 3 1104 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1106 | PayloadHdr | | 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1108 | | 1109 | one or more aggregation units | 1110 | | 1111 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1112 | :...OPTIONAL RTP padding | 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1115 Figure 4 The structure of an aggregation packet 1117 The fields in the payload header are set as follows. The F bit MUST 1118 be equal to 0 if the F bit of each aggregated NAL unit is equal to 1119 zero; otherwise, it MUST be equal to 1. The Type field MUST be 1120 equal to 48. The value of LayerId MUST be equal to the lowest value 1121 of LayerId of all the aggregated NAL units. The value of TID MUST 1122 be the lowest value of TID of all the aggregated NAL units. 1124 Informative Note: All VCL NAL units in an AP have the same TID 1125 value since they belong to the same access unit. However, an AP 1126 may contain non-VCL NAL units for which the TID value in the NAL 1127 unit header may be different than the TID value of the VCL NAL 1128 units in the same AP. 1130 An AP MUST carry at least two aggregation units and can carry as 1131 many aggregation units as necessary; however, the total amount of 1132 data in an AP obviously MUST fit into an IP packet, and the size 1133 SHOULD be chosen so that the resulting IP packet is smaller than the 1134 MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain 1135 Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be 1136 nested; i.e., an AP MUST NOT contain another AP. 1138 The first aggregation unit in an AP consists of an optional 16-bit 1139 DONL field (in network byte order) followed by a 16-bit unsigned 1140 size information (in network byte order) that indicates the size of 1141 the NAL unit in bytes (excluding these two octets, but including the 1142 NAL unit header), followed by the NAL unit itself, including its NAL 1143 unit header, as shown in Figure 5. 1145 0 1 2 3 1146 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1148 : DONL (optional) | NALU size | 1149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 | NALU size | | 1151 +-+-+-+-+-+-+-+-+ NAL unit | 1152 | | 1153 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1154 | : 1155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1157 Figure 5 The structure of the first aggregation unit in an AP 1159 The DONL field, when present, specifies the value of the 16 least 1160 significant bits of the decoding order number of the aggregated NAL 1161 unit. 1163 If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater 1164 than 0, the DONL field MUST be present in an aggregation unit that 1165 is the first aggregation unit in an AP, and the variable DON for the 1166 aggregated NAL unit is derived as equal to the value of the DONL 1167 field. Otherwise (tx-mode is equal to "SST" and sprop-depack-buf- 1168 nalus is equal to 0), the DONL field MUST NOT be present in an 1169 aggregation unit that is the first aggregation unit in an AP. 1171 An aggregation unit that is not the first aggregation unit in an AP 1172 consists of an optional 8-bit DOND field followed by a 16-bit 1173 unsigned size information (in network byte order) that indicates the 1174 size of the NAL unit in bytes (excluding these two octets, but 1175 including the NAL unit header), followed by the NAL unit itself, 1176 including its NAL unit header, as shown in Figure 6. 1178 0 1 2 3 1179 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1181 : DOND(optional)| NALU size | 1182 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1183 | | 1184 | NAL unit | 1185 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1186 | : 1187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1189 Figure 6 The structure of an aggregation unit that is not the first 1190 aggregation unit in an AP 1192 When present, the DOND field plus 1 specifies the difference between 1193 the decoding order number values of the current aggregated NAL unit 1194 and the preceding aggregated NAL unit in the same AP. 1196 If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater 1197 than 0, the DOND field MUST be present in an aggregation unit that 1198 is not the first aggregation unit in an AP, and the variable DON for 1199 the aggregated NAL unit is derived as equal to the DON of the 1200 preceding aggregated NAL unit in the same AP plus the value of the 1201 DOND field plus 1 modulo 65536. Otherwise (tx-mode is equal to 1202 "SST" and sprop-depack-buf-nalus is equal to 0), the DOND field MUST 1203 NOT be present in an aggregation unit that is not the first 1204 aggregation unit in an AP. 1206 Figure 7 presents an example of an AP that contains two aggregation 1207 units, labeled as 1 and 2 in the figure, without the DONL and DOND 1208 fields being present. 1210 0 1 2 3 1211 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1213 | RTP Header | 1214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1215 | PayloadHdr | NALU 1 Size | 1216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1217 | NALU 1 HDR | | 1218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1219 | . . . | 1220 | | 1221 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1222 | . . . | NALU 2 Size | NALU 2 HDR | 1223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1224 | NALU 2 HDR | | 1225 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1226 | . . . | 1227 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1228 | :...OPTIONAL RTP padding | 1229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 Figure 7 An example of an AP packet containing two aggregation units 1232 without the DONL and DOND fields 1234 Figure 8 presents an example of an AP that contains two aggregation 1235 units, labeled as 1 and 2 in the figure, with the DONL and DOND 1236 fields being present. 1238 0 1 2 3 1239 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1241 | RTP Header | 1242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1243 | PayloadHdr | NALU 1 DONL | 1244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1245 | NALU 1 Size | NALU 1 HDR | 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1247 | | 1248 | NALU 1 Data . . . | 1249 | | 1250 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1251 | | NALU 2 DOND | NALU 2 Size | 1252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1253 | NALU 2 HDR | | 1254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1255 | | 1256 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1257 | :...OPTIONAL RTP padding | 1258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1260 Figure 8 An example of an AP containing two aggregation units with 1261 the DONL and DOND fields 1263 4.8 Fragmentation Units (FUs) 1265 Fragmentation units (FUs) are introduced to enable fragmenting a 1266 single NAL unit into multiple RTP packets, possibly without 1267 cooperation or knowledge of the HEVC encoder. A fragment of a NAL 1268 unit consists of an integer number of consecutive octets of that NAL 1269 unit. Fragments of the same NAL unit MUST be sent in consecutive 1270 order with ascending RTP sequence numbers (with no other RTP packets 1271 within the same RTP packet stream being sent between the first and 1272 last fragment). 1274 When a NAL unit is fragmented and conveyed within FUs, it is 1275 referred to as a fragmented NAL unit. APs MUST NOT be fragmented. 1276 FUs MUST NOT be nested; i.e., an FU MUST NOT contain a subset of 1277 another FU. 1279 The RTP timestamp of an RTP packet carrying an FU is set to the 1280 NALU-time of the fragmented NAL unit. 1282 An FU consists of a payload header (denoted as PayloadHdr), an FU 1283 header of one octet, an optional 16-bit DONL field (in network byte 1284 order), and an FU payload, as shown in Figure 9. 1286 0 1 2 3 1287 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1289 | PayloadHdr | FU header | DONL(optional)| 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1291 | DONL(optional)| | 1292 |-+-+-+-+-+-+-+-+ | 1293 | FU payload | 1294 | | 1295 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1296 | :...OPTIONAL RTP padding | 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1299 Figure 9 The structure of an FU 1301 The fields in the payload header are set as follows. The Type field 1302 MUST be equal to 49. The fields F, LayerId, and TID MUST be equal 1303 to the fields F, LayerId, and TID, respectively, of the fragmented 1304 NAL unit. 1306 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1307 field, as shown in Figure 10. 1309 +---------------+ 1310 |0|1|2|3|4|5|6|7| 1311 +-+-+-+-+-+-+-+-+ 1312 |S|E| FuType | 1313 +---------------+ 1315 Figure 10 The structure of FU header 1317 The semantics of the FU header fields are as follows: 1318 S: 1 bit 1319 When set to one, the S bit indicates the start of a fragmented 1320 NAL unit i.e., the first byte of the FU payload is also the first 1321 byte of the payload of the fragmented NAL unit. When the FU 1322 payload is not the start of the fragmented NAL unit payload, the 1323 S bit MUST be set to zero. 1325 E: 1 bit 1326 When set to one, the E bit indicates the end of a fragmented NAL 1327 unit, i.e., the last byte of the payload is also the last byte of 1328 the fragmented NAL unit. When the FU payload is not the last 1329 fragment of a fragmented NAL unit, the E bit MUST be set to zero. 1331 FuType: 6 bits 1332 The field FuType MUST be equal to the field Type of the 1333 fragmented NAL unit. 1335 The DONL field, when present, specifies the value of the 16 least 1336 significant bits of the decoding order number of the fragmented NAL 1337 unit. 1339 If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater 1340 than 0, and the S bit is equal to 1, the DONL field MUST be present 1341 in the FU, and the variable DON for the fragmented NAL unit is 1342 derived as equal to the value of the DONL field. Otherwise (tx-mode 1343 is equal to "SST" and sprop-depack-buf-nalus is equal to 0, or the S 1344 bit is equal to 0), the DONL field MUST NOT be present in the FU. 1346 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., 1347 the Start bit and End bit MUST NOT both be set to one in the same FU 1348 header. 1350 The FU payload consists of fragments of the payload of the 1351 fragmented NAL unit so that if the FU payloads of consecutive FUs, 1352 starting with an FU with the S bit equal to 1 and ending with an FU 1353 with the E bit equal to 1, are sequentially concatenated, the 1354 payload of the fragmented NAL unit can be reconstructed. The NAL 1355 unit header of the fragmented NAL unit is not included as such in 1356 the FU payload, but rather the information of the NAL unit header of 1357 the fragmented NAL unit is conveyed in F, LayerId, and TID fields of 1358 the FU payload headers of the FUs and the FuType field of the FU 1359 header of the FUs. An FU payload MAY have any number of octets and 1360 MAY be empty. 1362 Informative note: Empty FU payloads are allowed to reduce the 1363 latency of a certain class of senders in nearly lossless 1364 environments. These senders can be characterized in that they 1365 packetize fragments of a NAL unit before the NAL unit is 1366 completely generated and, hence, before the NAL unit size is 1367 known. If zero-length FU payloads were not allowed, the sender 1368 would have to generate at least one bit of data of the following 1369 fragment of the NAL unit before the current FU could be sent. 1370 Due to the characteristics of HEVC, where sometimes several CTUs 1371 occupy zero bits, this is undesirable and can add delay. 1372 However, the (potential) use of zero-length FU payloads should be 1373 carefully weighted against the increased risk of the loss of at 1374 least a part of the fragmented NAL unit because of the additional 1375 packets employed for its transmission. 1377 If an FU is lost, the receiver SHOULD discard all following 1378 fragmentation units in transmission order corresponding to the same 1379 fragmented NAL unit, unless the decoder in the receiver is known to 1380 be prepared to gracefully handle incomplete NAL units. 1382 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1383 fragments of a NAL unit to an (incomplete) NAL unit, even if 1384 fragment n of that NAL unit is not received. In this case, the 1385 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1386 syntax violation. 1388 5. Packetization Rules 1390 The following packetization rules apply: 1392 o If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater 1393 than 0 for an RTP session, the transmission order of NAL units 1394 carried in the RTP session MAY be different than the NAL unit 1395 decoding order. Otherwise (tx-mode is equal to "SST" and sprop- 1396 depack-buf-nalus is equal to 0 for an RTP session), the 1397 transmission order of NAL units carried in the RTP session MUST 1398 be the same as the NAL unit decoding order. 1400 o A NAL unit of a small size SHOULD be encapsulated in an 1401 aggregation packet together with one or more other NAL units in 1402 order to avoid the unnecessary packetization overhead for small 1403 NAL units. For example, non-VCL NAL units such as access unit 1404 delimiters, parameter sets, or SEI NAL units are typically small 1405 and can often be aggregated with VCL NAL units without violating 1406 MTU size constraints. 1408 o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation 1409 packet together with its associated VCL NAL unit, as typically a 1410 non-VCL NAL unit would be meaningless without the associated VCL 1411 NAL unit being available. 1413 o For carrying exactly one NAL unit in an RTP packet, a single NAL 1414 unit packet MUST be used. 1416 6. De-packetization Process 1418 The general concept behind de-packetization is to get the NAL units 1419 out of the RTP packets in an RTP session and all the dependent RTP 1420 sessions, if any, and pass them to the decoder in the NAL unit 1421 decoding order. 1423 The de-packetization process is implementation dependent. 1424 Therefore, the following description should be seen as an example of 1425 a suitable implementation. Other schemes may be used as well as 1426 long as the output for the same input is the same as the process 1427 described below. The output is the same when the set of NAL units 1428 and their order are both identical. Optimizations relative to the 1429 described algorithms are possible. 1431 All normal RTP mechanisms related to buffer management apply. In 1432 particular, duplicated or outdated RTP packets (as indicated by the 1433 RTP sequences number and the RTP timestamp) are removed. To 1434 determine the exact time for decoding, factors such as a possible 1435 intentional delay to allow for proper inter-stream synchronization 1436 must be factored in. 1438 NAL units with NAL unit type values in the range of 0 to 47, 1439 inclusive may be passed to the decoder. NAL-unit-like structures 1440 with NAL unit type values in the range of 48 to 63, inclusive, MUST 1441 NOT be passed to the decoder. 1443 The receiver includes a receiver buffer, which is used to compensate 1444 for transmission delay jitter, to reorder NAL units from 1445 transmission order to the NAL unit decoding order, and to recover 1446 the NAL unit decoding order in MST, when applicable. In this 1447 section, the receiver operation is described under the assumption 1448 that there is no transmission delay jitter. To make a difference 1449 from a practical receiver buffer that is also used for compensation 1450 of transmission delay jitter, the receiver buffer is here after 1451 called the de-packetization buffer in this section. Receivers 1452 SHOULD also prepare for transmission delay jitter; i.e., either 1453 reserve separate buffers for transmission delay jitter buffering and 1454 de-packetization buffering or use a receiver buffer for both 1455 transmission delay jitter and de-packetization. Moreover, receivers 1456 SHOULD take transmission delay jitter into account in the buffering 1457 operation; e.g., by additional initial buffering before starting of 1458 decoding and playback. 1460 There are two buffering states in the receiver: initial buffering 1461 and buffering while playing. Initial buffering starts when the 1462 reception is initialized. After initial buffering, decoding and 1463 playback are started, and the buffering-while-playing mode is used. 1465 Regardless of the buffering state, the receiver stores incoming NAL 1466 units, in reception order, into the de-packetization buffer. NAL 1467 units carried in single NAL unit packets, APs, and FUs are stored in 1468 the de-packetization buffer individually, and the value of AbsDon is 1469 calculated and stored for each NAL unit. When MST is in use, NAL 1470 units of all RTP packet streams are stored in the same de- 1471 packetization buffer. 1473 Initial buffering lasts until condition A (the number of NAL units 1474 in the de-packetization buffer is greater than the value of sprop- 1475 depack-buf-nalus of the highest RTP session) is true. 1477 After initial buffering, whenever condition A is true, the following 1478 operation is repeatedly applied until condition A becomes false: 1480 o The NAL unit in the de-packetization buffer with the smallest 1481 value of AbsDon is removed from the de-packetization buffer and 1482 passed to the decoder. 1484 When no more NAL units are flowing into the de-packetization buffer, 1485 all NAL units remaining in the de-packetization buffer are removed 1486 from the buffer and passed to the decoder in the order of increasing 1487 AbsDon values. 1489 7. Payload Format Parameters 1491 This section specifies the parameters that MAY be used to select 1492 optional features of the payload format and certain features or 1493 properties of the bitstream. The parameters are specified here as 1494 part of the media type registration for the HEVC codec. A mapping 1495 of the parameters into the Session Description Protocol (SDP) 1496 [RFC4566] is also provided for applications that use SDP. 1497 Equivalent parameters could be defined elsewhere for use with 1498 control protocols that do not use SDP. 1500 7.1 Media Type Registration 1502 The media subtype for the HEVC codec is allocated from the IETF 1503 tree. 1505 The receiver MUST ignore any unspecified parameter. 1507 Media Type name: video 1509 Media subtype name: H265 1510 Required parameters: none 1512 OPTIONAL parameters: 1514 In the following definitions of parameters, "the stream" or "the 1515 NAL unit stream" refers to all NAL units conveyed in the current 1516 RTP session in SST, and all NAL units conveyed in the current RTP 1517 session and all NAL units conveyed in other RTP sessions that the 1518 current RTP session depends on in MST. 1520 profile-space, profile-id: 1522 The profile-space parameter indicates the context for 1523 interpretation of the profile-id parameter value. The 1524 profile, which specifies the subset of coding tools that may 1525 have been used to generate the stream or that the receiver 1526 supports, as specified in [HEVC], is defined by the 1527 combination of profile-space and profile-id. Note that 1528 profile-space is required to be equal to 0 in [HEVC], but 1529 other values for it may be specified in the future by ITU-T or 1530 ISO/IEC. 1532 If the profile-space and profile-id parameters are used to 1533 indicate properties of a NAL unit stream, it indicates that, 1534 to decode the stream, the minimum subset of coding tools a 1535 decoder has to support is the profile specified by both 1536 parameters. 1538 If the profile-space and profile-id parameters are used for 1539 capability exchange or session setup, it indicates the subset 1540 of coding tools, which is equal to the profile, that the codec 1541 supports for both receiving and sending. 1543 If no profile-space is present, a value of 0 MUST be inferred 1544 and if no profile-id is present the Main profile (i.e. a value 1545 of 1) MUST be inferred. 1547 The profile-space and profile-id parameters are derived from 1548 the sequence parameter set or video parameter set NAL units, 1549 as specified in [HEVC], as follows. 1551 For SST or for the stream corresponding to the highest RTP 1552 session of MST when MST is applied, the following applies: 1554 o profile_space = general_profile_space 1555 o profile_id = general_profile_idc 1557 For streams not corresponding to the highest RTP session of 1558 MST when MST is applied, the following applies, with j being 1559 the value of the sub-layer-id parameter: 1561 o profile_space = sub_layer_profile_space[j] 1562 o profile_id = sub_layer_profile_idc[j] 1564 tier-flag, level-id: 1566 The tier-flag parameter indicates the context for 1567 interpretation of the level-id value. The default level, 1568 which limits values of syntax elements or on arithmetic 1569 combinations of values of syntax elements, as specified in 1570 [HEVC], is defined by the combination of tier-flag and level- 1571 id. 1573 If the tier-flag and level-id parameters are used to indicate 1574 properties of a NAL unit stream, it indicates that, to decode 1575 the stream the lowest level the decoder has to support is the 1576 default level. 1578 If the tier-flag and level-id parameters are used for 1579 capability exchange or session setup, the following applies. 1580 If max-recv-level-id is not present, the default level defined 1581 by tier-flag and level-id indicates the highest level the 1582 codec wishes to support. Otherwise, tier-flag and max-recv- 1583 level-id indicate the highest level the codec supports for 1584 receiving. For either receiving or sending, all levels that 1585 are lower than the highest level supported MUST also be 1586 supported. 1588 If no tier-flag is present, a value of 0 MUST be inferred and 1589 if no level-id is present, a value of 93 (i.e. level 3.1) MUST 1590 be inferred. 1592 The tier-flag and level-id parameters are derived from the 1593 sequence parameter set or video parameter set NAL units, as 1594 specified in [HEVC], as follows. 1596 For SST or for the stream corresponding to the highest RTP 1597 session of MST when MST is applied, the following applies: 1599 o tier-flag = general_tier_flag 1600 o level-id = general_level_idc 1602 For streams not corresponding to the highest RTP session of 1603 MST when MST is applied, the following applies, with j being 1604 the value of the sub-layer-id parameter: 1606 o tier-flag = sub_layer_tier_flag[j] 1607 o level-id = sub_layer_level_idc[j] 1609 interop-constraints: 1611 A base16 [RFC4648] (hexadecimal) representation of the six 1612 bytes derived from the sequence parameter set or video 1613 parameter set NAL units as specified in [HEVC] consisting of 1614 progressive_source_flag, interlaced_source_flag, 1615 non_packed_constraint_flag, frame_only_constraint_flag, and 1616 reserved_zero_44bits. Note that reserved_zero_44bits is 1617 required to be equal to 0 in [HEVC], but other values for it 1618 may be specified in the future by ITU-T or ISO/IEC. 1620 If no interop-constraints are present, the following MUST be 1621 inferred: 1623 o progressive_source_flag = 1 1624 o interlaced_source_flag = 0 1625 o non_packed_constraint_flag = 1 1626 o frame_only_constraint_flag = 1 1627 o reserved_zero_44bits = 0 1629 For SST or for the stream corresponding to the highest RTP 1630 session of MST when MST is applied, the following applies: 1632 o progressive_source_flag = general_progressive_source_flag 1633 o interlaced_source_flag = general_interlaced_source_flag 1634 o non_packed_constraint_flag = 1635 general_non_packed_constraint_flag 1636 o frame_only_constraint_flag = 1637 general_frame_only_constraint_flag 1638 o reserved_zero_44bits = general_reserved_zero_44bits 1640 For streams not corresponding to the highest RTP session of 1641 MST when MST is applied, the following applies, with j being 1642 the value of the sub-layer-id parameter: 1644 o progressive_source_flag = 1645 sub_layer_progressive_source_flag[j] 1646 o interlaced_source_flag = 1647 sub_layer_interlaced_source_flag[j] 1648 o non_packed_constraint_flag = 1649 sub_layer_non_packed_constraint_flag[j] 1650 o frame_only_constraint_flag = 1651 sub_layer_frame_only_constraint_flag[j] 1652 o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] 1654 profile-compatibility-indicator: 1656 A base16 [RFC4648] representation of the four bytes 1657 representing the 32 profile compatibility flags in the 1658 sequence parameter set or video parameter set NAL units. A 1659 decoder conforming to a certain profile may be able to decode 1660 bitstreams conforming to other profiles. The profile- 1661 compatibility-indicator provides exact information of the 1662 ability of a decoder conforming to a certain profile to decode 1663 bitstreams conforming to another profile. More concretely, if 1664 the profile compatibility flag corresponding to the profile, 1665 which a decoder conforms to, is set, then the decoder is able 1666 to decode that bitstream with the flag set, irrespective of 1667 the profile, which a bitstream conforms to (provided that the 1668 decoder supports the highest level of the bitstream). 1670 For SST or for the stream corresponding to highest RTP session 1671 of MST when MST is used with temporal scalability the 1672 following applies with j = 0..31: 1674 o The 32 flags = general_profile_compatibility_flag[j] 1676 When MST is in use, for streams not corresponding to the 1677 highest RTP session, the following applies with i being the 1678 value of the sub-layer-id parameter and j = 0..31: 1680 o The 32 flags = sub_layer_profile_compatibility_flag[i][j] 1682 sub-layer-id: 1684 This parameter MAY be used to indicate the highest allowed 1685 value of TID in the stream. When not present, the value of 1686 sub-layer-id is inferred to be equal to 6. 1688 recv-sub-layer-id: 1690 This parameter MAY be used to signal a receiver's choice of 1691 the offers or declared sub-layers in the sprop-vps. The value 1692 of recv-sub-layer-id indicates the index of the highest sub- 1693 layer of the stream that a receiver supports. When not 1694 present, the value of recv-sub-layer-id is inferred to be 1695 equal to sub-layer-id. 1697 max-recv-level-id: 1699 This parameter MAY be used, together with tier-flag, to 1700 indicate the highest level a receiver supports. The highest 1701 level the receiver supports is equal to the value of max-recv- 1702 level-id divided by 30 for the Main or High tier (as 1703 determined by tier-flag equal to 0 or 1, respectively). 1705 When max-recv-level-id is not present, the value is inferred 1706 to be equal to level-id. 1708 max-recv-level-id MUST NOT be present when the highest level 1709 the receiver supports is not higher than the default level. 1711 sprop-vps: 1713 This parameter MAY be used to convey any video parameter set 1714 NAL unit of the stream. When present, the parameter MAY be 1715 used to indicate codec capability and sub-stream 1716 characteristics (i.e. properties of sub-layer representations 1717 as defined in [HEVC]) as well as for out-of-band transmission 1718 of video parameter sets. The value of the parameter is a 1719 comma-separated (',') list of base64 [RFC4648] representations 1720 of the video parameter set NAL units as specified in Section 1721 7.3.2.1 of [HEVC]. 1723 sprop-sps: 1725 This parameter MAY be used to convey sequence parameter set 1726 NAL units of the stream for out-of-band transmission of 1727 sequence parameter sets. The value of the parameter is a 1728 comma-separated (',') list of base64 [RFC4648] representations 1729 of the sequence parameter set NAL units as specified in 1730 Section 7.3.2.2 of [HEVC]. 1732 sprop-pps: 1734 This parameter MAY be used to convey picture parameter set NAL 1735 units of the stream for out-of-band transmission of picture 1736 parameter sets. The value of the parameter is a comma- 1737 separated (',') list of base64 [RFC4648] representations of 1738 the picture parameter set NAL units as specified in Section 1739 7.3.2.3 of [HEVC]. 1741 max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 1743 These parameters MAY be used to signal the capabilities of a 1744 receiver implementation. These parameters MUST NOT be used 1745 for any other purpose. The highest level (specified by tier- 1746 flag and max-recv-level-id) MUST be such that the receiver is 1747 fully capable of supporting. max-ls, max-lps, max-cpb, max- 1748 dpb, max-br, max-tr, and max-tc MAY be used to indicate 1749 capabilities of the receiver that extend the required 1750 capabilities of the highest level, as specified below. 1752 When more than one parameter from the set (max-ls, max-lps, 1753 max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the 1754 receiver MUST support all signaled capabilities 1755 simultaneously. For example, if both max-ls and max-br are 1756 present, the highest level with the extension of both the 1757 picture rate and bitrate is supported. That is, the receiver 1758 is able to decode NAL unit streams in which the luma sample 1759 rate is up to max-ls (inclusive), the bitrate is up to max-br 1760 (inclusive), the coded picture buffer size is derived as 1761 specified in the semantics of the max-br parameter below, and 1762 the other properties comply with the highest level specified 1763 by tier-flag and max-recv-level-id. 1765 Informative note: When the OPTIONAL media type parameters 1766 are used to signal the properties of a NAL unit stream, and 1767 max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, and max- 1768 tc are not present, the values of profile-space, profile- 1769 id, tier-flag, and level-id must always be such that the 1770 NAL unit stream complies fully with the specified profile 1771 and level. 1773 max-ls: 1774 The value of max-ls is an integer indicating the maximum 1775 processing rate in units of luma samples per second. The max- 1776 ls parameter signals that the receiver is capable of decoding 1777 video at a higher rate than is required by the highest level. 1779 When max-ls is signaled, the receiver MUST be able to decode 1780 NAL unit streams that conform to the highest level, with the 1781 exception that the MaxLumaSR value in Table A-2 of [HEVC] for 1782 the highest level is replaced with the value of max-ls. The 1783 value of max-ls MUST be greater than or equal to the value of 1784 MaxLumaSR given in Table A-2 of [HEVC] for the highest level. 1785 Senders MAY use this knowledge to send pictures of a given 1786 size at a higher picture rate than is indicated in the highest 1787 level. 1789 When not present, the value of max-ls is inferred to be equal 1790 to the value of MaxLumaSR given in Table A-2 of [HEVC] for the 1791 highest level. 1793 max-lps: 1794 The value of max-lps is an integer indicating the maximum 1795 picture size in units of luma samples. The max-lps parameter 1796 signals that the receiver is capable of decoding larger 1797 picture sizes than are required by the highest level. When 1798 max-lps is signaled, the receiver MUST be able to decode NAL 1799 unit streams that conform to the highest level, with the 1800 exception that the MaxLumaPS value in Table A-1 of [HEVC] for 1801 the highest level is replaced with the value of max-lps. The 1802 value of max-lps MUST be greater than or equal to the value of 1803 MaxLumaPS given in Table A-1 of [HEVC] for the highest level. 1804 Senders MAY use this knowledge to send larger pictures at a 1805 proportionally lower picture rate than is indicated in the 1806 highest level. 1808 When not present, the value of max-lps is inferred to be equal 1809 to the value of MaxLumaPS given in Table A-1 of [HEVC] for the 1810 highest level. 1812 max-cpb: 1813 The value of max-cpb is an integer indicating the maximum 1814 coded picture buffer size in units of CpbBrVclFactor bits for 1815 the VCL HRD parameters and in units of CpbBrNalFactor bits for 1816 the NAL HRD parameters, where CpbBrVclFactor and 1817 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- 1818 cpb parameter signals that the receiver has more memory than 1819 the minimum amount of coded picture buffer memory required by 1820 the highest level. When max-cpb is signaled, the receiver 1821 MUST be able to decode NAL unit streams that conform to the 1822 highest level, with the exception that the MaxCPB value in 1823 Table A-1 of [HEVC] for the highest level is replaced with the 1824 value of max-cpb. The value of max-cpb MUST be greater than 1825 or equal to the value of MaxCPB given in Table A-1 of [HEVC] 1826 for the highest level. Senders MAY use this knowledge to 1827 construct coded video streams with greater variation of 1828 bitrate than can be achieved with the MaxCPB value in Table A- 1829 1 of [HEVC]. 1831 When not present, the value of max-cpb is inferred to be equal 1832 to the value of MaxCPB given in Table A-1 of [HEVC] for the 1833 highest level. 1835 Informative note: The coded picture buffer is used in the 1836 hypothetical reference decoder (Annex C of HEVC). The use 1837 of the hypothetical reference decoder is recommended in 1838 HEVC encoders to verify that the produced bitstream 1839 conforms to the standard and to control the output bitrate. 1840 Thus, the coded picture buffer is conceptually independent 1841 of any other potential buffers in the receiver, including 1842 de-packetization and de-jitter buffers. The coded picture 1843 buffer need not be implemented in decoders as specified in 1844 Annex C of HEVC, but rather standard-compliant decoders can 1845 have any buffering arrangements provided that they can 1846 decode standard-compliant bitstreams. Thus, in practice, 1847 the input buffer for a video decoder can be integrated with 1848 de-packetization and de-jitter buffers of the receiver. 1850 max-dpb: 1851 The value of max-dpb is an integer indicating the maximum 1852 decoded picture buffer size in units decoded pictures at the 1853 MaxLumaPS for the highest level, i.e. number of decoded 1854 pictures at the maximum picture size defined by the highest 1855 level. The value of max-dpb MUST be smaller than or equal to 1856 16. The max-dpb parameter signals that the receiver has more 1857 memory than the minimum amount of decoded picture buffer 1858 memory required by default, which is MaxDpbPicBuf as defined 1859 in [HEVC] (equal to 6). When max-dpb is signaled, the 1860 receiver MUST be able to decode NAL unit streams that conform 1861 to the highest level, with the exception that the 1862 MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with 1863 the value of max-dpb. Consequently, a receiver that signals 1864 max-dpb MUST be capable of storing the following number of 1865 decoded pictures (MaxDpbSize) in its decoded picture buffer: 1867 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 1868 MaxDpbSize = Min( 4 * max-dpb, 16 ) 1869 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 1870 MaxDpbSize = Min( 2 * max-dpb, 16 ) 1871 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) 1872 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 1873 else 1874 MaxDpbSize = max-dpb 1876 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest 1877 level and PicSizeInSamplesY is the current size of each 1878 decoded picture in units of luma samples as defined in [HEVC]. 1880 The value of max-dpb MUST be greater than or equal to the 1881 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders 1882 MAY use this knowledge to construct coded video streams with 1883 improved compression. 1885 When not present, the value of max-dpb is inferred to be equal 1886 to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 1888 Informative note: This parameter was added primarily to 1889 complement a similar codepoint in the ITU-T Recommendation 1890 H.245, so as to facilitate signaling gateway designs. The 1891 decoded picture buffer stores reconstructed samples. There 1892 is no relationship between the size of the decoded picture 1893 buffer and the buffers used in RTP, especially de- 1894 packetization and de-jitter buffers. 1896 max-br: 1897 The value of max-br is an integer indicating the maximum video 1898 bitrate in units of CpbBrVclFactor bits per second for the VCL 1899 HRD parameters and in units of CpbBrNalFactor bits per second 1900 for the NAL HRD parameters, where CpbBrVclFactor and 1901 CpbBrNalFactor are defined in Section A.4 of [HEVC]. 1903 The max-br parameter signals that the video decoder of the 1904 receiver is capable of decoding video at a higher bitrate than 1905 is required by the highest level. 1907 When max-br is signaled, the video codec of the receiver MUST 1908 be able to decode NAL unit streams that conform to the highest 1909 level, with the following exceptions in the limits specified 1910 by the highest level: 1912 o The value of max-br replaces the MaxBR value in Table A-2 1913 of [HEVC] for the highest level. 1914 o When the max-cpb parameter is not present, the result of 1915 the following formula replaces the value of MaxCPB in Table 1916 A-1 of [HEVC]: 1918 (MaxCPB of the highest level) * max-br / (MaxBR of the 1919 highest level) 1921 For example, if a receiver signals capability for Main profile 1922 Level 2 with max-br equal to 2000, this indicates a maximum 1923 video bitrate of 2000 kbits/sec for VCL HRD parameters, a 1924 maximum video bitrate of 2200 kbits/sec for NAL HRD 1925 parameters, and a CPB size of 2000000 bits (2000000 / 1500000 1926 * 1500000). 1928 The value of max-br MUST be greater than or equal to the value 1929 MaxBR given in Table A-2 of [HEVC] for the highest level. 1931 Senders MAY use this knowledge to send higher bitrate video as 1932 allowed in the level definition of Annex A of HEVC to achieve 1933 improved video quality. 1935 When not present, the value of max-br is inferred to be equal 1936 to the value of MaxBR given in Table A-2 of [HEVC] for the 1937 highest level. 1939 Informative note: This parameter was added primarily to 1940 complement a similar codepoint in the ITU-T Recommendation 1941 H.245, so as to facilitate signaling gateway designs. The 1942 assumption that the network is capable of handling such 1943 bitrates at any given time cannot be made from the value of 1944 this parameter. In particular, no conclusion can be drawn 1945 that the signaled bitrate is possible under congestion 1946 control constraints. 1948 max-tr: 1949 The value of max-tr is an integer indication the maximum 1950 number of tile rows. The max-tr parameter signals that the 1951 receiver is capable of decoding video with a larger number of 1952 tile rows than the value allowed by the highest level. 1954 When max-tr is signaled, the receiver MUST be able to decode 1955 NAL unit streams that conform to the highest level, with the 1956 exception that the MaxTileRows value in Table A-1 of [HEVC] 1957 for the highest level is replaced with the value of max-tr. 1959 The value of max-tr MUST be greater than or equal to the value 1960 of MaxTileRows given in Table A-1 of [HEVC] for the highest 1961 level. Senders MAY use this knowledge to send pictures 1962 utilizing a larger number of tile rows than the value allowed 1963 by the highest level. 1965 When not present, the value of max-tr is inferred to be equal 1966 to the value of MaxTileRows given in Table A-1 of [HEVC] for 1967 the highest level. 1969 max-tc: 1970 The value of max-tc is an integer indication the maximum 1971 number of tile columns. The max-tc parameter signals that the 1972 receiver is capable of decoding video with a larger number of 1973 tile columns than the value allowed by the highest level. 1975 When max-tc is signaled, the receiver MUST be able to decode 1976 NAL unit streams that conform to the highest level, with the 1977 exception that the MaxTileCols value in Table A-1 of [HEVC] 1978 for the highest level is replaced with the value of max-tc. 1980 The value of max-tc MUST be greater than or equal to the value 1981 of MaxTileCols given in Table A-1 of [HEVC] for the highest 1982 level. Senders MAY use this knowledge to send pictures 1983 utilizing a larger number of tile columns than the value 1984 allowed by the highest level. 1986 When not present, the value of max-tc is inferred to be equal 1987 to the value of MaxTileCols given in Table A-1 of [HEVC] for 1988 the highest level. 1990 max-fps: 1992 The value of max-fps is an integer indicating the maximum 1993 picture rate in units of hundreds of pictures per second that 1994 can be efficiently received. The max-fps parameter MAY be 1995 used to signal that the receiver has a constraint in that it 1996 is not capable of decoding video efficiently at the full 1997 picture rate that is implied by the highest level and, when 1998 present, one or more of the parameters max-ls, max-lps, and 1999 max-br. 2001 The value of max-fps is not necessarily the picture rate at 2002 which the maximum picture size can be sent, it constitutes a 2003 constraint on maximum picture rate for all resolutions. 2005 Informative note: The max-fps parameter is semantically 2006 different from max-ls, max-lps, max-cpb, max-dpb, max-br, 2007 max-tr, and max-tc in that max-fps is used to signal a 2008 constraint, lowering the maximum picture rate from what is 2009 implied by other parameters. 2011 The encoder MUST use a picture rate equal to or less than this 2012 value. In cases where the max-fps parameter is absent the 2013 encoder is free to choose any picture rate according to the 2014 highest level and any signaled optional parameters. 2016 tx-mode: 2018 This parameter indicates whether the transmission mode is SST 2019 or MST. 2021 The value of tx-mode MUST be equal to either "MST" or "SST". 2022 When not present, the value of tx-mode is inferred to be equal 2023 to "SST". 2025 If the value is equal to "MST", MST MUST be in use. Otherwise 2026 (the value is equal to "SST"), SST MUST be in use. 2028 The value of tx-mode MUST be equal to "MST" for all RTP 2029 sessions in an MST. 2031 sprop-depack-buf-nalus: 2033 This parameter specifies the maximum number of NAL units that 2034 precede a NAL unit in the de-packetization buffer in reception 2035 order and follow the NAL unit in decoding order. 2037 The value of sprop-depack-buf-nalus MUST be an integer in the 2038 range of 0 to 32767, inclusive. 2040 When not present, the value of sprop-depack-buf-nalus is 2041 inferred to be equal to 0. 2043 When the RTP session depends on one or more other RTP sessions 2044 (in this case tx-mode MUST be equal to "MST"), this parameter 2045 MUST be present and the value of sprop-depack-buf-nalus MUST 2046 be greater than 0. 2048 sprop-depack-buf-bytes: 2050 This parameter signals the required size of the de- 2051 packetization buffer in units of bytes. The value of the 2052 parameter MUST be greater than or equal to the maximum buffer 2053 occupancy (in units of bytes) of the de-packetization buffer 2054 as specified in section 6. 2056 The value of sprop-depack-buf-bytes MUST be an integer in the 2057 range of 0 to 4294967295, inclusive. 2059 When the RTP session depends on one or more other RTP sessions 2060 (in this case tx-mode MUST be equal to "MST") or sprop-depack- 2061 buf-nalus is present and is greater than 0, this parameter 2062 MUST be present and the value of sprop-depack-buf-bytes MUST 2063 be greater than 0. 2065 Informative note: The value of sprop-depack-buf-bytes 2066 indicates the required size of the de-packetization buffer 2067 only. When network jitter can occur, an appropriately 2068 sized jitter buffer has to be available as well. 2070 depack-buf-cap: 2072 This parameter signals the capabilities of a receiver 2073 implementation and indicates the amount of de-packetization 2074 buffer space in units of bytes that the receiver has available 2075 for reconstructing the NAL unit decoding order. A receiver is 2076 able to handle any stream for which the value of the sprop- 2077 depack-buf-bytes parameter is smaller than or equal to this 2078 parameter. 2080 When not present, the value of depack-buf-cap is inferred to 2081 be equal to 0. The value of depack-buf-cap MUST be an integer 2082 in the range of 0 to 4294967295, inclusive. 2084 Informative note: depack-buf-cap indicates the maximum 2085 possible size of the de-packetization buffer of the 2086 receiver only. When network jitter can occur, an 2087 appropriately sized jitter buffer has to be available as 2088 well. 2090 sprop-segmentation-id: 2092 This parameter MAY be used to signal the segmentation tools 2093 present in the stream and that can be used for 2094 parallelization. The value of sprop-segmentation-id MUST be 2095 an integer in the range of 0 to 3, inclusive. When not 2096 present, the value of sprop-segmentation-id is inferred to be 2097 equal to 0. 2099 When sprop-segmentation-id is equal to 0, no information about 2100 the segmentation tools is provided. When sprop-segmentation- 2101 id is equal to 1, it indicates that slices are present in the 2102 stream. When sprop-segmentation-id is equal to 2, it 2103 indicates that tiles are present in the stream. When sprop- 2104 segmentation-id is equal to 3, it indicates that WPP is used 2105 in the stream. 2107 sprop-spatial-segmentation-idc: 2109 A base16 [RFC4648] representation of the syntax element 2110 min_spatial_segmentation_idc as specified in [HEVC]. This 2111 parameter MAY be used to describe parallelization capabilities 2112 of the stream. 2114 dec-parallel-cap: 2116 This parameter MAY be used to indicate the decoder's 2117 additional decoding capabilities given the presence of tools 2118 enabling parallel decoding, such as slices, tiles, and WPP, in 2119 the video stream. The decoding capability of the decoder may 2120 vary with the setting of the parallel decoding tools present 2121 in the stream, e.g. the size of the tiles that are present in 2122 a stream. Therefore, multiple capability points may be 2123 provided, each indicating the minimum required decoding 2124 capability that is associated with a parallelism requirement, 2125 which is a requirement on the video stream that enables 2126 parallel decoding. 2128 Each capability point is defined as a combination of 1) a 2129 parallelism requirement, 2) a profile (determined by profile- 2130 space and profile-id), 3) a highest level, and 4) a maximum 2131 processing rate, a maximum picture size, and a maximum video 2132 bitrate that may be equal to or greater than that determined 2133 by the highest level.The parameter's syntax in ABNF [RFC5234] 2134 is as follows: 2136 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2137 cap-point) "}" 2139 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2140 cap-parameter) 2142 spatial-seg-idc = 1*4DIGIT ; 1-4095 2144 cap-parameter = tier-flag / level-id / max-ls 2145 / max-lps / max-br 2147 The set of capability points expressed by the dec-parallel-cap 2148 parameter is enclosed in a pair of curly braces ("{}"). Each 2149 set of two consecutive capability points is separated by a 2150 comma (','). Within each capability point, each set of two 2151 consecutive parameters, and when present, their values, is 2152 separated by a semicolon (';'). 2154 The profile of all capability points is determined by profile- 2155 space and profile-id that are outside the dec-parallel-cap 2156 parameter. 2158 Each capability point starts with an indication of the 2159 parallelism requirement, which consists of a parallel tool 2160 type, which may be equal to 'w' or 't', and a decimal value of 2161 the spatial-seg-idc parameter. When the type is 'w', the 2162 capability point is valid only for H.265 bitstreams with WPP 2163 in use, i.e., entropy_coding_sync_enabled_flag equal to 1. 2164 When the type is 't', the capability point is valid only for 2165 H.265 bitstreams with WPP not in use (i.e. 2167 entropy_coding_sync_enabled_flag equal to 0). The capability- 2168 point is valid only for H.265 bitstreams with 2169 min_spatial_segmentation_idc equal to or greater than spatial- 2170 seg-idc. 2172 The value of spatial-seg-idc MUST be greater than 0. 2174 After the parallelism requirement indication, each capability 2175 point continues with one or more pairs of parameter and value 2176 in any order for any of the following parameters: 2178 o tier-flag 2179 o level-id 2180 o max-ls 2181 o max-lps 2182 o max-br 2184 At most one occurrence of each of the above five parameters is 2185 allowed within each capability point. 2187 The values of dec-parallel-cap.tier-flag and dec-parallel- 2188 cap.level-id for a capability point indicate the highest level 2189 of the capability point. The values of dec-parallel-cap.max- 2190 ls, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for 2191 a capability point indicate the maximum processing rate in 2192 units of luma samples per second, the maximum picture size in 2193 units of luma samples, and the maximum video bitrate (in units 2194 of CpbBrVclFactor bits per second for the VCL HRD parameters 2195 and in units of CpbBrNalFactor bits per second for the NAL HRD 2196 parameters) where CpbBrVclFactor and CpbBrNalFactor are 2197 defined in Section A.4 of [HEVC]). 2199 When not present, the value of dec-parallel-cap.tier-flag is 2200 inferred to be equal to the value of tier-flag outside the 2201 dec-parallel-cap parameter. When not present, the value of 2202 dec-parallel-cap.level-id is inferred to be equal to the value 2203 of max-recv-level-id outside the dec-parallel-cap parameter. 2204 When not present, the value of dec-parallel-cap.max-ls, dec- 2205 parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred 2206 to be equal to the value of max-ls, max-lps, or max-br, 2207 respectively, outside the dec-parallel-cap parameter. 2209 The general decoding capability, expressed by the set of 2210 parameters outside of dec-parallel-cap, is defined as the 2211 capability point that is determined by the following 2212 combination of parameters: 1) the parallelism requirement 2213 corresponding to the value of sprop-segmentation-id equal to 0 2214 for a stream, 2) the profile determined by profile-space and 2215 profile-id, 3) the highest level determined by tier-flag and 2216 max-recv-level-id, and 4) the maximum processing rate, the 2217 maximum picture size, and the maximum video bitrate determined 2218 by the highest level. The general decoding capability MUST 2219 NOT be included as one of the set of capability points in the 2220 dec-parallel-cap parameter. 2222 For example, the following parameters express the general 2223 decoding capability of 720p30 (Level 3.1) plus an additional 2224 decoding capability of 1080p30 (Level 4) given that the 2225 spatially largest tile or slice used in the bitstream is equal 2226 to or less than 1/3 of the picture size: 2228 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} 2230 For another example, the following parameters express an 2231 additional decoding capability of 1080p30, using dec-parallel- 2232 cap.max-ls and dec-parallel-cap.max-lps, given that WPP is 2233 used in the stream: 2235 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2236 max-ls=2088960;max-lps=62668800} 2238 Informative note: When min_spatial_segmentation_idc is 2239 present in a stream and WPP is not used, [HEVC] specifies 2240 that there is no slice or no tile in the stream containing 2241 more than 4 * PicSizeInSamplesY / 2242 ( min_spatial_segmentation_idc + 4 ) luma samples. 2244 Encoding considerations: 2246 This type is only defined for transfer via RTP (RFC 3550). 2248 Security considerations: 2250 See Section 9 of RFC XXXX. 2252 Public specification: 2254 Please refer to Section 13 of RFC XXXX. 2256 Additional information: None 2258 File extensions: none 2260 Macintosh file type code: none 2262 Object identifier or OID: none 2264 Person & email address to contact for further information: 2266 Intended usage: COMMON 2268 Author: See Section 14 of RFC XXXX. 2270 Change controller: 2272 IETF Audio/Video Transport Payloads working group delegated 2273 from the IESG. 2275 7.2 SDP Parameters 2277 The receiver MUST ignore any parameter unspecified in this memo. 2279 7.2.1 Mapping of Payload Type Parameters to SDP 2281 The media type video/H265 string is mapped to fields in the Session 2282 Description Protocol (SDP) [RFC4566] as follows: 2284 o The media name in the "m=" line of SDP MUST be video. 2286 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the 2287 media subtype). 2289 o The clock rate in the "a=rtpmap" line MUST be 90000. 2291 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2292 flag", "level-id", "interop-constraints", "profile-compatibility- 2293 indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level- 2294 id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "max- 2295 tr", "max-tc", "max-fps", "tx-mode", "sprop-depack-buf-nalus", 2296 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-segmentation- 2297 id", "sprop-spatial-segmentation-idc", and "dec-parallel-cap", 2298 when present, MUST be included in the "a=fmtp" line of SDP. This 2299 parameter is expressed as a media type string, in the form of a 2300 semicolon separated list of parameter=value pairs. 2302 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 2303 pps", when present, MUST be included in the "a=fmtp" line of SDP 2304 or conveyed using the "fmtp" source attribute as specified in 2305 section 6.3 of [RFC5576]. For a particular media format (i.e., 2306 RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST 2307 NOT be both included in the "a=fmtp" line of SDP and conveyed 2308 using the "fmtp" source attribute. When included in the "a=fmtp" 2309 line of SDP, these parameters are expressed as a media type 2310 string, in the form of a semicolon separated list of 2311 parameter=value pairs. When conveyed using the "fmtp" source 2312 attribute, these parameters are only associated with the given 2313 source and payload type as parts of the "fmtp" source attribute. 2315 Informative note: Conveyance of "sprop-vps", "sprop-sps", and 2316 "sprop-pps" using the "fmtp" source attribute allows for out- 2317 of-band transport of parameter sets in topologies like Topo- 2318 Video-switch-MCU as specified in [RFC5117]. 2320 An example of media representation in SDP is as follows: 2322 m=video 49170 RTP/AVP 98 2323 a=rtpmap:98 H265/90000 2324 a=fmtp:98 profile-id=1; 2325 sprop-vps=