idnits 2.17.1 draft-ietf-payload-rtp-h265-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 13, 2014) is 3534 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1113 -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-05 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-07 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-02 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: February 2015 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 August 13, 2014 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-06.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization 22 of one or more Network Abstraction Layer (NAL) units in each RTP 23 packet payload, as well as fragmentation of a NAL unit into 24 multiple RTP packets. Furthermore, it supports transmission of 25 an HEVC bitstream over a single as well as multiple RTP streams. 26 The payload format has wide applicability in videoconferencing, 27 Internet video streaming, and high bit-rate entertainment-quality 28 video, among others. 30 Status of this Memo 32 This Internet-Draft is submitted to IETF in full conformance with 33 the provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF), its areas, and its working groups. Note that 37 other groups may also distribute working documents as Internet- 38 Drafts. 40 Internet-Drafts are draft documents valid for a maximum of six 41 months and may be updated, replaced, or obsoleted by other 42 documents at any time. It is inappropriate to use Internet- 43 Drafts as reference material or to cite them other than as "work 44 in progress." 46 The list of current Internet-Drafts can be accessed at 47 http://www.ietf.org/ietf/1id-abstracts.txt. 49 The list of Internet-Draft Shadow Directories can be accessed at 50 http://www.ietf.org/shadow.html. 52 This Internet-Draft will expire on February 13, 2015. 54 Copyright and License Notice 56 Copyright (c) 2014 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with 64 respect to this document. Code Components extracted from this 65 document must include Simplified BSD License text as described in 66 Section 4.e of the Trust Legal Provisions and are provided 67 without warranty as described in the Simplified BSD License. 69 Table of Contents 71 Abstract.........................................................1 72 Status of this Memo..............................................1 73 Table of Contents................................................3 74 1 Introduction...................................................5 75 1.1 Overview of the HEVC Codec................................5 76 1.1.1 Coding-Tool Features.................................5 77 1.1.2 Systems and Transport Interfaces.....................7 78 1.1.3 Parallel Processing Support.........................14 79 1.1.4 NAL Unit Header.....................................16 80 1.2 Overview of the Payload Format...........................18 81 2 Conventions...................................................18 82 3 Definitions and Abbreviations.................................19 83 3.1 Definitions..............................................19 84 3.1.1 Definitions from the HEVC Specification.............19 85 3.1.2 Definitions Specific to This Memo...................21 86 3.2 Abbreviations............................................22 87 4 RTP Payload Format............................................24 88 4.1 RTP Header Usage.........................................24 89 4.2 Payload Header Usage.....................................26 90 4.3 Payload Structures.......................................27 91 4.4 Transmission Modes.......................................27 92 4.5 Decoding Order Number....................................28 93 4.6 Single NAL Unit Packets..................................30 94 4.7 Aggregation Packets (APs)................................31 95 4.8 Fragmentation Units (FUs)................................36 96 4.9 PACI packets.............................................39 97 4.9.1 Reasons for the PACI rules (informative)............42 98 4.9.2 PACI extensions (Informative).......................43 99 4.10 Temporal Scalability Control Information................44 100 5 Packetization Rules...........................................46 101 6 De-packetization Process......................................47 102 7 Payload Format Parameters.....................................49 103 7.1 Media Type Registration..................................50 104 7.2 SDP Parameters...........................................75 105 7.2.1 Mapping of Payload Type Parameters to SDP...........75 106 7.2.2 Usage with SDP Offer/Answer Model...................77 107 7.2.3 Usage in Declarative Session Descriptions...........86 108 7.2.4 Parameter Sets Considerations.......................87 109 7.2.5 Dependency Signaling in Multi-Stream Mode...........87 110 8 Use with Feedback Messages....................................88 111 8.1 Picture Loss Indication (PLI)............................89 112 8.2 Slice Loss Indication....................................89 113 8.3 Use of HEVC with the RPSI Feedback Message...............90 114 8.4 Full Intra Request (FIR).................................91 115 9 Security Considerations.......................................92 116 10 Congestion Control...........................................93 117 11 IANA Consideration...........................................94 118 12 Acknowledgements.............................................94 119 13 References...................................................95 120 13.1 Normative References....................................95 121 13.2 Informative References..................................96 122 14 Authors' Addresses...........................................98 124 1 Introduction 126 1.1 Overview of the HEVC Codec 128 High Efficiency Video Coding [HEVC], formally known as ITU-T 129 Recommendation H.265 and ISO/IEC International Standard 23008-2 130 was ratified by ITU-T in April 2013 and reportedly provides 131 significant coding efficiency gains over H.264 [H.264]. 133 As both H.264 [H.264] and its RTP payload format [RFC6184] are 134 widely deployed and generally known in the relevant implementer 135 communities, frequently only the differences between those two 136 specifications are highlighted in non-normative, explanatory 137 parts of this memo. Basic familiarity with both specifications 138 is assumed for those parts. However, the normative parts of this 139 memo do not require study of H.264 or its RTP payload format. 141 H.264 and HEVC share a similar hybrid video codec design. 142 Conceptually, both technologies include a video coding layer 143 (VCL), which is often used to refer to the coding-tool features, 144 and a network abstraction layer (NAL), which is often used to 145 refer to the systems and transport interface aspects of the 146 codecs. 148 1.1.1 Coding-Tool Features 150 Similarly to earlier hybrid-video-coding-based standards, 151 including H.264, the following basic video coding design is 152 employed by HEVC. A prediction signal is first formed either by 153 intra or motion compensated prediction, and the residual (the 154 difference between the original and the prediction) is then 155 coded. The gains in coding efficiency are achieved by 156 redesigning and improving almost all parts of the codec over 157 earlier designs. In addition, HEVC includes several tools to 158 make the implementation on parallel architectures easier. Below 159 is a summary of HEVC coding-tool features. 161 Quad-tree block and transform structure 163 One of the major tools that contribute significantly to the 164 coding efficiency of HEVC is the usage of flexible coding blocks 165 and transforms, which are defined in a hierarchical quad-tree 166 manner. Unlike H.264, where the basic coding block is a 167 macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit 168 (CTU) of a maximum size of 64x64. Each CTU can be divided into 169 smaller units in a hierarchical quad-tree manner and can 170 represent smaller blocks down to size 4x4. Similarly, the 171 transforms used in HEVC can have different sizes, starting from 172 4x4 and going up to 32x32. Utilizing large blocks and transforms 173 contribute to the major gain of HEVC, especially at high 174 resolutions. 176 Entropy coding 178 HEVC uses a single entropy coding engine, which is based on 179 Context Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 180 uses two distinct entropy coding engines. CABAC in HEVC shares 181 many similarities with CABAC of H.264, but contains several 182 improvements. Those include improvements in coding efficiency 183 and lowered implementation complexity, especially for parallel 184 architectures. 186 In-loop filtering 188 H.264 includes an in-loop adaptive deblocking filter, where the 189 blocking artifacts around the transform edges in the 190 reconstructed picture are smoothed to improve the picture quality 191 and compression efficiency. In HEVC, a similar deblocking filter 192 is employed but with somewhat lower complexity. In addition, 193 pictures undergo a subsequent filtering operation called Sample 194 Adaptive Offset (SAO), which is a new design element in HEVC. 195 SAO basically adds a pixel-level offset in an adaptive manner and 196 usually acts as a de-ringing filter. It is observed that SAO 197 improves the picture quality, especially around sharp edges 198 contributing substantially to visual quality improvements of 199 HEVC. 201 Motion prediction and coding 203 There have been a number of improvements in this area that are 204 summarized as follows. The first category is motion merge and 205 advanced motion vector prediction (AMVP) modes. The motion 206 information of a prediction block can be inferred from the 207 spatially or temporally neighboring blocks. This is similar to 208 the DIRECT mode in H.264 but includes new aspects to incorporate 209 the flexible quad-tree structure and methods to improve the 210 parallel implementations. In addition, the motion vector 211 predictor can be signaled for improved efficiency. The second 212 category is high-precision interpolation. The interpolation 213 filter length is increased to 8-tap from 6-tap, which improves 214 the coding efficiency but also comes with increased complexity. 215 In addition, the interpolation filter is defined with higher 216 precision without any intermediate rounding operations to further 217 improve the coding efficiency. 219 Intra prediction and intra coding 221 Compared to 8 intra prediction modes in H.264, HEVC supports 222 angular intra prediction with 33 directions. This increased 223 flexibility improves both objective coding efficiency and visual 224 quality as the edges can be better predicted and ringing 225 artifacts around the edges can be reduced. In addition, the 226 reference samples are adaptively smoothed based on the prediction 227 direction. To avoid contouring artifacts a new interpolative 228 prediction generation is included to improve the visual quality. 229 Furthermore, discrete sine transform (DST) is utilized instead of 230 traditional discrete cosine transform (DCT) for 4x4 intra 231 transform blocks. 233 Other coding-tool features 235 HEVC includes some tools for lossless coding and efficient screen 236 content coding, such as skipping the transform for certain 237 blocks. These tools are particularly useful for example when 238 streaming the user-interface of a mobile device to a large 239 display. 241 1.1.2 Systems and Transport Interfaces 243 HEVC inherited the basic systems and transport interfaces 244 designs, such as the NAL-unit-based syntax structure, the 245 hierarchical syntax and data unit structure from sequence-level 246 parameter sets, multi-picture-level or picture-level parameter 247 sets, slice-level header parameters, lower-level parameters, the 248 supplemental enhancement information (SEI) message mechanism, the 249 hypothetical reference decoder (HRD) based video buffering model, 250 and so on. In the following, a list of differences in these 251 aspects compared to H.264 is summarized. 253 Video parameter set 255 A new type of parameter set, called video parameter set (VPS), 256 was introduced. For the first (2013) version of [HEVC], the 257 video parameter set NAL unit is required to be available prior to 258 its activation, while the information contained in the video 259 parameter set is not necessary for operation of the decoding 260 process. For future HEVC extensions, such as the 3D or scalable 261 extensions, the video parameter set is expected to include 262 information necessary for operation of the decoding process, e.g. 263 decoding dependency or information for reference picture set 264 construction of enhancement layers. The VPS provides a "big 265 picture" of a bitstream, including what types of operation points 266 are provided, the profile, tier, and level of the operation 267 points, and some other high-level properties of the bitstream 268 that can be used as the basis for session negotiation and content 269 selection, etc. (see section 7.1). 271 Profile, tier and level 273 The profile, tier and level syntax structure that can be included 274 in both VPS and sequence parameter set (SPS) includes 12 bytes of 275 data to describe the entire bitstream (including all temporally 276 scalable layers, which are referred to as sub-layers in the HEVC 277 specification), and can optionally include more profile, tier and 278 level information pertaining to individual temporally scalable 279 layers. The profile indicator indicates the "best viewed as" 280 profile when the bitstream conforms to multiple profiles, similar 281 to the major brand concept in the ISO base media file format 282 (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, 283 such as the 3GPP file format [3GPPFF]. The profile, tier and 284 level syntax structure also includes the indications of whether 285 the bitstream is free of frame-packed content, whether the 286 bitstream is free of interlaced source content and free of field 287 pictures, i.e. contains only frame pictures of progressive 288 source, such that clients/players with no support of post- 289 processing functionalities for handling of frame-packed or 290 interlaced source content or field pictures can reject those 291 bitstreams. 293 Bitstream and elementary stream 295 HEVC includes a definition of an elementary stream, which is new 296 compared to H.264. An elementary stream consists of a sequence 297 of one or more bitstreams. An elementary stream that consists of 298 two or more bitstreams has typically been formed by splicing 299 together two or more bitstreams (or parts thereof). When an 300 elementary stream contains more than one bitstream, the last NAL 301 unit of the last access unit of a bitstream (except the last 302 bitstream in the elementary stream) must contain an end of 303 bitstream NAL unit and the first access unit of the subsequent 304 bitstream must be an intra random access point (IRAP) access 305 unit. This IRAP access unit may be a clean random access (CRA), 306 broken link access (BLA), or instantaneous decoding refresh (IDR) 307 access unit. 309 Random access support 311 HEVC includes signaling in NAL unit header, through NAL unit 312 types, of IRAP pictures beyond IDR pictures. Three types of IRAP 313 pictures, namely IDR, CRA and BLA pictures are supported, wherein 314 IDR pictures are conventionally referred to as closed group-of- 315 pictures (closed-GOP) random access points, and CRA and BLA 316 pictures are those conventionally referred to as open-GOP random 317 access points. BLA pictures usually originate from splicing of 318 two bitstreams or part thereof at a CRA picture, e.g. during 319 stream switching. To enable better systems usage of IRAP 320 pictures, altogether six different NAL units are defined to 321 signal the properties of the IRAP pictures, which can be used to 322 better match the stream access point (SAP) types as defined in 323 the ISOBMFF [ISOBMFF], which are utilized for random access 324 support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. 325 Pictures following an IRAP picture in decoding order and 326 preceding the IRAP picture in output order are referred to as 327 leading pictures associated with the IRAP picture. There are two 328 types of leading pictures, namely random access decodable leading 329 (RADL) pictures and random access skipped leading (RASL) 330 pictures. RADL pictures are decodable when the decoding started 331 at the associated IRAP picture, and RASL pictures are not 332 decodable when the decoding started at the associated IRAP 333 picture and are usually discarded. HEVC provides mechanisms to 334 enable the specification of conformance of bitstreams with RASL 335 pictures being discarded, thus to provide a standard-compliant 336 way to enable systems components to discard RASL pictures when 337 needed. 339 Temporal scalability support 341 HEVC includes an improved support of temporal scalability, by 342 inclusion of the signaling of TemporalId in the NAL unit header, 343 the restriction that pictures of a particular temporal sub-layer 344 cannot be used for inter prediction reference by pictures of a 345 lower temporal sub-layer, the sub-bitstream extraction process, 346 and the requirement that each sub-bitstream extraction output be 347 a conforming bitstream. Media-aware network elements (MANEs) can 348 utilize the TemporalId in the NAL unit header for stream 349 adaptation purposes based on temporal scalability. 351 Temporal sub-layer switching support 353 HEVC specifies, through NAL unit types present in the NAL unit 354 header, the signaling of temporal sub-layer access (TSA) and 355 stepwise temporal sub-layer access (STSA). A TSA picture and 356 pictures following the TSA picture in decoding order do not use 357 pictures prior to the TSA picture in decoding order with 358 TemporalId greater than or equal to that of the TSA picture for 359 inter prediction reference. A TSA picture enables up-switching, 360 at the TSA picture, to the sub-layer containing the TSA picture 361 or any higher sub-layer, from the immediately lower sub-layer. 362 An STSA picture does not use pictures with the same TemporalId as 363 the STSA picture for inter prediction reference. Pictures 364 following an STSA picture in decoding order with the same 365 TemporalId as the STSA picture do not use pictures prior to the 366 STSA picture in decoding order with the same TemporalId as the 367 STSA picture for inter prediction reference. An STSA picture 368 enables up-switching, at the STSA picture, to the sub-layer 369 containing the STSA picture, from the immediately lower sub- 370 layer. 372 Sub-layer reference or non-reference pictures 374 The concept and signaling of reference/non-reference pictures in 375 HEVC are different from H.264. In H.264, if a picture may be 376 used by any other picture for inter prediction reference, it is a 377 reference picture; otherwise it is a non-reference picture, and 378 this is signaled by two bits in the NAL unit header. In HEVC, a 379 picture is called a reference picture only when it is marked as 380 "used for reference". In addition, the concept of sub-layer 381 reference picture was introduced. If a picture may be used by 382 another other picture with the same TemporalId for inter 383 prediction reference, it is a sub-layer reference picture; 384 otherwise it is a sub-layer non-reference picture. Whether a 385 picture is a sub-layer reference picture or sub-layer non- 386 reference picture is signaled through NAL unit type values. 388 Extensibility 390 Besides the TemporalId in the NAL unit header, HEVC also includes 391 the signaling of a six-bit layer ID in the NAL unit header, which 392 must be equal to 0 for a single-layer bitstream. Extension 393 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, 394 slice headers, and so on. All these extension mechanisms enable 395 future extensions in a backward compatible manner, such that 396 bitstreams encoded according to potential future HEVC extensions 397 can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) 398 and the then-legacy decoders can decode and output the base layer 399 bitstream. 401 Bitstream extraction 403 HEVC includes a bitstream extraction process as an integral part 404 of the overall decoding process, as well as specification of the 405 use of the bitstream extraction process in description of 406 bitstream conformance tests as part of the hypothetical reference 407 decoder (HRD) specification. 409 Reference picture management 411 The reference picture management of HEVC, including reference 412 picture marking and removal from the decoded picture buffer (DPB) 413 as well as reference picture list construction (RPLC), differs 414 from that of H.264. Instead of the sliding window plus adaptive 415 memory management control operation (MMCO) based reference 416 picture marking mechanism in H.264, HEVC specifies a reference 417 picture set (RPS) based reference picture management and marking 418 mechanism, and the RPLC is consequently based on the RPS 419 mechanism. A reference picture set consists of a set of 420 reference pictures associated with a picture, consisting of all 421 reference pictures that are prior to the associated picture in 422 decoding order, that may be used for inter prediction of the 423 associated picture or any picture following the associated 424 picture in decoding order. The reference picture set consists of 425 five lists of reference pictures; RefPicSetStCurrBefore, 426 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 427 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 428 RefPicSetLtCurr contain all reference pictures that may be used 429 in inter prediction of the current picture and that may be used 430 in inter prediction of one or more of the pictures following the 431 current picture in decoding order. RefPicSetStFoll and 432 RefPicSetLtFoll consist of all reference pictures that are not 433 used in inter prediction of the current picture but may be used 434 in inter prediction of one or more of the pictures following the 435 current picture in decoding order. RPS provides an "intra-coded" 436 signaling of the DPB status, instead of an "inter-coded" 437 signaling, mainly for improved error resilience. The RPLC 438 process in HEVC is based on the RPS, by signaling an index to an 439 RPS subset for each reference index; this process is simpler than 440 the RPLC process in H.264. 442 Ultra low delay support 444 HEVC specifies a sub-picture-level HRD operation, for support of 445 the so-called ultra-low delay. The mechanism specifies a 446 standard-compliant way to enable delay reduction below one 447 picture interval. Sub-picture-level coded picture buffer (CPB) 448 and DPB parameters may be signaled, and utilization of these 449 information for the derivation of CPB timing (wherein the CPB 450 removal time corresponds to decoding time) and DPB output timing 451 (display time) is specified. Decoders are allowed to operate the 452 HRD at the conventional access-unit-level, even when the sub- 453 picture-level HRD parameters are present. 455 New SEI messages 457 HEVC inherits many H.264 SEI messages with changes in syntax 458 and/or semantics making them applicable to HEVC. Additionally, 459 there are a few new SEI messages reviewed briefly in the 460 following paragraphs. 462 The display orientation SEI message informs the decoder of a 463 transformation that is recommended to be applied to the cropped 464 decoded picture prior to display, such that the pictures can be 465 properly displayed, e.g. in an upside-up manner. 467 The structure of pictures SEI message provides information on the 468 NAL unit types, picture order count values, and prediction 469 dependencies of a sequence of pictures. The SEI message can be 470 used for example for concluding what impact a lost picture has on 471 other pictures. 473 The decoded picture hash SEI message provides a checksum derived 474 from the sample values of a decoded picture. It can be used for 475 detecting whether a picture was correctly received and decoded. 477 The active parameter sets SEI message includes the IDs of the 478 active video parameter set and the active sequence parameter set 479 and can be used to activate VPSs and SPSs. In addition, the SEI 480 message includes the following indications: 1) An indication of 481 whether "full random accessibility" is supported (when supported, 482 all parameter sets needed for decoding of the remaining of the 483 bitstream when random accessing from the beginning of the current 484 coded video sequence by completely discarding all access units 485 earlier in decoding order are present in the remaining bitstream 486 and all coded pictures in the remaining bitstream can be 487 correctly decoded); 2) An indication of whether there is no 488 parameter set within the current coded video sequence that 489 updates another parameter set of the same type preceding in 490 decoding order. An update of a parameter set refers to the use 491 of the same parameter set ID but with some other parameters 492 changed. If this property is true for all coded video sequences 493 in the bitstream, then all parameter sets can be sent out-of-band 494 before session start. 496 The decoding unit information SEI message provides coded picture 497 buffer removal delay information for a decoding unit. The 498 message can be used in very-low-delay buffering operations. 500 The region refresh information SEI message can be used together 501 with the recovery point SEI message (present in both H.264 and 502 HEVC) for improved support of gradual decoding refresh (GDR). 503 This supports random access from inter-coded pictures, wherein 504 complete pictures can be correctly decoded or recovered after an 505 indicated number of pictures in output/display order. 507 1.1.3 Parallel Processing Support 509 The reportedly significantly higher encoding computational demand 510 of HEVC over H.264, in conjunction with the ever increasing video 511 resolution (both spatially and temporally) required by the 512 market, led to the adoption of VCL coding tools specifically 513 targeted to allow for parallelization on the sub-picture level. 514 That is, parallelization occurs, at the minimum, at the 515 granularity of an integer number of CTUs. The targets for this 516 type of high-level parallelization are multicore CPUs and DSPs as 517 well as multiprocessor systems. In a system design, to be 518 useful, these tools require signaling support, which is provided 519 in Section 7 of this memo. This section provides a brief 520 overview of the tools available in [HEVC]. 522 Many of the tools incorporated in HEVC were designed keeping in 523 mind the potential parallel implementations in multi-core/multi- 524 processor architectures. Specifically, for parallelization, four 525 picture partition strategies are available. 527 Slices are segments of the bitstream that can be reconstructed 528 independently from other slices within the same picture (though 529 there may still be interdependencies through loop filtering 530 operations). Slices are the only tool that can be used for 531 parallelization that is also available, in virtually identical 532 form, in H.264. Slices based parallelization does not require 533 much inter-processor or inter-core communication (except for 534 inter-processor or inter-core data sharing for motion 535 compensation when decoding a predictively coded picture, which is 536 typically much heavier than inter-processor or inter-core data 537 sharing due to in-picture prediction), as slices are designed to 538 be independently decodable. However, for the same reason, slices 539 can require some coding overhead. Further, slices (in contrast 540 to some of the other tools mentioned below) also serve as the key 541 mechanism for bitstream partitioning to match Maximum Transfer 542 Unit (MTU) size requirements, due to the in-picture independence 543 of slices and the fact that each regular slice is encapsulated in 544 its own NAL unit. In many cases, the goal of parallelization and 545 the goal of MTU size matching can place contradicting demands to 546 the slice layout in a picture. The realization of this situation 547 led to the development of the more advanced tools mentioned 548 below. 550 Dependent slice segments allow for fragmentation of a coded slice 551 into fragments at CTU boundaries without breaking any in-picture 552 prediction mechanism. They are complementary to the 553 fragmentation mechanism described in this memo in that they need 554 the cooperation of the encoder. As a dependent slice segment 555 necessarily contains an integer number of CTUs, a decoder using 556 multiple cores operating on CTUs can process a dependent slice 557 segment without communicating parts of the slice segment's 558 bitstream to other cores. Fragmentation, as specified in this 559 memo, in contrast, does not guarantee that a fragment contains an 560 integer number of CTUs. 562 In wavefront parallel processing (WPP), the picture is 563 partitioned into rows of CTUs. Entropy decoding and prediction 564 are allowed to use data from CTUs in other partitions. Parallel 565 processing is possible through parallel decoding of CTU rows, 566 where the start of the decoding of a row is delayed by two CTUs, 567 so to ensure that data related to a CTU above and to the right of 568 the subject CTU is available before the subject CTU is being 569 decoded. Using this staggered start (which appears like a 570 wavefront when represented graphically), parallelization is 571 possible with up to as many processors/cores as the picture 572 contains CTU rows. 574 Because in-picture prediction between neighboring CTU rows within 575 a picture is allowed, the required inter-processor/inter-core 576 communication to enable in-picture prediction can be substantial. 577 The WPP partitioning does not result in the creation of more NAL 578 units compared to when it is not applied, thus WPP cannot be used 579 for MTU size matching, though slices can be used in combination 580 for that purpose. 582 Tiles define horizontal and vertical boundaries that partition a 583 picture into tile columns and rows. The scan order of CTUs is 584 changed to be local within a tile (in the order of a CTU raster 585 scan of a tile), before decoding the top-left CTU of the next 586 tile in the order of tile raster scan of a picture. Similar to 587 slices, tiles break in-picture prediction dependencies (including 588 entropy decoding dependencies). However, they do not need to be 589 included into individual NAL units (same as WPP in this regard), 590 hence tiles cannot be used for MTU size matching, though slices 591 can be used in combination for that purpose. Each tile can be 592 processed by one processor/core, and the inter-processor/inter- 593 core communication required for in-picture prediction between 594 processing units decoding neighboring tiles is limited to 595 conveying the shared slice header in cases a slice is spanning 596 more than one tile, and loop filtering related sharing of 597 reconstructed samples and metadata. Insofar, tiles are less 598 demanding in terms of inter-processor communication bandwidth 599 compared to WPP due to the in-picture independence between two 600 neighboring partitions. 602 1.1.4 NAL Unit Header 604 HEVC maintains the NAL unit concept of H.264 with modifications. 605 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 606 payload of a NAL unit refers to the NAL unit excluding the NAL 607 unit header. 609 +---------------+---------------+ 610 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 612 |F| Type | LayerId | TID | 613 +-------------+-----------------+ 615 Figure 1 The structure of HEVC NAL unit header 617 The semantics of the fields in the NAL unit header are as 618 specified in [HEVC] and described briefly below for convenience. 619 In addition to the name and size of each field, the corresponding 620 syntax element name in [HEVC] is also provided. 622 F: 1 bit 623 forbidden_zero_bit. Required to be zero in [HEVC]. HEVC 624 declares a value of 1 as a syntax violation. Note that the 625 inclusion of this bit in the NAL unit header is to enable 626 transport of HEVC video over MPEG-2 transport systems 627 (avoidance of start code emulations) [MPEG2S]. 629 Type: 6 bits 630 nal_unit_type. This field specifies the NAL unit type as 631 defined in Table 7-1 of [HEVC]. If the most significant bit 632 of this field of a NAL unit is equal to 0 (i.e. the value of 633 this field is less than 32), the NAL unit is a VCL NAL unit. 634 Otherwise, the NAL unit is a non-VCL NAL unit. For a 635 reference of all currently defined NAL unit types and their 636 semantics, please refer to Section 7.4.1 in [HEVC]. 638 LayerId: 6 bits 639 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 640 anticipated that in future scalable or 3D video coding 641 extensions of this specification, this syntax element will be 642 used to identify additional layers that may be present in the 643 coded video sequence, wherein a layer may be, e.g. a spatial 644 scalable layer, a quality scalable layer, a texture view, or a 645 depth view. 647 TID: 3 bits 648 nuh_temporal_id_plus1. This field specifies the temporal 649 identifier of the NAL unit plus 1. The value of TemporalId is 650 equal to TID minus 1. A TID value of 0 is illegal to ensure 651 that there is at least one bit in the NAL unit header equal to 652 1, so to enable independent considerations of start code 653 emulations in the NAL unit header and in the NAL unit payload 654 data. 656 1.2 Overview of the Payload Format 658 This payload format defines the following processes required for 659 transport of HEVC coded data over RTP [RFC3550]: 661 o Usage of RTP header with this payload format 663 o Packetization of HEVC coded NAL units into RTP packets using 664 three types of payload structures, namely single NAL unit 665 packet, aggregation packet, and fragment unit 667 o Transmission of HEVC NAL units of the same bitstream within a 668 single RTP stream or multiple RTP streams within one or more 669 RTP sessions, where within an RTP stream transmission of NAL 670 units may be either non-interleaved (i.e. the transmission 671 order of NAL units is the same as their decoding order) or 672 interleaved (i.e. the transmission order of NAL units is 673 different from their decoding order) 675 o Media type parameters to be used with the Session Description 676 Protocol (SDP) [RFC4566] 678 o A payload header extension mechanism and data structures for 679 enhanced support of temporal scalability based on that 680 extension mechanism. 682 2 Conventions 684 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 685 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 686 "OPTIONAL" in this document are to be interpreted as described in 687 BCP 14, RFC 2119 [RFC2119]. 689 In this document, these key words will appear with that 690 interpretation only when in ALL CAPS. Lower case uses of these 691 words are not to be interpreted as carrying the RFC 2119 692 significance. 694 This specification uses the notion of setting and clearing a bit 695 when bit fields are handled. Setting a bit is the same as 696 assigning that bit the value of 1 (On). Clearing a bit is the 697 same as assigning that bit the value of 0 (Off). 699 3 Definitions and Abbreviations 701 3.1 Definitions 703 This document uses the terms and definitions of [HEVC]. Section 704 3.1.1 lists relevant definitions copied from [HEVC] for 705 convenience. Section 3.1.2 provides definitions specific to this 706 memo. 708 3.1.1 Definitions from the HEVC Specification 710 access unit: A set of NAL units that are associated with each 711 other according to a specified classification rule, are 712 consecutive in decoding order, and contain exactly one coded 713 picture. 715 BLA access unit: An access unit in which the coded picture is a 716 BLA picture. 718 BLA picture: An IRAP picture for which each VCL NAL unit has 719 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 721 coded video sequence: A sequence of access units that consists, 722 in decoding order, of an IRAP access unit with NoRaslOutputFlag 723 equal to 1, followed by zero or more access units that are not 724 IRAP access units with NoRaslOutputFlag equal to 1, including all 725 subsequent access units up to but not including any subsequent 726 access unit that is an IRAP access unit with NoRaslOutputFlag 727 equal to 1. 729 Informative note: An IRAP access unit may be an IDR access 730 unit, a BLA access unit, or a CRA access unit. The value of 731 NoRaslOutputFlag is equal to 1 for each IDR access unit, each 732 BLA access unit, and each CRA access unit that is the first 733 access unit in the bitstream in decoding order, is the first 734 access unit that follows an end of sequence NAL unit in 735 decoding order, or has HandleCraAsBlaFlag equal to 1. 737 CRA access unit: An access unit in which the coded picture is a 738 CRA picture. 740 CRA picture: A RAP picture for which each VCL NAL unit has 741 nal_unit_type equal to CRA_NUT. 743 IDR access unit: An access unit in which the coded picture is an 744 IDR picture. 746 IDR picture: A RAP picture for which each VCL NAL unit has 747 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 749 IRAP access unit: An access unit in which the coded picture is an 750 IRAP picture. 752 IRAP picture: A coded picture for which each VCL NAL unit has 753 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 754 (23), inclusive. 756 layer: A set of VCL NAL units that all have a particular value of 757 nuh_layer_id and the associated non-VCL NAL units, or one of a 758 set of syntactical structures having a hierarchical relationship. 760 operation point: bitstream created from another bitstream by 761 operation of the sub-bitstream extraction process with the 762 another bitstream, a target highest TemporalId, and a target 763 layer identifier list as inputs. 765 random access: The act of starting the decoding process for a 766 bitstream at a point other than the beginning of the bitstream. 768 sub-layer: A temporal scalable layer of a temporal scalable 769 bitstream consisting of VCL NAL units with a particular value of 770 the TemporalId variable, and the associated non-VCL NAL units. 772 sub-layer representation: A subset of the bitstream consisting of 773 NAL units of a particular sub-layer and the lower sub-layers. 775 tile: A rectangular region of coding tree blocks within a 776 particular tile column and a particular tile row in a picture. 778 tile column: A rectangular region of coding tree blocks having a 779 height equal to the height of the picture and a width specified 780 by syntax elements in the picture parameter set. 782 tile row: A rectangular region of coding tree blocks having a 783 height specified by syntax elements in the picture parameter set 784 and a width equal to the width of the picture. 786 3.1.2 Definitions Specific to This Memo 788 dependee RTP stream: An RTP stream on which another RTP stream 789 depends. All RTP streams in an MSM except for the highest RTP 790 stream are dependee RTP streams. 792 highest RTP stream: The RTP stream on which no other RTP stream 793 depends. The RTP stream in an SSM is the highest RTP stream. 795 media aware network element (MANE): A network element, such as a 796 middlebox, selective forwarding unit, or application layer 797 gateway that is capable of parsing certain aspects of the RTP 798 payload headers or the RTP payload and reacting to their 799 contents. 801 Informative note: The concept of a MANE goes beyond normal 802 routers or gateways in that a MANE has to be aware of the 803 signaling (e.g. to learn about the payload type mappings of 804 the media streams), and in that it has to be trusted when 805 working with SRTP. The advantage of using MANEs is that they 806 allow packets to be dropped according to the needs of the 807 media coding. For example, if a MANE has to drop packets due 808 to congestion on a certain link, it can identify and remove 809 those packets whose elimination produces the least adverse 810 effect on the user experience. After dropping packets, MANEs 811 must rewrite RTCP packets to match the changes to the RTP 812 stream as specified in Section 7 of [RFC3550]. 814 multi-stream mode(MSM): Transmission of an HEVC bitstream using 815 more than one RTP stream. 817 NAL unit decoding order: A NAL unit order that conforms to the 818 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 820 NAL-unit-like structure: A data structure that is similar to NAL 821 units in the sense that it also has a NAL unit header and a 822 payload, with a difference that the payload does not follow the 823 start code emulation prevention mechanism required for the NAL 824 unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples 825 NAL-unit-like structures defined in this memo are packet payloads 826 of AP, PACI, and FU packets. 828 NALU-time: The value that the RTP timestamp would have if the NAL 829 unit would be transported in its own RTP packet. 831 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within 832 the scope of this memo, one RTP stream is utilized to transport 833 one or more temporal sub-layers. 835 single-stream mode (SSM): Transmission of an HEVC bitstream using 836 only one RTP stream. 838 transmission order: The order of packets in ascending RTP 839 sequence number order (in modulo arithmetic). Within an 840 aggregation packet, the NAL unit transmission order is the same 841 as the order of appearance of NAL units in the packet. 843 3.2 Abbreviations 845 AP Aggregation Packet 847 BLA Broken Link Access 849 CRA Clean Random Access 850 CTB Coding Tree Block 852 CTU Coding Tree Unit 854 CVS Coded Video Sequence 856 DPH Decoded Picture Hash 858 FU Fragmentation Unit 860 GDR Gradual Decoding Refresh 862 HRD Hypothetical Reference Decoder 864 IDR Instantaneous Decoding Refresh 866 IRAP Intra Random Access Point 868 MANE Media Aware Network Element 870 MSM Multi-Stream Mode 872 MTU Maximum Transfer Unit 874 NAL Network Abstraction Layer 876 NALU Network Abstraction Layer Unit 878 PACI PAyload Content Information 880 PHES Payload Header Extension Structure 882 PPS Picture Parameter Set 884 RADL Random Access Decodable Leading (Picture) 886 RASL Random Access Skipped Leading (Picture) 888 RPS Reference Picture Set 890 SEI Supplemental Enhancement Information 892 SPS Sequence Parameter Set 893 SSM Single-Stream Mode 895 STSA Step-wise Temporal Sub-layer Access 897 TSA Temporal Sub-layer Access 899 TCSI Temporal Scalability Control Information 901 VCL Video Coding Layer 903 VPS Video Parameter Set 905 4 RTP Payload Format 907 4.1 RTP Header Usage 909 The format of the RTP header is specified in [RFC3550] and 910 reprinted in Figure 2 for convenience. This payload format uses 911 the fields of the header in a manner consistent with that 912 specification. 914 The RTP payload (and the settings for some RTP header bits) for 915 aggregation packets and fragmentation units are specified in 916 Sections 4.7 and 4.8, respectively. 918 0 1 2 3 919 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 921 |V=2|P|X| CC |M| PT | sequence number | 922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 923 | timestamp | 924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 925 | synchronization source (SSRC) identifier | 926 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 927 | contributing source (CSRC) identifiers | 928 | .... | 929 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 931 Figure 2 RTP header according to [RFC3550] 933 The RTP header information to be set according to this RTP 934 payload format is set as follows: 936 Marker bit (M): 1 bit 938 Set for the last packet, carried in the current RTP stream, of 939 the access unit, in line with the normal use of the M bit in 940 video formats, to allow an efficient playout buffer handling. 941 When MSM is in use, if an access unit appears in multiple RTP 942 streams, the marker bit is set on each RTP stream's last 943 packet of the access unit. 945 Informative note: The content of a NAL unit does not tell 946 whether or not the NAL unit is the last NAL unit, in 947 decoding order, of an access unit. An RTP sender 948 implementation may obtain this information from the video 949 encoder. If, however, the implementation cannot obtain 950 this information directly from the encoder, e.g. when the 951 bitstream was pre-encoded, and also there is no timestamp 952 allocated for each NAL unit, then the sender implementation 953 can inspect subsequent NAL units in decoding order to 954 determine whether or not the NAL unit is the last NAL unit 955 of an access unit as follows. A NAL unit naluX is the last 956 NAL unit of an access unit if it is the last NAL unit of 957 the bitstream or the next VCL NAL unit naluY in decoding 958 order has the high-order bit of the first byte after its 959 NAL unit header equal to 1, and all NAL units between naluX 960 and naluY, when present, have nal_unit_type in the range of 961 32 to 35, inclusive, equal to 39, or in the ranges of 41 to 962 44, inclusive, or 48 to 55, inclusive. 964 Payload type (PT): 7 bits 966 The assignment of an RTP payload type for this new packet 967 format is outside the scope of this document and will not be 968 specified here. The assignment of a payload type has to be 969 performed either through the profile used or in a dynamic way. 971 Informative note: It is not required to use different 972 payload type values for different RTP streams in MSM. 974 Sequence number (SN): 16 bits 976 Set and used in accordance with RFC 3550 [RFC3550]. 978 Timestamp: 32 bits 980 The RTP timestamp is set to the sampling timestamp of the 981 content. A 90 kHz clock rate MUST be used. 983 If the NAL unit has no timing properties of its own (e.g. 984 parameter set and SEI NAL units), the RTP timestamp MUST be 985 set to the RTP timestamp of the coded picture of the access 986 unit in which the NAL unit (according to Section 7.4.2.4.4 of 987 [HEVC]) is included. 989 Receivers MUST use the RTP timestamp for the display process, 990 even when the bitstream contains picture timing SEI messages 991 or decoding unit information SEI messages as specified in 992 [HEVC]. However, this does not mean that picture timing SEI 993 messages in the bitstream should be discarded, as picture 994 timing SEI messages may contain frame-field information that 995 is important in appropriately rendering interlaced video. 997 Synchronization source (SSRC): 32-bits 999 Used to identify the source of the RTP packets. In SSM, by 1000 definition a single SSRC is used for all parts of a single 1001 bitstream. In MSM, each SSRC is used for an RTP stream 1002 containing a subset of the sub-layers for a single (temporally 1003 scalable) bitstream. A receiver is required to correctly 1004 associate the set of SSRCs that are included parts of the same 1005 bitstream. 1007 Informative note: The term "bitstream" in this document is 1008 equivalent to the term "encoded stream" in [I-D.ietf- 1009 avtext-rtp-grouping-taxonomy]. 1011 4.2 Payload Header Usage 1013 The TID value indicates (among other things) the relative 1014 importance of an RTP packet, for example because NAL units 1015 belonging to higher temporal sub-layers are not used for the 1016 decoding of lower temporal sub-layers. A lower value of TID 1017 indicates a higher importance. More important NAL units MAY be 1018 better protected against transmission losses than less important 1019 NAL units. 1021 4.3 Payload Structures 1023 The first two bytes of the payload of an RTP packet are referred 1024 to as the payload header. The payload header consists of the 1025 same fields (F, Type, LayerId, and TID) as the NAL unit header as 1026 shown in section 1.1.4, irrespective of the type of the payload 1027 structure. 1029 Four different types of RTP packet payload structures are 1030 specified. A receiver can identify the type of an RTP packet 1031 payload through the Type field in the payload header. 1033 The four different payload structures are as follows: 1035 o Single NAL unit packet: Contains a single NAL unit in the 1036 payload, and the NAL unit header of the NAL unit also serves 1037 as the payload header. This payload structure is specified in 1038 section 4.6. 1040 o Aggregation packet (AP): Contains more than one NAL unit 1041 within one access unit. This payload structure is specified 1042 in section 4.7. 1044 o Fragmentation unit (FU): Contains a subset of a single NAL 1045 unit. This payload structure is specified in section 4.8. 1047 o PACI carrying RTP packet: Contains a payload header (that 1048 differs from other payload headers for efficiency), a Payload 1049 Header Extension Structure (PHES), and a PACI payload. This 1050 payload structure is specified in section 4.9. 1052 4.4 Transmission Modes 1054 This memo enables transmission of an HEVC bitstream over a single 1055 RTP stream or multiple RTP streams. The concept and working 1056 principle is inherited from the design of what was called single 1057 and multiple session transmission in [RFC6190] and follows a 1058 similar design. If only one RTP stream is used for transmission 1059 of the HEVC bitstream, the transmission mode is referred to as 1060 single-stream mode (SSM); otherwise (more than one RTP stream is 1061 used for transmission of the HEVC bitstream), the transmission 1062 mode is referred to as multi-stream mode (MSM). 1064 Dependency of one RTP stream on another RTP stream is typically 1065 indicated as specified in [RFC5583]. When an RTP stream A 1066 depends on another RTP stream B, the RTP stream B is referred to 1067 as a dependee RTP stream of the RTP stream A. 1069 Informative note: An MSM may involve one or more RTP sessions. 1070 Each RTP stream in an MSM may be in its own RTP session or a 1071 set of multiple RTP streams in an MSM may belong to the same 1072 RTP session, e.g. as indicated by the mechanism specified in 1073 the Internet-Draft [I-D.ietf-avtcore-rtp-multi-stream] or in 1074 [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1076 SSM SHOULD be used for point-to-point unicast scenarios, while 1077 MSM SHOULD be used for point-to-multipoint multicast scenarios 1078 where different receivers require different operation points of 1079 the same HEVC bitstream, to improve bandwidth utilizing 1080 efficiency. 1082 Informative note: A multicast may degrade to a unicast after 1083 all but one receivers have left (this is a justification of 1084 the first "SHOULD" instead of "MUST"), and there might be 1085 scenarios where MSM is desirable but not possible e.g. when IP 1086 multicast is not deployed in certain network (this is a 1087 justification of the second "SHOULD" instead of "MUST"). 1089 The transmission mode is indicated by the tx-mode media parameter 1090 (see section 7.1). If tx-mode is equal to "SSM", SSM MUST be 1091 used. Otherwise (tx-mode is equal to "MSM"), MSM MUST be used. 1093 Receivers MUST support both SSM and MSM. 1095 4.5 Decoding Order Number 1097 For each NAL unit, the variable AbsDon is derived, representing 1098 the decoding order number that is indicative of the NAL unit 1099 decoding order. 1101 Let NAL unit n be the n-th NAL unit in transmission order within 1102 an RTP stream. 1104 If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to 1105 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as 1106 equal to n. 1108 Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is 1109 greater than 0), AbsDon[n] is derived as follows, where DON[n] is 1110 the value of the variable DON for NAL unit n: 1112 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit 1113 in transmission order), AbsDon[0] is set equal to DON[0]. 1115 o Otherwise (n is greater than 0), the following applies for 1116 derivation of AbsDon[n]: 1118 If DON[n] == DON[n-1], 1119 AbsDon[n] = AbsDon[n-1] 1121 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1122 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1124 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1125 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1127 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1128 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1129 DON[n]) 1131 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1132 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1134 For any two NAL units m and n, the following applies: 1136 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1137 follows NAL unit m in NAL unit decoding order. 1139 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding 1140 order of the two NAL units can be in either order. 1142 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n 1143 precedes NAL unit m in decoding order. 1145 When two consecutive NAL units in the NAL unit decoding order 1146 have different values of AbsDon, the value of AbsDon for the 1147 second NAL unit in decoding order MUST be greater than the value 1148 of AbsDon for the first NAL unit, and the absolute difference 1149 between the two AbsDon values MAY be greater than or equal to 1. 1151 Informative note: There are multiple reasons to allow for the 1152 absolute difference of the values of AbsDon for two 1153 consecutive NAL units in the NAL unit decoding order to be 1154 greater than one. An increment by one is not required, as at 1155 the time of associating values of AbsDon to NAL units, it may 1156 not be known whether all NAL units are to be delivered to the 1157 receiver. For example, a gateway may not forward VCL NAL 1158 units of higher sub-layers or some SEI NAL units when there is 1159 congestion in the network. In another example, the first 1160 intra-coded picture of a pre-encoded clip is transmitted in 1161 advance to ensure that it is readily available in the 1162 receiver, and when transmitting the first intra-coded picture, 1163 the originator does not exactly know how many NAL units will 1164 be encoded before the first intra-coded picture of the pre- 1165 encoded clip follows in decoding order. Thus, the values of 1166 AbsDon for the NAL units of the first intra-coded picture of 1167 the pre-encoded clip have to be estimated when they are 1168 transmitted, and gaps in values of AbsDon may occur. Another 1169 example is MSM where the AbsDon values must indicate cross- 1170 layer decoding order for NAL units conveyed in all the RTP 1171 streams. 1173 4.6 Single NAL Unit Packets 1175 A single NAL unit packet contains exactly one NAL unit, and 1176 consists of a payload header (denoted as PayloadHdr), a 1177 conditional 16-bit DONL field (in network byte order), and the 1178 NAL unit payload data (the NAL unit excluding its NAL unit 1179 header) of the contained NAL unit, as shown in Figure 3. 1181 0 1 2 3 1182 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1183 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1184 | PayloadHdr | DONL (conditional) | 1185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1186 | | 1187 | NAL unit payload data | 1188 | | 1189 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1190 | :...OPTIONAL RTP padding | 1191 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1193 Figure 3 The structure a single NAL unit packet 1195 The payload header SHOULD be an exact copy of the NAL unit header 1196 of the contained NAL unit. However, the Type (i.e. 1197 nal_unit_type) field MAY be changed, e.g. when it is desirable to 1198 handle a CRA picture to be a BLA picture [JCTVC-J0107]. 1200 The DONL field, when present, specifies the value of the 16 least 1201 significant bits of the decoding order number of the contained 1202 NAL unit. If tx-mode is equal to "MSM" or sprop-max-don-diff is 1203 greater than 0, the DONL field MUST be present, and the variable 1204 DON for the contained NAL unit is derived as equal to the value 1205 of the DONL field. Otherwise (tx-mode is equal to "SSM" and 1206 sprop-max-don-diff is equal to 0), the DONL field MUST NOT be 1207 present. 1209 4.7 Aggregation Packets (APs) 1211 Aggregation packets (APs) are introduced to enable the reduction 1212 of packetization overhead for small NAL units, such as most of 1213 the non-VCL NAL units, which are often only a few octets in size. 1215 An AP aggregates NAL units within one access unit. Each NAL unit 1216 to be carried in an AP is encapsulated in an aggregation unit. 1217 NAL units aggregated in one AP are in NAL unit decoding order. 1219 An AP consists of a payload header (denoted as PayloadHdr) 1220 followed by two or more aggregation units, as shown in Figure 4. 1222 0 1 2 3 1223 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1225 | PayloadHdr (Type=48) | | 1226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1227 | | 1228 | two or more aggregation units | 1229 | | 1230 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 | :...OPTIONAL RTP padding | 1232 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1234 Figure 4 The structure of an aggregation packet 1236 The fields in the payload header are set as follows. The F bit 1237 MUST be equal to 0 if the F bit of each aggregated NAL unit is 1238 equal to zero; otherwise, it MUST be equal to 1. The Type field 1239 MUST be equal to 48. The value of LayerId MUST be equal to the 1240 lowest value of LayerId of all the aggregated NAL units. The 1241 value of TID MUST be the lowest value of TID of all the 1242 aggregated NAL units. 1244 Informative Note: All VCL NAL units in an AP have the same TID 1245 value since they belong to the same access unit. However, an 1246 AP may contain non-VCL NAL units for which the TID value in 1247 the NAL unit header may be different than the TID value of the 1248 VCL NAL units in the same AP. 1250 An AP MUST carry at least two aggregation units and can carry as 1251 many aggregation units as necessary; however, the total amount of 1252 data in an AP obviously MUST fit into an IP packet, and the size 1253 SHOULD be chosen so that the resulting IP packet is smaller than 1254 the MTU size so to avoid IP layer fragmentation. An AP MUST NOT 1255 contain Fragmentation Units (FUs) specified in section 4.8. APs 1256 MUST NOT be nested; i.e. an AP MUST NOT contain another AP. 1258 The first aggregation unit in an AP consists of a conditional 16- 1259 bit DONL field (in network byte order) followed by a 16-bit 1260 unsigned size information (in network byte order) that indicates 1261 the size of the NAL unit in bytes (excluding these two octets, 1262 but including the NAL unit header), followed by the NAL unit 1263 itself, including its NAL unit header, as shown in Figure 5. 1265 0 1 2 3 1266 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1268 : DONL (conditional) | NALU size | 1269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 | NALU size | | 1271 +-+-+-+-+-+-+-+-+ NAL unit | 1272 | | 1273 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1274 | : 1275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1277 Figure 5 The structure of the first aggregation unit in an AP 1279 The DONL field, when present, specifies the value of the 16 least 1280 significant bits of the decoding order number of the aggregated 1281 NAL unit. 1283 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1284 than 0, the DONL field MUST be present in an aggregation unit 1285 that is the first aggregation unit in an AP, and the variable DON 1286 for the aggregated NAL unit is derived as equal to the value of 1287 the DONL field. Otherwise (tx-mode is equal to "SSM" and sprop- 1288 max-don-diff is equal to 0), the DONL field MUST NOT be present 1289 in an aggregation unit that is the first aggregation unit in an 1290 AP. 1292 An aggregation unit that is not the first aggregation unit in an 1293 AP consists of a conditional 8-bit DOND field followed by a 16- 1294 bit unsigned size information (in network byte order) that 1295 indicates the size of the NAL unit in bytes (excluding these two 1296 octets, but including the NAL unit header), followed by the NAL 1297 unit itself, including its NAL unit header, as shown in Figure 6. 1299 0 1 2 3 1300 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1302 : DOND (cond) | NALU size | 1303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1304 | | 1305 | NAL unit | 1306 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | : 1308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1310 Figure 6 The structure of an aggregation unit that is not the 1311 first aggregation unit in an AP 1313 When present, the DOND field plus 1 specifies the difference 1314 between the decoding order number values of the current 1315 aggregated NAL unit and the preceding aggregated NAL unit in the 1316 same AP. 1318 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1319 than 0, the DOND field MUST be present in an aggregation unit 1320 that is not the first aggregation unit in an AP, and the variable 1321 DON for the aggregated NAL unit is derived as equal to the DON of 1322 the preceding aggregated NAL unit in the same AP plus the value 1323 of the DOND field plus 1 modulo 65536. Otherwise (tx-mode is 1324 equal to "SSM" and sprop-max-don-diff is equal to 0), the DOND 1325 field MUST NOT be present in an aggregation unit that is not the 1326 first aggregation unit in an AP, and in this case the 1327 transmission order and decoding order of NAL units carried in the 1328 AP are the same as the order the NAL units appear in the AP. 1330 Figure 7 presents an example of an AP that contains two 1331 aggregation units, labeled as 1 and 2 in the figure, without the 1332 DONL and DOND fields being present. 1334 0 1 2 3 1335 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1337 | RTP Header | 1338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1339 | PayloadHdr (Type=48) | NALU 1 Size | 1340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1341 | NALU 1 HDR | | 1342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1343 | . . . | 1344 | | 1345 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1346 | . . . | NALU 2 Size | NALU 2 HDR | 1347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1348 | NALU 2 HDR | | 1349 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1350 | . . . | 1351 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1352 | :...OPTIONAL RTP padding | 1353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1355 Figure 7 An example of an AP packet containing two aggregation 1356 units without the DONL and DOND fields 1358 Figure 8 presents an example of an AP that contains two 1359 aggregation units, labeled as 1 and 2 in the figure, with the 1360 DONL and DOND fields being present. 1362 0 1 2 3 1363 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1365 | RTP Header | 1366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1367 | PayloadHdr (Type=48) | NALU 1 DONL | 1368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1369 | NALU 1 Size | NALU 1 HDR | 1370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1371 | | 1372 | NALU 1 Data . . . | 1373 | | 1374 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1375 | | NALU 2 DOND | NALU 2 Size | 1376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1377 | NALU 2 HDR | | 1378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1379 | | 1380 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1381 | :...OPTIONAL RTP padding | 1382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1384 Figure 8 An example of an AP containing two aggregation units 1385 with the DONL and DOND fields 1387 4.8 Fragmentation Units (FUs) 1389 Fragmentation units (FUs) are introduced to enable fragmenting a 1390 single NAL unit into multiple RTP packets, possibly without 1391 cooperation or knowledge of the HEVC encoder. A fragment of a NAL 1392 unit consists of an integer number of consecutive octets of that 1393 NAL unit. Fragments of the same NAL unit MUST be sent in consecutive 1394 order with ascending RTP sequence numbers (with no other RTP packets 1395 within the same RTP stream being sent between the first and last 1396 fragment). 1398 When a NAL unit is fragmented and conveyed within FUs, it is 1399 referred to as a fragmented NAL unit. APs MUST NOT be 1400 fragmented. FUs MUST NOT be nested; i.e. an FU MUST NOT contain 1401 a subset of another FU. 1403 The RTP timestamp of an RTP packet carrying an FU is set to the 1404 NALU-time of the fragmented NAL unit. 1406 An FU consists of a payload header (denoted as PayloadHdr), an FU 1407 header of one octet, a conditional 16-bit DONL field (in network 1408 byte order), and an FU payload, as shown in Figure 9. 1410 0 1 2 3 1411 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1413 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1415 | DONL (cond) | | 1416 |-+-+-+-+-+-+-+-+ | 1417 | FU payload | 1418 | | 1419 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1420 | :...OPTIONAL RTP padding | 1421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1423 Figure 9 The structure of an FU 1425 The fields in the payload header are set as follows. The Type 1426 field MUST be equal to 49. The fields F, LayerId, and TID MUST 1427 be equal to the fields F, LayerId, and TID, respectively, of the 1428 fragmented NAL unit. 1430 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1431 field, as shown in Figure 10. 1433 +---------------+ 1434 |0|1|2|3|4|5|6|7| 1435 +-+-+-+-+-+-+-+-+ 1436 |S|E| FuType | 1437 +---------------+ 1439 Figure 10 The structure of FU header 1441 The semantics of the FU header fields are as follows: 1442 S: 1 bit 1443 When set to one, the S bit indicates the start of a fragmented 1444 NAL unit i.e. the first byte of the FU payload is also the 1445 first byte of the payload of the fragmented NAL unit. When 1446 the FU payload is not the start of the fragmented NAL unit 1447 payload, the S bit MUST be set to zero. 1449 E: 1 bit 1450 When set to one, the E bit indicates the end of a fragmented 1451 NAL unit, i.e. the last byte of the payload is also the last 1452 byte of the fragmented NAL unit. When the FU payload is not 1453 the last fragment of a fragmented NAL unit, the E bit MUST be 1454 set to zero. 1456 FuType: 6 bits 1457 The field FuType MUST be equal to the field Type of the 1458 fragmented NAL unit. 1460 The DONL field, when present, specifies the value of the 16 least 1461 significant bits of the decoding order number of the fragmented 1462 NAL unit. 1464 If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1465 than 0, and the S bit is equal to 1, the DONL field MUST be 1466 present in the FU, and the variable DON for the fragmented NAL 1467 unit is derived as equal to the value of the DONL field. 1468 Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is 1469 equal to 0, or the S bit is equal to 0), the DONL field MUST NOT 1470 be present in the FU. 1472 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1473 the Start bit and End bit MUST NOT both be set to one in the same 1474 FU header. 1476 The FU payload consists of fragments of the payload of the 1477 fragmented NAL unit so that if the FU payloads of consecutive 1478 FUs, starting with an FU with the S bit equal to 1 and ending 1479 with an FU with the E bit equal to 1, are sequentially 1480 concatenated, the payload of the fragmented NAL unit can be 1481 reconstructed. The NAL unit header of the fragmented NAL unit is 1482 not included as such in the FU payload, but rather the 1483 information of the NAL unit header of the fragmented NAL unit is 1484 conveyed in F, LayerId, and TID fields of the FU payload headers 1485 of the FUs and the FuType field of the FU header of the FUs. An 1486 FU payload MUST NOT be empty. 1488 If an FU is lost, the receiver SHOULD discard all following 1489 fragmentation units in transmission order corresponding to the 1490 same fragmented NAL unit, unless the decoder in the receiver is 1491 known to be prepared to gracefully handle incomplete NAL units. 1493 A receiver in an endpoint or in a MANE MAY aggregate the first n- 1494 1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1495 fragment n of that NAL unit is not received. In this case, the 1496 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1497 a syntax violation. 1499 4.9 PACI packets 1501 This section specifies the PACI packet structure. The basic 1502 payload header specified in this memo is intentionally limited to 1503 the 16 bits of the NAL unit header so to keep the packetization 1504 overhead to a minimum. However, cases have been identified where 1505 it is advisable to include control information in an easily 1506 accessible position in the packet header, despite the additional 1507 overhead. One such control information is the Temporal 1508 Scalability Control Information as specified in section 4.10 1509 below. PACI packets carry this and future, similar structures. 1511 The PACI packet structure is based on a payload header extension 1512 mechanism that is generic and extensible to carry payload header 1513 extensions. In this section, the focus lies on the use within 1514 this specification. Section 4.9.2 below provides guidance for 1515 the specification designers in how to employ the extension 1516 mechanism in future specifications. 1518 A PACI packet consists of a payload header (denoted as 1519 PayloadHdr), for which the structure follows what is described in 1520 section 4.3 above. The payload header is followed by the fields 1521 A, cType, PHSsize, F[0..2] and Y. 1523 Figure 11 shows a PACI packet in compliance with this memo; that 1524 is, without any extensions. 1526 0 1 2 3 1527 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1528 1 1529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1530 +-+ 1531 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1533 +-+ 1534 | Payload Header Extension Structure (PHES) | 1536 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1537 | | 1538 | PACI payload: NAL unit | 1539 | . . . | 1540 | | 1541 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1542 +-+ 1543 | :...OPTIONAL RTP padding | 1544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1545 +- 1547 Figure 11 The structure of a PACI 1549 The fields in the payload header are set as follows. The F bit 1550 MUST be equal to 0. The Type field MUST be equal to 50. The 1551 value of LayerId MUST be a copy of the LayerId field of the PACI 1552 payload NAL unit or NAL-unit-like structure. The value of TID 1553 MUST be a copy of the TID field of the PACI payload NAL unit or 1554 NAL-unit-like structure. 1556 The semantics of other fields are as follows: 1558 A: 1 bit 1559 Copy of the F bit of the PACI payload NAL unit or NAL-unit- 1560 like structure. 1562 cType: 6 bits 1563 Copy of the Type field of the PACI payload NAL unit or NAL- 1564 unit-like structure. 1566 PHSsize: 5 bits 1567 Indicates the length of the PHES field. The value is limited 1568 to be less than or equal to 32 octets, to simplify encoder 1569 design for MTU size matching. 1571 F0 1572 This field equal to 1 specifies the presence of a temporal 1573 scalability support extension in the PHES. 1575 F1, F2 1576 MUST be 0, available for future extensions, see section 4.9.2. 1578 Y: 1 bit 1579 MUST be 0, available for future extensions, see section 4.9.2. 1581 PHES: variable number of octets 1582 A variable number of octets as indicated by the value of 1583 PHSsize. 1585 PACI Payload 1586 The single NAL unit packet or NAL-unit-like structure (such 1587 as: FU or AP) to be carried, not including the first two 1588 octets. 1590 Informative note: The first two octets of the NAL unit or 1591 NAL-unit-like structure carried in the PACI payload are not 1592 included in the PACI payload. Rather, the respective values 1593 are copied in locations of the PayloadHdr of the RTP 1594 packet. This design offers two advantages: first, the 1595 overall structure of the payload header is preserved, i.e. 1596 there is no special case of payload header structure that 1597 needs to be implemented for PACI. Second, no additional 1598 overhead is introduced. 1600 A PACI payload MAY be a single NAL unit, an FU, or an AP. 1601 PACIs MUST NOT be fragmented or aggregated. The following 1602 subsection documents the reasons for these design choices. 1604 4.9.1 Reasons for the PACI rules (informative) 1606 A PACI cannot be fragmented. If a PACI could be fragmented, and 1607 a fragment other than the first fragment would get lost, access 1608 to the information in the PACI would not be possible. Therefore, 1609 a PACI must not be fragmented. In other words, an FU must not 1610 carry (fragments of) a PACI. 1612 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1613 from a compression viewpoint, as, in many cases, several to be 1614 aggregated NAL units would share identical PACI fields and values 1615 which would be carried redundantly for no reason. Most, if not 1616 all the practical effects of PACI aggregation can be achieved by 1617 aggregating NAL units and bundling them with a PACI (see below). 1618 Therefore, a PACI must not be aggregated. In other words, an AP 1619 must not contain a PACI. 1621 The payload of a PACI can be a fragment. Both middleboxes and 1622 sending systems with inflexible (often hardware-based) encoders 1623 occasionally find themselves in situations where a PACI and its 1624 headers, combined, are larger than the MTU size. In such a 1625 scenario, the middlebox or sender can fragment the NAL unit and 1626 encapsulate the fragment in a PACI. Doing so preserves the 1627 payload header extension information for all fragments, allowing 1628 downstream middleboxes and the receiver to take advantage of that 1629 information. Therefore, a sender may place a fragment into a 1630 PACI, and a receiver must be able to handle such a PACI. 1632 The payload of a PACI can be an aggregation NAL unit. HEVC 1633 bitstreams can contain unevenly sized and/or small (when compared 1634 to the MTU size) NAL units. In order to efficiently packetize 1635 such small NAL units, AP were introduced. The benefits of APs 1636 are independent from the need for a payload header extension. 1637 Therefore, a sender may place an AP into a PACI, and a receiver 1638 must be able to handle such a PACI. 1640 4.9.2 PACI extensions (Informative) 1642 This subsection includes recommendations for future specification 1643 designers on how to extent the PACI syntax to accommodate future 1644 extensions. Obviously, designers are free to specify whatever 1645 appears to be appropriate to them at the time of their design. 1646 However, a lot of thought has been invested into the extension 1647 mechanism described below, and we suggest that deviations from it 1648 warrant a good explanation. 1650 This memo defines only a single payload header extension (Temporal 1651 Scalability Control Information, described below in section 4.10), 1652 and, therefore, only the F0 bit carries semantics. F1 and F2 are 1653 already named (and not just marked as reserved, as a typical video 1654 spec designer would do). They are intended to signal two additional 1655 extensions. The Y bit allows to, recursively, add further F and Y 1656 bits to extend the mechanism beyond 3 possible payload header 1657 extensions. It is suggested to define a new packet type (using a 1658 different value for Type) when assigning the F1, F2, or Y bits 1659 different semantics than what is suggested below. 1661 When a Y bit is set, an 8 bit flag-extension is inserted after 1662 the Y bit. A flag-extension consists of 7 flags F[n..n+6], and 1663 another Y bit. 1665 The basic PACI header already includes F0, F1, and F2. 1666 Therefore, the Fx bits in the first flag-extensions are numbered 1667 F3, F4, ..., F9, the F bits in the second flag-extension are 1668 numbered F10, F11, ..., F16, and so forth. As a result, at least 1669 3 Fx bits are always in the PACI, but the number of Fx bits (and 1670 associated types of extensions), can be increased by setting the 1671 next Y bit and adding an octet of flag-extensions, carrying 7 1672 flags and another Y bit. The size of this list of flags is 1673 subject to the limits specified in section 4.9 (32 octets for all 1674 flag-extensions and the PHES information combined). 1676 Each of the F bits can indicate either the presence of 1677 information in the Payload Header Extension Structure (PHES), 1678 described below, or a given F bit can indicate a certain 1679 condition, without including additional information in the PHES. 1681 When a spec developer devises a new syntax that takes advantage 1682 of the PACI extension mechanism, he/she must follow the 1683 constraints listed below; otherwise the extension mechanism may 1684 break. 1686 1) The fields added for a particular Fx bit MUST be fixed in 1687 length and not depend on what other Fx bits are set (no 1688 parsing dependency). 1689 2) The Fx bits must be assigned in order. 1690 3) An implementation that supports the n-th Fn bit for any 1691 value of n must understand the syntax (though not 1692 necessarily the semantics) of the fields Fk (with k < n), so 1693 to be able to either use those bits when present, or at 1694 least be able to skip over them. 1696 4.10 Temporal Scalability Control Information 1698 This section describes the single payload header extension 1699 defined in this specification, known as Temporal Scalability 1700 Control Information (TSCI). If, in the future, additional 1701 payload header extensions become necessary, they could be 1702 specified in this section of an updated version of this document, 1703 or in their own documents. 1705 When F0 is set to 1 in a PACI, this specifies that the PHES field 1706 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as 1707 follows: 1709 0 1 2 3 1710 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1711 1 1712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1713 +-+ 1714 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1716 +-+ 1717 | TL0PICIDX | IrapPicID |S|E| RES | | 1718 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1719 | .... | 1720 | PACI payload: NAL unit | 1721 | | 1722 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1723 +-+ 1724 | :...OPTIONAL RTP padding | 1725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1726 +-+ 1728 Figure 12 The structure of a PACI with a PHES containing a TSCI 1730 TL0PICIDX (8 bits) 1731 When present, the TL0PICIDX field MUST be set to equal to 1732 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1733 [H.265] for the access unit containing the NAL unit in the 1734 PACI. 1736 IrapPicID (8 bits) 1737 When present, the IrapPicID field MUST be set to equal to 1738 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1739 access unit containing the NAL unit in the PACI. 1741 S (1 bit) 1742 The S bit MUST be set to 1 if any of the following conditions 1743 is true and MUST be set to 0 otherwise: 1744 o The NAL unit in the payload of the PACI is the first VCL NAL 1745 unit, in decoding order, of a picture. 1747 o The NAL unit in the payload of the PACI is an AP and the NAL 1748 unit in the first contained aggregation unit is the first 1749 VCL NAL unit, in decoding order, of a picture. 1750 o The NAL unit in the payload of the PACI is an FU with its S 1751 bit equal to 1 and the FU payload containing a fragment of 1752 the first VCL NAL unit, in decoding order of a picture. 1754 E (1 bit) 1755 The E bit MUST be set to 1 if any of the following conditions 1756 is true and MUST be set to 0 otherwise: 1757 o The NAL unit in the payload of the PACI is the last VCL NAL 1758 unit, in decoding order, of a picture. 1759 o The NAL unit in the payload of the PACI is an AP and the NAL 1760 unit in the last contained aggregation unit is the last VCL 1761 NAL unit, in decoding order, of a picture. 1762 o The NAL unit in the payload of the PACI is an FU with its E 1763 bit equal to 1 and the FU payload containing a fragment of 1764 the last VCL NAL unit, in decoding order of a picture. 1766 RES (6 bits) 1767 MUST be equal to 0. Reserved for future extensions. 1769 The value of PHSsize MUST be set to 3. Receivers MUST allow 1770 other values of the fields F0, F1, F2, Y, and PHSsize, and MUST 1771 ignore any additional fields, when present, than specified above 1772 in the PHES. 1774 5 Packetization Rules 1776 The following packetization rules apply: 1778 o If tx-mode is equal to "MSM" or sprop-max-don-diff is greater 1779 than 0 for an RTP stream, the transmission order of NAL units 1780 carried in the RTP stream MAY be different than the NAL unit 1781 decoding order. Otherwise (tx-mode is equal to "SSM" and sprop- 1782 max-don-diff is equal to 0 for an RTP stream), the transmission 1783 order of NAL units carried in the RTP stream MUST be the same as 1784 the NAL unit decoding order. 1786 o A NAL unit of a small size SHOULD be encapsulated in an 1787 aggregation packet together with one or more other NAL units 1788 in order to avoid the unnecessary packetization overhead for 1789 small NAL units. For example, non-VCL NAL units such as 1790 access unit delimiters, parameter sets, or SEI NAL units are 1791 typically small and can often be aggregated with VCL NAL units 1792 without violating MTU size constraints. 1794 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1795 match viewpoint, be encapsulated in an aggregation packet 1796 together with its associated VCL NAL unit, as typically a non- 1797 VCL NAL unit would be meaningless without the associated VCL 1798 NAL unit being available. 1800 o For carrying exactly one NAL unit in an RTP packet, a single 1801 NAL unit packet MUST be used. 1803 6 De-packetization Process 1805 The general concept behind de-packetization is to get the NAL 1806 units out of the RTP packets in an RTP stream and all RTP streams 1807 the RTP stream depends on, if any, and pass them to the decoder 1808 in the NAL unit decoding order. 1810 The de-packetization process is implementation dependent. 1811 Therefore, the following description should be seen as an example 1812 of a suitable implementation. Other schemes may be used as well 1813 as long as the output for the same input is the same as the 1814 process described below. The output is the same when the set of 1815 output NAL units and their order are both identical. 1816 Optimizations relative to the described algorithms are possible. 1818 All normal RTP mechanisms related to buffer management apply. In 1819 particular, duplicated or outdated RTP packets (as indicated by 1820 the RTP sequences number and the RTP timestamp) are removed. To 1821 determine the exact time for decoding, factors such as a possible 1822 intentional delay to allow for proper inter-stream 1823 synchronization must be factored in. 1825 NAL units with NAL unit type values in the range of 0 to 47, 1826 inclusive may be passed to the decoder. NAL-unit-like structures 1827 with NAL unit type values in the range of 48 to 63, inclusive, 1828 MUST NOT be passed to the decoder. 1830 The receiver includes a receiver buffer, which is used to 1831 compensate for transmission delay jitter within individual RTP 1832 streams and across RTP streams, to reorder NAL units from 1833 transmission order to the NAL unit decoding order, and to recover 1834 the NAL unit decoding order in MSM, when applicable. In this 1835 section, the receiver operation is described under the assumption 1836 that there is no transmission delay jitter within an RTP stream 1837 and across RTP streams. To make a difference from a practical 1838 receiver buffer that is also used for compensation of 1839 transmission delay jitter, the receiver buffer is here after 1840 called the de-packetization buffer in this section. Receivers 1841 should also prepare for transmission delay jitter; i.e. either 1842 reserve separate buffers for transmission delay jitter buffering 1843 and de-packetization buffering or use a receiver buffer for both 1844 transmission delay jitter and de-packetization. Moreover, 1845 receivers should take transmission delay jitter into account in 1846 the buffering operation; e.g. by additional initial buffering 1847 before starting of decoding and playback. 1849 If only one RTP stream is being received and sprop-max-don-diff 1850 of the only RTP stream being received is equal to 0, the de- 1851 packetization buffer size is zero bytes, i.e. the NAL units 1852 carried in the RTP stream are directly passed to the decoder in 1853 their transmission order, which is identical to the decoding 1854 order of the NAL units. Otherwise, the process described in the 1855 remainder of this section applies. 1857 There are two buffering states in the receiver: initial buffering 1858 and buffering while playing. Initial buffering starts when the 1859 reception is initialized. After initial buffering, decoding and 1860 playback are started, and the buffering-while-playing mode is 1861 used. 1863 Regardless of the buffering state, the receiver stores incoming 1864 NAL units, in reception order, into the de-packetization buffer. 1865 NAL units carried in RTP packets are stored in the de- 1866 packetization buffer individually, and the value of AbsDon is 1867 calculated and stored for each NAL unit. When MSM is in use, NAL 1868 units of all RTP streams of a bitstream are stored in the same 1869 de-packetization buffer. When NAL units carried in any two RTP 1870 streams are available to be placed into the de-packetization 1871 buffer, those NAL units carried in the RTP stream that is lower 1872 in the dependency tree are placed into the buffer first. For 1873 example, if RTP stream A depends on RTP stream B, then NAL units 1874 carried in RTP stream B are placed into the buffer first. 1876 Initial buffering lasts until condition A (the difference between 1877 the greatest and smallest AbsDon values of the NAL units in the 1878 de-packetization buffer is greater than or equal to the value of 1879 sprop-max-don-diff of the highest RTP stream) or condition B (the 1880 number of NAL units in the de-packetization buffer is greater 1881 than the value of sprop-depack-buf-nalus) is true. 1883 After initial buffering, whenever condition A or condition B is 1884 true, the following operation is repeatedly applied until both 1885 condition A and condition A become false: 1887 o The NAL unit in the de-packetization buffer with the smallest 1888 value of AbsDon is removed from the de-packetization buffer 1889 and passed to the decoder. 1891 When no more NAL units are flowing into the de-packetization 1892 buffer, all NAL units remaining in the de-packetization buffer 1893 are removed from the buffer and passed to the decoder in the 1894 order of increasing AbsDon values. 1896 7 Payload Format Parameters 1898 This section specifies the parameters that MAY be used to select 1899 optional features of the payload format and certain features or 1900 properties of the bitstream or the RTP stream. The parameters 1901 are specified here as part of the media type registration for the 1902 HEVC codec. A mapping of the parameters into the Session 1903 Description Protocol (SDP) [RFC4566] is also provided for 1904 applications that use SDP. Equivalent parameters could be 1905 defined elsewhere for use with control protocols that do not use 1906 SDP. 1908 7.1 Media Type Registration 1910 The media subtype for the HEVC codec is allocated from the IETF 1911 tree. 1913 The receiver MUST ignore any unrecognized parameter. 1915 Media Type name: video 1917 Media subtype name: H265 1919 Required parameters: none 1921 OPTIONAL parameters: 1923 profile-space, tier-flag, profile-id, profile-compatibility- 1924 indicator, interop-constraints, and level-id: 1926 These parameters indicate the profile, tier, default level, 1927 and some constraints of the bitstream carried by the RTP 1928 stream and all RTP streams the RTP stream depends on, or a 1929 specific set of the profile, tier, default level, and some 1930 constraints the receiver supports. 1932 The profile and some constraints are indicated collectively 1933 by profile-space, profile-id, profile-compatibility- 1934 indicator, and interop-constraints. The profile specifies 1935 the subset of coding tools that may have been used to 1936 generate the bitstream or that the receiver supports. 1938 Informative note: There are 32 values of profile-id, and 1939 there are 32 flags in profile-compatibility-indicator, 1940 each flag corresponding to one value of profile-id. 1941 According to HEVC version 1 in [HEVC], when more than 1942 one of the 32 flags is set for a bitstream, the 1943 bitstream would comply with all the profiles 1944 corresponding to the set flags. However, in a draft of 1945 HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 1946 Format Range Extensions profiles have been specified, 1947 all using the same value of profile-id (4), 1948 differentiated by some of the 48 bits in interop- 1949 constraints - this (rather unexpected way of profile 1950 signalling) means that one of the 32 flags may 1951 correspond to multiple profiles. To be able to support 1952 whatever HEVC extension profile that might be specified 1953 and indicated using profile-space, profile-id, profile- 1954 compatibility-indicator, and interop-constraints in the 1955 future, it would be safe to require symmetric use of 1956 these parameters in SDP offer/answer unless recv-sub- 1957 layer-id is included in the SDP answer for choosing one 1958 of the sub-layers offered. 1960 The tier is indicated by tier-flag. The default level is 1961 indicated by level-id. The tier and the default level 1962 specify the limits on values of syntax elements or 1963 arithmetic combinations of values of syntax elements that 1964 are followed when generating the bitstream or that the 1965 receiver supports. 1967 A set of profile-space, tier-flag, profile-id, profile- 1968 compatibility-indicator, interop-constraints, and level-id 1969 parameters ptlA is said to be consistent with another set 1970 of these parameters ptlB if any decoder that conforms to 1971 the profile, tier, level, and constraints indicated by ptlB 1972 can decode any bitstream that conforms to the profile, 1973 tier, level, and constraints indicated by ptlA. 1975 In SDP offer/answer, when the SDP answer does not include 1976 the recv-sub-layer-id parameter that is less than the 1977 sprop-sub-layer-id parameter in the SDP offer, the 1978 following applies: 1980 o The profile-space, tier-flag, profile-id, profile- 1981 compatibility-indicator, and interop-constraints 1982 parameters MUST be used symmetrically, i.e. the value 1983 of each of these parameters in the offer MUST be the 1984 same as that in the answer, either explicitly 1985 signalled or implicitly inferred. 1986 o The level-id parameter is changeable as long as the 1987 highest level indicated by the answer is either equal 1988 to or lower than that in the offer. Note that the 1989 highest level is indicated by level-id and max-recv- 1990 level-id together. 1992 In SDP offer/answer, when the SDP answer does include the 1993 recv-sub-layer-id parameter that is less than the sprop- 1994 sub-layer-id parameter in the SDP offer, the set of 1995 profile-space, tier-flag, profile-id, profile- 1996 compatibility-indicator, interop-constraints, and level-id 1997 parameters included in the answer MUST be consistent with 1998 that for the chosen sub-layer representation as indicated 1999 in the SDP offer, with the exception that the level-id 2000 parameter in the SDP answer is changable as long as the 2001 highest level indicated by the answer is either lower than 2002 or equal to that in the offer. 2004 More specifications of these parameters, including how they 2005 relate to the values of the profile, tier, and level syntax 2006 elements specified in [HEVC] are provided below. 2008 profile-space, profile-id: 2010 The value of profile-space MUST be in the range of 0 to 3, 2011 inclusive. The value of profile-id MUST be in the range of 2012 0 to 31, inclusive. 2014 When profile-space is not present, a value of 0 MUST be 2015 inferred. When profile-id is not present, a value of 1 2016 (i.e. the Main profile) MUST be inferred. 2018 When used to indicate properties of a bitstream, profile- 2019 space and profile-id are derived from the profile, tier, 2020 and level syntax elements in SPS or VPS NAL units as 2021 follows, where general_profile_space, general_profile_idc, 2022 sub_layer_profile_space[j], and sub_layer_profile_idc[j] 2023 are specified in [HEVC]: 2025 If the RTP stream is the highest RTP stream, the 2026 following applies: 2028 o profile_space = general_profile_space 2029 o profile_id = general_profile_idc 2030 Otherwise (the RTP stream is a dependee RTP stream), the 2031 following applies, with j being the value of the sprop- 2032 sub-layer-id parameter: 2034 o profile_space = sub_layer_profile_space[j] 2035 o profile_id = sub_layer_profile_idc[j] 2037 tier-flag, level-id: 2039 The value of tier-flag MUST be in the range of 0 to 1, 2040 inclusive. The value of level-id MUST be in the range of 0 2041 to 255, inclusive. 2043 If the tier-flag and level-id parameters are used to 2044 indicate properties of a bitstream, they indicate the tier 2045 and the highest level the bitstream complies with. 2047 If the tier-flag and level-id parameters are used for 2048 capability exchange, the following applies. If max-recv- 2049 level-id is not present, the default level defined by 2050 level-id indicates the highest level the codec wishes to 2051 support. Otherwise, max-recv-level-id indicates the 2052 highest level the codec supports for receiving. For either 2053 receiving or sending, all levels that are lower than the 2054 highest level supported MUST also be supported. 2056 If no tier-flag is present, a value of 0 MUST be inferred 2057 and if no level-id is present, a value of 93 (i.e. level 2058 3.1) MUST be inferred. 2060 When used to indicate properties of a bitstream, the tier- 2061 flag and level-id parameters are derived from the profile, 2062 tier, and level syntax elements in SPS or VPS NAL units as 2063 follows, where general_tier_flag, general_level_idc, 2064 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 2065 specified in [HEVC]: 2067 If the RTP stream is the highest RTP stream, the 2068 following applies: 2070 o tier-flag = general_tier_flag 2071 o level-id = general_level_idc 2073 Otherwise (the RTP stream is a dependee RTP stream), the 2074 following applies, with j being the value of the sprop- 2075 sub-layer-id parameter: 2077 o tier-flag = sub_layer_tier_flag[j] 2078 o level-id = sub_layer_level_idc[j] 2080 interop-constraints: 2082 A base16 [RFC4648] (hexadecimal) representation of six 2083 bytes of data, consisting of progressive_source_flag, 2084 interlaced_source_flag, non_packed_constraint_flag, 2085 frame_only_constraint_flag, and reserved_zero_44bits. 2087 If the interop-constraints parameter is not present, the 2088 following MUST be inferred: 2090 o progressive_source_flag = 1 2091 o interlaced_source_flag = 0 2092 o non_packed_constraint_flag = 1 2093 o frame_only_constraint_flag = 1 2094 o reserved_zero_44bits = 0 2096 When the interop-constraints parameter is used to indicate 2097 properties of a bitstream, the following applies, where 2098 general_progressive_source_flag, 2099 general_interlaced_source_flag, 2100 general_non_packed_constraint_flag, 2101 general_non_packed_constraint_flag, 2102 general_frame_only_constraint_flag, 2103 general_reserved_zero_44bits, 2104 sub_layer_progressive_source_flag[j], 2105 sub_layer_interlaced_source_flag[j], 2106 sub_layer_non_packed_constraint_flag[j], 2107 sub_layer_frame_only_constraint_flag[j], and 2108 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2110 If the RTP stream is the highest RTP stream, the 2111 following applies: 2113 o progressive_source_flag = 2114 general_progressive_source_flag 2115 o interlaced_source_flag = 2116 general_interlaced_source_flag 2117 o non_packed_constraint_flag = 2118 general_non_packed_constraint_flag 2119 o frame_only_constraint_flag = 2120 general_frame_only_constraint_flag 2121 o reserved_zero_44bits = general_reserved_zero_44bits 2123 Otherwise (the RTP stream is a dependee RTP stream), the 2124 following applies, with j being the value of the sprop- 2125 sub-layer-id parameter: 2127 o progressive_source_flag = 2128 sub_layer_progressive_source_flag[j] 2129 o interlaced_source_flag = 2130 sub_layer_interlaced_source_flag[j] 2131 o non_packed_constraint_flag = 2133 sub_layer_non_packed_constraint_flag[j] 2134 o frame_only_constraint_flag = 2136 sub_layer_frame_only_constraint_flag[j] 2137 o reserved_zero_44bits = 2138 sub_layer_reserved_zero_44bits[j] 2140 Using interop-constraints for capability exchange results 2141 in a requirement on any bitstream to be compliant with the 2142 interop-constraints. 2144 profile-compatibility-indicator: 2146 A base16 [RFC4648] representation of four bytes of data. 2148 When profile-compatibility-indicator is used to indicate 2149 properties of a bitstream, the following applies, where 2150 general_profile_compatibility_flag[j] and 2151 sub_layer_profile_compatibility_flag[i][j] are specified in 2152 [HEVC]: 2154 The profile-compatibility-indicator in this case 2155 indicates additional profiles to the profile defined by 2156 profile_space, profile_id, and interop-constraints the 2157 bitstream conforms to. A decoder that conforms to any 2158 of all the profiles the bitstream conforms to would be 2159 capable of decoding the bitstream. These additional 2160 profiles are defined by profile-space, each set bit of 2161 profile-compatibility-indicator, and interop- 2162 constraints. 2164 If the RTP stream is the highest RTP stream, the 2165 following applies for each value of j in the range of 0 2166 to 31, inclusive: 2168 o bit j of profile-compatibility-indicator = 2169 general_profile_compatibility_flag[j] 2171 Otherwise (the RTP stream is a dependee RTP stream), the 2172 following applies for i equal to sprop-sub-layer-id and 2173 for each value of j in the range of 0 to 31, inclusive: 2175 o bit j of profile-compatibility-indicator = 2176 sub_layer_profile_compatibility_flag[i][j] 2178 Using profile-compatibility-indicator for capability 2179 exchange results in a requirement on any bitstream to be 2180 compliant with the profile-compatibility-indicator. This 2181 is intended to handle cases where any future HEVC profile 2182 is defined as an intersection of two or more profiles. 2184 If this parameter is not present, this parameter defaults 2185 to the following: bit j, with j equal to profile-id, of 2186 profile-compatibility-indicator is inferred to be equal to 2187 1, and all other bits are inferred to be equal to 0. 2189 sprop-sub-layer-id: 2191 This parameter MAY be used to indicate the highest allowed 2192 value of TID in the bitstream. When not present, the value 2193 of sprop-sub-layer-id is inferred to be equal to 6. 2195 The value of sprop-sub-layer-id MUST be in the range of 0 2196 to 6, inclusive. 2198 recv-sub-layer-id: 2200 This parameter MAY be used to signal a receiver's choice of 2201 the offered or declared sub-layer representations in the 2202 sprop-vps. The value of recv-sub-layer-id indicates the 2203 TID of the highest sub-layer of the bitstream that a 2204 receiver supports. When not present, the value of recv- 2205 sub-layer-id is inferred to be equal to the value of the 2206 sprop-sub-layer-id parameter in the SDP offer. 2208 The value of recv-sub-layer-id MUST be in the range of 0 to 2209 6, inclusive. 2211 max-recv-level-id: 2213 This parameter MAY be used to indicate the highest level a 2214 receiver supports. The highest level the receiver supports 2215 is equal to the value of max-recv-level-id divided by 30. 2217 The value of max-recv-level-id MUST be in the range of 0 2218 to 255, inclusive. 2220 When max-recv-level-id is not present, the value is 2221 inferred to be equal to level-id. 2223 max-recv-level-id MUST NOT be present when the highest 2224 level the receiver supports is not higher than the default 2225 level. 2227 tx-mode: 2229 This parameter indicates whether the transmission mode is SSM 2230 or MSM. 2232 The value of tx-mode MUST be equal to either "MSM" or "SSM". 2233 When not present, the value of tx-mode is inferred to be 2234 equal to "SSM". 2236 If the value is equal to "MSM", MSM MUST be in use. Otherwise 2237 (the value is equal to "SSM"), SSM MUST be in use. 2239 The value of tx-mode MUST be equal to "MSM" for all RTP 2240 sessions in an MSM. 2242 sprop-vps: 2244 This parameter MAY be used to convey any video parameter 2245 set NAL unit of the bitstream for out-of-band transmission 2246 of video parameter sets. The parameter MAY also be used 2247 for capability exchange and to indicate sub-stream 2248 characteristics (i.e. properties of sub-layer 2249 representations as defined in [HEVC]). The value of the 2250 parameter is a comma-separated (',') list of base64 2251 [RFC4648] representations of the video parameter set NAL 2252 units as specified in Section 7.3.2.1 of [HEVC]. 2254 The sprop-vps parameter MAY contain one or more than one 2255 video parameter set NAL unit. However, all other video 2256 parameter sets contained in the sprop-vps parameter MUST be 2257 consistent with the first video parameter set in the sprop- 2258 vps parameter. A video parameter set vpsB is said to be 2259 consistent with another video parameter set vpsA if any 2260 decoder that conforms to the profile, tier, level, and 2261 constraints indicated by the 12 bytes of data starting from 2262 the syntax element general_profile_space to the syntax 2263 element general_level_id, inclusive, in the first 2264 profile_tier_level( ) syntax structure in vpsA can decode 2265 any bitstream that conforms to the profile, tier, level, 2266 and constraints indicated by the 12 bytes of data starting 2267 from the syntax element general_profile_space to the syntax 2268 element general_level_id, inclusive, in the first 2269 profile_tier_level( ) syntax structure in vpsB. 2271 sprop-sps: 2273 This parameter MAY be used to convey sequence parameter set 2274 NAL units of the bitstream for out-of-band transmission of 2275 sequence parameter sets. The value of the parameter is a 2276 comma-separated (',') list of base64 [RFC4648] 2277 representations of the sequence parameter set NAL units as 2278 specified in Section 7.3.2.2 of [HEVC]. 2280 sprop-pps: 2282 This parameter MAY be used to convey picture parameter set 2283 NAL units of the bitstream for out-of-band transmission of 2284 picture parameter sets. The value of the parameter is a 2285 comma-separated (',') list of base64 [RFC4648] 2286 representations of the picture parameter set NAL units as 2287 specified in Section 7.3.2.3 of [HEVC]. 2289 sprop-sei: 2291 This parameter MAY be used to convey one or more SEI 2292 messages that describe bitstream characteristics. When 2293 present, a decoder can rely on the bitstream 2294 characteristics that are described in the SEI messages for 2295 the entire duration of the session, independently from the 2296 persistence scopes of the SEI messages as specified in 2297 [HEVC]. 2299 The value of the parameter is a comma-separated (',') list 2300 of base64 [RFC4648] representations of SEI NAL units as 2301 specified in Section 7.3.2.4 of [HEVC]. 2303 Informative note: Intentionally, no list of applicable 2304 or inapplicable SEI messages is specified here. 2305 Conveying certain SEI messages in sprop-sei may be 2306 sensible in some application scenarios and meaningless 2307 in others. However, a few examples are described below: 2309 1) In an environment where the bitstream was created 2310 from film-based source material, and no splicing is 2311 going to occur during the lifetime of the session, 2312 the film grain characteristics SEI message or the 2313 tone mapping information SEI message are likely 2314 meaningful, and sending them in sprop-sei rather than 2315 in the bitstream at each entry point may help saving 2316 bits and allows to configure the renderer only once, 2317 avoiding unwanted artifacts. 2318 2) The structure of pictures information SEI message in 2319 sprop-sei can be used to inform a decoder of 2320 information on the NAL unit types, picture order 2321 count values, and prediction dependencies of a 2322 sequence of pictures. Having such knowledge can be 2323 helpful for error recovery. 2324 3) Examples for SEI messages that would be meaningless 2325 to be conveyed in sprop-sei include the decoded 2326 picture hash SEI message (it is close to impossible 2327 that all decoded pictures have the same hash-tag), 2328 the display orientation SEI message when the device 2329 is a handheld device (as the display orientation may 2330 change when the handheld device is turned around), or 2331 the filler payload SEI message (as there is no point 2332 in just having more bits in SDP). 2334 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2336 These parameters MAY be used to signal the capabilities of 2337 a receiver implementation. These parameters MUST NOT be 2338 used for any other purpose. The highest level (specified 2339 by max-recv-level-id) MUST be such that the receiver is 2340 fully capable of supporting. max-lsr, max-lps, max-cpb, 2341 max-dpb, max-br, max-tr, and max-tc MAY be used to indicate 2342 capabilities of the receiver that extend the required 2343 capabilities of the highest level, as specified below. 2345 When more than one parameter from the set (max-lsr, max- 2346 lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, 2347 the receiver MUST support all signaled capabilities 2348 simultaneously. For example, if both max-lsr and max-br 2349 are present, the highest level with the extension of both 2350 the picture rate and bitrate is supported. That is, the 2351 receiver is able to decode bitstreams in which the luma 2352 sample rate is up to max-lsr (inclusive), the bitrate is up 2353 to max-br (inclusive), the coded picture buffer size is 2354 derived as specified in the semantics of the max-br 2355 parameter below, and the other properties comply with the 2356 highest level specified by max-recv-level-id. 2358 Informative note: When the OPTIONAL media type 2359 parameters are used to signal the properties of a 2360 bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- 2361 br, max-tr, and max-tc are not present, the values of 2362 profile-space, tier-flag, profile-id, profile- 2363 compatibility-indicator, interop-constraints, and level- 2364 id must always be such that the bitstream complies fully 2365 with the specified profile, tier, and level. 2367 max-lsr: 2368 The value of max-lsr is an integer indicating the maximum 2369 processing rate in units of luma samples per second. The 2370 max-lsr parameter signals that the receiver is capable of 2371 decoding video at a higher rate than is required by the 2372 highest level. 2374 When max-lsr is signaled, the receiver MUST be able to 2375 decode bitstreams that conform to the highest level, with 2376 the exception that the MaxLumaSR value in Table A-2 of 2377 [HEVC] for the highest level is replaced with the value of 2378 max-lsr. Senders MAY use this knowledge to send pictures 2379 of a given size at a higher picture rate than is indicated 2380 in the highest level. 2382 When not present, the value of max-lsr is inferred to be 2383 equal to the value of MaxLumaSR given in Table A-2 of 2384 [HEVC] for the highest level. 2386 The value of max-lsr MUST be in the range of MaxLumaSR to 2387 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in 2388 Table A-2 of [HEVC] for the highest level. 2390 max-lps: 2391 The value of max-lps is an integer indicating the maximum 2392 picture size in units of luma samples. The max-lps 2393 parameter signals that the receiver is capable of decoding 2394 larger picture sizes than are required by the highest 2395 level. When max-lps is signaled, the receiver MUST be able 2396 to decode bitstreams that conform to the highest level, 2397 with the exception that the MaxLumaPS value in Table A-1 of 2398 [HEVC] for the highest level is replaced with the value of 2399 max-lps. Senders MAY use this knowledge to send larger 2400 pictures at a proportionally lower picture rate than is 2401 indicated in the highest level. 2403 When not present, the value of max-lps is inferred to be 2404 equal to the value of MaxLumaPS given in Table A-1 of 2405 [HEVC] for the highest level. 2407 The value of max-lps MUST be in the range of MaxLumaPS to 2408 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in 2409 Table A-1 of [HEVC] for the highest level. 2411 max-cpb: 2412 The value of max-cpb is an integer indicating the maximum 2413 coded picture buffer size in units of CpbBrVclFactor bits 2414 for the VCL HRD parameters and in units of CpbBrNalFactor 2415 bits for the NAL HRD parameters, where CpbBrVclFactor and 2416 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The 2417 max-cpb parameter signals that the receiver has more memory 2418 than the minimum amount of coded picture buffer memory 2419 required by the highest level. When max-cpb is signaled, 2420 the receiver MUST be able to decode bitstreams that conform 2421 to the highest level, with the exception that the MaxCPB 2422 value in Table A-1 of [HEVC] for the highest level is 2423 replaced with the value of max-cpb. Senders MAY use this 2424 knowledge to construct coded bitstreams with greater 2425 variation of bitrate than can be achieved with the MaxCPB 2426 value in Table A-1 of [HEVC]. 2428 When not present, the value of max-cpb is inferred to be 2429 equal to the value of MaxCPB given in Table A-1 of [HEVC] 2430 for the highest level. 2432 The value of max-cpb MUST be in the range of MaxCPB to 2433 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table 2434 A-1 of [HEVC] for the highest level. 2436 Informative note: The coded picture buffer is used in 2437 the hypothetical reference decoder (Annex C of HEVC). 2438 The use of the hypothetical reference decoder is 2439 recommended in HEVC encoders to verify that the produced 2440 bitstream conforms to the standard and to control the 2441 output bitrate. Thus, the coded picture buffer is 2442 conceptually independent of any other potential buffers 2443 in the receiver, including de-packetization and de- 2444 jitter buffers. The coded picture buffer need not be 2445 implemented in decoders as specified in Annex C of HEVC, 2446 but rather standard-compliant decoders can have any 2447 buffering arrangements provided that they can decode 2448 standard-compliant bitstreams. Thus, in practice, the 2449 input buffer for a video decoder can be integrated with 2450 de-packetization and de-jitter buffers of the receiver. 2452 max-dpb: 2453 The value of max-dpb is an integer indicating the maximum 2454 decoded picture buffer size in units decoded pictures at 2455 the MaxLumaPS for the highest level, i.e. the number of 2456 decoded pictures at the maximum picture size defined by the 2457 highest level. The value of max-dpb MUST be in the range 2458 of 1 to 16, respectively. The max-dpb parameter signals 2459 that the receiver has more memory than the minimum amount 2460 of decoded picture buffer memory required by default, which 2461 is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When 2462 max-dpb is signaled, the receiver MUST be able to decode 2463 bitstreams that conform to the highest level, with the 2464 exception that the MaxDpbPicBuff value defined in [HEVC] as 2465 6 is replaced with the value of max-dpb. Consequently, a 2466 receiver that signals max-dpb MUST be capable of storing 2467 the following number of decoded pictures (MaxDpbSize) in 2468 its decoded picture buffer: 2470 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2471 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2472 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2473 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2474 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 2475 ) ) 2476 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2477 else 2478 MaxDpbSize = max-dpb 2480 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the 2481 highest level and PicSizeInSamplesY is the current size of 2482 each decoded picture in units of luma samples as defined in 2483 [HEVC]. 2485 The value of max-dpb MUST be greater than or equal to the 2486 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2487 Senders MAY use this knowledge to construct coded 2488 bitstreams with improved compression. 2490 When not present, the value of max-dpb is inferred to be 2491 equal to the value of MaxDpbPicBuf (i.e. 6) as defined in 2492 [HEVC]. 2494 Informative note: This parameter was added primarily to 2495 complement a similar codepoint in the ITU-T 2496 Recommendation H.245, so as to facilitate signaling 2497 gateway designs. The decoded picture buffer stores 2498 reconstructed samples. There is no relationship between 2499 the size of the decoded picture buffer and the buffers 2500 used in RTP, especially de-packetization and de-jitter 2501 buffers. 2503 max-br: 2504 The value of max-br is an integer indicating the maximum 2505 video bitrate in units of CpbBrVclFactor bits per second 2506 for the VCL HRD parameters and in units of CpbBrNalFactor 2507 bits per second for the NAL HRD parameters, where 2508 CpbBrVclFactor and CpbBrNalFactor are defined in Section 2509 A.4 of [HEVC]. 2511 The max-br parameter signals that the video decoder of the 2512 receiver is capable of decoding video at a higher bitrate 2513 than is required by the highest level. 2515 When max-br is signaled, the video codec of the receiver 2516 MUST be able to decode bitstreams that conform to the 2517 highest level, with the following exceptions in the limits 2518 specified by the highest level: 2520 o The value of max-br replaces the MaxBR value in Table A- 2521 2 of [HEVC] for the highest level. 2522 o When the max-cpb parameter is not present, the result of 2523 the following formula replaces the value of MaxCPB in 2524 Table A-1 of [HEVC]: 2526 (MaxCPB of the highest level) * max-br / (MaxBR of 2527 the highest level) 2529 For example, if a receiver signals capability for Main 2530 profile Level 2 with max-br equal to 2000, this indicates a 2531 maximum video bitrate of 2000 kbits/sec for VCL HRD 2532 parameters, a maximum video bitrate of 2200 kbits/sec for 2533 NAL HRD parameters, and a CPB size of 2000000 bits (2000000 2534 / 1500000 * 1500000). 2536 Senders MAY use this knowledge to send higher bitrate video 2537 as allowed in the level definition of Annex A of HEVC to 2538 achieve improved video quality. 2540 When not present, the value of max-br is inferred to be 2541 equal to the value of MaxBR given in Table A-2 of [HEVC] 2542 for the highest level. 2544 The value of max-br MUST be in the range of MaxBR to 2545 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2546 [HEVC] for the highest level. 2548 Informative note: This parameter was added primarily to 2549 complement a similar codepoint in the ITU-T 2550 Recommendation H.245, so as to facilitate signaling 2551 gateway designs. The assumption that the network is 2552 capable of handling such bitrates at any given time 2553 cannot be made from the value of this parameter. In 2554 particular, no conclusion can be drawn that the signaled 2555 bitrate is possible under congestion control 2556 constraints. 2558 max-tr: 2559 The value of max-tr is an integer indication the maximum 2560 number of tile rows. The max-tr parameter signals that the 2561 receiver is capable of decoding video with a larger number 2562 of tile rows than the value allowed by the highest level. 2564 When max-tr is signaled, the receiver MUST be able to 2565 decode bitstreams that conform to the highest level, with 2566 the exception that the MaxTileRows value in Table A-1 of 2567 [HEVC] for the highest level is replaced with the value of 2568 max-tr. 2570 Senders MAY use this knowledge to send pictures utilizing a 2571 larger number of tile rows than the value allowed by the 2572 highest level. 2574 When not present, the value of max-tr is inferred to be 2575 equal to the value of MaxTileRows given in Table A-1 of 2576 [HEVC] for the highest level. 2578 The value of max-tr MUST be in the range of MaxTileRows to 2579 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2580 Table A-1 of [HEVC] for the highest level. 2582 max-tc: 2583 The value of max-tc is an integer indication the maximum 2584 number of tile columns. The max-tc parameter signals that 2585 the receiver is capable of decoding video with a larger 2586 number of tile columns than the value allowed by the 2587 highest level. 2589 When max-tc is signaled, the receiver MUST be able to 2590 decode bitstreams that conform to the highest level, with 2591 the exception that the MaxTileCols value in Table A-1 of 2592 [HEVC] for the highest level is replaced with the value of 2593 max-tc. 2595 Senders MAY use this knowledge to send pictures utilizing a 2596 larger number of tile columns than the value allowed by the 2597 highest level. 2599 When not present, the value of max-tc is inferred to be 2600 equal to the value of MaxTileCols given in Table A-1 of 2601 [HEVC] for the highest level. 2603 The value of max-tc MUST be in the range of MaxTileCols to 2604 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2605 Table A-1 of [HEVC] for the highest level. 2607 max-fps: 2609 The value of max-fps is an integer indicating the maximum 2610 picture rate in units of pictures per 100 seconds that can 2611 be effectively processed by the receiver. The max-fps 2612 parameter MAY be used to signal that the receiver has a 2613 constraint in that it is not capable of processing video 2614 effectively at the full picture rate that is implied by the 2615 highest level and, when present, one or more of the 2616 parameters max-lsr, max-lps, and max-br. 2618 The value of max-fps is not necessarily the picture rate at 2619 which the maximum picture size can be sent, it constitutes 2620 a constraint on maximum picture rate for all resolutions. 2622 Informative note: The max-fps parameter is semantically 2623 different from max-lsr, max-lps, max-cpb, max-dpb, max- 2624 br, max-tr, and max-tc in that max-fps is used to signal 2625 a constraint, lowering the maximum picture rate from 2626 what is implied by other parameters. 2628 The encoder MUST use a picture rate equal to or less than 2629 this value. In cases where the max-fps parameter is absent 2630 the encoder is free to choose any picture rate according to 2631 the highest level and any signaled optional parameters. 2633 The value of max-fps MUST be smaller than or equal to the 2634 full picture rate that is implied by the highest level and, 2635 when present, one or more of the parameters max-lsr, max- 2636 lps, and max-br. 2638 sprop-max-don-diff: 2640 The value of this parameter MUST be equal to 0, if the RTP 2641 stream does not depend on other RTP streams and there is no 2642 NAL unit naluA that is followed in transmission order by 2643 any NAL unit preceding naluA in decoding order. Otherwise, 2644 this parameter specifies the maximum absolute difference 2645 between the decoding order number (i.e., AbsDon) values of 2646 any two NAL units naluA and naluB, where naluA follows 2647 naluB in decoding order and precedes naluB in transmission 2648 order. 2650 The value of sprop-max-don-diff MUST be an integer in the 2651 range of 0 to 32767, inclusive. 2653 When not present, the value of sprop-max-don-diff is 2654 inferred to be equal to 0. 2656 When the RTP stream depends on one or more other RTP 2657 streams (in this case tx-mode MUST be equal to "MSM" and 2658 MSM is in use), this parameter MUST be present and the 2659 value MUST be greater than 0. 2661 Informative note: When the RTP stream does not depend on 2662 other RTP streams, either MSM or SSM may be in use. 2664 sprop-depack-buf-nalus: 2666 This parameter specifies the maximum number of NAL units 2667 that precede a NAL unit in transmission order and follow 2668 the NAL unit in decoding order. 2670 The value of sprop-depack-buf-nalus MUST be an integer in 2671 the range of 0 to 32767, inclusive. 2673 When not present, the value of sprop-depack-buf-nalus is 2674 inferred to be equal to 0. 2676 When the RTP stream depends on one or more other RTP 2677 streams (in this case tx-mode MUST be equal to "MSM" and 2678 MSM is in use), this parameter MUST be present and the 2679 value MUST be greater than 0. 2681 sprop-depack-buf-bytes: 2683 This parameter signals the required size of the de- 2684 packetization buffer in units of bytes. The value of the 2685 parameter MUST be greater than or equal to the maximum 2686 buffer occupancy (in units of bytes) of the de- 2687 packetization buffer as specified in section 6. 2689 The value of sprop-depack-buf-bytes MUST be an integer in 2690 the range of 0 to 4294967295, inclusive. 2692 When the RTP stream depends on one or more other RTP 2693 streams (in this case tx-mode MUST be equal to "MSM" and 2694 MSM is in use) or sprop-max-don-diff is present and greater 2695 than 0, this parameter MUST be present and the value MUST 2696 be greater than 0. 2698 Informative note: The value of sprop-depack-buf-bytes 2699 indicates the required size of the de-packetization 2700 buffer only. When network jitter can occur, an 2701 appropriately sized jitter buffer has to be available as 2702 well. 2704 depack-buf-cap: 2706 This parameter signals the capabilities of a receiver 2707 implementation and indicates the amount of de-packetization 2708 buffer space in units of bytes that the receiver has 2709 available for reconstructing the NAL unit decoding order 2710 from NAL units carried in one or more RTP streams. A 2711 receiver is able to handle any RTP stream, and all RTP 2712 streams the RTP stream depends on, when present, for which 2713 the value of the sprop-depack-buf-bytes parameter is 2714 smaller than or equal to this parameter. 2716 When not present, the value of depack-buf-cap is inferred 2717 to be equal to 4294967295. The value of depack-buf-cap 2718 MUST be an integer in the range of 1 to 4294967295, 2719 inclusive. 2721 Informative note: depack-buf-cap indicates the maximum 2722 possible size of the de-packetization buffer of the 2723 receiver only. When network jitter can occur, an 2724 appropriately sized jitter buffer has to be available as 2725 well. 2727 sprop-segmentation-id: 2729 This parameter MAY be used to signal the segmentation tools 2730 present in the bitstream and that can be used for 2731 parallelization. The value of sprop-segmentation-id MUST 2732 be an integer in the range of 0 to 3, inclusive. When not 2733 present, the value of sprop-segmentation-id is inferred to 2734 be equal to 0. 2736 When sprop-segmentation-id is equal to 0, no information 2737 about the segmentation tools is provided. When sprop- 2738 segmentation-id is equal to 1, it indicates that slices are 2739 present in the bitstream. When sprop-segmentation-id is 2740 equal to 2, it indicates that tiles are present in the 2741 bitstream. When sprop-segmentation-id is equal to 3, it 2742 indicates that WPP is used in the bitstream. 2744 sprop-spatial-segmentation-idc: 2746 A base16 [RFC4648] representation of the syntax element 2747 min_spatial_segmentation_idc as specified in [HEVC]. This 2748 parameter MAY be used to describe parallelization 2749 capabilities of the bitstream. 2751 dec-parallel-cap: 2753 This parameter MAY be used to indicate the decoder's 2754 additional decoding capabilities given the presence of 2755 tools enabling parallel decoding, such as slices, tiles, 2756 and WPP, in the bitstream. The decoding capability of the 2757 decoder may vary with the setting of the parallel decoding 2758 tools present in the bitstream, e.g. the size of the tiles 2759 that are present in a bitstream. Therefore, multiple 2760 capability points may be provided, each indicating the 2761 minimum required decoding capability that is associated 2762 with a parallelism requirement, which is a requirement on 2763 the bitstream that enables parallel decoding. 2765 Each capability point is defined as a combination of 1) a 2766 parallelism requirement, 2) a profile (determined by 2767 profile-space and profile-id), 3) a highest level, and 4) a 2768 maximum processing rate, a maximum picture size, and a 2769 maximum video bitrate that may be equal to or greater than 2770 that determined by the highest level. The parameter's 2771 syntax in ABNF [RFC5234] is as follows: 2773 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2774 cap-point) "}" 2776 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2777 cap-parameter) 2779 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2781 cap-parameter = tier-flag / level-id / max-lsr 2782 / max-lps / max-br 2784 tier-flag = "tier-flag" EQ ("0" / "1") 2786 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2788 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2789 18,446,744,073,709,551,615) 2791 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2793 max-br = "max-br" EQ 1*20DIGIT ; (0- 2794 18,446,744,073,709,551,615) 2796 EQ = "=" 2798 The set of capability points expressed by the dec-parallel- 2799 cap parameter is enclosed in a pair of curly braces ("{}"). 2800 Each set of two consecutive capability points is separated 2801 by a comma (','). Within each capability point, each set 2802 of two consecutive parameters, and when present, their 2803 values, is separated by a semicolon (';'). 2805 The profile of all capability points is determined by 2806 profile-space and profile-id that are outside the dec- 2807 parallel-cap parameter. 2809 Each capability point starts with an indication of the 2810 parallelism requirement, which consists of a parallel tool 2811 type, which may be equal to 'w' or 't', and a decimal value 2812 of the spatial-seg-idc parameter. When the type is 'w', 2813 the capability point is valid only for H.265 bitstreams 2814 with WPP in use, i.e. entropy_coding_sync_enabled_flag 2815 equal to 1. When the type is 't', the capability point is 2816 valid only for H.265 bitstreams with WPP not in use (i.e. 2817 entropy_coding_sync_enabled_flag equal to 0). The 2818 capability-point is valid only for H.265 bitstreams with 2819 min_spatial_segmentation_idc equal to or greater than 2820 spatial-seg-idc. 2822 After the parallelism requirement indication, each 2823 capability point continues with one or more pairs of 2824 parameter and value in any order for any of the following 2825 parameters: 2827 o tier-flag 2828 o level-id 2829 o max-lsr 2830 o max-lps 2831 o max-br 2833 At most one occurrence of each of the above five parameters 2834 is allowed within each capability point. 2836 The values of dec-parallel-cap.tier-flag and dec-parallel- 2837 cap.level-id for a capability point indicate the highest 2838 level of the capability point. The values of dec-parallel- 2839 cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- 2840 cap.max-br for a capability point indicate the maximum 2841 processing rate in units of luma samples per second, the 2842 maximum picture size in units of luma samples, and the 2843 maximum video bitrate (in units of CpbBrVclFactor bits per 2844 second for the VCL HRD parameters and in units of 2845 CpbBrNalFactor bits per second for the NAL HRD parameters 2846 where CpbBrVclFactor and CpbBrNalFactor are defined in 2847 Section A.4 of [HEVC]). 2849 When not present, the value of dec-parallel-cap.tier-flag 2850 is inferred to be equal to the value of tier-flag outside 2851 the dec-parallel-cap parameter. When not present, the 2852 value of dec-parallel-cap.level-id is inferred to be equal 2853 to the value of max-recv-level-id outside the dec-parallel- 2854 cap parameter. When not present, the value of dec- 2855 parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- 2856 parallel-cap.max-br is inferred to be equal to the value of 2857 max-lsr, max-lps, or max-br, respectively, outside the dec- 2858 parallel-cap parameter. 2860 The general decoding capability, expressed by the set of 2861 parameters outside of dec-parallel-cap, is defined as the 2862 capability point that is determined by the following 2863 combination of parameters: 1) the parallelism requirement 2864 corresponding to the value of sprop-segmentation-id equal 2865 to 0 for a bitstream, 2) the profile determined by profile- 2866 space, profile-id, profile-compatibility-indicator, and 2867 interop-constraints, 3) the tier and the highest level 2868 determined by tier-flag and max-recv-level-id, and 4) the 2869 maximum processing rate, the maximum picture size, and the 2870 maximum video bitrate determined by the highest level. The 2871 general decoding capability MUST NOT be included as one of 2872 the set of capability points in the dec-parallel-cap 2873 parameter. 2875 For example, the following parameters express the general 2876 decoding capability of 720p30 (Level 3.1) plus an 2877 additional decoding capability of 1080p30 (Level 4) given 2878 that the spatially largest tile or slice used in the 2879 bitstream is equal to or less than 1/3 of the picture size: 2881 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- 2882 id=120} 2884 For another example, the following parameters express an 2885 additional decoding capability of 1080p30, using dec- 2886 parallel-cap.max-lsr and dec-parallel-cap.max-lps, given 2887 that WPP is used in the bitstream: 2889 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2890 max-lsr=62668800;max-lps=2088960} 2892 Informative note: When min_spatial_segmentation_idc is 2893 present in a bitstream and WPP is not used, [HEVC] 2894 specifies that there is no slice or no tile in the 2895 bitstream containing more than 4 * PicSizeInSamplesY / 2896 ( min_spatial_segmentation_idc + 4 ) luma samples. 2898 include-dph: 2900 This parameter is used to indicate the capability and 2901 preference to utilize or include decoded picture hash (DPH) 2902 SEI messages (See Section D.3.19 of [HEVC]) in the 2903 bitstream. DPH SEI messages can be used to detect picture 2904 corruption so the receiver can request picture repair, see 2905 Section 8. The value is a comma separated list of hash 2906 types that is supported or requested to be used, each hash 2907 type provided as an unsigned integer value (0-255), with 2908 the hash types listed from most preferred to the least 2909 preferred. Example: "include-dph=0,2", which indicates the 2910 capability for MD5 (most preferred) and Checksum (less 2911 preferred). If the parameter is not included or the value 2912 contains no hash types, then no capability to utilize DPH 2913 SEI messages is assumed. Note that DPH SEI messages MAY 2914 still be included in the bitstream even when there is no 2915 declaration of capability to use them, as in general SEI 2916 messages do not affect the normative decoding process and 2917 decoders are allowed to ignore SEI messages. 2919 Encoding considerations: 2921 This type is only defined for transfer via RTP (RFC 3550). 2923 Security considerations: 2925 See Section 9 of RFC XXXX. 2927 Public specification: 2929 Please refer to Section 13 of RFC XXXX. 2931 Additional information: None 2933 File extensions: none 2935 Macintosh file type code: none 2937 Object identifier or OID: none 2939 Person & email address to contact for further information: 2941 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 2943 Intended usage: COMMON 2945 Author: See Section 14 of RFC XXXX. 2947 Change controller: 2949 IETF Audio/Video Transport Payloads working group delegated 2950 from the IESG. 2952 7.2 SDP Parameters 2954 The receiver MUST ignore any parameter unspecified in this memo. 2956 7.2.1 Mapping of Payload Type Parameters to SDP 2958 The media type video/H265 string is mapped to fields in the 2959 Session Description Protocol (SDP) [RFC4566] as follows: 2961 o The media name in the "m=" line of SDP MUST be video. 2963 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 2964 (the media subtype). 2966 o The clock rate in the "a=rtpmap" line MUST be 90000. 2968 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2969 flag", "level-id", "interop-constraints", "profile- 2970 compatibility-indicator", "sprop-sub-layer-id", "recv-sub- 2971 layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- 2972 lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", 2973 "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", 2974 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop- 2975 segmentation-id", "sprop-spatial-segmentation-idc", "dec- 2976 parallel-cap", and "include-dph", when present, MUST be 2977 included in the "a=fmtp" line of SDP. This parameter is 2978 expressed as a media type string, in the form of a semicolon 2979 separated list of parameter=value pairs. 2981 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 2982 pps", when present, MUST be included in the "a=fmtp" line of 2983 SDP or conveyed using the "fmtp" source attribute as specified 2984 in section 6.3 of [RFC5576]. For a particular media format 2985 (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- 2986 pps" MUST NOT be both included in the "a=fmtp" line of SDP and 2987 conveyed using the "fmtp" source attribute. When included in 2988 the "a=fmtp" line of SDP, these parameters are expressed as a 2989 media type string, in the form of a semicolon separated list 2990 of parameter=value pairs. When conveyed in the "a=fmtp" line 2991 of SDP for a particular payload type, the parameters "sprop- 2992 vps", "sprop-sps", and "sprop-pps" MUST be applied to each 2993 SSRC with the payload type. When conveyed using the "fmtp" 2994 source attribute, these parameters are only associated with 2995 the given source and payload type as parts of the "fmtp" 2996 source attribute. 2998 Informative note: Conveyance of "sprop-vps", "sprop-sps", 2999 and "sprop-pps" using the "fmtp" source attribute allows 3000 for out-of-band transport of parameter sets in topologies 3001 like Topo-Video-switch-MCU as specified in [RFC5117]. 3003 An example of media representation in SDP is as follows: 3005 m=video 49170 RTP/AVP 98 3006 a=rtpmap:98 H265/90000 3007 a=fmtp:98 profile-id=1; 3008 sprop-vps=