idnits 2.17.1 draft-ietf-payload-rtp-h265-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 173 instances of weird spacing in the document. Is it really formatted ragged-right, rather than justified? ** There are 3 instances of too long lines in the document, the longest one being 14 characters in excess of 72. ** The abstract seems to contain references ([HEVC]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 27 has weird spacing: '... at any ti...' == Line 30 has weird spacing: '... The list ...' == Line 45 has weird spacing: '...fo) in effec...' == Line 46 has weird spacing: '...ication of t...' == Line 47 has weird spacing: '...ly, as they ...' == (168 more instances...) -- The document date (February 12, 2014) is 3724 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '3GP' is mentioned on line 269, but not defined -- Looks like a reference, but probably isn't: '0' on line 1035 == Missing Reference: 'RFC5234' is mentioned on line 2356, but not defined == Missing Reference: 'RFC5117' is mentioned on line 2538, but not defined ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667) == Missing Reference: 'RFC2326' is mentioned on line 2852, but not defined ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826) == Missing Reference: 'RFC2974' is mentioned on line 2853, but not defined == Missing Reference: 'RFC3551' is mentioned on line 2994, but not defined == Missing Reference: 'RFC3711' is mentioned on line 2994, but not defined == Missing Reference: 'RFC5124' is mentioned on line 2995, but not defined == Missing Reference: 'RFC 3711' is mentioned on line 3020, but not defined == Missing Reference: 'RFC 3551' is mentioned on line 3044, but not defined == Unused Reference: '3GPPFF' is defined on line 3169, but no explicit reference was found in the text == Unused Reference: 'RFC5109' is defined on line 3218, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-01 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 Summary: 6 errors (**), 0 flaws (~~), 22 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: August 2014 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 February 12, 2014 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-02.txt 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with 18 the provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on August 12, 2014. 38 Copyright and License Notice 40 Copyright (c) 2014 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with 48 respect to this document. Code Components extracted from this 49 document must include Simplified BSD License text as described in 50 Section 4.e of the Trust Legal Provisions and are provided without 51 warranty as described in the Simplified BSD License. 53 Abstract 55 This memo describes an RTP payload format for the video coding 56 standard ITU-T Recommendation H.265 and ISO/IEC International 57 Standard 23008-2, both also known as High Efficiency Video Coding 58 (HEVC) [HEVC], developed by the Joint Collaborative Team on Video 59 Coding (JCT-VC). The RTP payload format allows for packetization of 60 one or more Network Abstraction Layer (NAL) units in each RTP packet 61 payload, as well as fragmentation of a NAL unit into multiple RTP 62 packets. Furthermore, it supports transmission of an HEVC stream 63 over a single as well as multiple RTP flows. The payload format has 64 wide applicability in videoconferencing, Internet video streaming, 65 and high bit-rate entertainment-quality video, among others. 67 Table of Contents 69 Status of this Memo...............................................1 70 Abstract..........................................................3 71 Table of Contents.................................................3 72 1 . Introduction..................................................5 73 1.1 . Overview of the HEVC Codec...............................5 74 1.1.1 Coding-Tool Features..................................5 75 1.1.2 Systems and Transport Interfaces......................7 76 1.1.3 Parallel Processing Support..........................14 77 1.1.4 NAL Unit Header......................................16 78 1.2 . Overview of the Payload Format..........................17 79 2 . Conventions..................................................18 80 3 . Definitions and Abbreviations................................18 81 3.1 Definitions...............................................18 82 3.1.1 Definitions from the HEVC Specification..............18 83 3.1.2 Definitions Specific to This Memo....................20 84 3.2 Abbreviations.............................................21 85 4 . RTP Payload Format...........................................23 86 4.1 RTP Header Usage..........................................23 87 4.2 Payload Header Usage......................................25 88 4.3 Payload Structures........................................25 89 4.4 Transmission Modes........................................26 90 4.5 Decoding Order Number.....................................27 91 4.6 Single NAL Unit Packets...................................28 92 4.7 Aggregation Packets (APs).................................29 93 4.8 Fragmentation Units (FUs).................................34 94 4.9 PACI packets..............................................37 95 4.9.1 Reasons for the PACI rules (informative).............40 96 4.10 Payload Header Extensions................................41 97 5 . Packetization Rules..........................................43 98 6 . De-packetization Process.....................................43 99 7 . Payload Format Parameters....................................45 100 7.1 Media Type Registration...................................45 101 7.2 SDP Parameters............................................64 102 7.2.1 Mapping of Payload Type Parameters to SDP............64 103 7.2.2 Usage with SDP Offer/Answer Model....................65 104 7.2.3 Usage in Declarative Session Descriptions............73 105 7.2.4 Parameter Sets Considerations........................74 106 7.2.5 Dependency Signaling in Multi-Session Transmission...74 107 8 . Use with Feedback Messages...................................75 108 8.1 Use of HEVC with the RPSI Feedback Message................76 109 9 . Security Considerations......................................76 110 10 . Congestion Control..........................................78 111 11 . IANA Consideration..........................................79 112 12 . Acknowledgements............................................79 113 13 . References..................................................79 114 13.1 Normative References.....................................79 115 13.2 Informative References...................................81 116 14 . Authors' Addresses..........................................82 118 1. Introduction 120 1.1. Overview of the HEVC Codec 122 High Efficiency Video Coding [HEVC], formally known as ITU-T 123 Recommendation H.265 and ISO/IEC International Standard 23008-2 was 124 ratified by ITU-T in April 2013 and reportedly provides significant 125 coding efficiency gains over H.264 [H.264]. 127 As both H.264 [H.264] and its RTP payload format [RFC6184] are 128 widely deployed and generally known in the relevant implementer 129 communities, frequently only the differences between those two 130 specifications are highlighted in non-normative, explanatory parts 131 of this memo. Basic familiarity with both specifications is assumed 132 for those parts. However, the normative parts of this memo do not 133 require study of H.264 or its RTP payload format. 135 H.264 and HEVC share a similar hybrid video codec design. 136 Conceptually, both technologies include a video coding layer (VCL), 137 which is often used to refer to the coding-tool features, and a 138 network abstraction layer (NAL), which is often used to refer to the 139 systems and transport interface aspects of the codecs. 141 1.1.1 Coding-Tool Features 143 Similarly to earlier hybrid-video-coding-based standards, including 144 H.264, the following basic video coding design is employed by HEVC. 145 A prediction signal is first formed either by intra or motion 146 compensated prediction, and the residual (the difference between the 147 original and the prediction) is then coded. The gains in coding 148 efficiency are achieved by redesigning and improving almost all 149 parts of the codec over earlier designs. In addition, HEVC includes 150 several tools to make the implementation on parallel architectures 151 easier. Below is a summary of HEVC coding-tool features. 153 Quad-tree block and transform structure 155 One of the major tools that contribute significantly to the coding 156 efficiency of HEVC is the usage of flexible coding blocks and 157 transforms, which are defined in a hierarchical quad-tree manner. 158 Unlike H.264, where the basic coding block is a macroblock of fixed 159 size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size 160 of 64x64. Each CTU can be divided into smaller units in a 161 hierarchical quad-tree manner and can represent smaller blocks down 162 to size 4x4. Similarly, the transforms used in HEVC can have 163 different sizes, starting from 4x4 and going up to 32x32. Utilizing 164 large blocks and transforms contribute to the major gain of HEVC, 165 especially at high resolutions. 167 Entropy coding 169 HEVC uses a single entropy coding engine, which is based on Context 170 Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two 171 distinct entropy coding engines. CABAC in HEVC shares many 172 similarities with CABAC of H.264, but contains several improvements. 173 Those include improvements in coding efficiency and lowered 174 implementation complexity, especially for parallel architectures. 176 In-loop filtering 178 H.264 includes an in-loop adaptive deblocking filter, where the 179 blocking artifacts around the transform edges in the reconstructed 180 picture are smoothed to improve the picture quality and compression 181 efficiency. In HEVC, a similar deblocking filter is employed but 182 with somewhat lower complexity. In addition, pictures undergo a 183 subsequent filtering operation called Sample Adaptive Offset (SAO), 184 which is a new design element in HEVC. SAO basically adds a pixel- 185 level offset in an adaptive manner and usually acts as a de-ringing 186 filter. It is observed that SAO improves the picture quality, 187 especially around sharp edges contributing substantially to visual 188 quality improvements of HEVC. 190 Motion prediction and coding 192 There have been a number of improvements in this area that are 193 summarized as follows. The first category is motion merge and 194 advanced motion vector prediction (AMVP) modes. The motion 195 information of a prediction block can be inferred from the spatially 196 or temporally neighboring blocks. This is similar to the DIRECT 197 mode in H.264 but includes new aspects to incorporate the flexible 198 quad-tree structure and methods to improve the parallel 199 implementations. In addition, the motion vector predictor can be 200 signaled for improved efficiency. The second category is high- 201 precision interpolation. The interpolation filter length is 202 increased to 8-tap from 6-tap, which improves the coding efficiency 203 but also comes with increased complexity. In addition, the 204 interpolation filter is defined with higher precision without any 205 intermediate rounding operations to further improve the coding 206 efficiency. 208 Intra prediction and intra coding 210 Compared to 8 intra prediction modes in H.264, HEVC supports angular 211 intra prediction with 33 directions. This increased flexibility 212 improves both objective coding efficiency and visual quality as the 213 edges can be better predicted and ringing artifacts around the edges 214 can be reduced. In addition, the reference samples are adaptively 215 smoothed based on the prediction direction. To avoid contouring 216 artifacts a new interpolative prediction generation is included to 217 improve the visual quality. Furthermore, discrete sine transform 218 (DST) is utilized instead of traditional discrete cosine transform 219 (DCT) for 4x4 intra transform blocks. 221 Other coding-tool features 223 HEVC includes some tools for lossless coding and efficient screen 224 content coding, such as skipping the transform for certain blocks. 225 These tools are particularly useful for example when streaming the 226 user-interface of a mobile device to a large display. 228 1.1.2 Systems and Transport Interfaces 230 HEVC inherited the basic systems and transport interfaces designs, 231 such as the NAL-unit-based syntax structure, the hierarchical syntax 232 and data unit structure from sequence-level parameter sets, multi- 233 picture-level or picture-level parameter sets, slice-level header 234 parameters, lower-level parameters, the supplemental enhancement 235 information (SEI) message mechanism, the hypothetical reference 236 decoder (HRD) based video buffering model, and so on. In the 237 following, a list of differences in these aspects compared to H.264 238 is summarized. 240 Video parameter set 242 A new type of parameter set, called video parameter set (VPS), was 243 introduced. For the first (2013) version of [HEVC], the video 244 parameter set NAL unit is required to be available prior to its 245 activation, while the information contained in the video parameter 246 set is not necessary for operation of the decoding process. For 247 future HEVC extensions, such as the 3D or scalable extensions, the 248 video parameter set is expected to include information necessary for 249 operation of the decoding process, e.g. decoding dependency or 250 information for reference picture set construction of enhancement 251 layers. The VPS provides a "big picture" of a bitstream, including 252 what types of operation points are provided, the profile, tier, and 253 level of the operation points, and some other high-level properties 254 of the bitstream that can be used as the basis for session 255 negotiation and content selection, etc. (see section 7.1). 257 Profile, tier and level 259 The profile, tier and level syntax structure that can be included in 260 both VPS and sequence parameter set (SPS) includes 12 bytes data to 261 describe the entire bitstream (including all temporally scalable 262 layers, which are referred to as sub-layers in the HEVC 263 specification), and can optionally include more profile, tier and 264 level information pertaining to individual temporally scalable 265 layers. The profile indicator indicates the "best viewed as" 266 profile when the bitstream conforms to multiple profiles, similar to 267 the major brand concept in the ISO base media file format (ISOBMFF) 268 [ISOBMFF] and file formats derived based on ISOBMFF, such as the 269 3GPP file format [3GP]. The profile, tier and level syntax 270 structure also includes the indications of whether the bitstream is 271 free of frame-packed content, whether the bitstream is free of 272 interlaced source content and free of field pictures, i.e. contains 273 only frame pictures of progressive source, such that clients/players 274 with no support of post-processing functionalities for handling of 275 frame-packed or interlaced source content or field pictures can 276 reject those bitstreams. 278 Bitstream and elementary stream 280 HEVC includes a definition of an elementary stream, which is new 281 compared to H.264. An elementary stream consists of a sequence of 282 one or more bitstreams. An elementary stream that consists of two 283 or more bitstreams has typically been formed by splicing together 284 two or more bitstreams (or parts thereof). When an elementary 285 stream contains more than one bitstream, the last NAL unit of the 286 last access unit of a bitstream (except the last bitstream in the 287 elementary stream) must contain an end of bitstream NAL unit and the 288 first access unit of the subsequent bitstream must be an intra 289 random access point (IRAP) access unit. This IRAP access unit may 290 be a clean random access (CRA), broken link access (BLA), or 291 instantaneous decoding refresh (IDR) access unit. 293 Random access support 295 HEVC includes signaling in NAL unit header, through NAL unit types, 296 of IRAP pictures beyond IDR pictures. Three types of IRAP pictures, 297 namely IDR, CRA and BLA pictures are supported, wherein IDR pictures 298 are conventionally referred to as closed group-of-pictures (closed- 299 GOP) random access points, and CRA and BLA pictures are those 300 conventionally referred to as open-GOP random access points. BLA 301 pictures usually originate from splicing of two bitstreams or part 302 thereof at a CRA picture, e.g. during stream switching. To enable 303 better systems usage of IRAP pictures, altogether six different NAL 304 units are defined to signal the properties of the IRAP pictures, 305 which can be used to better match the stream access point (SAP) 306 types as defined in the ISOBMFF [ISOBMFF], which are utilized for 307 random access support in both 3GP-DASH [3GPDASH] and MPEG DASH 308 [MPEGDASH]. Pictures following an IRAP picture in decoding order 309 and preceding the IRAP picture in output order are referred to as 310 leading pictures associated with the IRAP picture. There are two 311 types of leading pictures, namely random access decodable leading 312 (RADL) pictures and random access skipped leading (RASL) pictures. 313 RADL pictures are decodable when the decoding started at the 314 associated IRAP picture, and RASL pictures are not decodable when 315 the decoding started at the associated IRAP picture and are usually 316 discarded. HEVC provides mechanisms to enable the specification of 317 conformance of bitstreams with RASL pictures being discarded, thus 318 to provide a standard-compliant way to enable systems components to 319 discard RASL pictures when needed. 321 Temporal scalability support 323 HEVC includes an improved support of temporal scalability, by 324 inclusion of the signaling of TemporalId in the NAL unit header, the 325 restriction that pictures of a particular temporal sub-layer cannot 326 be used for inter prediction reference by pictures of a lower 327 temporal sub-layer, the sub-bitstream extraction process, and the 328 requirement that each sub-bitstream extraction output be a 329 conforming bitstream. Media-aware network elements (MANEs) can 330 utilize the TemporalId in the NAL unit header for stream adaptation 331 purposes based on temporal scalability. 333 Temporal sub-layer switching support 335 HEVC specifies, through NAL unit types present in the NAL unit 336 header, the signaling of temporal sub-layer access (TSA) and 337 stepwise temporal sub-layer access (STSA). A TSA picture and 338 pictures following the TSA picture in decoding order do not use 339 pictures prior to the TSA picture in decoding order with TemporalId 340 greater than or equal to that of the TSA picture for inter 341 prediction reference. A TSA picture enables up-switching, at the 342 TSA picture, to the sub-layer containing the TSA picture or any 343 higher sub-layer, from the immediately lower sub-layer. An STSA 344 picture does not use pictures with the same TemporalId as the STSA 345 picture for inter prediction reference. Pictures following an STSA 346 picture in decoding order with the same TemporalId as the STSA 347 picture do not use pictures prior to the STSA picture in decoding 348 order with the same TemporalId as the STSA picture for inter 349 prediction reference. An STSA picture enables up-switching, at the 350 STSA picture, to the sub-layer containing the STSA picture, from the 351 immediately lower sub-layer. 353 Sub-layer reference or non-reference pictures 355 The concept and signaling of reference/non-reference pictures in 356 HEVC are different from H.264. In H.264, if a picture may be used 357 by any other picture for inter prediction reference, it is a 358 reference picture; otherwise it is a non-reference picture, and this 359 is signaled by two bits in the NAL unit header. In HEVC, a picture 360 is called a reference picture only when it is marked as "used for 361 reference". In addition, the concept of sub-layer reference picture 362 was introduced. If a picture may be used by another other picture 363 with the same TemporalId for inter prediction reference, it is a 364 sub-layer reference picture; otherwise it is a sub-layer non- 365 reference picture. Whether a picture is a sub-layer reference 366 picture or sub-layer non-reference picture is signaled through NAL 367 unit type values. 369 Extensibility 371 Besides the TemporalId in the NAL unit header, HEVC also includes 372 the signaling of a six-bit layer ID in the NAL unit header, which 373 must be equal to 0 for a single-layer bitstream. Extension 374 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice 375 headers, and so on. All these extension mechanisms enable future 376 extensions in a backward compatible manner, such that bitstreams 377 encoded according to potential future HEVC extensions can be fed to 378 then-legacy decoders (e.g. HEVC version 1 decoders) and the then- 379 legacy decoders can decode and output the base layer bitstream. 381 Bitstream extraction 383 HEVC includes a bitstream extraction process as an integral part of 384 the overall decoding process, as well as specification of the use of 385 the bitstream extraction process in description of bitstream 386 conformance tests as part of the hypothetical reference decoder 387 (HRD) specification. 389 Reference picture management 391 The reference picture management of HEVC, including reference 392 picture marking and removal from the decoded picture buffer (DPB) as 393 well as reference picture list construction (RPLC), differs from 394 that of H.264. Instead of the sliding window plus adaptive memory 395 management control operation (MMCO) based reference picture marking 396 mechanism in H.264, HEVC specifies a reference picture set (RPS) 397 based reference picture management and marking mechanism, and the 398 RPLC is consequently based on the RPS mechanism. A reference 399 picture set consists of a set of reference pictures associated with 400 a picture, consisting of all reference pictures that are prior to 401 the associated picture in decoding order, that may be used for inter 402 prediction of the associated picture or any picture following the 403 associated picture in decoding order. The reference picture set 404 consists of five lists of reference pictures; RefPicSetStCurrBefore, 405 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 406 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 407 RefPicSetLtCurr contain all reference pictures that may be used in 408 inter prediction of the current picture and that may be used in 409 inter prediction of one or more of the pictures following the 410 current picture in decoding order. RefPicSetStFoll and 411 RefPicSetLtFoll consist of all reference pictures that are not used 412 in inter prediction of the current picture but may be used in inter 413 prediction of one or more of the pictures following the current 414 picture in decoding order. RPS provides an "intra-coded" signaling 415 of the DPB status, instead of an "inter-coded" signaling, mainly for 416 improved error resilience. The RPLC process in HEVC is based on the 417 RPS, by signaling an index to an RPS subset for each reference 418 index. The RPLC process has been simplified compared to that in 419 H.264, by removal of the reference picture list modification (also 420 referred to as reference picture list reordering) process. 422 Ultra low delay support 424 HEVC specifies a sub-picture-level HRD operation, for support of the 425 so-called ultra-low delay. The mechanism specifies a standard- 426 compliant way to enable delay reduction below one picture interval. 427 Sub-picture-level coded picture buffer (CPB) and DPB parameters may 428 be signaled, and utilization of these information for the derivation 429 of CPB timing (wherein the CPB removal time corresponds to decoding 430 time) and DPB output timing (display time) is specified. Decoders 431 are allowed to operate the HRD at the conventional access-unit- 432 level, even when the sub-picture-level HRD parameters are present. 434 New SEI messages 436 HEVC inherits many H.264 SEI messages with changes in syntax and/or 437 semantics making them applicable to HEVC. Additionally, there are a 438 few new SEI messages reviewed briefly in the following paragraphs. 440 The structure of pictures SEI message provides information on the 441 NAL unit types, picture order count values, and prediction 442 dependencies of a sequence of pictures. The SEI message can be used 443 for example for concluding what impact a lost picture has on other 444 pictures. 446 The decoded picture hash SEI message provides a checksum derived 447 from the sample values of a decoded picture. It can be used for 448 detecting whether a picture was correctly received and decoded. 450 The active parameter sets SEI message includes the IDs of the active 451 video parameter set and the active sequence parameter set and can be 452 used to activate VPSs and SPSs. In addition, the SEI message 453 includes the following indications: 1) An indication of whether 454 "full random accessibility" is supported (when supported, all 455 parameter sets needed for decoding of the remaining of the bitstream 456 when random accessing from the beginning of the current coded video 457 sequence by completely discarding all access units earlier in 458 decoding order are present in the remaining bitstream and all coded 459 pictures in the remaining bitstream can be correctly decoded); 2) An 460 indication of whether there is no parameter set within the current 461 coded video sequence that updates another parameter set of the same 462 type preceding in decoding order. An update of a parameter set 463 refers to the use of the same parameter set ID but with some other 464 parameters changed. If this property is true for all coded video 465 sequences in the bitstream, then all parameter sets can be sent out- 466 of-band before session start. 468 The decoding unit information SEI message provides coded picture 469 buffer removal delay information for a decoding unit. The message 470 can be used in very-low-delay buffering operations. 472 The region refresh information SEI message can be used together with 473 the recovery point SEI message (present in both H.264 and HEVC) for 474 improved support of gradual decoding refresh (GDR). This supports 475 random access from inter-coded pictures, wherein complete pictures 476 can be correctly decoded or recovered after an indicated number of 477 pictures in output/display order. 479 1.1.3 Parallel Processing Support 481 The reportedly significantly higher encoding computational demand of 482 HEVC over H.264, in conjunction with the ever increasing video 483 resolution (both spatially and temporally) required by the market, 484 led to the adoption of VCL coding tools specifically targeted to 485 allow for parallelization on the sub-picture level. That is, 486 parallelization occurs, at the minimum, at the granularity of an 487 integer number of CTUs. The targets for this type of high-level 488 parallelization are multicore CPUs and DSPs as well as 489 multiprocessor systems. In a system design, to be useful, these 490 tools require signaling support, which is provided in Section 7 of 491 this memo. This section provides a brief overview of the tools 492 available in [HEVC]. 494 Many of the tools incorporated in HEVC were designed keeping in mind 495 the potential parallel implementations in multi-core/multi-processor 496 architectures. Specifically, for parallelization, four picture 497 partition strategies are available. 499 Slices are segments of the bitstream that can be reconstructed 500 independently from other slices within the same picture (though 501 there may still be interdependencies through loop filtering 502 operations). Slices are the only tool that can be used for 503 parallelization that is also available, in virtually identical form, 504 in H.264. Slices based parallelization does not require much inter- 505 processor or inter-core communication (except for inter-processor or 506 inter-core data sharing for motion compensation when decoding a 507 predictively coded picture, which is typically much heavier than 508 inter-processor or inter-core data sharing due to in-picture 509 prediction), as slices are designed to be independently decodable. 510 However, for the same reason, slices can require some coding 511 overhead. Further, slices (in contrast to some of the other tools 512 mentioned below) also serve as the key mechanism for bitstream 513 partitioning to match Maximum Transfer Unit (MTU) size requirements, 514 due to the in-picture independence of slices and the fact that each 515 regular slice is encapsulated in its own NAL unit. In many cases, 516 the goal of parallelization and the goal of MTU size matching can 517 place contradicting demands to the slice layout in a picture. The 518 realization of this situation led to the development of the more 519 advanced tools mentioned below. This payload format does not 520 contain any specific mechanisms aiding parallelization through 521 slices. 523 Dependent slice segments allow for fragmentation of a coded slice 524 into fragments at CTU boundaries without breaking any in-picture 525 prediction mechanism. They are complementary to the fragmentation 526 mechanism described in this memo in that they need the cooperation 527 of the encoder. As a dependent slice segment necessarily contains 528 an integer number of CTUs, a decoder using multiple cores operating 529 on CTUs can process a dependent slice segment without communicating 530 parts of the slice segment's bitstream to other cores. 531 Fragmentation, as specified in this memo, in contrast, does not 532 guarantee that a fragment contains an integer number of CTUs. 534 In wavefront parallel processing (WPP), the picture is partitioned 535 into rows of CTUs. Entropy decoding and prediction are allowed to 536 use data from CTUs in other partitions. Parallel processing is 537 possible through parallel decoding of CTU rows, where the start of 538 the decoding of a row is delayed by two CTUs, so to ensure that data 539 related to a CTU above and to the right of the subject CTU is 540 available before the subject CTU is being decoded. Using this 541 staggered start (which appears like a wavefront when represented 542 graphically), parallelization is possible with up to as many 543 processors/cores as the picture contains CTU rows. 545 Because in-picture prediction between neighboring CTU rows within a 546 picture is allowed, the required inter-processor/inter-core 547 communication to enable in-picture prediction can be substantial. 548 The WPP partitioning does not result in the creation of more NAL 549 units compared to when it is not applied, thus WPP cannot be used 550 for MTU size matching, though slices can be used in combination for 551 that purpose. 553 Tiles define horizontal and vertical boundaries that partition a 554 picture into tile columns and rows. The scan order of CTUs is 555 changed to be local within a tile (in the order of a CTU raster scan 556 of a tile), before decoding the top-left CTU of the next tile in the 557 order of tile raster scan of a picture. Similar to slices, tiles 558 break in-picture prediction dependencies (including entropy decoding 559 dependencies). However, they do not need to be included into 560 individual NAL units (same as WPP in this regard), hence tiles 561 cannot be used for MTU size matching, though slices can be used in 562 combination for that purpose. Each tile can be processed by one 563 processor/core, and the inter-processor/inter-core communication 564 required for in-picture prediction between processing units decoding 565 neighboring tiles is limited to conveying the shared slice header in 566 cases a slice is spanning more than one tile, and loop filtering 567 related sharing of reconstructed samples and metadata. Insofar, 568 tiles are less demanding in terms of inter-processor communication 569 bandwidth compared to WPP due to the in-picture independence between 570 two neighboring partitions. 572 1.1.4 NAL Unit Header 574 HEVC maintains the NAL unit concept of H.264 with modifications. 575 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 576 payload of a NAL unit refers to the NAL unit excluding the NAL unit 577 header. 579 +---------------+---------------+ 580 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 |F| Type | LayerId | TID | 583 +-------------+-----------------+ 585 Figure 1 The structure of HEVC NAL unit header 587 The semantics of the fields in the NAL unit header are as specified 588 in [HEVC] and described briefly below for convenience. In addition 589 to the name and size of each field, the corresponding syntax element 590 name in [HEVC] is also provided. 592 F: 1 bit 593 forbidden_zero_bit. MUST be zero. HEVC declares a value of 1 as 594 a syntax violation. Note that the inclusion of this bit in the 595 NAL unit header is to enable transport of HEVC video over MPEG-2 596 transport systems (avoidance of start code emulations) [MPEG2S]. 598 Type: 6 bits 599 nal_unit_type. This field specifies the NAL unit type as defined 600 in Table 7-1 of [HEVC]. If the most significant bit of this 601 field of a NAL unit is equal to 0 (i.e. the value of this field 602 is less than 32), the NAL unit is a VCL NAL unit. Otherwise, the 603 NAL unit is a non-VCL NAL unit. For a reference of all currently 604 defined NAL unit types and their semantics, please refer to 605 Section 7.4.1 in [HEVC]. 607 LayerId: 6 bits 608 nuh_layer_id. MUST be equal to zero. It is anticipated that in 609 future scalable or 3D video coding extensions of this 610 specification, this syntax element will be used to identify 611 additional layers that may be present in the coded video 612 sequence, wherein a layer may be, e.g. a spatial scalable layer, 613 a quality scalable layer, a texture view, or a depth view. 615 TID: 3 bits 616 nuh_temporal_id_plus1. This field specifies the temporal 617 identifier of the NAL unit plus 1. The value of TemporalId is 618 equal to TID minus 1. A TID value of 0 is illegal to ensure that 619 there is at least one bit in the NAL unit header equal to 1, so 620 to enable independent considerations of start code emulations in 621 the NAL unit header and in the NAL unit payload data. 623 1.2. Overview of the Payload Format 625 This payload format defines the following processes required for 626 transport of HEVC coded data over RTP [RFC3550]: 628 o Usage of RTP header with this payload format 630 o Packetization of HEVC coded NAL units into RTP packets using three 631 types of payload structures, namely single NAL unit packet, 632 aggregation packet, and fragment unit 634 o Transmission of HEVC NAL units of the same bitstream within a 635 single RTP stream (note that RTP stream is used equivalently as 636 RTP flow in this memo) or multiple RTP streams 638 o Media type parameters to be used with the Session Description 639 Protocol (SDP) [RFC4566] 641 o A payload header extension mechanism and data structures for 642 enhanced support of temporal scalability based on that extension 643 mechanism. 645 2. Conventions 647 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 648 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 649 document are to be interpreted as described in BCP 14, RFC 2119 650 [RFC2119]. 652 In this document, these key words will appear with that 653 interpretation only when in ALL CAPS. Lower case uses of these 654 words are not to be interpreted as carrying the RFC 2119 655 significance. 657 This specification uses the notion of setting and clearing a bit 658 when bit fields are handled. Setting a bit is the same as assigning 659 that bit the value of 1 (On). Clearing a bit is the same as 660 assigning that bit the value of 0 (Off). 662 3. Definitions and Abbreviations 664 3.1 Definitions 666 This document uses the terms and definitions of [HEVC]. Section 667 3.1.1 lists relevant definitions copied from [HEVC] for convenience. 668 Section 3.1.2 gives definitions specific to this memo. 670 3.1.1 Definitions from the HEVC Specification 672 access unit: A set of NAL units that are associated with each other 673 according to a specified classification rule, are consecutive in 674 decoding order, and contain exactly one coded picture. 676 BLA access unit: An access unit in which the coded picture is a BLA 677 picture. 679 BLA picture: An IRAP picture for which each VCL NAL unit has 680 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 682 coded video sequence: A sequence of access units that consists, in 683 decoding order, of an IRAP access unit with NoRaslOutputFlag equal 684 to 1, followed by zero or more access units that are not IRAP access 685 units with NoRaslOutputFlag equal to 1, including all subsequent 686 access units up to but not including any subsequent access unit that 687 is an IRAP access unit with NoRaslOutputFlag equal to 1. 689 Informative note: An IRAP access unit may be an IDR access unit, 690 a BLA access unit, or a CRA access unit. The value of 691 NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA 692 access unit, and each CRA access unit that is the first access 693 unit in the bitstream in decoding order, is the first access unit 694 that follows an end of sequence NAL unit in decoding order, or 695 has HandleCraAsBlaFlag equal to 1. 697 CRA access unit: An access unit in which the coded picture is a CRA 698 picture. 700 CRA picture: A RAP picture for which each VCL NAL unit has 701 nal_unit_type equal to CRA_NUT. 703 IDR access unit: An access unit in which the coded picture is an IDR 704 picture. 706 IDR picture: A RAP picture for which each VCL NAL unit has 707 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 709 IRAP access unit: An access unit in which the coded picture is an 710 IRAP picture. 712 IRAP picture: A coded picture for which each VCL NAL unit has 713 nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive. 715 layer: A set of VCL NAL units that all have a particular value of 716 nuh_layer_id and the associated non-VCL NAL units, or one of a set 717 of syntactical structures having a hierarchical relationship. 719 operation point: bitstream created from another bitstream by 720 operation of the sub-bitstream extraction process with the another 721 bitstream, a target highest TemporalId, and a target layer 722 identifier list as inputs. 724 random access: The act of starting the decoding process for a 725 bitstream at a point other than the beginning of the stream. 727 sub-layer: A temporal scalable layer of a temporal scalable 728 bitstream consisting of VCL NAL units with a particular value of the 729 TemporalId variable, and the associated non-VCL NAL units. 731 tile: A rectangular region of coding tree blocks within a particular 732 tile column and a particular tile row in a picture. 734 tile column: A rectangular region of coding tree blocks having a 735 height equal to the height of the picture and a width specified by 736 syntax elements in the picture parameter set. 738 tile row: A rectangular region of coding tree blocks having a height 739 specified by syntax elements in the picture parameter set and a 740 width equal to the width of the picture. 742 3.1.2 Definitions Specific to This Memo 744 dependent RTP stream: An RTP stream in an MST on which another RTP 745 stream depends. 747 highest RTP stream: The RTP stream in an MST on which no other RTP 748 stream depends. 750 media aware network element (MANE): A network element, such as a 751 middlebox or application layer gateway that is capable of parsing 752 certain aspects of the RTP payload headers or the RTP payload and 753 reacting to their contents. 755 Informative note: The concept of a MANE goes beyond normal 756 routers or gateways in that a MANE has to be aware of the 757 signaling (e.g. to learn about the payload type mappings of the 758 media streams), and in that it has to be trusted when working 759 with SRTP. The advantage of using MANEs is that they allow 760 packets to be dropped according to the needs of the media coding. 761 For example, if a MANE has to drop packets due to congestion on a 762 certain link, it can identify and remove those packets whose 763 elimination produces the least adverse effect on the user 764 experience. After dropping packets, MANEs must rewrite RTCP 765 packets to match the changes to the RTP stream as specified in 766 Section 7 of [RFC3550]. 768 multi-stream transmission (MST): Transmission of an HEVC bitstream 769 using more than one RTP stream. 771 NAL unit decoding order: A NAL unit order that conforms to the 772 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 774 NALU-time: The value that the RTP timestamp would have if the NAL 775 unit would be transported in its own RTP packet. 777 RTP stream: A sequence of RTP packets with increasing sequence 778 numbers (except for wrap-around), identical PT and identical SSRC 779 (Synchronization Source), carried in one RTP session. Within the 780 scope of this memo, one RTP stream is utilized to transport one or 781 more temporal sub-layers. 783 single-stream transmission (SST): Transmission of an HEVC bitstream 784 using only one RTP stream. 786 transmission order: The order of packets in ascending RTP sequence 787 number order (in modulo arithmetic). Within an aggregation packet, 788 the NAL unit transmission order is the same as the order of 789 appearance of NAL units in the packet. 791 3.2 Abbreviations 793 AP Aggregation Packet 795 BLA Broken Link Access 797 CRA Clean Random Access 799 CTB Coding Tree Block 801 CTU Coding Tree Unit 803 CVS Coded Video Sequence 805 FU Fragmentation Unit 806 GDR Gradual Decoding Refresh 808 HRD Hypothetical Reference Decoder 810 IDR Instantaneous Decoding Refresh 812 IRAP Intra Random Access Point 814 MANE Media Aware Network Element 816 MST Multi-Stream Transmission 818 MTU Maximum Transfer Unit 820 NAL Network Abstraction Layer 822 NALU Network Abstraction Layer Unit 824 PACI PAyload Content Information 826 PHES Payload Header Extension Structure 828 PPS Picture Parameter Set 830 RADL Random Access Decodable Leading (Picture) 832 RASL Random Access Skipped Leading (Picture) 834 RPS Reference Picture Set 836 SEI Supplemental Enhancement Information 838 SPS Sequence Parameter Set 840 SST Single-Stream Transmission 842 STSA Step-wise Temporal Sub-layer Access 844 TSA Temporal Sub-layer Access 846 VCL Video Coding Layer 848 VPS Video Parameter Set 850 4. RTP Payload Format 852 4.1 RTP Header Usage 854 The format of the RTP header is specified in [RFC3550] and reprinted 855 in Figure 2 for convenience. This payload format uses the fields of 856 the header in a manner consistent with that specification. 858 The RTP payload (and the settings for some RTP header bits) for 859 aggregation packets and fragmentation units are specified in 860 Sections 4.7 and 4.8, respectively. 862 0 1 2 3 863 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 865 |V=2|P|X| CC |M| PT | sequence number | 866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 867 | timestamp | 868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 869 | synchronization source (SSRC) identifier | 870 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 871 | contributing source (CSRC) identifiers | 872 | .... | 873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 875 Figure 2 RTP header according to [RFC3550] 877 The RTP header information to be set according to this RTP payload 878 format is set as follows: 880 Marker bit (M): 1 bit 882 Set for the last packet of the access unit indicated by the RTP 883 timestamp, in line with the normal use of the M bit in video 884 formats, to allow an efficient playout buffer handling. Decoders 885 can use this bit as an early indication of the last packet of an 886 access unit. 888 Informative note: The content of a NAL unit does not tell 889 whether or not the NAL unit is the last NAL unit, in decoding 890 order, of an access unit. An RTP sender implementation may 891 obtain this information from the video encoder. If, however, 892 the implementation cannot obtain this information directly 893 from the encoder, e.g. when the stream was pre-encoded, and 894 also there is no timestamp allocated for each NAL unit, then 895 the sender implementation can inspect subsequent NAL units in 896 decoding order to determine whether or not the NAL unit is the 897 last NAL unit of an access unit as follows. A NAL unit naluX 898 is the last NAL unit of an access unit if it is the last NAL 899 unit of the stream or the next VCL NAL unit naluY in decoding 900 order has the high-order bit of the first byte after its NAL 901 unit header equal to 1, and all NAL units between naluX and 902 naluY, when present, have nal_unit_type in the range of 32 to 903 35, inclusive, equal to 39, or in the ranges of 41 to 44, 904 inclusive, or 48 to 55, inclusive. 906 Payload type (PT): 7 bits 908 The assignment of an RTP payload type for this new packet format 909 is outside the scope of this document and will not be specified 910 here. The assignment of a payload type has to be performed 911 either through the profile used or in a dynamic way. 913 Sequence number (SN): 16 bits 915 Set and used in accordance with RFC 3550. 917 Timestamp: 32 bits 919 The RTP timestamp is set to the sampling timestamp of the 920 content. A 90 kHz clock rate MUST be used. 922 If the NAL unit has no timing properties of its own (e.g. 923 parameter set and SEI NAL units), the RTP timestamp is set to the 924 RTP timestamp of the coded picture of the access unit in which 925 the NAL unit is included, according to Section 7.4.2.4.4 of 926 [HEVC]. 928 Receivers SHOULD ignore the picture output timing information in 929 any picture timing SEI messages or decoding unit information SEI 930 messages as specified in [HEVC]. Instead, receivers SHOULD use 931 the RTP timestamp for the display process. Receivers MUST pass 932 picture timing SEI messages and decoding unit information SEI 933 messages to the decoder and MAY use the field/frame related 934 information for the display process e.g. when frame doubling or 935 frame tripling is indicated by the field/frame related 936 information. 938 4.2 Payload Header Usage 940 The TID value indicates (among other things) the relative importance 941 of an RTP packet, for example because NAL units belonging to higher 942 temporal sub-layers are not used for the decoding of lower temporal 943 sub-layers. A lower value of TID indicates a higher importance. 944 More important NAL units MAY be better protected against 945 transmission losses than less important NAL units. 947 4.3 Payload Structures 949 The first two bytes of the payload of an RTP packet are referred to 950 as the payload header. In most cases, the payload header consists 951 of the same fields (F, Type, LayerId, and TID) as the NAL unit 952 header as shown in section 1.1.4, irrespective of the type of the 953 payload structure. The single exception is an RTP packet carrying a 954 Payload Content Information (PACI) NAL-unit like structure. 956 Four different types of RTP packet payload structures are specified. 957 A receiver can identify the type of an RTP packet payload through 958 the Type field in the payload header. 960 The four different payload structures are as follows: 962 o Single NAL unit packet: Contains a single NAL unit in the 963 payload, and the NAL unit header of the NAL unit also serves as 964 the payload header. This payload structure is specified in 965 section 4.6. 967 o Aggregation packet (AP): Contains more than one NAL unit within 968 one access unit. This payload structure is specified in 969 section 4.7. 971 o Fragmentation unit (FU): Contains a subset of a single NAL unit. 972 This payload structure is specified in section 4.8. 974 o PACI carrying RTP packet: Contains a payload header (that differs 975 from other payload headers for efficiency), a Payload Header 976 Extension Structure (PHES), and a PACI payload. This payload 977 structure is specified in section 4.9. 979 4.4 Transmission Modes 981 This memo enables transmission of an HEVC bitstream over a single 982 RTP stream or multiple RTP streams. The concept and working 983 principle is inherited from the design of single and multiple 984 session transmission in [RFC6190] and follows a similar design. If 985 only one RTP stream is used for transmission of the HEVC bitstream, 986 the transmission mode is referred to as single-stream transmission 987 (SST); otherwise (more than one RTP stream is used for transmission 988 of the HEVC bitstream), the transmission mode is referred to as 989 multi-stream transmission (MST). 991 Dependency of one RTP stream on another RTP stream is indicated as 992 specified in [RFC5583]. In MST, the RTP stream on which on other 993 RTP stream depends is referred to as the highest RTP stream. When 994 an RTP stream A depends on another RTP stream B, the RTP stream B is 995 referred to as a dependent RTP stream of the RTP stream A. 997 Informative note: An MST may involve one or more RTP sessions. 998 For example, each RTP stream in an MST may be in its own RTP 999 session. For another example, a set of multiple RTP streams in 1000 an MST may belong to the same RTP session, e.g. as indicated by 1001 the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or 1002 [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1004 SST SHOULD be used for point-to-point unicast scenarios, while MST 1005 SHOULD be used for point-to-multipoint multicast scenarios where 1006 different receivers require different operation points of the same 1007 HEVC bitstream, to improve bandwidth utilizing efficiency. 1009 Informative note: A multicast may degrade to a unicast after all 1010 but one receivers have left (this is a justification of the first 1011 "SHOULD" instead of "MUST"), and there might be scenarios where 1012 MST is desirable but not possible e.g. when IP multicast is not 1013 deployed in certain network (this is a justification of the 1014 second "SHOULD" instead of "MUST"). 1016 Receivers MUST support both SST and MST. 1018 4.5 Decoding Order Number 1020 For each NAL unit, the variable AbsDon is derived, representing the 1021 decoding order number that is indicative of the NAL unit decoding 1022 order. 1024 Let NAL unit n be the n-th NAL unit in transmission order within an 1025 RTP stream. 1027 If sprop-depack-buf-nalus is equal to 0, AbsDon[n], the value of 1028 AbsDon for NAL unit n, is derived as equal to n. 1030 Otherwise (sprop-depack-buf-nalus is greater than 0), AbsDon[n] is 1031 derived as follows, where DON[n] is the value of the variable DON 1032 for NAL unit n: 1034 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in 1035 transmission order), AbsDon[0] is set equal to DON[0]. 1037 o Otherwise (n is greater than 0), the following applies for 1038 derivation of AbsDon[n]: 1040 If DON[n] == DON[n-1], 1041 AbsDon[n] = AbsDon[n-1] 1043 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1044 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1046 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1047 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1049 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1050 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n]) 1052 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1053 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1055 For any two NAL units m and n, the following applies: 1057 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1058 follows NAL unit m in NAL unit decoding order. 1060 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order 1061 of the two NAL units can be in either order. 1063 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes 1064 NAL unit m in decoding order. 1066 When two consecutive NAL units in the NAL unit decoding order have 1067 different values of AbsDon, the value of AbsDon for the second NAL 1068 unit in decoding order MUST be greater than the value of AbsDon for 1069 the first NAL unit, and the absolute difference between the two 1070 AbsDon values MAY be greater than or equal to 1. 1072 Informative note: There are multiple reasons to allow for the 1073 absolute difference of the values of AbsDon for two consecutive 1074 NAL units in the NAL unit decoding order to be greater than one. 1075 An increment by one is not required, as at the time of 1076 associating values of AbsDon to NAL units, it may not be known 1077 whether all NAL units are to be delivered to the receiver. For 1078 example, a gateway may not forward VCL NAL units of higher sub- 1079 layers or some SEI NAL units when there is congestion in the 1080 network. In another example, the first intra picture of a pre- 1081 encoded clip is transmitted in advance to ensure that it is 1082 readily available in the receiver, and when transmitting the 1083 first intra picture, the originator does not exactly know how 1084 many NAL units will be encoded before the first intra picture of 1085 the pre-encoded clip follows in decoding order. Thus, the values 1086 of AbsDon for the NAL units of the first intra picture of the 1087 pre-encoded clip have to be estimated when they are transmitted, 1088 and gaps in values of AbsDon may occur. Another example is MST 1089 where the AbsDon values must indicate cross-layer decoding order 1090 for NAL units conveyed in all the RTP streams. 1092 4.6 Single NAL Unit Packets 1094 A single NAL unit packet contains exactly one NAL unit, and consists 1095 of a payload header (denoted as PayloadHdr), an optional 16-bit DONL 1096 field (in network byte order), and the NAL unit payload data (the 1097 NAL unit excluding its NAL unit header) of the contained NAL unit, 1098 as shown in Figure 3. 1100 0 1 2 3 1101 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1102 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1103 | PayloadHdr | DONL (optional) | 1104 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1105 | | 1106 | NAL unit payload data | 1107 | | 1108 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1109 | :...OPTIONAL RTP padding | 1110 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1112 Figure 3 The structure a single NAL unit packet 1114 The payload header SHOULD be an exact copy of the NAL unit header of 1115 the contained NAL unit. However, the Type (i.e. nal_unit_type) 1116 field MAY be changed, e.g. when it is desirable to handle a CRA 1117 picture to be a BLA picture [JCTVC-J0107]. 1119 The DONL field, when present, specifies the value of the 16 least 1120 significant bits of the decoding order number of the contained NAL 1121 unit. 1123 If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be 1124 present, and the variable DON for the contained NAL unit is derived 1125 as equal to the value of the DONL field. Otherwise (sprop-depack- 1126 buf-nalus is equal to 0), the DONL field MUST NOT be present. 1128 4.7 Aggregation Packets (APs) 1130 Aggregation packets (APs) are introduced to enable the reduction of 1131 packetization overhead for small NAL units, such as most of the non- 1132 VCL NAL units, which are often only a few octets in size. 1134 An AP aggregates NAL units within one access unit. Each NAL unit to 1135 be carried in an AP is encapsulated in an aggregation unit. NAL 1136 units aggregated in one AP are in NAL unit decoding order. 1138 An AP consists of a payload header (denoted as PayloadHdr) followed 1139 by two or more aggregation units, as shown in Figure 4. 1141 0 1 2 3 1142 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1144 | PayloadHdr | | 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1146 | | 1147 | two or more aggregation units | 1148 | | 1149 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 | :...OPTIONAL RTP padding | 1151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1153 Figure 4 The structure of an aggregation packet 1155 The fields in the payload header are set as follows. The F bit MUST 1156 be equal to 0 if the F bit of each aggregated NAL unit is equal to 1157 zero; otherwise, it MUST be equal to 1. The Type field MUST be 1158 equal to 48. The value of LayerId MUST be equal to the lowest value 1159 of LayerId of all the aggregated NAL units. The value of TID MUST 1160 be the lowest value of TID of all the aggregated NAL units. 1162 Informative Note: All VCL NAL units in an AP have the same TID 1163 value since they belong to the same access unit. However, an AP 1164 may contain non-VCL NAL units for which the TID value in the NAL 1165 unit header may be different than the TID value of the VCL NAL 1166 units in the same AP. 1168 An AP MUST carry at least two aggregation units and can carry as 1169 many aggregation units as necessary; however, the total amount of 1170 data in an AP obviously MUST fit into an IP packet, and the size 1171 SHOULD be chosen so that the resulting IP packet is smaller than the 1172 MTU size so to avoid IP layer fragmentation. An AP MUST NOT contain 1173 Fragmentation Units (FUs) specified in section 4.8. APs MUST NOT be 1174 nested; i.e. an AP MUST NOT contain another AP. 1176 The first aggregation unit in an AP consists of an optional 16-bit 1177 DONL field (in network byte order) followed by a 16-bit unsigned 1178 size information (in network byte order) that indicates the size of 1179 the NAL unit in bytes (excluding these two octets, but including the 1180 NAL unit header), followed by the NAL unit itself, including its NAL 1181 unit header, as shown in Figure 5. 1183 0 1 2 3 1184 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1186 : DONL (optional) | NALU size | 1187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1188 | NALU size | | 1189 +-+-+-+-+-+-+-+-+ NAL unit | 1190 | | 1191 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1192 | : 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1195 Figure 5 The structure of the first aggregation unit in an AP 1197 The DONL field, when present, specifies the value of the 16 least 1198 significant bits of the decoding order number of the aggregated NAL 1199 unit. 1201 If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be 1202 present in an aggregation unit that is the first aggregation unit in 1203 an AP, and the variable DON for the aggregated NAL unit is derived 1204 as equal to the value of the DONL field. Otherwise (sprop-depack- 1205 buf-nalus is equal to 0), the DONL field MUST NOT be present in an 1206 aggregation unit that is the first aggregation unit in an AP. 1208 An aggregation unit that is not the first aggregation unit in an AP 1209 consists of an optional 8-bit DOND field followed by a 16-bit 1210 unsigned size information (in network byte order) that indicates the 1211 size of the NAL unit in bytes (excluding these two octets, but 1212 including the NAL unit header), followed by the NAL unit itself, 1213 including its NAL unit header, as shown in Figure 6. 1215 0 1 2 3 1216 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1218 : DOND(optional)| NALU size | 1219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1220 | | 1221 | NAL unit | 1222 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1223 | : 1224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1226 Figure 6 The structure of an aggregation unit that is not the first 1227 aggregation unit in an AP 1229 When present, the DOND field plus 1 specifies the difference between 1230 the decoding order number values of the current aggregated NAL unit 1231 and the preceding aggregated NAL unit in the same AP. 1233 If sprop-depack-buf-nalus is greater than 0, the DOND field MUST be 1234 present in an aggregation unit that is not the first aggregation 1235 unit in an AP, and the variable DON for the aggregated NAL unit is 1236 derived as equal to the DON of the preceding aggregated NAL unit in 1237 the same AP plus the value of the DOND field plus 1 modulo 65536. 1238 Otherwise (sprop-depack-buf-nalus is equal to 0), the DOND field 1239 MUST NOT be present in an aggregation unit that is not the first 1240 aggregation unit in an AP. 1242 Figure 7 presents an example of an AP that contains two aggregation 1243 units, labeled as 1 and 2 in the figure, without the DONL and DOND 1244 fields being present. 1246 0 1 2 3 1247 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | RTP Header | 1250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1251 | PayloadHdr | NALU 1 Size | 1252 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1253 | NALU 1 HDR | | 1254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1255 | . . . | 1256 | | 1257 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1258 | . . . | NALU 2 Size | NALU 2 HDR | 1259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1260 | NALU 2 HDR | | 1261 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1262 | . . . | 1263 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1264 | :...OPTIONAL RTP padding | 1265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 Figure 7 An example of an AP packet containing two aggregation units 1268 without the DONL and DOND fields 1270 Figure 8 presents an example of an AP that contains two aggregation 1271 units, labeled as 1 and 2 in the figure, with the DONL and DOND 1272 fields being present. 1274 0 1 2 3 1275 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1277 | RTP Header | 1278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1279 | PayloadHdr | NALU 1 DONL | 1280 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1281 | NALU 1 Size | NALU 1 HDR | 1282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1283 | | 1284 | NALU 1 Data . . . | 1285 | | 1286 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1287 | | NALU 2 DOND | NALU 2 Size | 1288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1289 | NALU 2 HDR | | 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1291 | | 1292 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 | :...OPTIONAL RTP padding | 1294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1296 Figure 8 An example of an AP containing two aggregation units with 1297 the DONL and DOND fields 1299 4.8 Fragmentation Units (FUs) 1301 Fragmentation units (FUs) are introduced to enable fragmenting a 1302 single NAL unit into multiple RTP packets, possibly without 1303 cooperation or knowledge of the HEVC encoder. A fragment of a NAL 1304 unit consists of an integer number of consecutive octets of that NAL 1305 unit. Fragments of the same NAL unit MUST be sent in consecutive 1306 order with ascending RTP sequence numbers (with no other RTP packets 1307 within the same RTP stream being sent between the first and last 1308 fragment). 1310 When a NAL unit is fragmented and conveyed within FUs, it is 1311 referred to as a fragmented NAL unit. APs MUST NOT be fragmented. 1312 FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of 1313 another FU. 1315 The RTP timestamp of an RTP packet carrying an FU is set to the 1316 NALU-time of the fragmented NAL unit. 1318 An FU consists of a payload header (denoted as PayloadHdr), an FU 1319 header of one octet, an optional 16-bit DONL field (in network byte 1320 order), and an FU payload, as shown in Figure 9. 1322 0 1 2 3 1323 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1325 | PayloadHdr | FU header | DONL(optional)| 1326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1327 | DONL(optional)| | 1328 |-+-+-+-+-+-+-+-+ | 1329 | FU payload | 1330 | | 1331 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1332 | :...OPTIONAL RTP padding | 1333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1335 Figure 9 The structure of an FU 1337 The fields in the payload header are set as follows. The Type field 1338 MUST be equal to 49. The fields F, LayerId, and TID MUST be equal 1339 to the fields F, LayerId, and TID, respectively, of the fragmented 1340 NAL unit. 1342 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1343 field, as shown in Figure 10. 1345 +---------------+ 1346 |0|1|2|3|4|5|6|7| 1347 +-+-+-+-+-+-+-+-+ 1348 |S|E| FuType | 1349 +---------------+ 1351 Figure 10 The structure of FU header 1353 The semantics of the FU header fields are as follows: 1354 S: 1 bit 1355 When set to one, the S bit indicates the start of a fragmented 1356 NAL unit i.e. the first byte of the FU payload is also the first 1357 byte of the payload of the fragmented NAL unit. When the FU 1358 payload is not the start of the fragmented NAL unit payload, the 1359 S bit MUST be set to zero. 1361 E: 1 bit 1362 When set to one, the E bit indicates the end of a fragmented NAL 1363 unit, i.e. the last byte of the payload is also the last byte of 1364 the fragmented NAL unit. When the FU payload is not the last 1365 fragment of a fragmented NAL unit, the E bit MUST be set to zero. 1367 FuType: 6 bits 1368 The field FuType MUST be equal to the field Type of the 1369 fragmented NAL unit. 1371 The DONL field, when present, specifies the value of the 16 least 1372 significant bits of the decoding order number of the fragmented NAL 1373 unit. 1375 If sprop-depack-buf-nalus is greater than 0, and the S bit is equal 1376 to 1, the DONL field MUST be present in the FU, and the variable DON 1377 for the fragmented NAL unit is derived as equal to the value of the 1378 DONL field. Otherwise (sprop-depack-buf-nalus is equal to 0, or the 1379 S bit is equal to 0), the DONL field MUST NOT be present in the FU. 1381 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1382 the Start bit and End bit MUST NOT both be set to one in the same FU 1383 header. 1385 The FU payload consists of fragments of the payload of the 1386 fragmented NAL unit so that if the FU payloads of consecutive FUs, 1387 starting with an FU with the S bit equal to 1 and ending with an FU 1388 with the E bit equal to 1, are sequentially concatenated, the 1389 payload of the fragmented NAL unit can be reconstructed. The NAL 1390 unit header of the fragmented NAL unit is not included as such in 1391 the FU payload, but rather the information of the NAL unit header of 1392 the fragmented NAL unit is conveyed in F, LayerId, and TID fields of 1393 the FU payload headers of the FUs and the FuType field of the FU 1394 header of the FUs. An FU payload MAY have any number of octets and 1395 MAY be empty. 1397 Informative note: Empty FU payloads are allowed to reduce the 1398 latency of a certain class of senders in nearly lossless 1399 environments. These senders can be characterized in that they 1400 packetize fragments of a NAL unit before the NAL unit is 1401 completely generated and, hence, before the NAL unit size is 1402 known. If zero-length FU payloads were not allowed, the sender 1403 would have to generate at least one bit of data of the following 1404 fragment of the NAL unit before the current FU could be sent. 1405 Due to the characteristics of HEVC, where sometimes several CTUs 1406 occupy zero bits, this is undesirable and can add delay. 1407 However, the (potential) use of zero-length FU payloads should be 1408 carefully weighted against the increased risk of the loss of at 1409 least a part of the fragmented NAL unit because of the additional 1410 packets employed for its transmission. 1412 If an FU is lost, the receiver SHOULD discard all following 1413 fragmentation units in transmission order corresponding to the same 1414 fragmented NAL unit, unless the decoder in the receiver is known to 1415 be prepared to gracefully handle incomplete NAL units. 1417 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1418 fragments of a NAL unit to an (incomplete) NAL unit, even if 1419 fragment n of that NAL unit is not received. In this case, the 1420 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1421 syntax violation. 1423 4.9 PACI packets 1425 This section specifies the PACI packet structure, based on a payload 1426 header extension mechanism that is generic and extensible to carry 1427 payload header extensions. 1429 The structure of an RTP packet carrying a Payload Header Extension 1430 Structure (PHES) and a PACI payload is as follows: 1432 0 1 2 3 1433 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1435 | RTP Header | 1436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1437 |F| PACI=50 | LayerId | TID |A| Type | PHSsize |F0..2|X| 1438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1439 | Payload Header Extension Structure (PHES) | 1440 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1441 | | 1442 | PACI payload: NAL unit | 1443 | . . . | 1444 | | 1445 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1446 | :...OPTIONAL RTP padding | 1447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1449 Figure 11 The structure of a PACI 1451 The semantics of the fields are as follows: 1453 F: 1 bit 1454 Forbidden_zero-bit. MUST be zero. 1456 PACI: 6 bits 1457 Indicates a PACI, and must be 50. 1459 LayerId: 6 bits 1460 Copy of the LayerId field of the PACI payload NAL unit or NAL 1461 unit like structure 1463 TID: 3 bits 1464 Copy of the TID field of the PACI payload NAL unit or NAL unit 1465 like structure 1467 A: 1 bit 1468 Copy of the F bit of the PACI payload NAL unit or NAL unit like 1469 structure 1471 Type: 6 bits 1472 Copy of the Type field of the PACI payload NAL unit or NAL unit 1473 like structure 1475 PHSsize: 5 bits 1476 Indicates the total length of the PHES. The value is limited to 1477 be less than or equal to 32 octets, to simplify encoder design 1478 for MTU size matching. 1480 F0..2: 3 bits 1481 Each of the three bits indicate, when set, the presence of an 1482 optional field (or set of fields) in the PHES. 1484 X: 1 bit 1485 The X bit, when set, indicates the presence of another octet 1486 consisting of seven flags and another X bit, each of the seven 1487 flags indicating the presence of more PHES fields (for future 1488 extensions). 1490 PHES: variable number of octets 1491 A variable number of octets as indicated by the value of PHSsize. 1493 PACI Payload 1494 The NAL unit or NAL unit like structure (such as: FU or AP) to be 1495 carried, not including the first two octets. 1497 Informative note: The first two octets of the NAL unit or NAL 1498 unit like structure carried in the PACI payload are not 1499 included in the PACI payload. Rather, the respective values 1500 are copied in locations of the PayloadHdr of the RTP packet. 1501 This design offers two advantages: first, the overall 1502 structure of the payload header is preserved, i.e. there is no 1503 special case of payload header structure that needs to be 1504 implemented for PACI. Second, no additional overhead is 1505 introduced. 1507 A PACI payload MAY be a single NAL unit, an FU, or an AP. PACIs 1508 MUST NOT be fragmented or aggregated. The following subsection 1509 documents the reasons for these design choices. 1511 4.9.1 Reasons for the PACI rules (informative) 1513 A PACI cannot be fragmented. If a PACI could be fragmented, and a 1514 fragment other than the first fragment would get lost, access to the 1515 information in the PACI would not be possible. Therefore, a PACI 1516 must not be fragmented. In other words, an FU must not carry 1517 (fragments of) a PACI. 1519 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1520 from a compression viewpoint, as, in many cases, several to be 1521 aggregated NAL units would share identical PACI fields and values 1522 which would be carried redundantly for no reason. Most, if not all 1523 the practical effects of PACI aggregation can be achieved by 1524 aggregating NAL units and bundling them with a PACI (see below). 1525 Therefore, a PACI must not be aggregated. In other words, an AP 1526 must not contain a PACI. 1528 The payload of a PACI can be a fragment. Both middleboxes and 1529 sending systems with inflexible (often hardware-based) encoders 1530 occasionally find themselves in situations where a PACI and its 1531 headers, combined, are larger than the MTU size. In such a 1532 scenario, the middlebox or sender can fragment the NAL unit and 1533 encapsulate the fragment in a PACI. Doing so preserves the payload 1534 header extension information for all fragments, allowing downstream 1535 middleboxes and the receiver to take advantage of that information. 1536 Therefore, a sender may place a fragment into a PACI, and a receiver 1537 must be able to handle such a PACI. 1539 The payload of a PACI can be an aggregation NAL unit. HEVC 1540 bitstreams can contain unevenly sized and/or small (when compared to 1541 the MTU size) NAL units. In order to efficiently packetize such 1542 small NAL units, AP were introduced. The benefits of APs are 1543 independent from the need for a payload header extension. 1544 Therefore, a sender may place an AP into a PACI, and a receiver must 1545 be able to handle such a PACI. 1547 4.10 Payload Header Extensions 1549 This section describes the single payload header extension defined 1550 in this specification. If, in the future, additional payload header 1551 extensions become necessary, they could be specified in this section 1552 of an updated version of this document, or in their own documents. 1554 When bit 0 of the field F0..2 is set to 1 in a PACI, this indicates 1555 the presence of the temporal scalability information fields 1556 TL0REFIDX, IrapPicID, S, and E as follows: 1558 0 1 2 3 1559 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1561 | RTP Header | 1562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1563 |F| PACI=50 | LayerId | TID |A| Type | PHSsize |F0..2|X| 1564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1565 | TL0REFIDX | IrapPicID |S|E| reserved | | 1566 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1567 | .... | 1568 | PACI payload: NAL unit | 1569 | | 1570 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1571 | :...OPTIONAL RTP padding | 1572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1574 Figure 12 The structure of a PACI with a PHES containing some 1575 temporal scalability information 1577 TL0PICIDX (8 bits) 1578 When present, the TL0PICIDX field MUST be set to equal to 1579 temporal_sub_layer_zero_idx as specified in Section D.3.32 of 1580 [H.265] for the access unit containing the NAL unit in the PACI. 1582 IrapPicID (8 bits) 1583 When present, the IrapPicID field MUST be set to equal to 1584 irap_pic_id as specified in Section D.3.32 of [H.265] for the 1585 access unit containing the NAL unit in the PACI. 1587 S (1 bit) 1588 The S bit MUST be set to 1 if any of the following conditions is 1589 true and MUST be set to 0 otherwise: 1591 . The NAL unit in the payload of the PACI is the first VCL NAL 1592 unit, in decoding order, of a picture. 1593 . The NAL unit in the payload of the PACI is an AP and the NAL 1594 unit in the first contained aggregation unit is the first VCL 1595 NAL unit, in decoding order, of a picture. 1596 . The NAL unit in the payload of the PACI is an FU with its S bit 1597 equal to 1 and the FU payload containing a fragment of the 1598 first VCL NAL unit, in decoding order of a picture. 1600 E (1 bit) 1601 The E bit MUST be set to 1 if any of the following conditions is 1602 true and MUST be set to 0 otherwise: 1604 . The NAL unit in the payload of the PACI is the last VCL NAL 1605 unit, in decoding order, of a picture. 1606 . The NAL unit in the payload of the PACI is an AP and the NAL 1607 unit in the last contained aggregation unit is the last VCL NAL 1608 unit, in decoding order, of a picture. 1609 . The NAL unit in the payload of the PACI is an FU with its E bit 1610 equal to 1 and the FU payload containing a fragment of the last 1611 VCL NAL unit, in decoding order of a picture. 1613 The values of bits 1 and 2 of the field F0..2 MUST be set to 0, the 1614 value of the X bit MUST be set to 0, and the value of PHSsize MUST 1615 be set to 3. Receivers SHALL allow other values of the fields 1616 F0..2, X, and PHSsize, and SHALL any ignore additional fields, when 1617 present, than specified above in the PHES. 1619 5. Packetization Rules 1621 The following packetization rules apply: 1623 o If sprop-depack-buf-nalus is greater than 0 for an RTP stream, 1624 the transmission order of NAL units carried in the RTP stream MAY 1625 be different than the NAL unit decoding order. Otherwise (sprop- 1626 depack-buf-nalus is equal to 0 for an RTP stream), the 1627 transmission order of NAL units carried in the RTP stream MUST be 1628 the same as the NAL unit decoding order. 1630 o A NAL unit of a small size SHOULD be encapsulated in an 1631 aggregation packet together with one or more other NAL units in 1632 order to avoid the unnecessary packetization overhead for small 1633 NAL units. For example, non-VCL NAL units such as access unit 1634 delimiters, parameter sets, or SEI NAL units are typically small 1635 and can often be aggregated with VCL NAL units without violating 1636 MTU size constraints. 1638 o Each non-VCL NAL unit SHOULD be encapsulated in an aggregation 1639 packet together with its associated VCL NAL unit, as typically a 1640 non-VCL NAL unit would be meaningless without the associated VCL 1641 NAL unit being available. 1643 o For carrying exactly one NAL unit in an RTP packet, a single NAL 1644 unit packet MUST be used. 1646 6. De-packetization Process 1648 The general concept behind de-packetization is to get the NAL units 1649 out of the RTP packets in an RTP stream and all the dependent RTP 1650 streams, if any, and pass them to the decoder in the NAL unit 1651 decoding order. 1653 The de-packetization process is implementation dependent. 1654 Therefore, the following description should be seen as an example of 1655 a suitable implementation. Other schemes may be used as well as 1656 long as the output for the same input is the same as the process 1657 described below. The output is the same when the set of NAL units 1658 and their order are both identical. Optimizations relative to the 1659 described algorithms are possible. 1661 All normal RTP mechanisms related to buffer management apply. In 1662 particular, duplicated or outdated RTP packets (as indicated by the 1663 RTP sequences number and the RTP timestamp) are removed. To 1664 determine the exact time for decoding, factors such as a possible 1665 intentional delay to allow for proper inter-stream synchronization 1666 must be factored in. 1668 NAL units with NAL unit type values in the range of 0 to 47, 1669 inclusive may be passed to the decoder. NAL-unit-like structures 1670 with NAL unit type values in the range of 48 to 63, inclusive, MUST 1671 NOT be passed to the decoder. 1673 The receiver includes a receiver buffer, which is used to compensate 1674 for transmission delay jitter, to reorder NAL units from 1675 transmission order to the NAL unit decoding order, and to recover 1676 the NAL unit decoding order in MST, when applicable. In this 1677 section, the receiver operation is described under the assumption 1678 that there is no transmission delay jitter. To make a difference 1679 from a practical receiver buffer that is also used for compensation 1680 of transmission delay jitter, the receiver buffer is here after 1681 called the de-packetization buffer in this section. Receivers 1682 SHOULD also prepare for transmission delay jitter; i.e. either 1683 reserve separate buffers for transmission delay jitter buffering and 1684 de-packetization buffering or use a receiver buffer for both 1685 transmission delay jitter and de-packetization. Moreover, receivers 1686 SHOULD take transmission delay jitter into account in the buffering 1687 operation; e.g. by additional initial buffering before starting of 1688 decoding and playback. 1690 There are two buffering states in the receiver: initial buffering 1691 and buffering while playing. Initial buffering starts when the 1692 reception is initialized. After initial buffering, decoding and 1693 playback are started, and the buffering-while-playing mode is used. 1695 Regardless of the buffering state, the receiver stores incoming NAL 1696 units, in reception order, into the de-packetization buffer. NAL 1697 units carried in RTP packets are stored in the de-packetization 1698 buffer individually, and the value of AbsDon is calculated and 1699 stored for each NAL unit. When MST is in use, NAL units of all RTP 1700 streams are stored in the same de-packetization buffer. 1702 Initial buffering lasts until condition A (the number of NAL units 1703 in the de-packetization buffer is greater than the value of sprop- 1704 depack-buf-nalus of the highest RTP stream) is true. 1706 After initial buffering, whenever condition A is true, the following 1707 operation is repeatedly applied until condition A becomes false: 1709 o The NAL unit in the de-packetization buffer with the smallest 1710 value of AbsDon is removed from the de-packetization buffer and 1711 passed to the decoder. 1713 When no more NAL units are flowing into the de-packetization buffer, 1714 all NAL units remaining in the de-packetization buffer are removed 1715 from the buffer and passed to the decoder in the order of increasing 1716 AbsDon values. 1718 7. Payload Format Parameters 1720 This section specifies the parameters that MAY be used to select 1721 optional features of the payload format and certain features or 1722 properties of the bitstream. The parameters are specified here as 1723 part of the media type registration for the HEVC codec. A mapping 1724 of the parameters into the Session Description Protocol (SDP) 1725 [RFC4566] is also provided for applications that use SDP. 1726 Equivalent parameters could be defined elsewhere for use with 1727 control protocols that do not use SDP. 1729 7.1 Media Type Registration 1731 The media subtype for the HEVC codec is allocated from the IETF 1732 tree. 1734 The receiver MUST ignore any unspecified parameter. 1736 Media Type name: video 1738 Media subtype name: H265 1740 Required parameters: none 1742 OPTIONAL parameters: 1744 In the following definitions of parameters, "the stream" or "the 1745 NAL unit stream" refers to all NAL units conveyed in the current 1746 RTP stream in SST, and all NAL units conveyed in the current RTP 1747 stream and all NAL units conveyed in other RTP streams that the 1748 current RTP stream depends on in MST. 1750 profile-space, profile-id: 1752 The profile-space parameter indicates the context for 1753 interpretation of the profile-id parameter value. The 1754 profile, which specifies the subset of coding tools that may 1755 have been used to generate the stream or that the receiver 1756 supports, as specified in [HEVC], is defined by the 1757 combination of profile-space and profile-id. Note that 1758 profile-space is required to be equal to 0 in [HEVC], but 1759 other values for it may be specified in the future by ITU-T or 1760 ISO/IEC. 1762 If the profile-space and profile-id parameters are used to 1763 indicate properties of a NAL unit stream, it indicates that, 1764 to decode the stream, the minimum subset of coding tools a 1765 decoder has to support is the profile specified by both 1766 parameters. 1768 If the profile-space and profile-id parameters are used for 1769 capability exchange or session setup, it indicates the subset 1770 of coding tools, which is equal to the profile, that the codec 1771 supports for both receiving and sending. 1773 If no profile-space is present, a value of 0 MUST be inferred 1774 and if no profile-id is present the Main profile (i.e. a value 1775 of 1) MUST be inferred. 1777 When used to indicate properties of a NAL unit stream, the 1778 profile-space and profile-id parameters are derived from the 1779 sequence parameter set or video parameter set NAL units, as 1780 specified in [HEVC], as follows. 1782 If the RTP stream is not a dependent RTP stream, the 1783 following applies: 1785 o profile_space = general_profile_space 1786 o profile_id = general_profile_idc 1788 Otherwise (the RTP stream is a dependent RTP stream), the 1789 following applies, with j being the value of the sub-layer- 1790 id parameter: 1792 o profile_space = sub_layer_profile_space[j] 1793 o profile_id = sub_layer_profile_idc[j] 1795 tier-flag, level-id: 1797 The tier-flag parameter indicates the context for 1798 interpretation of the level-id value. The default level, 1799 which limits values of syntax elements or on arithmetic 1800 combinations of values of syntax elements, as specified in 1801 [HEVC], is defined by the combination of tier-flag and level- 1802 id. 1804 If the tier-flag and level-id parameters are used to indicate 1805 properties of a NAL unit stream, it indicates that, to decode 1806 the stream the lowest level the decoder has to support is the 1807 default level. 1809 If the tier-flag and level-id parameters are used for 1810 capability exchange or session setup, the following applies. 1811 If max-recv-level-id is not present, the default level defined 1812 by tier-flag and level-id indicates the highest level the 1813 codec wishes to support. Otherwise, tier-flag and max-recv- 1814 level-id indicate the highest level the codec supports for 1815 receiving. For either receiving or sending, all levels that 1816 are lower than the highest level supported MUST also be 1817 supported. 1819 If no tier-flag is present, a value of 0 MUST be inferred and 1820 if no level-id is present, a value of 93 (i.e. level 3.1) MUST 1821 be inferred. 1823 When used to indicate properties of a NAL unit stream, the 1824 tier-flag and level-id parameters are derived from the 1825 sequence parameter set or video parameter set NAL units, as 1826 specified in [HEVC], as follows. 1828 If the RTP stream is not a dependent RTP stream, the 1829 following applies: 1831 o tier-flag = general_tier_flag 1832 o level-id = general_level_idc 1834 Otherwise (the RTP stream is a dependent RTP stream), the 1835 following applies, with j being the value of the sub-layer- 1836 id parameter: 1838 o tier-flag = sub_layer_tier_flag[j] 1839 o level-id = sub_layer_level_idc[j] 1841 interop-constraints: 1843 A base16 [RFC4648] (hexadecimal) representation of the six 1844 bytes derived from the sequence parameter set or video 1845 parameter set NAL units as specified in [HEVC] consisting of 1846 progressive_source_flag, interlaced_source_flag, 1847 non_packed_constraint_flag, frame_only_constraint_flag, and 1848 reserved_zero_44bits. Note that reserved_zero_44bits is 1849 required to be equal to 0 in [HEVC], but other values for it 1850 may be specified in the future by ITU-T or ISO/IEC. 1852 If no interop-constraints are present, the following MUST be 1853 inferred: 1855 o progressive_source_flag = 1 1856 o interlaced_source_flag = 0 1857 o non_packed_constraint_flag = 1 1858 o frame_only_constraint_flag = 1 1859 o reserved_zero_44bits = 0 1861 When used to indicate properties of a NAL unit stream, the 1862 following applies. 1864 If the RTP stream is not a dependent RTP stream, the 1865 following applies: 1867 o progressive_source_flag = general_progressive_source_flag 1868 o interlaced_source_flag = general_interlaced_source_flag 1869 o non_packed_constraint_flag = 1870 general_non_packed_constraint_flag 1871 o frame_only_constraint_flag = 1872 general_frame_only_constraint_flag 1873 o reserved_zero_44bits = general_reserved_zero_44bits 1875 Otherwise (the RTP stream is a dependent RTP stream), the 1876 following applies, with j being the value of the sub-layer- 1877 id parameter: 1879 o progressive_source_flag = 1880 sub_layer_progressive_source_flag[j] 1881 o interlaced_source_flag = 1882 sub_layer_interlaced_source_flag[j] 1883 o non_packed_constraint_flag = 1884 sub_layer_non_packed_constraint_flag[j] 1885 o frame_only_constraint_flag = 1886 sub_layer_frame_only_constraint_flag[j] 1887 o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j] 1889 profile-compatibility-indicator: 1891 A base16 [RFC4648] representation of the four bytes 1892 representing the 32 profile compatibility flags in the 1893 sequence parameter set or video parameter set NAL units. A 1894 decoder conforming to a certain profile may be able to decode 1895 bitstreams conforming to other profiles. The profile- 1896 compatibility-indicator provides exact information of the 1897 ability of a decoder conforming to a certain profile to decode 1898 bitstreams conforming to another profile. More concretely, if 1899 the profile compatibility flag corresponding to the profile a 1900 decoder conforms to is set, then the decoder is able to decode 1901 any bitstream with the flag set, irrespective of the profile 1902 the bitstream conforms to (provided that the decoder supports 1903 the highest level of the bitstream). 1905 When used to indicate properties of a NAL unit stream, the 1906 following applies. 1908 If the RTP stream is not a dependent RTP stream, the 1909 following applies with j = 0..31: 1911 o The 32 flags = general_profile_compatibility_flag[j] 1913 Otherwise (the RTP stream is a dependent RTP stream), the 1914 following applies with i being the value of the sub-layer- 1915 id parameter and j = 0..31: 1917 o The 32 flags = sub_layer_profile_compatibility_flag[i][j] 1919 sub-layer-id: 1921 This parameter MAY be used to indicate the highest allowed 1922 value of TID in the stream. When not present, the value of 1923 sub-layer-id is inferred to be equal to 6. 1925 recv-sub-layer-id: 1927 This parameter MAY be used to signal a receiver's choice of 1928 the offered or declared sub-layers in the sprop-vps. The 1929 value of recv-sub-layer-id indicates the TID of the highest 1930 sub-layer of the stream that a receiver supports. When not 1931 present, the value of recv-sub-layer-id is inferred to be 1932 equal to sub-layer-id. 1934 max-recv-level-id: 1936 This parameter MAY be used, together with tier-flag, to 1937 indicate the highest level a receiver supports. The highest 1938 level the receiver supports is equal to the value of max-recv- 1939 level-id divided by 30 for the Main or High tier (as 1940 determined by tier-flag equal to 0 or 1, respectively). 1942 When max-recv-level-id is not present, the value is inferred 1943 to be equal to level-id. 1945 max-recv-level-id MUST NOT be present when the highest level 1946 the receiver supports is not higher than the default level. 1948 sprop-vps: 1950 This parameter MAY be used to convey any video parameter set 1951 NAL unit of the stream. When present, the parameter MAY be 1952 used to indicate codec capability and sub-stream 1953 characteristics (i.e. properties of sub-layer representations 1954 as defined in [HEVC]) as well as for out-of-band transmission 1955 of video parameter sets. The value of the parameter is a 1956 comma-separated (',') list of base64 [RFC4648] representations 1957 of the video parameter set NAL units as specified in Section 1958 7.3.2.1 of [HEVC]. 1960 sprop-sps: 1962 This parameter MAY be used to convey sequence parameter set 1963 NAL units of the stream for out-of-band transmission of 1964 sequence parameter sets. The value of the parameter is a 1965 comma-separated (',') list of base64 [RFC4648] representations 1966 of the sequence parameter set NAL units as specified in 1967 Section 7.3.2.2 of [HEVC]. 1969 sprop-pps: 1971 This parameter MAY be used to convey picture parameter set NAL 1972 units of the stream for out-of-band transmission of picture 1973 parameter sets. The value of the parameter is a comma- 1974 separated (',') list of base64 [RFC4648] representations of 1975 the picture parameter set NAL units as specified in Section 1976 7.3.2.3 of [HEVC]. 1978 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 1980 These parameters MAY be used to signal the capabilities of a 1981 receiver implementation. These parameters MUST NOT be used 1982 for any other purpose. The highest level (specified by tier- 1983 flag and max-recv-level-id) MUST be such that the receiver is 1984 fully capable of supporting. max-lsr, max-lps, max-cpb, max- 1985 dpb, max-br, max-tr, and max-tc MAY be used to indicate 1986 capabilities of the receiver that extend the required 1987 capabilities of the highest level, as specified below. 1989 When more than one parameter from the set (max-lsr, max-lps, 1990 max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the 1991 receiver MUST support all signaled capabilities 1992 simultaneously. For example, if both max-lsr and max-br are 1993 present, the highest level with the extension of both the 1994 picture rate and bitrate is supported. That is, the receiver 1995 is able to decode NAL unit streams in which the luma sample 1996 rate is up to max-lsr (inclusive), the bitrate is up to max-br 1997 (inclusive), the coded picture buffer size is derived as 1998 specified in the semantics of the max-br parameter below, and 1999 the other properties comply with the highest level specified 2000 by tier-flag and max-recv-level-id. 2002 Informative note: When the OPTIONAL media type parameters 2003 are used to signal the properties of a NAL unit stream, and 2004 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and 2005 max-tc are not present, the values of profile-space, 2006 profile-id, tier-flag, and level-id must always be such 2007 that the NAL unit stream complies fully with the specified 2008 profile and level. 2010 max-lsr: 2011 The value of max-lsr is an integer indicating the maximum 2012 processing rate in units of luma samples per second. The max- 2013 lsr parameter signals that the receiver is capable of decoding 2014 video at a higher rate than is required by the highest level. 2016 When max-lsr is signaled, the receiver MUST be able to decode 2017 NAL unit streams that conform to the highest level, with the 2018 exception that the MaxLumaSR value in Table A-2 of [HEVC] for 2019 the highest level is replaced with the value of max-lsr. The 2020 value of max-lsr MUST be greater than or equal to the value of 2021 MaxLumaSR given in Table A-2 of [HEVC] for the highest level. 2022 Senders MAY use this knowledge to send pictures of a given 2023 size at a higher picture rate than is indicated in the highest 2024 level. 2026 When not present, the value of max-lsr is inferred to be equal 2027 to the value of MaxLumaSR given in Table A-2 of [HEVC] for the 2028 highest level. 2030 max-lps: 2031 The value of max-lps is an integer indicating the maximum 2032 picture size in units of luma samples. The max-lps parameter 2033 signals that the receiver is capable of decoding larger 2034 picture sizes than are required by the highest level. When 2035 max-lps is signaled, the receiver MUST be able to decode NAL 2036 unit streams that conform to the highest level, with the 2037 exception that the MaxLumaPS value in Table A-1 of [HEVC] for 2038 the highest level is replaced with the value of max-lps. The 2039 value of max-lps MUST be greater than or equal to the value of 2040 MaxLumaPS given in Table A-1 of [HEVC] for the highest level. 2041 Senders MAY use this knowledge to send larger pictures at a 2042 proportionally lower picture rate than is indicated in the 2043 highest level. 2045 When not present, the value of max-lps is inferred to be equal 2046 to the value of MaxLumaPS given in Table A-1 of [HEVC] for the 2047 highest level. 2049 max-cpb: 2050 The value of max-cpb is an integer indicating the maximum 2051 coded picture buffer size in units of CpbBrVclFactor bits for 2052 the VCL HRD parameters and in units of CpbBrNalFactor bits for 2053 the NAL HRD parameters, where CpbBrVclFactor and 2054 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The max- 2055 cpb parameter signals that the receiver has more memory than 2056 the minimum amount of coded picture buffer memory required by 2057 the highest level. When max-cpb is signaled, the receiver 2058 MUST be able to decode NAL unit streams that conform to the 2059 highest level, with the exception that the MaxCPB value in 2060 Table A-1 of [HEVC] for the highest level is replaced with the 2061 value of max-cpb. The value of max-cpb MUST be greater than 2062 or equal to the value of MaxCPB given in Table A-1 of [HEVC] 2063 for the highest level. Senders MAY use this knowledge to 2064 construct coded video streams with greater variation of 2065 bitrate than can be achieved with the MaxCPB value in Table A- 2066 1 of [HEVC]. 2068 When not present, the value of max-cpb is inferred to be equal 2069 to the value of MaxCPB given in Table A-1 of [HEVC] for the 2070 highest level. 2072 Informative note: The coded picture buffer is used in the 2073 hypothetical reference decoder (Annex C of HEVC). The use 2074 of the hypothetical reference decoder is recommended in 2075 HEVC encoders to verify that the produced bitstream 2076 conforms to the standard and to control the output bitrate. 2077 Thus, the coded picture buffer is conceptually independent 2078 of any other potential buffers in the receiver, including 2079 de-packetization and de-jitter buffers. The coded picture 2080 buffer need not be implemented in decoders as specified in 2081 Annex C of HEVC, but rather standard-compliant decoders can 2082 have any buffering arrangements provided that they can 2083 decode standard-compliant bitstreams. Thus, in practice, 2084 the input buffer for a video decoder can be integrated with 2085 de-packetization and de-jitter buffers of the receiver. 2087 max-dpb: 2088 The value of max-dpb is an integer indicating the maximum 2089 decoded picture buffer size in units decoded pictures at the 2090 MaxLumaPS for the highest level, i.e. number of decoded 2091 pictures at the maximum picture size defined by the highest 2092 level. The value of max-dpb MUST be smaller than or equal to 2093 16. The max-dpb parameter signals that the receiver has more 2094 memory than the minimum amount of decoded picture buffer 2095 memory required by default, which is MaxDpbPicBuf as defined 2096 in [HEVC] (equal to 6). When max-dpb is signaled, the 2097 receiver MUST be able to decode NAL unit streams that conform 2098 to the highest level, with the exception that the 2099 MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with 2100 the value of max-dpb. Consequently, a receiver that signals 2101 max-dpb MUST be capable of storing the following number of 2102 decoded pictures (MaxDpbSize) in its decoded picture buffer: 2104 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2105 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2106 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2107 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2108 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) ) 2109 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2110 else 2111 MaxDpbSize = max-dpb 2112 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest 2113 level and PicSizeInSamplesY is the current size of each 2114 decoded picture in units of luma samples as defined in [HEVC]. 2116 The value of max-dpb MUST be greater than or equal to the 2117 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders 2118 MAY use this knowledge to construct coded video streams with 2119 improved compression. 2121 When not present, the value of max-dpb is inferred to be equal 2122 to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2124 Informative note: This parameter was added primarily to 2125 complement a similar codepoint in the ITU-T Recommendation 2126 H.245, so as to facilitate signaling gateway designs. The 2127 decoded picture buffer stores reconstructed samples. There 2128 is no relationship between the size of the decoded picture 2129 buffer and the buffers used in RTP, especially de- 2130 packetization and de-jitter buffers. 2132 max-br: 2133 The value of max-br is an integer indicating the maximum video 2134 bitrate in units of CpbBrVclFactor bits per second for the VCL 2135 HRD parameters and in units of CpbBrNalFactor bits per second 2136 for the NAL HRD parameters, where CpbBrVclFactor and 2137 CpbBrNalFactor are defined in Section A.4 of [HEVC]. 2139 The max-br parameter signals that the video decoder of the 2140 receiver is capable of decoding video at a higher bitrate than 2141 is required by the highest level. 2143 When max-br is signaled, the video codec of the receiver MUST 2144 be able to decode NAL unit streams that conform to the highest 2145 level, with the following exceptions in the limits specified 2146 by the highest level: 2148 o The value of max-br replaces the MaxBR value in Table A-2 2149 of [HEVC] for the highest level. 2150 o When the max-cpb parameter is not present, the result of 2151 the following formula replaces the value of MaxCPB in Table 2152 A-1 of [HEVC]: 2154 (MaxCPB of the highest level) * max-br / (MaxBR of the 2155 highest level) 2157 For example, if a receiver signals capability for Main profile 2158 Level 2 with max-br equal to 2000, this indicates a maximum 2159 video bitrate of 2000 kbits/sec for VCL HRD parameters, a 2160 maximum video bitrate of 2200 kbits/sec for NAL HRD 2161 parameters, and a CPB size of 2000000 bits (2000000 / 1500000 2162 * 1500000). 2164 The value of max-br MUST be greater than or equal to the value 2165 MaxBR given in Table A-2 of [HEVC] for the highest level. 2167 Senders MAY use this knowledge to send higher bitrate video as 2168 allowed in the level definition of Annex A of HEVC to achieve 2169 improved video quality. 2171 When not present, the value of max-br is inferred to be equal 2172 to the value of MaxBR given in Table A-2 of [HEVC] for the 2173 highest level. 2175 Informative note: This parameter was added primarily to 2176 complement a similar codepoint in the ITU-T Recommendation 2177 H.245, so as to facilitate signaling gateway designs. The 2178 assumption that the network is capable of handling such 2179 bitrates at any given time cannot be made from the value of 2180 this parameter. In particular, no conclusion can be drawn 2181 that the signaled bitrate is possible under congestion 2182 control constraints. 2184 max-tr: 2185 The value of max-tr is an integer indication the maximum 2186 number of tile rows. The max-tr parameter signals that the 2187 receiver is capable of decoding video with a larger number of 2188 tile rows than the value allowed by the highest level. 2190 When max-tr is signaled, the receiver MUST be able to decode 2191 NAL unit streams that conform to the highest level, with the 2192 exception that the MaxTileRows value in Table A-1 of [HEVC] 2193 for the highest level is replaced with the value of max-tr. 2195 The value of max-tr MUST be greater than or equal to the value 2196 of MaxTileRows given in Table A-1 of [HEVC] for the highest 2197 level. Senders MAY use this knowledge to send pictures 2198 utilizing a larger number of tile rows than the value allowed 2199 by the highest level. 2201 When not present, the value of max-tr is inferred to be equal 2202 to the value of MaxTileRows given in Table A-1 of [HEVC] for 2203 the highest level. 2205 max-tc: 2206 The value of max-tc is an integer indication the maximum 2207 number of tile columns. The max-tc parameter signals that the 2208 receiver is capable of decoding video with a larger number of 2209 tile columns than the value allowed by the highest level. 2211 When max-tc is signaled, the receiver MUST be able to decode 2212 NAL unit streams that conform to the highest level, with the 2213 exception that the MaxTileCols value in Table A-1 of [HEVC] 2214 for the highest level is replaced with the value of max-tc. 2216 The value of max-tc MUST be greater than or equal to the value 2217 of MaxTileCols given in Table A-1 of [HEVC] for the highest 2218 level. Senders MAY use this knowledge to send pictures 2219 utilizing a larger number of tile columns than the value 2220 allowed by the highest level. 2222 When not present, the value of max-tc is inferred to be equal 2223 to the value of MaxTileCols given in Table A-1 of [HEVC] for 2224 the highest level. 2226 max-fps: 2228 The value of max-fps is an integer indicating the maximum 2229 picture rate in units of hundreds of pictures per second that 2230 can be efficiently received. The max-fps parameter MAY be 2231 used to signal that the receiver has a constraint in that it 2232 is not capable of decoding video efficiently at the full 2233 picture rate that is implied by the highest level and, when 2234 present, one or more of the parameters max-lsr, max-lps, and 2235 max-br. 2237 The value of max-fps is not necessarily the picture rate at 2238 which the maximum picture size can be sent, it constitutes a 2239 constraint on maximum picture rate for all resolutions. 2241 Informative note: The max-fps parameter is semantically 2242 different from max-lsr, max-lps, max-cpb, max-dpb, max-br, 2243 max-tr, and max-tc in that max-fps is used to signal a 2244 constraint, lowering the maximum picture rate from what is 2245 implied by other parameters. 2247 The encoder MUST use a picture rate equal to or less than this 2248 value. In cases where the max-fps parameter is absent the 2249 encoder is free to choose any picture rate according to the 2250 highest level and any signaled optional parameters. 2252 sprop-depack-buf-nalus: 2254 This parameter specifies the maximum number of NAL units that 2255 precede a NAL unit in the de-packetization buffer in reception 2256 order and follow the NAL unit in decoding order. 2258 The value of sprop-depack-buf-nalus MUST be an integer in the 2259 range of 0 to 32767, inclusive. 2261 When not present, the value of sprop-depack-buf-nalus is 2262 inferred to be equal to 0. 2264 When the RTP stream depends on one or more other RTP streams 2265 (in this case MST is in use), this parameter MUST be present 2266 and the value MUST be greater than 0. 2268 Informative note: When the RTP stream does not depends on 2269 other RTP streams, either MST or SST may be in use. 2271 sprop-depack-buf-bytes: 2273 This parameter signals the required size of the de- 2274 packetization buffer in units of bytes. The value of the 2275 parameter MUST be greater than or equal to the maximum buffer 2276 occupancy (in units of bytes) of the de-packetization buffer 2277 as specified in section 6. 2279 The value of sprop-depack-buf-bytes MUST be an integer in the 2280 range of 0 to 4294967295, inclusive. 2282 When the RTP stream depends on one or more other RTP streams 2283 (in this case MST is in use) or sprop-depack-buf-nalus is 2284 present and is greater than 0, this parameter MUST be present 2285 and the value MUST be greater than 0. 2287 Informative note: The value of sprop-depack-buf-bytes 2288 indicates the required size of the de-packetization buffer 2289 only. When network jitter can occur, an appropriately 2290 sized jitter buffer has to be available as well. 2292 depack-buf-cap: 2294 This parameter signals the capabilities of a receiver 2295 implementation and indicates the amount of de-packetization 2296 buffer space in units of bytes that the receiver has available 2297 for reconstructing the NAL unit decoding order. A receiver is 2298 able to handle any stream for which the value of the sprop- 2299 depack-buf-bytes parameter is smaller than or equal to this 2300 parameter. 2302 When not present, the value of depack-buf-cap is inferred to 2303 be equal to 4294967295. The value of depack-buf-cap MUST be 2304 an integer in the range of 1 to 4294967295, inclusive. 2306 Informative note: depack-buf-cap indicates the maximum 2307 possible size of the de-packetization buffer of the 2308 receiver only. When network jitter can occur, an 2309 appropriately sized jitter buffer has to be available as 2310 well. 2312 sprop-segmentation-id: 2314 This parameter MAY be used to signal the segmentation tools 2315 present in the stream and that can be used for 2316 parallelization. The value of sprop-segmentation-id MUST be 2317 an integer in the range of 0 to 3, inclusive. When not 2318 present, the value of sprop-segmentation-id is inferred to be 2319 equal to 0. 2321 When sprop-segmentation-id is equal to 0, no information about 2322 the segmentation tools is provided. When sprop-segmentation- 2323 id is equal to 1, it indicates that slices are present in the 2324 stream. When sprop-segmentation-id is equal to 2, it 2325 indicates that tiles are present in the stream. When sprop- 2326 segmentation-id is equal to 3, it indicates that WPP is used 2327 in the stream. 2329 sprop-spatial-segmentation-idc: 2331 A base16 [RFC4648] representation of the syntax element 2332 min_spatial_segmentation_idc as specified in [HEVC]. This 2333 parameter MAY be used to describe parallelization capabilities 2334 of the stream. 2336 dec-parallel-cap: 2338 This parameter MAY be used to indicate the decoder's 2339 additional decoding capabilities given the presence of tools 2340 enabling parallel decoding, such as slices, tiles, and WPP, in 2341 the video stream. The decoding capability of the decoder may 2342 vary with the setting of the parallel decoding tools present 2343 in the stream, e.g. the size of the tiles that are present in 2344 a stream. Therefore, multiple capability points may be 2345 provided, each indicating the minimum required decoding 2346 capability that is associated with a parallelism requirement, 2347 which is a requirement on the video stream that enables 2348 parallel decoding. 2350 Each capability point is defined as a combination of 1) a 2351 parallelism requirement, 2) a profile (determined by profile- 2352 space and profile-id), 3) a highest level, and 4) a maximum 2353 processing rate, a maximum picture size, and a maximum video 2354 bitrate that may be equal to or greater than that determined 2355 by the highest level. The parameter's syntax in ABNF 2356 [RFC5234] is as follows: 2358 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2359 cap-point) "}" 2361 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2362 cap-parameter) 2364 spatial-seg-idc = 1*4DIGIT ; 1-4095 2366 cap-parameter = tier-flag / level-id / max-lsr 2367 / max-lps / max-br 2369 The set of capability points expressed by the dec-parallel-cap 2370 parameter is enclosed in a pair of curly braces ("{}"). Each 2371 set of two consecutive capability points is separated by a 2372 comma (','). Within each capability point, each set of two 2373 consecutive parameters, and when present, their values, is 2374 separated by a semicolon (';'). 2376 The profile of all capability points is determined by profile- 2377 space and profile-id that are outside the dec-parallel-cap 2378 parameter. 2380 Each capability point starts with an indication of the 2381 parallelism requirement, which consists of a parallel tool 2382 type, which may be equal to 'w' or 't', and a decimal value of 2383 the spatial-seg-idc parameter. When the type is 'w', the 2384 capability point is valid only for H.265 bitstreams with WPP 2385 in use, i.e. entropy_coding_sync_enabled_flag equal to 1. 2386 When the type is 't', the capability point is valid only for 2387 H.265 bitstreams with WPP not in use (i.e. 2388 entropy_coding_sync_enabled_flag equal to 0). The capability- 2389 point is valid only for H.265 bitstreams with 2390 min_spatial_segmentation_idc equal to or greater than spatial- 2391 seg-idc. 2393 The value of spatial-seg-idc MUST be greater than 0. 2395 After the parallelism requirement indication, each capability 2396 point continues with one or more pairs of parameter and value 2397 in any order for any of the following parameters: 2399 o tier-flag 2400 o level-id 2401 o max-lsr 2402 o max-lps 2403 o max-br 2405 At most one occurrence of each of the above five parameters is 2406 allowed within each capability point. 2408 The values of dec-parallel-cap.tier-flag and dec-parallel- 2409 cap.level-id for a capability point indicate the highest level 2410 of the capability point. The values of dec-parallel-cap.max- 2411 lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for 2412 a capability point indicate the maximum processing rate in 2413 units of luma samples per second, the maximum picture size in 2414 units of luma samples, and the maximum video bitrate (in units 2415 of CpbBrVclFactor bits per second for the VCL HRD parameters 2416 and in units of CpbBrNalFactor bits per second for the NAL HRD 2417 parameters where CpbBrVclFactor and CpbBrNalFactor are defined 2418 in Section A.4 of [HEVC]). 2420 When not present, the value of dec-parallel-cap.tier-flag is 2421 inferred to be equal to the value of tier-flag outside the 2422 dec-parallel-cap parameter. When not present, the value of 2423 dec-parallel-cap.level-id is inferred to be equal to the value 2424 of max-recv-level-id outside the dec-parallel-cap parameter. 2425 When not present, the value of dec-parallel-cap.max-lsr, dec- 2426 parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred 2427 to be equal to the value of max-lsr, max-lps, or max-br, 2428 respectively, outside the dec-parallel-cap parameter. 2430 The general decoding capability, expressed by the set of 2431 parameters outside of dec-parallel-cap, is defined as the 2432 capability point that is determined by the following 2433 combination of parameters: 1) the parallelism requirement 2434 corresponding to the value of sprop-segmentation-id equal to 0 2435 for a stream, 2) the profile determined by profile-space and 2436 profile-id, 3) the highest level determined by tier-flag and 2437 max-recv-level-id, and 4) the maximum processing rate, the 2438 maximum picture size, and the maximum video bitrate determined 2439 by the highest level. The general decoding capability MUST 2440 NOT be included as one of the set of capability points in the 2441 dec-parallel-cap parameter. 2443 For example, the following parameters express the general 2444 decoding capability of 720p30 (Level 3.1) plus an additional 2445 decoding capability of 1080p30 (Level 4) given that the 2446 spatially largest tile or slice used in the bitstream is equal 2447 to or less than 1/3 of the picture size: 2449 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120} 2451 For another example, the following parameters express an 2452 additional decoding capability of 1080p30, using dec-parallel- 2453 cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is 2454 used in the stream: 2456 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2457 max-lsr=62668800;max-lps=2088960} 2459 Informative note: When min_spatial_segmentation_idc is 2460 present in a stream and WPP is not used, [HEVC] specifies 2461 that there is no slice or no tile in the stream containing 2462 more than 4 * PicSizeInSamplesY / 2463 ( min_spatial_segmentation_idc + 4 ) luma samples. 2465 Encoding considerations: 2467 This type is only defined for transfer via RTP (RFC 3550). 2469 Security considerations: 2471 See Section 9 of RFC XXXX. 2473 Public specification: 2475 Please refer to Section 13 of RFC XXXX. 2477 Additional information: None 2479 File extensions: none 2481 Macintosh file type code: none 2483 Object identifier or OID: none 2484 Person & email address to contact for further information: 2486 Intended usage: COMMON 2488 Author: See Section 14 of RFC XXXX. 2490 Change controller: 2492 IETF Audio/Video Transport Payloads working group delegated 2493 from the IESG. 2495 7.2 SDP Parameters 2497 The receiver MUST ignore any parameter unspecified in this memo. 2499 7.2.1 Mapping of Payload Type Parameters to SDP 2501 The media type video/H265 string is mapped to fields in the Session 2502 Description Protocol (SDP) [RFC4566] as follows: 2504 o The media name in the "m=" line of SDP MUST be video. 2506 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the 2507 media subtype). 2509 o The clock rate in the "a=rtpmap" line MUST be 90000. 2511 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2512 flag", "level-id", "interop-constraints", "profile-compatibility- 2513 indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level- 2514 id", "max-lsr", "max-lps", "max-cpb", "max-dpb", "max-br", "max- 2515 tr", "max-tc", "max-fps", "sprop-depack-buf-nalus", "sprop- 2516 depack-buf-bytes", "depack-buf-cap", "sprop-segmentation-id", 2517 "sprop-spatial-segmentation-idc", and "dec-parallel-cap", when 2518 present, MUST be included in the "a=fmtp" line of SDP. This 2519 parameter is expressed as a media type string, in the form of a 2520 semicolon separated list of parameter=value pairs. 2522 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 2523 pps", when present, MUST be included in the "a=fmtp" line of SDP 2524 or conveyed using the "fmtp" source attribute as specified in 2525 section 6.3 of [RFC5576]. For a particular media format (i.e. 2526 RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST 2527 NOT be both included in the "a=fmtp" line of SDP and conveyed 2528 using the "fmtp" source attribute. When included in the "a=fmtp" 2529 line of SDP, these parameters are expressed as a media type 2530 string, in the form of a semicolon separated list of 2531 parameter=value pairs. When conveyed using the "fmtp" source 2532 attribute, these parameters are only associated with the given 2533 source and payload type as parts of the "fmtp" source attribute. 2535 Informative note: Conveyance of "sprop-vps", "sprop-sps", and 2536 "sprop-pps" using the "fmtp" source attribute allows for out- 2537 of-band transport of parameter sets in topologies like Topo- 2538 Video-switch-MCU as specified in [RFC5117]. 2540 An example of media representation in SDP is as follows: 2542 m=video 49170 RTP/AVP 98 2543 a=rtpmap:98 H265/90000 2544 a=fmtp:98 profile-id=1; 2545 sprop-vps=