idnits 2.17.1 draft-ietf-payload-rtp-h265-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 8, 2014) is 3420 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1136 == Unused Reference: 'RFC6190' is defined on line 3838, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-mmusic-sdp-bundle-negotiation' is defined on line 3869, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-05 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-02 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-02 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: June 2015 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 December 8, 2014 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-07.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization 22 of one or more Network Abstraction Layer (NAL) units in each RTP 23 packet payload, as well as fragmentation of a NAL unit into 24 multiple RTP packets. Furthermore, it supports transmission of 25 an HEVC bitstream over a single as well as multiple RTP streams. 26 When multiple RTP streams are used, a single or multiple 27 transports may be utilized. The payload format has wide 28 applicability in videoconferencing, Internet video streaming, and 29 high bit-rate entertainment-quality video, among others. 31 Status of this Memo 33 This Internet-Draft is submitted to IETF in full conformance with 34 the provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six 42 months and may be updated, replaced, or obsoleted by other 43 documents at any time. It is inappropriate to use Internet- 44 Drafts as reference material or to cite them other than as "work 45 in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on June 8, 2015. 55 Copyright and License Notice 57 Copyright (c) 2014 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with 65 respect to this document. Code Components extracted from this 66 document must include Simplified BSD License text as described in 67 Section 4.e of the Trust Legal Provisions and are provided 68 without warranty as described in the Simplified BSD License. 70 Table of Contents 72 Abstract.........................................................1 73 Status of this Memo..............................................1 74 Table of Contents................................................3 75 1 Introduction...................................................5 76 1.1 Overview of the HEVC Codec................................5 77 1.1.1 Coding-Tool Features.................................5 78 1.1.2 Systems and Transport Interfaces.....................7 79 1.1.3 Parallel Processing Support.........................14 80 1.1.4 NAL Unit Header.....................................16 81 1.2 Overview of the Payload Format...........................18 82 2 Conventions...................................................18 83 3 Definitions and Abbreviations.................................19 84 3.1 Definitions..............................................19 85 3.1.1 Definitions from the HEVC Specification.............19 86 3.1.2 Definitions Specific to This Memo...................21 87 3.2 Abbreviations............................................23 88 4 RTP Payload Format............................................24 89 4.1 RTP Header Usage.........................................24 90 4.2 Payload Header Usage.....................................27 91 4.3 Payload Structures.......................................27 92 4.4 Transmission Modes.......................................28 93 4.5 Decoding Order Number....................................29 94 4.6 Single NAL Unit Packets..................................31 95 4.7 Aggregation Packets (APs)................................32 96 4.8 Fragmentation Units (FUs)................................37 97 4.9 PACI packets.............................................40 98 4.9.1 Reasons for the PACI rules (informative)............43 99 4.9.2 PACI extensions (Informative).......................44 100 4.10 Temporal Scalability Control Information................45 101 5 Packetization Rules...........................................47 102 6 De-packetization Process......................................48 103 7 Payload Format Parameters.....................................50 104 7.1 Media Type Registration..................................51 105 7.2 SDP Parameters...........................................76 106 7.2.1 Mapping of Payload Type Parameters to SDP...........76 107 7.2.2 Usage with SDP Offer/Answer Model...................78 108 7.2.3 Usage in Declarative Session Descriptions...........87 109 7.2.4 Parameter Sets Considerations.......................88 110 7.2.5 Dependency Signaling in Multi-Stream Mode...........88 111 8 Use with Feedback Messages....................................89 112 8.1 Picture Loss Indication (PLI)............................90 113 8.2 Slice Loss Indication (SLI)..............................90 114 8.3 Reference Picture Selection Indication (RPSI)............91 115 8.4 Full Intra Request (FIR).................................92 116 9 Security Considerations.......................................93 117 10 Congestion Control...........................................94 118 11 IANA Consideration...........................................95 119 12 Acknowledgements.............................................95 120 13 References...................................................96 121 13.1 Normative References....................................96 122 13.2 Informative References..................................97 123 14 Authors' Addresses...........................................99 125 1 Introduction 127 1.1 Overview of the HEVC Codec 129 High Efficiency Video Coding [HEVC], formally known as ITU-T 130 Recommendation H.265 and ISO/IEC International Standard 23008-2 131 was ratified by ITU-T in April 2013 and reportedly provides 132 significant coding efficiency gains over H.264 [H.264]. 134 As both H.264 [H.264] and its RTP payload format [RFC6184] are 135 widely deployed and generally known in the relevant implementer 136 communities, frequently only the differences between those two 137 specifications are highlighted in non-normative, explanatory 138 parts of this memo. Basic familiarity with both specifications 139 is assumed for those parts. However, the normative parts of this 140 memo do not require study of H.264 or its RTP payload format. 142 H.264 and HEVC share a similar hybrid video codec design. 143 Conceptually, both technologies include a video coding layer 144 (VCL), which is often used to refer to the coding-tool features, 145 and a network abstraction layer (NAL), which is often used to 146 refer to the systems and transport interface aspects of the 147 codecs. 149 1.1.1 Coding-Tool Features 151 Similarly to earlier hybrid-video-coding-based standards, 152 including H.264, the following basic video coding design is 153 employed by HEVC. A prediction signal is first formed either by 154 intra or motion compensated prediction, and the residual (the 155 difference between the original and the prediction) is then 156 coded. The gains in coding efficiency are achieved by 157 redesigning and improving almost all parts of the codec over 158 earlier designs. In addition, HEVC includes several tools to 159 make the implementation on parallel architectures easier. Below 160 is a summary of HEVC coding-tool features. 162 Quad-tree block and transform structure 164 One of the major tools that contribute significantly to the 165 coding efficiency of HEVC is the usage of flexible coding blocks 166 and transforms, which are defined in a hierarchical quad-tree 167 manner. Unlike H.264, where the basic coding block is a 168 macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit 169 (CTU) of a maximum size of 64x64. Each CTU can be divided into 170 smaller units in a hierarchical quad-tree manner and can 171 represent smaller blocks down to size 4x4. Similarly, the 172 transforms used in HEVC can have different sizes, starting from 173 4x4 and going up to 32x32. Utilizing large blocks and transforms 174 contribute to the major gain of HEVC, especially at high 175 resolutions. 177 Entropy coding 179 HEVC uses a single entropy coding engine, which is based on 180 Context Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 181 uses two distinct entropy coding engines. CABAC in HEVC shares 182 many similarities with CABAC of H.264, but contains several 183 improvements. Those include improvements in coding efficiency 184 and lowered implementation complexity, especially for parallel 185 architectures. 187 In-loop filtering 189 H.264 includes an in-loop adaptive deblocking filter, where the 190 blocking artifacts around the transform edges in the 191 reconstructed picture are smoothed to improve the picture quality 192 and compression efficiency. In HEVC, a similar deblocking filter 193 is employed but with somewhat lower complexity. In addition, 194 pictures undergo a subsequent filtering operation called Sample 195 Adaptive Offset (SAO), which is a new design element in HEVC. 196 SAO basically adds a pixel-level offset in an adaptive manner and 197 usually acts as a de-ringing filter. It is observed that SAO 198 improves the picture quality, especially around sharp edges 199 contributing substantially to visual quality improvements of 200 HEVC. 202 Motion prediction and coding 204 There have been a number of improvements in this area that are 205 summarized as follows. The first category is motion merge and 206 advanced motion vector prediction (AMVP) modes. The motion 207 information of a prediction block can be inferred from the 208 spatially or temporally neighboring blocks. This is similar to 209 the DIRECT mode in H.264 but includes new aspects to incorporate 210 the flexible quad-tree structure and methods to improve the 211 parallel implementations. In addition, the motion vector 212 predictor can be signaled for improved efficiency. The second 213 category is high-precision interpolation. The interpolation 214 filter length is increased to 8-tap from 6-tap, which improves 215 the coding efficiency but also comes with increased complexity. 216 In addition, the interpolation filter is defined with higher 217 precision without any intermediate rounding operations to further 218 improve the coding efficiency. 220 Intra prediction and intra coding 222 Compared to 8 intra prediction modes in H.264, HEVC supports 223 angular intra prediction with 33 directions. This increased 224 flexibility improves both objective coding efficiency and visual 225 quality as the edges can be better predicted and ringing 226 artifacts around the edges can be reduced. In addition, the 227 reference samples are adaptively smoothed based on the prediction 228 direction. To avoid contouring artifacts a new interpolative 229 prediction generation is included to improve the visual quality. 230 Furthermore, discrete sine transform (DST) is utilized instead of 231 traditional discrete cosine transform (DCT) for 4x4 intra 232 transform blocks. 234 Other coding-tool features 236 HEVC includes some tools for lossless coding and efficient screen 237 content coding, such as skipping the transform for certain 238 blocks. These tools are particularly useful for example when 239 streaming the user-interface of a mobile device to a large 240 display. 242 1.1.2 Systems and Transport Interfaces 244 HEVC inherited the basic systems and transport interfaces 245 designs, such as the NAL-unit-based syntax structure, the 246 hierarchical syntax and data unit structure from sequence-level 247 parameter sets, multi-picture-level or picture-level parameter 248 sets, slice-level header parameters, lower-level parameters, the 249 supplemental enhancement information (SEI) message mechanism, the 250 hypothetical reference decoder (HRD) based video buffering model, 251 and so on. In the following, a list of differences in these 252 aspects compared to H.264 is summarized. 254 Video parameter set 256 A new type of parameter set, called video parameter set (VPS), 257 was introduced. For the first (2013) version of [HEVC], the 258 video parameter set NAL unit is required to be available prior to 259 its activation, while the information contained in the video 260 parameter set is not necessary for operation of the decoding 261 process. For future HEVC extensions, such as the 3D or scalable 262 extensions, the video parameter set is expected to include 263 information necessary for operation of the decoding process, e.g. 264 decoding dependency or information for reference picture set 265 construction of enhancement layers. The VPS provides a "big 266 picture" of a bitstream, including what types of operation points 267 are provided, the profile, tier, and level of the operation 268 points, and some other high-level properties of the bitstream 269 that can be used as the basis for session negotiation and content 270 selection, etc. (see section 7.1). 272 Profile, tier and level 274 The profile, tier and level syntax structure that can be included 275 in both VPS and sequence parameter set (SPS) includes 12 bytes of 276 data to describe the entire bitstream (including all temporally 277 scalable layers, which are referred to as sub-layers in the HEVC 278 specification), and can optionally include more profile, tier and 279 level information pertaining to individual temporally scalable 280 layers. The profile indicator indicates the "best viewed as" 281 profile when the bitstream conforms to multiple profiles, similar 282 to the major brand concept in the ISO base media file format 283 (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, 284 such as the 3GPP file format [3GPPFF]. The profile, tier and 285 level syntax structure also includes the indications of whether 286 the bitstream is free of frame-packed content, whether the 287 bitstream is free of interlaced source content and free of field 288 pictures, i.e. contains only frame pictures of progressive 289 source, such that clients/players with no support of post- 290 processing functionalities for handling of frame-packed or 291 interlaced source content or field pictures can reject those 292 bitstreams. 294 Bitstream and elementary stream 296 HEVC includes a definition of an elementary stream, which is new 297 compared to H.264. An elementary stream consists of a sequence 298 of one or more bitstreams. An elementary stream that consists of 299 two or more bitstreams has typically been formed by splicing 300 together two or more bitstreams (or parts thereof). When an 301 elementary stream contains more than one bitstream, the last NAL 302 unit of the last access unit of a bitstream (except the last 303 bitstream in the elementary stream) must contain an end of 304 bitstream NAL unit and the first access unit of the subsequent 305 bitstream must be an intra random access point (IRAP) access 306 unit. This IRAP access unit may be a clean random access (CRA), 307 broken link access (BLA), or instantaneous decoding refresh (IDR) 308 access unit. 310 Random access support 312 HEVC includes signaling in NAL unit header, through NAL unit 313 types, of IRAP pictures beyond IDR pictures. Three types of IRAP 314 pictures, namely IDR, CRA and BLA pictures are supported, wherein 315 IDR pictures are conventionally referred to as closed group-of- 316 pictures (closed-GOP) random access points, and CRA and BLA 317 pictures are those conventionally referred to as open-GOP random 318 access points. BLA pictures usually originate from splicing of 319 two bitstreams or part thereof at a CRA picture, e.g. during 320 stream switching. To enable better systems usage of IRAP 321 pictures, altogether six different NAL units are defined to 322 signal the properties of the IRAP pictures, which can be used to 323 better match the stream access point (SAP) types as defined in 324 the ISOBMFF [ISOBMFF], which are utilized for random access 325 support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. 326 Pictures following an IRAP picture in decoding order and 327 preceding the IRAP picture in output order are referred to as 328 leading pictures associated with the IRAP picture. There are two 329 types of leading pictures, namely random access decodable leading 330 (RADL) pictures and random access skipped leading (RASL) 331 pictures. RADL pictures are decodable when the decoding started 332 at the associated IRAP picture, and RASL pictures are not 333 decodable when the decoding started at the associated IRAP 334 picture and are usually discarded. HEVC provides mechanisms to 335 enable the specification of conformance of bitstreams with RASL 336 pictures being discarded, thus to provide a standard-compliant 337 way to enable systems components to discard RASL pictures when 338 needed. 340 Temporal scalability support 342 HEVC includes an improved support of temporal scalability, by 343 inclusion of the signaling of TemporalId in the NAL unit header, 344 the restriction that pictures of a particular temporal sub-layer 345 cannot be used for inter prediction reference by pictures of a 346 lower temporal sub-layer, the sub-bitstream extraction process, 347 and the requirement that each sub-bitstream extraction output be 348 a conforming bitstream. Media-aware network elements (MANEs) can 349 utilize the TemporalId in the NAL unit header for stream 350 adaptation purposes based on temporal scalability. 352 Temporal sub-layer switching support 354 HEVC specifies, through NAL unit types present in the NAL unit 355 header, the signaling of temporal sub-layer access (TSA) and 356 stepwise temporal sub-layer access (STSA). A TSA picture and 357 pictures following the TSA picture in decoding order do not use 358 pictures prior to the TSA picture in decoding order with 359 TemporalId greater than or equal to that of the TSA picture for 360 inter prediction reference. A TSA picture enables up-switching, 361 at the TSA picture, to the sub-layer containing the TSA picture 362 or any higher sub-layer, from the immediately lower sub-layer. 363 An STSA picture does not use pictures with the same TemporalId as 364 the STSA picture for inter prediction reference. Pictures 365 following an STSA picture in decoding order with the same 366 TemporalId as the STSA picture do not use pictures prior to the 367 STSA picture in decoding order with the same TemporalId as the 368 STSA picture for inter prediction reference. An STSA picture 369 enables up-switching, at the STSA picture, to the sub-layer 370 containing the STSA picture, from the immediately lower sub- 371 layer. 373 Sub-layer reference or non-reference pictures 375 The concept and signaling of reference/non-reference pictures in 376 HEVC are different from H.264. In H.264, if a picture may be 377 used by any other picture for inter prediction reference, it is a 378 reference picture; otherwise it is a non-reference picture, and 379 this is signaled by two bits in the NAL unit header. In HEVC, a 380 picture is called a reference picture only when it is marked as 381 "used for reference". In addition, the concept of sub-layer 382 reference picture was introduced. If a picture may be used by 383 another other picture with the same TemporalId for inter 384 prediction reference, it is a sub-layer reference picture; 385 otherwise it is a sub-layer non-reference picture. Whether a 386 picture is a sub-layer reference picture or sub-layer non- 387 reference picture is signaled through NAL unit type values. 389 Extensibility 391 Besides the TemporalId in the NAL unit header, HEVC also includes 392 the signaling of a six-bit layer ID in the NAL unit header, which 393 must be equal to 0 for a single-layer bitstream. Extension 394 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, 395 slice headers, and so on. All these extension mechanisms enable 396 future extensions in a backward compatible manner, such that 397 bitstreams encoded according to potential future HEVC extensions 398 can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) 399 and the then-legacy decoders can decode and output the base layer 400 bitstream. 402 Bitstream extraction 404 HEVC includes a bitstream extraction process as an integral part 405 of the overall decoding process, as well as specification of the 406 use of the bitstream extraction process in description of 407 bitstream conformance tests as part of the hypothetical reference 408 decoder (HRD) specification. 410 Reference picture management 412 The reference picture management of HEVC, including reference 413 picture marking and removal from the decoded picture buffer (DPB) 414 as well as reference picture list construction (RPLC), differs 415 from that of H.264. Instead of the sliding window plus adaptive 416 memory management control operation (MMCO) based reference 417 picture marking mechanism in H.264, HEVC specifies a reference 418 picture set (RPS) based reference picture management and marking 419 mechanism, and the RPLC is consequently based on the RPS 420 mechanism. A reference picture set consists of a set of 421 reference pictures associated with a picture, consisting of all 422 reference pictures that are prior to the associated picture in 423 decoding order, that may be used for inter prediction of the 424 associated picture or any picture following the associated 425 picture in decoding order. The reference picture set consists of 426 five lists of reference pictures; RefPicSetStCurrBefore, 427 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 428 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 429 RefPicSetLtCurr contain all reference pictures that may be used 430 in inter prediction of the current picture and that may be used 431 in inter prediction of one or more of the pictures following the 432 current picture in decoding order. RefPicSetStFoll and 433 RefPicSetLtFoll consist of all reference pictures that are not 434 used in inter prediction of the current picture but may be used 435 in inter prediction of one or more of the pictures following the 436 current picture in decoding order. RPS provides an "intra-coded" 437 signaling of the DPB status, instead of an "inter-coded" 438 signaling, mainly for improved error resilience. The RPLC 439 process in HEVC is based on the RPS, by signaling an index to an 440 RPS subset for each reference index; this process is simpler than 441 the RPLC process in H.264. 443 Ultra low delay support 445 HEVC specifies a sub-picture-level HRD operation, for support of 446 the so-called ultra-low delay. The mechanism specifies a 447 standard-compliant way to enable delay reduction below one 448 picture interval. Sub-picture-level coded picture buffer (CPB) 449 and DPB parameters may be signaled, and utilization of these 450 information for the derivation of CPB timing (wherein the CPB 451 removal time corresponds to decoding time) and DPB output timing 452 (display time) is specified. Decoders are allowed to operate the 453 HRD at the conventional access-unit-level, even when the sub- 454 picture-level HRD parameters are present. 456 New SEI messages 458 HEVC inherits many H.264 SEI messages with changes in syntax 459 and/or semantics making them applicable to HEVC. Additionally, 460 there are a few new SEI messages reviewed briefly in the 461 following paragraphs. 463 The display orientation SEI message informs the decoder of a 464 transformation that is recommended to be applied to the cropped 465 decoded picture prior to display, such that the pictures can be 466 properly displayed, e.g. in an upside-up manner. 468 The structure of pictures SEI message provides information on the 469 NAL unit types, picture order count values, and prediction 470 dependencies of a sequence of pictures. The SEI message can be 471 used for example for concluding what impact a lost picture has on 472 other pictures. 474 The decoded picture hash SEI message provides a checksum derived 475 from the sample values of a decoded picture. It can be used for 476 detecting whether a picture was correctly received and decoded. 478 The active parameter sets SEI message includes the IDs of the 479 active video parameter set and the active sequence parameter set 480 and can be used to activate VPSs and SPSs. In addition, the SEI 481 message includes the following indications: 1) An indication of 482 whether "full random accessibility" is supported (when supported, 483 all parameter sets needed for decoding of the remaining of the 484 bitstream when random accessing from the beginning of the current 485 coded video sequence by completely discarding all access units 486 earlier in decoding order are present in the remaining bitstream 487 and all coded pictures in the remaining bitstream can be 488 correctly decoded); 2) An indication of whether there is no 489 parameter set within the current coded video sequence that 490 updates another parameter set of the same type preceding in 491 decoding order. An update of a parameter set refers to the use 492 of the same parameter set ID but with some other parameters 493 changed. If this property is true for all coded video sequences 494 in the bitstream, then all parameter sets can be sent out-of-band 495 before session start. 497 The decoding unit information SEI message provides coded picture 498 buffer removal delay information for a decoding unit. The 499 message can be used in very-low-delay buffering operations. 501 The region refresh information SEI message can be used together 502 with the recovery point SEI message (present in both H.264 and 503 HEVC) for improved support of gradual decoding refresh (GDR). 504 This supports random access from inter-coded pictures, wherein 505 complete pictures can be correctly decoded or recovered after an 506 indicated number of pictures in output/display order. 508 1.1.3 Parallel Processing Support 510 The reportedly significantly higher encoding computational demand 511 of HEVC over H.264, in conjunction with the ever increasing video 512 resolution (both spatially and temporally) required by the 513 market, led to the adoption of VCL coding tools specifically 514 targeted to allow for parallelization on the sub-picture level. 515 That is, parallelization occurs, at the minimum, at the 516 granularity of an integer number of CTUs. The targets for this 517 type of high-level parallelization are multicore CPUs and DSPs as 518 well as multiprocessor systems. In a system design, to be 519 useful, these tools require signaling support, which is provided 520 in Section 7 of this memo. This section provides a brief 521 overview of the tools available in [HEVC]. 523 Many of the tools incorporated in HEVC were designed keeping in 524 mind the potential parallel implementations in multi-core/multi- 525 processor architectures. Specifically, for parallelization, four 526 picture partition strategies are available. 528 Slices are segments of the bitstream that can be reconstructed 529 independently from other slices within the same picture (though 530 there may still be interdependencies through loop filtering 531 operations). Slices are the only tool that can be used for 532 parallelization that is also available, in virtually identical 533 form, in H.264. Slices based parallelization does not require 534 much inter-processor or inter-core communication (except for 535 inter-processor or inter-core data sharing for motion 536 compensation when decoding a predictively coded picture, which is 537 typically much heavier than inter-processor or inter-core data 538 sharing due to in-picture prediction), as slices are designed to 539 be independently decodable. However, for the same reason, slices 540 can require some coding overhead. Further, slices (in contrast 541 to some of the other tools mentioned below) also serve as the key 542 mechanism for bitstream partitioning to match Maximum Transfer 543 Unit (MTU) size requirements, due to the in-picture independence 544 of slices and the fact that each regular slice is encapsulated in 545 its own NAL unit. In many cases, the goal of parallelization and 546 the goal of MTU size matching can place contradicting demands to 547 the slice layout in a picture. The realization of this situation 548 led to the development of the more advanced tools mentioned 549 below. 551 Dependent slice segments allow for fragmentation of a coded slice 552 into fragments at CTU boundaries without breaking any in-picture 553 prediction mechanism. They are complementary to the 554 fragmentation mechanism described in this memo in that they need 555 the cooperation of the encoder. As a dependent slice segment 556 necessarily contains an integer number of CTUs, a decoder using 557 multiple cores operating on CTUs can process a dependent slice 558 segment without communicating parts of the slice segment's 559 bitstream to other cores. Fragmentation, as specified in this 560 memo, in contrast, does not guarantee that a fragment contains an 561 integer number of CTUs. 563 In wavefront parallel processing (WPP), the picture is 564 partitioned into rows of CTUs. Entropy decoding and prediction 565 are allowed to use data from CTUs in other partitions. Parallel 566 processing is possible through parallel decoding of CTU rows, 567 where the start of the decoding of a row is delayed by two CTUs, 568 so to ensure that data related to a CTU above and to the right of 569 the subject CTU is available before the subject CTU is being 570 decoded. Using this staggered start (which appears like a 571 wavefront when represented graphically), parallelization is 572 possible with up to as many processors/cores as the picture 573 contains CTU rows. 575 Because in-picture prediction between neighboring CTU rows within 576 a picture is allowed, the required inter-processor/inter-core 577 communication to enable in-picture prediction can be substantial. 578 The WPP partitioning does not result in the creation of more NAL 579 units compared to when it is not applied, thus WPP cannot be used 580 for MTU size matching, though slices can be used in combination 581 for that purpose. 583 Tiles define horizontal and vertical boundaries that partition a 584 picture into tile columns and rows. The scan order of CTUs is 585 changed to be local within a tile (in the order of a CTU raster 586 scan of a tile), before decoding the top-left CTU of the next 587 tile in the order of tile raster scan of a picture. Similar to 588 slices, tiles break in-picture prediction dependencies (including 589 entropy decoding dependencies). However, they do not need to be 590 included into individual NAL units (same as WPP in this regard), 591 hence tiles cannot be used for MTU size matching, though slices 592 can be used in combination for that purpose. Each tile can be 593 processed by one processor/core, and the inter-processor/inter- 594 core communication required for in-picture prediction between 595 processing units decoding neighboring tiles is limited to 596 conveying the shared slice header in cases a slice is spanning 597 more than one tile, and loop filtering related sharing of 598 reconstructed samples and metadata. Insofar, tiles are less 599 demanding in terms of inter-processor communication bandwidth 600 compared to WPP due to the in-picture independence between two 601 neighboring partitions. 603 1.1.4 NAL Unit Header 605 HEVC maintains the NAL unit concept of H.264 with modifications. 606 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 607 payload of a NAL unit refers to the NAL unit excluding the NAL 608 unit header. 610 +---------------+---------------+ 611 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 613 |F| Type | LayerId | TID | 614 +-------------+-----------------+ 616 Figure 1 The structure of HEVC NAL unit header 618 The semantics of the fields in the NAL unit header are as 619 specified in [HEVC] and described briefly below for convenience. 620 In addition to the name and size of each field, the corresponding 621 syntax element name in [HEVC] is also provided. 623 F: 1 bit 624 forbidden_zero_bit. Required to be zero in [HEVC]. HEVC 625 declares a value of 1 as a syntax violation. Note that the 626 inclusion of this bit in the NAL unit header is to enable 627 transport of HEVC video over MPEG-2 transport systems 628 (avoidance of start code emulations) [MPEG2S]. 630 Type: 6 bits 631 nal_unit_type. This field specifies the NAL unit type as 632 defined in Table 7-1 of [HEVC]. If the most significant bit 633 of this field of a NAL unit is equal to 0 (i.e. the value of 634 this field is less than 32), the NAL unit is a VCL NAL unit. 635 Otherwise, the NAL unit is a non-VCL NAL unit. For a 636 reference of all currently defined NAL unit types and their 637 semantics, please refer to Section 7.4.1 in [HEVC]. 639 LayerId: 6 bits 640 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 641 anticipated that in future scalable or 3D video coding 642 extensions of this specification, this syntax element will be 643 used to identify additional layers that may be present in the 644 coded video sequence, wherein a layer may be, e.g. a spatial 645 scalable layer, a quality scalable layer, a texture view, or a 646 depth view. 648 TID: 3 bits 649 nuh_temporal_id_plus1. This field specifies the temporal 650 identifier of the NAL unit plus 1. The value of TemporalId is 651 equal to TID minus 1. A TID value of 0 is illegal to ensure 652 that there is at least one bit in the NAL unit header equal to 653 1, so to enable independent considerations of start code 654 emulations in the NAL unit header and in the NAL unit payload 655 data. 657 1.2 Overview of the Payload Format 659 This payload format defines the following processes required for 660 transport of HEVC coded data over RTP [RFC3550]: 662 o Usage of RTP header with this payload format 664 o Packetization of HEVC coded NAL units into RTP packets using 665 three types of payload structures, namely single NAL unit 666 packet, aggregation packet, and fragment unit 668 o Transmission of HEVC NAL units of the same bitstream within a 669 single RTP stream or multiple RTP streams (within one or more 670 RTP sessions), where within an RTP stream transmission of NAL 671 units may be either non-interleaved (i.e. the transmission 672 order of NAL units is the same as their decoding order) or 673 interleaved (i.e. the transmission order of NAL units is 674 different from their decoding order) 676 o Media type parameters to be used with the Session Description 677 Protocol (SDP) [RFC4566] 679 o A payload header extension mechanism and data structures for 680 enhanced support of temporal scalability based on that 681 extension mechanism. 683 2 Conventions 685 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 686 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 687 "OPTIONAL" in this document are to be interpreted as described in 688 BCP 14, RFC 2119 [RFC2119]. 690 In this document, these key words will appear with that 691 interpretation only when in ALL CAPS. Lower case uses of these 692 words are not to be interpreted as carrying the RFC 2119 693 significance. 695 This specification uses the notion of setting and clearing a bit 696 when bit fields are handled. Setting a bit is the same as 697 assigning that bit the value of 1 (On). Clearing a bit is the 698 same as assigning that bit the value of 0 (Off). 700 3 Definitions and Abbreviations 702 3.1 Definitions 704 This document uses the terms and definitions of [HEVC]. Section 705 3.1.1 lists relevant definitions copied from [HEVC] for 706 convenience. Section 3.1.2 provides definitions specific to this 707 memo. 709 3.1.1 Definitions from the HEVC Specification 711 access unit: A set of NAL units that are associated with each 712 other according to a specified classification rule, are 713 consecutive in decoding order, and contain exactly one coded 714 picture. 716 BLA access unit: An access unit in which the coded picture is a 717 BLA picture. 719 BLA picture: An IRAP picture for which each VCL NAL unit has 720 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 722 coded video sequence: A sequence of access units that consists, 723 in decoding order, of an IRAP access unit with NoRaslOutputFlag 724 equal to 1, followed by zero or more access units that are not 725 IRAP access units with NoRaslOutputFlag equal to 1, including all 726 subsequent access units up to but not including any subsequent 727 access unit that is an IRAP access unit with NoRaslOutputFlag 728 equal to 1. 730 Informative note: An IRAP access unit may be an IDR access 731 unit, a BLA access unit, or a CRA access unit. The value of 732 NoRaslOutputFlag is equal to 1 for each IDR access unit, each 733 BLA access unit, and each CRA access unit that is the first 734 access unit in the bitstream in decoding order, is the first 735 access unit that follows an end of sequence NAL unit in 736 decoding order, or has HandleCraAsBlaFlag equal to 1. 738 CRA access unit: An access unit in which the coded picture is a 739 CRA picture. 741 CRA picture: A RAP picture for which each VCL NAL unit has 742 nal_unit_type equal to CRA_NUT. 744 IDR access unit: An access unit in which the coded picture is an 745 IDR picture. 747 IDR picture: A RAP picture for which each VCL NAL unit has 748 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 750 IRAP access unit: An access unit in which the coded picture is an 751 IRAP picture. 753 IRAP picture: A coded picture for which each VCL NAL unit has 754 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 755 (23), inclusive. 757 layer: A set of VCL NAL units that all have a particular value of 758 nuh_layer_id and the associated non-VCL NAL units, or one of a 759 set of syntactical structures having a hierarchical relationship. 761 operation point: bitstream created from another bitstream by 762 operation of the sub-bitstream extraction process with the 763 another bitstream, a target highest TemporalId, and a target 764 layer identifier list as inputs. 766 random access: The act of starting the decoding process for a 767 bitstream at a point other than the beginning of the bitstream. 769 sub-layer: A temporal scalable layer of a temporal scalable 770 bitstream consisting of VCL NAL units with a particular value of 771 the TemporalId variable, and the associated non-VCL NAL units. 773 sub-layer representation: A subset of the bitstream consisting of 774 NAL units of a particular sub-layer and the lower sub-layers. 776 tile: A rectangular region of coding tree blocks within a 777 particular tile column and a particular tile row in a picture. 779 tile column: A rectangular region of coding tree blocks having a 780 height equal to the height of the picture and a width specified 781 by syntax elements in the picture parameter set. 783 tile row: A rectangular region of coding tree blocks having a 784 height specified by syntax elements in the picture parameter set 785 and a width equal to the width of the picture. 787 3.1.2 Definitions Specific to This Memo 789 dependee RTP stream: An RTP stream on which another RTP stream 790 depends. All RTP streams in an MRST or MRMT except for the 791 highest RTP stream are dependee RTP streams. 793 highest RTP stream: The RTP stream on which no other RTP stream 794 depends. The RTP stream in an SRST is the highest RTP stream. 796 media aware network element (MANE): A network element, such as a 797 middlebox, selective forwarding unit, or application layer 798 gateway that is capable of parsing certain aspects of the RTP 799 payload headers or the RTP payload and reacting to their 800 contents. 802 Informative note: The concept of a MANE goes beyond normal 803 routers or gateways in that a MANE has to be aware of the 804 signaling (e.g. to learn about the payload type mappings of 805 the media streams), and in that it has to be trusted when 806 working with SRTP. The advantage of using MANEs is that they 807 allow packets to be dropped according to the needs of the 808 media coding. For example, if a MANE has to drop packets due 809 to congestion on a certain link, it can identify and remove 810 those packets whose elimination produces the least adverse 811 effect on the user experience. After dropping packets, MANEs 812 must rewrite RTCP packets to match the changes to the RTP 813 stream as specified in Section 7 of [RFC3550]. 815 Media Transport: As used in the MRST, MRMT, and SRST definitions 816 below, Media Transport denotes the transport of packets over a 817 transport association identified by a 5-tuple (source address, 818 source port, destination address, destination port, transport 819 protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- 820 grouping-taxonomy]. 822 Multiple RTP streams on a Single Transport (MRST): Multiple RTP 823 streams carrying a single HEVC bitstream on a Single Transport. 824 See also section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 826 Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP 827 streams carrying a single HEVC bitstream on Multiple Transports. 828 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 830 NAL unit decoding order: A NAL unit order that conforms to the 831 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 833 NAL-unit-like structure: A data structure that is similar to NAL 834 units in the sense that it also has a NAL unit header and a 835 payload, with a difference that the payload does not follow the 836 start code emulation prevention mechanism required for the NAL 837 unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples 838 NAL-unit-like structures defined in this memo are packet payloads 839 of AP, PACI, and FU packets. 841 NALU-time: The value that the RTP timestamp would have if the NAL 842 unit would be transported in its own RTP packet. 844 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within 845 the scope of this memo, one RTP stream is utilized to transport 846 one or more temporal sub-layers. 848 Single RTP stream on a Single Transport (SRST): Single RTP 849 stream carrying a single HEVC bitstream on a Single (Media) 850 Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp- 851 grouping-taxonomy]. 853 transmission order: The order of packets in ascending RTP 854 sequence number order (in modulo arithmetic). Within an 855 aggregation packet, the NAL unit transmission order is the same 856 as the order of appearance of NAL units in the packet. 858 3.2 Abbreviations 860 AP Aggregation Packet 862 BLA Broken Link Access 864 CRA Clean Random Access 866 CTB Coding Tree Block 868 CTU Coding Tree Unit 870 CVS Coded Video Sequence 872 DPH Decoded Picture Hash 874 FU Fragmentation Unit 876 GDR Gradual Decoding Refresh 878 HRD Hypothetical Reference Decoder 880 IDR Instantaneous Decoding Refresh 882 IRAP Intra Random Access Point 884 MANE Media Aware Network Element 886 MRMT Multiple RTP streams on Multiple Transports 888 MRST Multiple RTP streams on a Single Transport 890 MTU Maximum Transfer Unit 892 NAL Network Abstraction Layer 893 NALU Network Abstraction Layer Unit 895 PACI PAyload Content Information 897 PHES Payload Header Extension Structure 899 PPS Picture Parameter Set 901 RADL Random Access Decodable Leading (Picture) 903 RASL Random Access Skipped Leading (Picture) 905 RPS Reference Picture Set 907 SEI Supplemental Enhancement Information 909 SPS Sequence Parameter Set 911 SRST Single RTP stream on a Single Transport 913 STSA Step-wise Temporal Sub-layer Access 915 TSA Temporal Sub-layer Access 917 TCSI Temporal Scalability Control Information 919 VCL Video Coding Layer 921 VPS Video Parameter Set 923 4 RTP Payload Format 925 4.1 RTP Header Usage 927 The format of the RTP header is specified in [RFC3550] and 928 reprinted in Figure 2 for convenience. This payload format uses 929 the fields of the header in a manner consistent with that 930 specification. 932 The RTP payload (and the settings for some RTP header bits) for 933 aggregation packets and fragmentation units are specified in 934 Sections 4.7 and 4.8, respectively. 936 0 1 2 3 937 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 939 |V=2|P|X| CC |M| PT | sequence number | 940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 941 | timestamp | 942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 943 | synchronization source (SSRC) identifier | 944 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 945 | contributing source (CSRC) identifiers | 946 | .... | 947 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 949 Figure 2 RTP header according to [RFC3550] 951 The RTP header information to be set according to this RTP 952 payload format is set as follows: 954 Marker bit (M): 1 bit 956 Set for the last packet, carried in the current RTP stream, of 957 the access unit, in line with the normal use of the M bit in 958 video formats, to allow an efficient playout buffer handling. 959 When MRST or MRMT is in use, if an access unit appears in 960 multiple RTP streams, the marker bit is set on each RTP 961 stream's last packet of the access unit. 963 Informative note: The content of a NAL unit does not tell 964 whether or not the NAL unit is the last NAL unit, in 965 decoding order, of an access unit. An RTP sender 966 implementation may obtain this information from the video 967 encoder. If, however, the implementation cannot obtain 968 this information directly from the encoder, e.g. when the 969 bitstream was pre-encoded, and also there is no timestamp 970 allocated for each NAL unit, then the sender implementation 971 can inspect subsequent NAL units in decoding order to 972 determine whether or not the NAL unit is the last NAL unit 973 of an access unit as follows. A NAL unit naluX is the last 974 NAL unit of an access unit if it is the last NAL unit of 975 the bitstream or the next VCL NAL unit naluY in decoding 976 order has the high-order bit of the first byte after its 977 NAL unit header equal to 1, and all NAL units between naluX 978 and naluY, when present, have nal_unit_type in the range of 979 32 to 35, inclusive, equal to 39, or in the ranges of 41 to 980 44, inclusive, or 48 to 55, inclusive. 982 Payload type (PT): 7 bits 984 The assignment of an RTP payload type for this new packet 985 format is outside the scope of this document and will not be 986 specified here. The assignment of a payload type has to be 987 performed either through the profile used or in a dynamic way. 989 Informative note: It is not required to use different 990 payload type values for different RTP streams in MRST or 991 MRMT. 993 Sequence number (SN): 16 bits 995 Set and used in accordance with RFC 3550 [RFC3550]. 997 Timestamp: 32 bits 999 The RTP timestamp is set to the sampling timestamp of the 1000 content. A 90 kHz clock rate MUST be used. 1002 If the NAL unit has no timing properties of its own (e.g. 1003 parameter set and SEI NAL units), the RTP timestamp MUST be 1004 set to the RTP timestamp of the coded picture of the access 1005 unit in which the NAL unit (according to Section 7.4.2.4.4 of 1006 [HEVC]) is included. 1008 Receivers MUST use the RTP timestamp for the display process, 1009 even when the bitstream contains picture timing SEI messages 1010 or decoding unit information SEI messages as specified in 1011 [HEVC]. However, this does not mean that picture timing SEI 1012 messages in the bitstream should be discarded, as picture 1013 timing SEI messages may contain frame-field information that 1014 is important in appropriately rendering interlaced video. 1016 Synchronization source (SSRC): 32-bits 1018 Used to identify the source of the RTP packets. When using 1019 SRST, by definition a single SSRC is used for all parts of a 1020 single bitstream. In MRST or MRMT, different SSRCs are used 1021 for each RTP stream containing a subset of the sub-layers of 1022 the single (temporally scalable) bitstream. A receiver is 1023 required to correctly associate the set of SSRCs that are 1024 included parts of the same bitstream. 1026 Informative note: The term "bitstream" in this document is 1027 equivalent to the term "encoded stream" in [I-D.ietf- 1028 avtext-rtp-grouping-taxonomy]. 1030 4.2 Payload Header Usage 1032 The TID value indicates (among other things) the relative 1033 importance of an RTP packet, for example because NAL units 1034 belonging to higher temporal sub-layers are not used for the 1035 decoding of lower temporal sub-layers. A lower value of TID 1036 indicates a higher importance. More important NAL units MAY be 1037 better protected against transmission losses than less important 1038 NAL units. 1040 4.3 Payload Structures 1042 The first two bytes of the payload of an RTP packet are referred 1043 to as the payload header. The payload header consists of the 1044 same fields (F, Type, LayerId, and TID) as the NAL unit header as 1045 shown in section 1.1.4, irrespective of the type of the payload 1046 structure. 1048 Four different types of RTP packet payload structures are 1049 specified. A receiver can identify the type of an RTP packet 1050 payload through the Type field in the payload header. 1052 The four different payload structures are as follows: 1054 o Single NAL unit packet: Contains a single NAL unit in the 1055 payload, and the NAL unit header of the NAL unit also serves 1056 as the payload header. This payload structure is specified in 1057 section 4.6. 1059 o Aggregation packet (AP): Contains more than one NAL unit 1060 within one access unit. This payload structure is specified 1061 in section 4.7. 1063 o Fragmentation unit (FU): Contains a subset of a single NAL 1064 unit. This payload structure is specified in section 4.8. 1066 o PACI carrying RTP packet: Contains a payload header (that 1067 differs from other payload headers for efficiency), a Payload 1068 Header Extension Structure (PHES), and a PACI payload. This 1069 payload structure is specified in section 4.9. 1071 4.4 Transmission Modes 1073 This memo enables transmission of an HEVC bitstream over 1075 . a single RTP stream on a single Media Transport (SRST), 1076 . multiple RTP streams over a single Media Transport (MRST), 1077 or 1078 . multiple RTP streams over multiple Media Transports (MRMT). 1080 Informative Note: While this specification enables the use of 1081 MRST within the H.265 RTP payload, the signaling of MRST within 1082 SDP Offer/Answer is not fully specified at the time of this 1083 writing. See [RFC5576] and [RFC5583] for what is supported 1084 today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I- 1085 D.ietf-mmusic-sdp-bundle-negotiation]for future directions. 1087 When in MRMT, the dependency of one RTP stream on another RTP 1088 stream is typically indicated as specified in [RFC5583]. 1089 [RFC5583] can also be utilized to specify dependencies within 1090 MRST, but only if the RTP streams utilize distinct payload types. 1091 When an RTP stream A depends on another RTP stream B, the RTP 1092 stream B is referred to as a dependee RTP stream of the RTP 1093 stream A. 1095 SRST or MRST SHOULD be used for point-to-point unicast scenarios, 1096 while MRMT SHOULD be used for point-to-multipoint multicast 1097 scenarios where different receivers require different operation 1098 points of the same HEVC bitstream, to improve bandwidth utilizing 1099 efficiency. 1101 Informative note: A multicast may degrade to a unicast after 1102 all but one receivers have left (this is a justification of 1103 the first "SHOULD" instead of "MUST"), and there might be 1104 scenarios where MRMT is desirable but not possible e.g. when 1105 IP multicast is not deployed in certain network (this is a 1106 justification of the second "SHOULD" instead of "MUST"). 1108 The transmission mode is indicated by the tx-mode media parameter 1109 (see section 7.1). If tx-mode is equal to "SRST", SRST MUST be 1110 used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be 1111 used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. 1113 Receivers MUST support all of SRST, MRST, and MRMT. 1115 Informative note: The required support of MRMT by receivers 1116 does not imply that multicast must be supported by receivers. 1118 4.5 Decoding Order Number 1120 For each NAL unit, the variable AbsDon is derived, representing 1121 the decoding order number that is indicative of the NAL unit 1122 decoding order. 1124 Let NAL unit n be the n-th NAL unit in transmission order within 1125 an RTP stream. 1127 If tx-mode is equal to "SRST" and sprop-max-don-diff is equal 1128 to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived 1129 as equal to n. 1131 Otherwise (tx-mode is equal to "MRST" or "MRMT" or sprop-max-don- 1132 diff is greater than 0), AbsDon[n] is derived as follows, where 1133 DON[n] is the value of the variable DON for NAL unit n: 1135 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit 1136 in transmission order), AbsDon[0] is set equal to DON[0]. 1138 o Otherwise (n is greater than 0), the following applies for 1139 derivation of AbsDon[n]: 1141 If DON[n] == DON[n-1], 1142 AbsDon[n] = AbsDon[n-1] 1144 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1145 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1147 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1148 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1150 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1151 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1152 DON[n]) 1154 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1155 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1157 For any two NAL units m and n, the following applies: 1159 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1160 follows NAL unit m in NAL unit decoding order. 1162 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding 1163 order of the two NAL units can be in either order. 1165 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n 1166 precedes NAL unit m in decoding order. 1168 When two consecutive NAL units in the NAL unit decoding order 1169 have different values of AbsDon, the value of AbsDon for the 1170 second NAL unit in decoding order MUST be greater than the value 1171 of AbsDon for the first NAL unit, and the absolute difference 1172 between the two AbsDon values MAY be greater than or equal to 1. 1174 Informative note: There are multiple reasons to allow for the 1175 absolute difference of the values of AbsDon for two 1176 consecutive NAL units in the NAL unit decoding order to be 1177 greater than one. An increment by one is not required, as at 1178 the time of associating values of AbsDon to NAL units, it may 1179 not be known whether all NAL units are to be delivered to the 1180 receiver. For example, a gateway may not forward VCL NAL 1181 units of higher sub-layers or some SEI NAL units when there is 1182 congestion in the network. In another example, the first 1183 intra-coded picture of a pre-encoded clip is transmitted in 1184 advance to ensure that it is readily available in the 1185 receiver, and when transmitting the first intra-coded picture, 1186 the originator does not exactly know how many NAL units will 1187 be encoded before the first intra-coded picture of the pre- 1188 encoded clip follows in decoding order. Thus, the values of 1189 AbsDon for the NAL units of the first intra-coded picture of 1190 the pre-encoded clip have to be estimated when they are 1191 transmitted, and gaps in values of AbsDon may occur. Another 1192 example is MRST or MRMT where the AbsDon values must indicate 1193 cross-layer decoding order for NAL units conveyed in all the 1194 RTP streams. 1196 4.6 Single NAL Unit Packets 1198 A single NAL unit packet contains exactly one NAL unit, and 1199 consists of a payload header (denoted as PayloadHdr), a 1200 conditional 16-bit DONL field (in network byte order), and the 1201 NAL unit payload data (the NAL unit excluding its NAL unit 1202 header) of the contained NAL unit, as shown in Figure 3. 1204 0 1 2 3 1205 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1207 | PayloadHdr | DONL (conditional) | 1208 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1209 | | 1210 | NAL unit payload data | 1211 | | 1212 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1213 | :...OPTIONAL RTP padding | 1214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1216 Figure 3 The structure a single NAL unit packet 1218 The payload header SHOULD be an exact copy of the NAL unit header 1219 of the contained NAL unit. However, the Type (i.e. 1220 nal_unit_type) field MAY be changed, e.g. when it is desirable to 1221 handle a CRA picture to be a BLA picture [JCTVC-J0107]. 1223 The DONL field, when present, specifies the value of the 16 least 1224 significant bits of the decoding order number of the contained 1225 NAL unit. If tx-mode is equal to "MRST" or "MRMT" or sprop-max- 1226 don-diff is greater than 0, the DONL field MUST be present, and 1227 the variable DON for the contained NAL unit is derived as equal 1228 to the value of the DONL field. Otherwise (tx-mode is equal to 1229 "SRST" and sprop-max-don-diff is equal to 0), the DONL field MUST 1230 NOT be present. 1232 4.7 Aggregation Packets (APs) 1234 Aggregation packets (APs) are introduced to enable the reduction 1235 of packetization overhead for small NAL units, such as most of 1236 the non-VCL NAL units, which are often only a few octets in size. 1238 An AP aggregates NAL units within one access unit. Each NAL unit 1239 to be carried in an AP is encapsulated in an aggregation unit. 1240 NAL units aggregated in one AP are in NAL unit decoding order. 1242 An AP consists of a payload header (denoted as PayloadHdr) 1243 followed by two or more aggregation units, as shown in Figure 4. 1245 0 1 2 3 1246 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1248 | PayloadHdr (Type=48) | | 1249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1250 | | 1251 | two or more aggregation units | 1252 | | 1253 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1254 | :...OPTIONAL RTP padding | 1255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1257 Figure 4 The structure of an aggregation packet 1259 The fields in the payload header are set as follows. The F bit 1260 MUST be equal to 0 if the F bit of each aggregated NAL unit is 1261 equal to zero; otherwise, it MUST be equal to 1. The Type field 1262 MUST be equal to 48. The value of LayerId MUST be equal to the 1263 lowest value of LayerId of all the aggregated NAL units. The 1264 value of TID MUST be the lowest value of TID of all the 1265 aggregated NAL units. 1267 Informative Note: All VCL NAL units in an AP have the same TID 1268 value since they belong to the same access unit. However, an 1269 AP may contain non-VCL NAL units for which the TID value in 1270 the NAL unit header may be different than the TID value of the 1271 VCL NAL units in the same AP. 1273 An AP MUST carry at least two aggregation units and can carry as 1274 many aggregation units as necessary; however, the total amount of 1275 data in an AP obviously MUST fit into an IP packet, and the size 1276 SHOULD be chosen so that the resulting IP packet is smaller than 1277 the MTU size so to avoid IP layer fragmentation. An AP MUST NOT 1278 contain Fragmentation Units (FUs) specified in section 4.8. APs 1279 MUST NOT be nested; i.e. an AP MUST NOT contain another AP. 1281 The first aggregation unit in an AP consists of a conditional 16- 1282 bit DONL field (in network byte order) followed by a 16-bit 1283 unsigned size information (in network byte order) that indicates 1284 the size of the NAL unit in bytes (excluding these two octets, 1285 but including the NAL unit header), followed by the NAL unit 1286 itself, including its NAL unit header, as shown in Figure 5. 1288 0 1 2 3 1289 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1291 : DONL (conditional) | NALU size | 1292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 | NALU size | | 1294 +-+-+-+-+-+-+-+-+ NAL unit | 1295 | | 1296 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1297 | : 1298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1300 Figure 5 The structure of the first aggregation unit in an AP 1302 The DONL field, when present, specifies the value of the 16 least 1303 significant bits of the decoding order number of the aggregated 1304 NAL unit. 1306 If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is 1307 greater than 0, the DONL field MUST be present in an aggregation 1308 unit that is the first aggregation unit in an AP, and the 1309 variable DON for the aggregated NAL unit is derived as equal to 1310 the value of the DONL field. Otherwise (tx-mode is equal to 1311 "SRST" and sprop-max-don-diff is equal to 0), the DONL field MUST 1312 NOT be present in an aggregation unit that is the first 1313 aggregation unit in an AP. 1315 An aggregation unit that is not the first aggregation unit in an 1316 AP consists of a conditional 8-bit DOND field followed by a 16- 1317 bit unsigned size information (in network byte order) that 1318 indicates the size of the NAL unit in bytes (excluding these two 1319 octets, but including the NAL unit header), followed by the NAL 1320 unit itself, including its NAL unit header, as shown in Figure 6. 1322 0 1 2 3 1323 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1325 : DOND (cond) | NALU size | 1326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1327 | | 1328 | NAL unit | 1329 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1330 | : 1331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1333 Figure 6 The structure of an aggregation unit that is not the 1334 first aggregation unit in an AP 1336 When present, the DOND field plus 1 specifies the difference 1337 between the decoding order number values of the current 1338 aggregated NAL unit and the preceding aggregated NAL unit in the 1339 same AP. 1341 If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is 1342 greater than 0, the DOND field MUST be present in an aggregation 1343 unit that is not the first aggregation unit in an AP, and the 1344 variable DON for the aggregated NAL unit is derived as equal to 1345 the DON of the preceding aggregated NAL unit in the same AP plus 1346 the value of the DOND field plus 1 modulo 65536. Otherwise (tx- 1347 mode is equal to "SRST" and sprop-max-don-diff is equal to 0), 1348 the DOND field MUST NOT be present in an aggregation unit that is 1349 not the first aggregation unit in an AP, and in this case the 1350 transmission order and decoding order of NAL units carried in the 1351 AP are the same as the order the NAL units appear in the AP. 1353 Figure 7 presents an example of an AP that contains two 1354 aggregation units, labeled as 1 and 2 in the figure, without the 1355 DONL and DOND fields being present. 1357 0 1 2 3 1358 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1360 | RTP Header | 1361 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1362 | PayloadHdr (Type=48) | NALU 1 Size | 1363 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1364 | NALU 1 HDR | | 1365 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1366 | . . . | 1367 | | 1368 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1369 | . . . | NALU 2 Size | NALU 2 HDR | 1370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1371 | NALU 2 HDR | | 1372 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1373 | . . . | 1374 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1375 | :...OPTIONAL RTP padding | 1376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1378 Figure 7 An example of an AP packet containing two aggregation 1379 units without the DONL and DOND fields 1381 Figure 8 presents an example of an AP that contains two 1382 aggregation units, labeled as 1 and 2 in the figure, with the 1383 DONL and DOND fields being present. 1385 0 1 2 3 1386 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1388 | RTP Header | 1389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1390 | PayloadHdr (Type=48) | NALU 1 DONL | 1391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1392 | NALU 1 Size | NALU 1 HDR | 1393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1394 | | 1395 | NALU 1 Data . . . | 1396 | | 1397 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1398 | | NALU 2 DOND | NALU 2 Size | 1399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1400 | NALU 2 HDR | | 1401 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1402 | | 1403 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1404 | :...OPTIONAL RTP padding | 1405 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1407 Figure 8 An example of an AP containing two aggregation units 1408 with the DONL and DOND fields 1410 4.8 Fragmentation Units (FUs) 1412 Fragmentation units (FUs) are introduced to enable fragmenting a 1413 single NAL unit into multiple RTP packets, possibly without 1414 cooperation or knowledge of the HEVC encoder. A fragment of a NAL 1415 unit consists of an integer number of consecutive octets of that 1416 NAL unit. Fragments of the same NAL unit MUST be sent in consecutive 1417 order with ascending RTP sequence numbers (with no other RTP packets 1418 within the same RTP stream being sent between the first and last 1419 fragment). 1421 When a NAL unit is fragmented and conveyed within FUs, it is 1422 referred to as a fragmented NAL unit. APs MUST NOT be 1423 fragmented. FUs MUST NOT be nested; i.e. an FU MUST NOT contain 1424 a subset of another FU. 1426 The RTP timestamp of an RTP packet carrying an FU is set to the 1427 NALU-time of the fragmented NAL unit. 1429 An FU consists of a payload header (denoted as PayloadHdr), an FU 1430 header of one octet, a conditional 16-bit DONL field (in network 1431 byte order), and an FU payload, as shown in Figure 9. 1433 0 1 2 3 1434 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1436 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1438 | DONL (cond) | | 1439 |-+-+-+-+-+-+-+-+ | 1440 | FU payload | 1441 | | 1442 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1443 | :...OPTIONAL RTP padding | 1444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1446 Figure 9 The structure of an FU 1448 The fields in the payload header are set as follows. The Type 1449 field MUST be equal to 49. The fields F, LayerId, and TID MUST 1450 be equal to the fields F, LayerId, and TID, respectively, of the 1451 fragmented NAL unit. 1453 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1454 field, as shown in Figure 10. 1456 +---------------+ 1457 |0|1|2|3|4|5|6|7| 1458 +-+-+-+-+-+-+-+-+ 1459 |S|E| FuType | 1460 +---------------+ 1462 Figure 10 The structure of FU header 1464 The semantics of the FU header fields are as follows: 1465 S: 1 bit 1466 When set to one, the S bit indicates the start of a fragmented 1467 NAL unit i.e. the first byte of the FU payload is also the 1468 first byte of the payload of the fragmented NAL unit. When 1469 the FU payload is not the start of the fragmented NAL unit 1470 payload, the S bit MUST be set to zero. 1472 E: 1 bit 1473 When set to one, the E bit indicates the end of a fragmented 1474 NAL unit, i.e. the last byte of the payload is also the last 1475 byte of the fragmented NAL unit. When the FU payload is not 1476 the last fragment of a fragmented NAL unit, the E bit MUST be 1477 set to zero. 1479 FuType: 6 bits 1480 The field FuType MUST be equal to the field Type of the 1481 fragmented NAL unit. 1483 The DONL field, when present, specifies the value of the 16 least 1484 significant bits of the decoding order number of the fragmented 1485 NAL unit. 1487 If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is 1488 greater than 0, and the S bit is equal to 1, the DONL field MUST 1489 be present in the FU, and the variable DON for the fragmented NAL 1490 unit is derived as equal to the value of the DONL field. 1491 Otherwise (tx-mode is equal to "SRST" and sprop-max-don-diff is 1492 equal to 0, or the S bit is equal to 0), the DONL field MUST NOT 1493 be present in the FU. 1495 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1496 the Start bit and End bit MUST NOT both be set to one in the same 1497 FU header. 1499 The FU payload consists of fragments of the payload of the 1500 fragmented NAL unit so that if the FU payloads of consecutive 1501 FUs, starting with an FU with the S bit equal to 1 and ending 1502 with an FU with the E bit equal to 1, are sequentially 1503 concatenated, the payload of the fragmented NAL unit can be 1504 reconstructed. The NAL unit header of the fragmented NAL unit is 1505 not included as such in the FU payload, but rather the 1506 information of the NAL unit header of the fragmented NAL unit is 1507 conveyed in F, LayerId, and TID fields of the FU payload headers 1508 of the FUs and the FuType field of the FU header of the FUs. An 1509 FU payload MUST NOT be empty. 1511 If an FU is lost, the receiver SHOULD discard all following 1512 fragmentation units in transmission order corresponding to the 1513 same fragmented NAL unit, unless the decoder in the receiver is 1514 known to be prepared to gracefully handle incomplete NAL units. 1516 A receiver in an endpoint or in a MANE MAY aggregate the first n- 1517 1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1518 fragment n of that NAL unit is not received. In this case, the 1519 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1520 a syntax violation. 1522 4.9 PACI packets 1524 This section specifies the PACI packet structure. The basic 1525 payload header specified in this memo is intentionally limited to 1526 the 16 bits of the NAL unit header so to keep the packetization 1527 overhead to a minimum. However, cases have been identified where 1528 it is advisable to include control information in an easily 1529 accessible position in the packet header, despite the additional 1530 overhead. One such control information is the Temporal 1531 Scalability Control Information as specified in section 4.10 1532 below. PACI packets carry this and future, similar structures. 1534 The PACI packet structure is based on a payload header extension 1535 mechanism that is generic and extensible to carry payload header 1536 extensions. In this section, the focus lies on the use within 1537 this specification. Section 4.9.2 below provides guidance for 1538 the specification designers in how to employ the extension 1539 mechanism in future specifications. 1541 A PACI packet consists of a payload header (denoted as 1542 PayloadHdr), for which the structure follows what is described in 1543 section 4.3 above. The payload header is followed by the fields 1544 A, cType, PHSsize, F[0..2] and Y. 1546 Figure 11 shows a PACI packet in compliance with this memo; that 1547 is, without any extensions. 1549 0 1 2 3 1550 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1551 1 1552 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1553 +-+ 1554 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1555 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1556 +-+ 1557 | Payload Header Extension Structure (PHES) | 1559 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1560 | | 1561 | PACI payload: NAL unit | 1562 | . . . | 1563 | | 1564 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1565 +-+ 1566 | :...OPTIONAL RTP padding | 1567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1568 +- 1570 Figure 11 The structure of a PACI 1572 The fields in the payload header are set as follows. The F bit 1573 MUST be equal to 0. The Type field MUST be equal to 50. The 1574 value of LayerId MUST be a copy of the LayerId field of the PACI 1575 payload NAL unit or NAL-unit-like structure. The value of TID 1576 MUST be a copy of the TID field of the PACI payload NAL unit or 1577 NAL-unit-like structure. 1579 The semantics of other fields are as follows: 1581 A: 1 bit 1582 Copy of the F bit of the PACI payload NAL unit or NAL-unit- 1583 like structure. 1585 cType: 6 bits 1586 Copy of the Type field of the PACI payload NAL unit or NAL- 1587 unit-like structure. 1589 PHSsize: 5 bits 1590 Indicates the length of the PHES field. The value is limited 1591 to be less than or equal to 32 octets, to simplify encoder 1592 design for MTU size matching. 1594 F0 1595 This field equal to 1 specifies the presence of a temporal 1596 scalability support extension in the PHES. 1598 F1, F2 1599 MUST be 0, available for future extensions, see section 4.9.2. 1601 Y: 1 bit 1602 MUST be 0, available for future extensions, see section 4.9.2. 1604 PHES: variable number of octets 1605 A variable number of octets as indicated by the value of 1606 PHSsize. 1608 PACI Payload 1609 The single NAL unit packet or NAL-unit-like structure (such 1610 as: FU or AP) to be carried, not including the first two 1611 octets. 1613 Informative note: The first two octets of the NAL unit or 1614 NAL-unit-like structure carried in the PACI payload are not 1615 included in the PACI payload. Rather, the respective values 1616 are copied in locations of the PayloadHdr of the RTP 1617 packet. This design offers two advantages: first, the 1618 overall structure of the payload header is preserved, i.e. 1619 there is no special case of payload header structure that 1620 needs to be implemented for PACI. Second, no additional 1621 overhead is introduced. 1623 A PACI payload MAY be a single NAL unit, an FU, or an AP. 1624 PACIs MUST NOT be fragmented or aggregated. The following 1625 subsection documents the reasons for these design choices. 1627 4.9.1 Reasons for the PACI rules (informative) 1629 A PACI cannot be fragmented. If a PACI could be fragmented, and 1630 a fragment other than the first fragment would get lost, access 1631 to the information in the PACI would not be possible. Therefore, 1632 a PACI must not be fragmented. In other words, an FU must not 1633 carry (fragments of) a PACI. 1635 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1636 from a compression viewpoint, as, in many cases, several to be 1637 aggregated NAL units would share identical PACI fields and values 1638 which would be carried redundantly for no reason. Most, if not 1639 all the practical effects of PACI aggregation can be achieved by 1640 aggregating NAL units and bundling them with a PACI (see below). 1641 Therefore, a PACI must not be aggregated. In other words, an AP 1642 must not contain a PACI. 1644 The payload of a PACI can be a fragment. Both middleboxes and 1645 sending systems with inflexible (often hardware-based) encoders 1646 occasionally find themselves in situations where a PACI and its 1647 headers, combined, are larger than the MTU size. In such a 1648 scenario, the middlebox or sender can fragment the NAL unit and 1649 encapsulate the fragment in a PACI. Doing so preserves the 1650 payload header extension information for all fragments, allowing 1651 downstream middleboxes and the receiver to take advantage of that 1652 information. Therefore, a sender may place a fragment into a 1653 PACI, and a receiver must be able to handle such a PACI. 1655 The payload of a PACI can be an aggregation NAL unit. HEVC 1656 bitstreams can contain unevenly sized and/or small (when compared 1657 to the MTU size) NAL units. In order to efficiently packetize 1658 such small NAL units, AP were introduced. The benefits of APs 1659 are independent from the need for a payload header extension. 1660 Therefore, a sender may place an AP into a PACI, and a receiver 1661 must be able to handle such a PACI. 1663 4.9.2 PACI extensions (Informative) 1665 This subsection includes recommendations for future specification 1666 designers on how to extent the PACI syntax to accommodate future 1667 extensions. Obviously, designers are free to specify whatever 1668 appears to be appropriate to them at the time of their design. 1669 However, a lot of thought has been invested into the extension 1670 mechanism described below, and we suggest that deviations from it 1671 warrant a good explanation. 1673 This memo defines only a single payload header extension (Temporal 1674 Scalability Control Information, described below in section 4.10), 1675 and, therefore, only the F0 bit carries semantics. F1 and F2 are 1676 already named (and not just marked as reserved, as a typical video 1677 spec designer would do). They are intended to signal two additional 1678 extensions. The Y bit allows to, recursively, add further F and Y 1679 bits to extend the mechanism beyond 3 possible payload header 1680 extensions. It is suggested to define a new packet type (using a 1681 different value for Type) when assigning the F1, F2, or Y bits 1682 different semantics than what is suggested below. 1684 When a Y bit is set, an 8 bit flag-extension is inserted after 1685 the Y bit. A flag-extension consists of 7 flags F[n..n+6], and 1686 another Y bit. 1688 The basic PACI header already includes F0, F1, and F2. 1689 Therefore, the Fx bits in the first flag-extensions are numbered 1690 F3, F4, ..., F9, the F bits in the second flag-extension are 1691 numbered F10, F11, ..., F16, and so forth. As a result, at least 1692 3 Fx bits are always in the PACI, but the number of Fx bits (and 1693 associated types of extensions), can be increased by setting the 1694 next Y bit and adding an octet of flag-extensions, carrying 7 1695 flags and another Y bit. The size of this list of flags is 1696 subject to the limits specified in section 4.9 (32 octets for all 1697 flag-extensions and the PHES information combined). 1699 Each of the F bits can indicate either the presence of 1700 information in the Payload Header Extension Structure (PHES), 1701 described below, or a given F bit can indicate a certain 1702 condition, without including additional information in the PHES. 1704 When a spec developer devises a new syntax that takes advantage 1705 of the PACI extension mechanism, he/she must follow the 1706 constraints listed below; otherwise the extension mechanism may 1707 break. 1709 1) The fields added for a particular Fx bit MUST be fixed in 1710 length and not depend on what other Fx bits are set (no 1711 parsing dependency). 1712 2) The Fx bits must be assigned in order. 1713 3) An implementation that supports the n-th Fn bit for any 1714 value of n must understand the syntax (though not 1715 necessarily the semantics) of the fields Fk (with k < n), so 1716 to be able to either use those bits when present, or at 1717 least be able to skip over them. 1719 4.10 Temporal Scalability Control Information 1721 This section describes the single payload header extension 1722 defined in this specification, known as Temporal Scalability 1723 Control Information (TSCI). If, in the future, additional 1724 payload header extensions become necessary, they could be 1725 specified in this section of an updated version of this document, 1726 or in their own documents. 1728 When F0 is set to 1 in a PACI, this specifies that the PHES field 1729 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as 1730 follows: 1732 0 1 2 3 1733 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1734 1 1735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1736 +-+ 1737 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1739 +-+ 1740 | TL0PICIDX | IrapPicID |S|E| RES | | 1741 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1742 | .... | 1743 | PACI payload: NAL unit | 1744 | | 1745 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1746 +-+ 1747 | :...OPTIONAL RTP padding | 1748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+- 1749 +-+ 1751 Figure 12 The structure of a PACI with a PHES containing a TSCI 1753 TL0PICIDX (8 bits) 1754 When present, the TL0PICIDX field MUST be set to equal to 1755 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1756 [H.265] for the access unit containing the NAL unit in the 1757 PACI. 1759 IrapPicID (8 bits) 1760 When present, the IrapPicID field MUST be set to equal to 1761 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1762 access unit containing the NAL unit in the PACI. 1764 S (1 bit) 1765 The S bit MUST be set to 1 if any of the following conditions 1766 is true and MUST be set to 0 otherwise: 1767 o The NAL unit in the payload of the PACI is the first VCL NAL 1768 unit, in decoding order, of a picture. 1770 o The NAL unit in the payload of the PACI is an AP and the NAL 1771 unit in the first contained aggregation unit is the first 1772 VCL NAL unit, in decoding order, of a picture. 1773 o The NAL unit in the payload of the PACI is an FU with its S 1774 bit equal to 1 and the FU payload containing a fragment of 1775 the first VCL NAL unit, in decoding order of a picture. 1777 E (1 bit) 1778 The E bit MUST be set to 1 if any of the following conditions 1779 is true and MUST be set to 0 otherwise: 1780 o The NAL unit in the payload of the PACI is the last VCL NAL 1781 unit, in decoding order, of a picture. 1782 o The NAL unit in the payload of the PACI is an AP and the NAL 1783 unit in the last contained aggregation unit is the last VCL 1784 NAL unit, in decoding order, of a picture. 1785 o The NAL unit in the payload of the PACI is an FU with its E 1786 bit equal to 1 and the FU payload containing a fragment of 1787 the last VCL NAL unit, in decoding order of a picture. 1789 RES (6 bits) 1790 MUST be equal to 0. Reserved for future extensions. 1792 The value of PHSsize MUST be set to 3. Receivers MUST allow 1793 other values of the fields F0, F1, F2, Y, and PHSsize, and MUST 1794 ignore any additional fields, when present, than specified above 1795 in the PHES. 1797 5 Packetization Rules 1799 The following packetization rules apply: 1801 o If tx-mode is equal to "MRST" or "MRMT" or sprop-max-don-diff is 1802 greater than 0 for an RTP stream, the transmission order of NAL 1803 units carried in the RTP stream MAY be different than the NAL 1804 unit decoding order. Otherwise (tx-mode is equal to "SRST" and 1805 sprop-max-don-diff is equal to 0 for an RTP stream), the 1806 transmission order of NAL units carried in the RTP stream MUST 1807 be the same as the NAL unit decoding order. 1809 o A NAL unit of a small size SHOULD be encapsulated in an 1810 aggregation packet together with one or more other NAL units 1811 in order to avoid the unnecessary packetization overhead for 1812 small NAL units. For example, non-VCL NAL units such as 1813 access unit delimiters, parameter sets, or SEI NAL units are 1814 typically small and can often be aggregated with VCL NAL units 1815 without violating MTU size constraints. 1817 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1818 match viewpoint, be encapsulated in an aggregation packet 1819 together with its associated VCL NAL unit, as typically a non- 1820 VCL NAL unit would be meaningless without the associated VCL 1821 NAL unit being available. 1823 o For carrying exactly one NAL unit in an RTP packet, a single 1824 NAL unit packet MUST be used. 1826 6 De-packetization Process 1828 The general concept behind de-packetization is to get the NAL 1829 units out of the RTP packets in an RTP stream and all RTP streams 1830 the RTP stream depends on, if any, and pass them to the decoder 1831 in the NAL unit decoding order. 1833 The de-packetization process is implementation dependent. 1834 Therefore, the following description should be seen as an example 1835 of a suitable implementation. Other schemes may be used as well 1836 as long as the output for the same input is the same as the 1837 process described below. The output is the same when the set of 1838 output NAL units and their order are both identical. 1839 Optimizations relative to the described algorithms are possible. 1841 All normal RTP mechanisms related to buffer management apply. In 1842 particular, duplicated or outdated RTP packets (as indicated by 1843 the RTP sequences number and the RTP timestamp) are removed. To 1844 determine the exact time for decoding, factors such as a possible 1845 intentional delay to allow for proper inter-stream 1846 synchronization must be factored in. 1848 NAL units with NAL unit type values in the range of 0 to 47, 1849 inclusive may be passed to the decoder. NAL-unit-like structures 1850 with NAL unit type values in the range of 48 to 63, inclusive, 1851 MUST NOT be passed to the decoder. 1853 The receiver includes a receiver buffer, which is used to 1854 compensate for transmission delay jitter within individual RTP 1855 streams and across RTP streams, to reorder NAL units from 1856 transmission order to the NAL unit decoding order, and to recover 1857 the NAL unit decoding order in MRST or MRMT, when applicable. In 1858 this section, the receiver operation is described under the 1859 assumption that there is no transmission delay jitter within an 1860 RTP stream and across RTP streams. To make a difference from a 1861 practical receiver buffer that is also used for compensation of 1862 transmission delay jitter, the receiver buffer is here after 1863 called the de-packetization buffer in this section. Receivers 1864 should also prepare for transmission delay jitter; i.e. either 1865 reserve separate buffers for transmission delay jitter buffering 1866 and de-packetization buffering or use a receiver buffer for both 1867 transmission delay jitter and de-packetization. Moreover, 1868 receivers should take transmission delay jitter into account in 1869 the buffering operation; e.g. by additional initial buffering 1870 before starting of decoding and playback. 1872 If only one RTP stream is being received and sprop-max-don-diff 1873 of the only RTP stream being received is equal to 0, the de- 1874 packetization buffer size is zero bytes, i.e. the NAL units 1875 carried in the RTP stream are directly passed to the decoder in 1876 their transmission order, which is identical to the decoding 1877 order of the NAL units. Otherwise, the process described in the 1878 remainder of this section applies. 1880 There are two buffering states in the receiver: initial buffering 1881 and buffering while playing. Initial buffering starts when the 1882 reception is initialized. After initial buffering, decoding and 1883 playback are started, and the buffering-while-playing mode is 1884 used. 1886 Regardless of the buffering state, the receiver stores incoming 1887 NAL units, in reception order, into the de-packetization buffer. 1888 NAL units carried in RTP packets are stored in the de- 1889 packetization buffer individually, and the value of AbsDon is 1890 calculated and stored for each NAL unit. When MRST or MRMT is in 1891 use, NAL units of all RTP streams of a bitstream are stored in 1892 the same de-packetization buffer. When NAL units carried in any 1893 two RTP streams are available to be placed into the de- 1894 packetization buffer, those NAL units carried in the RTP stream 1895 that is lower in the dependency tree are placed into the buffer 1896 first. For example, if RTP stream A depends on RTP stream B, 1897 then NAL units carried in RTP stream B are placed into the buffer 1898 first. 1900 Initial buffering lasts until condition A (the difference between 1901 the greatest and smallest AbsDon values of the NAL units in the 1902 de-packetization buffer is greater than or equal to the value of 1903 sprop-max-don-diff of the highest RTP stream) or condition B (the 1904 number of NAL units in the de-packetization buffer is greater 1905 than the value of sprop-depack-buf-nalus) is true. 1907 After initial buffering, whenever condition A or condition B is 1908 true, the following operation is repeatedly applied until both 1909 condition A and condition A become false: 1911 o The NAL unit in the de-packetization buffer with the smallest 1912 value of AbsDon is removed from the de-packetization buffer 1913 and passed to the decoder. 1915 When no more NAL units are flowing into the de-packetization 1916 buffer, all NAL units remaining in the de-packetization buffer 1917 are removed from the buffer and passed to the decoder in the 1918 order of increasing AbsDon values. 1920 7 Payload Format Parameters 1922 This section specifies the parameters that MAY be used to select 1923 optional features of the payload format and certain features or 1924 properties of the bitstream or the RTP stream. The parameters 1925 are specified here as part of the media type registration for the 1926 HEVC codec. A mapping of the parameters into the Session 1927 Description Protocol (SDP) [RFC4566] is also provided for 1928 applications that use SDP. Equivalent parameters could be 1929 defined elsewhere for use with control protocols that do not use 1930 SDP. 1932 7.1 Media Type Registration 1934 The media subtype for the HEVC codec is allocated from the IETF 1935 tree. 1937 The receiver MUST ignore any unrecognized parameter. 1939 Media Type name: video 1941 Media subtype name: H265 1943 Required parameters: none 1945 OPTIONAL parameters: 1947 profile-space, tier-flag, profile-id, profile-compatibility- 1948 indicator, interop-constraints, and level-id: 1950 These parameters indicate the profile, tier, default level, 1951 and some constraints of the bitstream carried by the RTP 1952 stream and all RTP streams the RTP stream depends on, or a 1953 specific set of the profile, tier, default level, and some 1954 constraints the receiver supports. 1956 The profile and some constraints are indicated collectively 1957 by profile-space, profile-id, profile-compatibility- 1958 indicator, and interop-constraints. The profile specifies 1959 the subset of coding tools that may have been used to 1960 generate the bitstream or that the receiver supports. 1962 Informative note: There are 32 values of profile-id, and 1963 there are 32 flags in profile-compatibility-indicator, 1964 each flag corresponding to one value of profile-id. 1965 According to HEVC version 1 in [HEVC], when more than 1966 one of the 32 flags is set for a bitstream, the 1967 bitstream would comply with all the profiles 1968 corresponding to the set flags. However, in a draft of 1969 HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 1970 Format Range Extensions profiles have been specified, 1971 all using the same value of profile-id (4), 1972 differentiated by some of the 48 bits in interop- 1973 constraints - this (rather unexpected way of profile 1974 signalling) means that one of the 32 flags may 1975 correspond to multiple profiles. To be able to support 1976 whatever HEVC extension profile that might be specified 1977 and indicated using profile-space, profile-id, profile- 1978 compatibility-indicator, and interop-constraints in the 1979 future, it would be safe to require symmetric use of 1980 these parameters in SDP offer/answer unless recv-sub- 1981 layer-id is included in the SDP answer for choosing one 1982 of the sub-layers offered. 1984 The tier is indicated by tier-flag. The default level is 1985 indicated by level-id. The tier and the default level 1986 specify the limits on values of syntax elements or 1987 arithmetic combinations of values of syntax elements that 1988 are followed when generating the bitstream or that the 1989 receiver supports. 1991 A set of profile-space, tier-flag, profile-id, profile- 1992 compatibility-indicator, interop-constraints, and level-id 1993 parameters ptlA is said to be consistent with another set 1994 of these parameters ptlB if any decoder that conforms to 1995 the profile, tier, level, and constraints indicated by ptlB 1996 can decode any bitstream that conforms to the profile, 1997 tier, level, and constraints indicated by ptlA. 1999 In SDP offer/answer, when the SDP answer does not include 2000 the recv-sub-layer-id parameter that is less than the 2001 sprop-sub-layer-id parameter in the SDP offer, the 2002 following applies: 2004 o The profile-space, tier-flag, profile-id, profile- 2005 compatibility-indicator, and interop-constraints 2006 parameters MUST be used symmetrically, i.e. the value 2007 of each of these parameters in the offer MUST be the 2008 same as that in the answer, either explicitly 2009 signalled or implicitly inferred. 2010 o The level-id parameter is changeable as long as the 2011 highest level indicated by the answer is either equal 2012 to or lower than that in the offer. Note that the 2013 highest level is indicated by level-id and max-recv- 2014 level-id together. 2016 In SDP offer/answer, when the SDP answer does include the 2017 recv-sub-layer-id parameter that is less than the sprop- 2018 sub-layer-id parameter in the SDP offer, the set of 2019 profile-space, tier-flag, profile-id, profile- 2020 compatibility-indicator, interop-constraints, and level-id 2021 parameters included in the answer MUST be consistent with 2022 that for the chosen sub-layer representation as indicated 2023 in the SDP offer, with the exception that the level-id 2024 parameter in the SDP answer is changable as long as the 2025 highest level indicated by the answer is either lower than 2026 or equal to that in the offer. 2028 More specifications of these parameters, including how they 2029 relate to the values of the profile, tier, and level syntax 2030 elements specified in [HEVC] are provided below. 2032 profile-space, profile-id: 2034 The value of profile-space MUST be in the range of 0 to 3, 2035 inclusive. The value of profile-id MUST be in the range of 2036 0 to 31, inclusive. 2038 When profile-space is not present, a value of 0 MUST be 2039 inferred. When profile-id is not present, a value of 1 2040 (i.e. the Main profile) MUST be inferred. 2042 When used to indicate properties of a bitstream, profile- 2043 space and profile-id are derived from the profile, tier, 2044 and level syntax elements in SPS or VPS NAL units as 2045 follows, where general_profile_space, general_profile_idc, 2046 sub_layer_profile_space[j], and sub_layer_profile_idc[j] 2047 are specified in [HEVC]: 2049 If the RTP stream is the highest RTP stream, the 2050 following applies: 2052 o profile_space = general_profile_space 2053 o profile_id = general_profile_idc 2054 Otherwise (the RTP stream is a dependee RTP stream), the 2055 following applies, with j being the value of the sprop- 2056 sub-layer-id parameter: 2058 o profile_space = sub_layer_profile_space[j] 2059 o profile_id = sub_layer_profile_idc[j] 2061 tier-flag, level-id: 2063 The value of tier-flag MUST be in the range of 0 to 1, 2064 inclusive. The value of level-id MUST be in the range of 0 2065 to 255, inclusive. 2067 If the tier-flag and level-id parameters are used to 2068 indicate properties of a bitstream, they indicate the tier 2069 and the highest level the bitstream complies with. 2071 If the tier-flag and level-id parameters are used for 2072 capability exchange, the following applies. If max-recv- 2073 level-id is not present, the default level defined by 2074 level-id indicates the highest level the codec wishes to 2075 support. Otherwise, max-recv-level-id indicates the 2076 highest level the codec supports for receiving. For either 2077 receiving or sending, all levels that are lower than the 2078 highest level supported MUST also be supported. 2080 If no tier-flag is present, a value of 0 MUST be inferred 2081 and if no level-id is present, a value of 93 (i.e. level 2082 3.1) MUST be inferred. 2084 When used to indicate properties of a bitstream, the tier- 2085 flag and level-id parameters are derived from the profile, 2086 tier, and level syntax elements in SPS or VPS NAL units as 2087 follows, where general_tier_flag, general_level_idc, 2088 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 2089 specified in [HEVC]: 2091 If the RTP stream is the highest RTP stream, the 2092 following applies: 2094 o tier-flag = general_tier_flag 2095 o level-id = general_level_idc 2097 Otherwise (the RTP stream is a dependee RTP stream), the 2098 following applies, with j being the value of the sprop- 2099 sub-layer-id parameter: 2101 o tier-flag = sub_layer_tier_flag[j] 2102 o level-id = sub_layer_level_idc[j] 2104 interop-constraints: 2106 A base16 [RFC4648] (hexadecimal) representation of six 2107 bytes of data, consisting of progressive_source_flag, 2108 interlaced_source_flag, non_packed_constraint_flag, 2109 frame_only_constraint_flag, and reserved_zero_44bits. 2111 If the interop-constraints parameter is not present, the 2112 following MUST be inferred: 2114 o progressive_source_flag = 1 2115 o interlaced_source_flag = 0 2116 o non_packed_constraint_flag = 1 2117 o frame_only_constraint_flag = 1 2118 o reserved_zero_44bits = 0 2120 When the interop-constraints parameter is used to indicate 2121 properties of a bitstream, the following applies, where 2122 general_progressive_source_flag, 2123 general_interlaced_source_flag, 2124 general_non_packed_constraint_flag, 2125 general_non_packed_constraint_flag, 2126 general_frame_only_constraint_flag, 2127 general_reserved_zero_44bits, 2128 sub_layer_progressive_source_flag[j], 2129 sub_layer_interlaced_source_flag[j], 2130 sub_layer_non_packed_constraint_flag[j], 2131 sub_layer_frame_only_constraint_flag[j], and 2132 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2134 If the RTP stream is the highest RTP stream, the 2135 following applies: 2137 o progressive_source_flag = 2138 general_progressive_source_flag 2139 o interlaced_source_flag = 2140 general_interlaced_source_flag 2141 o non_packed_constraint_flag = 2142 general_non_packed_constraint_flag 2143 o frame_only_constraint_flag = 2144 general_frame_only_constraint_flag 2145 o reserved_zero_44bits = general_reserved_zero_44bits 2147 Otherwise (the RTP stream is a dependee RTP stream), the 2148 following applies, with j being the value of the sprop- 2149 sub-layer-id parameter: 2151 o progressive_source_flag = 2152 sub_layer_progressive_source_flag[j] 2153 o interlaced_source_flag = 2154 sub_layer_interlaced_source_flag[j] 2155 o non_packed_constraint_flag = 2157 sub_layer_non_packed_constraint_flag[j] 2158 o frame_only_constraint_flag = 2160 sub_layer_frame_only_constraint_flag[j] 2161 o reserved_zero_44bits = 2162 sub_layer_reserved_zero_44bits[j] 2164 Using interop-constraints for capability exchange results 2165 in a requirement on any bitstream to be compliant with the 2166 interop-constraints. 2168 profile-compatibility-indicator: 2170 A base16 [RFC4648] representation of four bytes of data. 2172 When profile-compatibility-indicator is used to indicate 2173 properties of a bitstream, the following applies, where 2174 general_profile_compatibility_flag[j] and 2175 sub_layer_profile_compatibility_flag[i][j] are specified in 2176 [HEVC]: 2178 The profile-compatibility-indicator in this case 2179 indicates additional profiles to the profile defined by 2180 profile_space, profile_id, and interop-constraints the 2181 bitstream conforms to. A decoder that conforms to any 2182 of all the profiles the bitstream conforms to would be 2183 capable of decoding the bitstream. These additional 2184 profiles are defined by profile-space, each set bit of 2185 profile-compatibility-indicator, and interop- 2186 constraints. 2188 If the RTP stream is the highest RTP stream, the 2189 following applies for each value of j in the range of 0 2190 to 31, inclusive: 2192 o bit j of profile-compatibility-indicator = 2193 general_profile_compatibility_flag[j] 2195 Otherwise (the RTP stream is a dependee RTP stream), the 2196 following applies for i equal to sprop-sub-layer-id and 2197 for each value of j in the range of 0 to 31, inclusive: 2199 o bit j of profile-compatibility-indicator = 2200 sub_layer_profile_compatibility_flag[i][j] 2202 Using profile-compatibility-indicator for capability 2203 exchange results in a requirement on any bitstream to be 2204 compliant with the profile-compatibility-indicator. This 2205 is intended to handle cases where any future HEVC profile 2206 is defined as an intersection of two or more profiles. 2208 If this parameter is not present, this parameter defaults 2209 to the following: bit j, with j equal to profile-id, of 2210 profile-compatibility-indicator is inferred to be equal to 2211 1, and all other bits are inferred to be equal to 0. 2213 sprop-sub-layer-id: 2215 This parameter MAY be used to indicate the highest allowed 2216 value of TID in the bitstream. When not present, the value 2217 of sprop-sub-layer-id is inferred to be equal to 6. 2219 The value of sprop-sub-layer-id MUST be in the range of 0 2220 to 6, inclusive. 2222 recv-sub-layer-id: 2224 This parameter MAY be used to signal a receiver's choice of 2225 the offered or declared sub-layer representations in the 2226 sprop-vps. The value of recv-sub-layer-id indicates the 2227 TID of the highest sub-layer of the bitstream that a 2228 receiver supports. When not present, the value of recv- 2229 sub-layer-id is inferred to be equal to the value of the 2230 sprop-sub-layer-id parameter in the SDP offer. 2232 The value of recv-sub-layer-id MUST be in the range of 0 to 2233 6, inclusive. 2235 max-recv-level-id: 2237 This parameter MAY be used to indicate the highest level a 2238 receiver supports. The highest level the receiver supports 2239 is equal to the value of max-recv-level-id divided by 30. 2241 The value of max-recv-level-id MUST be in the range of 0 2242 to 255, inclusive. 2244 When max-recv-level-id is not present, the value is 2245 inferred to be equal to level-id. 2247 max-recv-level-id MUST NOT be present when the highest 2248 level the receiver supports is not higher than the default 2249 level. 2251 tx-mode: 2253 This parameter indicates whether the transmission mode is 2254 SRST, MRST, or MRMT. 2256 The value of tx-mode MUST be equal to "SRST", "MRST" or 2257 "MRMT". When not present, the value of tx-mode is inferred 2258 to be equal to "SRST". 2260 If the value is equal to "MRST", MRST MUST be in use. 2261 Otherwise, if the value is equal to "MRMT", MRMT MUST be in 2262 use. Otherwise (the value is equal to "SRST"), SRST MUST be 2263 in use. 2265 The value of tx-mode MUST be equal to "MRST" for all RTP 2266 streams in an MRST. 2268 The value of tx-mode MUST be equal to "MRMT" for all RTP 2269 streams in an MRMT. 2271 sprop-vps: 2273 This parameter MAY be used to convey any video parameter 2274 set NAL unit of the bitstream for out-of-band transmission 2275 of video parameter sets. The parameter MAY also be used 2276 for capability exchange and to indicate sub-stream 2277 characteristics (i.e. properties of sub-layer 2278 representations as defined in [HEVC]). The value of the 2279 parameter is a comma-separated (',') list of base64 2280 [RFC4648] representations of the video parameter set NAL 2281 units as specified in Section 7.3.2.1 of [HEVC]. 2283 The sprop-vps parameter MAY contain one or more than one 2284 video parameter set NAL unit. However, all other video 2285 parameter sets contained in the sprop-vps parameter MUST be 2286 consistent with the first video parameter set in the sprop- 2287 vps parameter. A video parameter set vpsB is said to be 2288 consistent with another video parameter set vpsA if any 2289 decoder that conforms to the profile, tier, level, and 2290 constraints indicated by the 12 bytes of data starting from 2291 the syntax element general_profile_space to the syntax 2292 element general_level_id, inclusive, in the first 2293 profile_tier_level( ) syntax structure in vpsA can decode 2294 any bitstream that conforms to the profile, tier, level, 2295 and constraints indicated by the 12 bytes of data starting 2296 from the syntax element general_profile_space to the syntax 2297 element general_level_id, inclusive, in the first 2298 profile_tier_level( ) syntax structure in vpsB. 2300 sprop-sps: 2302 This parameter MAY be used to convey sequence parameter set 2303 NAL units of the bitstream for out-of-band transmission of 2304 sequence parameter sets. The value of the parameter is a 2305 comma-separated (',') list of base64 [RFC4648] 2306 representations of the sequence parameter set NAL units as 2307 specified in Section 7.3.2.2 of [HEVC]. 2309 sprop-pps: 2311 This parameter MAY be used to convey picture parameter set 2312 NAL units of the bitstream for out-of-band transmission of 2313 picture parameter sets. The value of the parameter is a 2314 comma-separated (',') list of base64 [RFC4648] 2315 representations of the picture parameter set NAL units as 2316 specified in Section 7.3.2.3 of [HEVC]. 2318 sprop-sei: 2320 This parameter MAY be used to convey one or more SEI 2321 messages that describe bitstream characteristics. When 2322 present, a decoder can rely on the bitstream 2323 characteristics that are described in the SEI messages for 2324 the entire duration of the session, independently from the 2325 persistence scopes of the SEI messages as specified in 2326 [HEVC]. 2328 The value of the parameter is a comma-separated (',') list 2329 of base64 [RFC4648] representations of SEI NAL units as 2330 specified in Section 7.3.2.4 of [HEVC]. 2332 Informative note: Intentionally, no list of applicable 2333 or inapplicable SEI messages is specified here. 2334 Conveying certain SEI messages in sprop-sei may be 2335 sensible in some application scenarios and meaningless 2336 in others. However, a few examples are described below: 2338 1) In an environment where the bitstream was created 2339 from film-based source material, and no splicing is 2340 going to occur during the lifetime of the session, 2341 the film grain characteristics SEI message or the 2342 tone mapping information SEI message are likely 2343 meaningful, and sending them in sprop-sei rather than 2344 in the bitstream at each entry point may help saving 2345 bits and allows to configure the renderer only once, 2346 avoiding unwanted artifacts. 2347 2) The structure of pictures information SEI message in 2348 sprop-sei can be used to inform a decoder of 2349 information on the NAL unit types, picture order 2350 count values, and prediction dependencies of a 2351 sequence of pictures. Having such knowledge can be 2352 helpful for error recovery. 2353 3) Examples for SEI messages that would be meaningless 2354 to be conveyed in sprop-sei include the decoded 2355 picture hash SEI message (it is close to impossible 2356 that all decoded pictures have the same hash-tag), 2357 the display orientation SEI message when the device 2358 is a handheld device (as the display orientation may 2359 change when the handheld device is turned around), or 2360 the filler payload SEI message (as there is no point 2361 in just having more bits in SDP). 2363 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2365 These parameters MAY be used to signal the capabilities of 2366 a receiver implementation. These parameters MUST NOT be 2367 used for any other purpose. The highest level (specified 2368 by max-recv-level-id) MUST be such that the receiver is 2369 fully capable of supporting. max-lsr, max-lps, max-cpb, 2370 max-dpb, max-br, max-tr, and max-tc MAY be used to indicate 2371 capabilities of the receiver that extend the required 2372 capabilities of the highest level, as specified below. 2374 When more than one parameter from the set (max-lsr, max- 2375 lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, 2376 the receiver MUST support all signaled capabilities 2377 simultaneously. For example, if both max-lsr and max-br 2378 are present, the highest level with the extension of both 2379 the picture rate and bitrate is supported. That is, the 2380 receiver is able to decode bitstreams in which the luma 2381 sample rate is up to max-lsr (inclusive), the bitrate is up 2382 to max-br (inclusive), the coded picture buffer size is 2383 derived as specified in the semantics of the max-br 2384 parameter below, and the other properties comply with the 2385 highest level specified by max-recv-level-id. 2387 Informative note: When the OPTIONAL media type 2388 parameters are used to signal the properties of a 2389 bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- 2390 br, max-tr, and max-tc are not present, the values of 2391 profile-space, tier-flag, profile-id, profile- 2392 compatibility-indicator, interop-constraints, and level- 2393 id must always be such that the bitstream complies fully 2394 with the specified profile, tier, and level. 2396 max-lsr: 2397 The value of max-lsr is an integer indicating the maximum 2398 processing rate in units of luma samples per second. The 2399 max-lsr parameter signals that the receiver is capable of 2400 decoding video at a higher rate than is required by the 2401 highest level. 2403 When max-lsr is signaled, the receiver MUST be able to 2404 decode bitstreams that conform to the highest level, with 2405 the exception that the MaxLumaSR value in Table A-2 of 2406 [HEVC] for the highest level is replaced with the value of 2407 max-lsr. Senders MAY use this knowledge to send pictures 2408 of a given size at a higher picture rate than is indicated 2409 in the highest level. 2411 When not present, the value of max-lsr is inferred to be 2412 equal to the value of MaxLumaSR given in Table A-2 of 2413 [HEVC] for the highest level. 2415 The value of max-lsr MUST be in the range of MaxLumaSR to 2416 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in 2417 Table A-2 of [HEVC] for the highest level. 2419 max-lps: 2420 The value of max-lps is an integer indicating the maximum 2421 picture size in units of luma samples. The max-lps 2422 parameter signals that the receiver is capable of decoding 2423 larger picture sizes than are required by the highest 2424 level. When max-lps is signaled, the receiver MUST be able 2425 to decode bitstreams that conform to the highest level, 2426 with the exception that the MaxLumaPS value in Table A-1 of 2427 [HEVC] for the highest level is replaced with the value of 2428 max-lps. Senders MAY use this knowledge to send larger 2429 pictures at a proportionally lower picture rate than is 2430 indicated in the highest level. 2432 When not present, the value of max-lps is inferred to be 2433 equal to the value of MaxLumaPS given in Table A-1 of 2434 [HEVC] for the highest level. 2436 The value of max-lps MUST be in the range of MaxLumaPS to 2437 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in 2438 Table A-1 of [HEVC] for the highest level. 2440 max-cpb: 2441 The value of max-cpb is an integer indicating the maximum 2442 coded picture buffer size in units of CpbBrVclFactor bits 2443 for the VCL HRD parameters and in units of CpbBrNalFactor 2444 bits for the NAL HRD parameters, where CpbBrVclFactor and 2445 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The 2446 max-cpb parameter signals that the receiver has more memory 2447 than the minimum amount of coded picture buffer memory 2448 required by the highest level. When max-cpb is signaled, 2449 the receiver MUST be able to decode bitstreams that conform 2450 to the highest level, with the exception that the MaxCPB 2451 value in Table A-1 of [HEVC] for the highest level is 2452 replaced with the value of max-cpb. Senders MAY use this 2453 knowledge to construct coded bitstreams with greater 2454 variation of bitrate than can be achieved with the MaxCPB 2455 value in Table A-1 of [HEVC]. 2457 When not present, the value of max-cpb is inferred to be 2458 equal to the value of MaxCPB given in Table A-1 of [HEVC] 2459 for the highest level. 2461 The value of max-cpb MUST be in the range of MaxCPB to 2462 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table 2463 A-1 of [HEVC] for the highest level. 2465 Informative note: The coded picture buffer is used in 2466 the hypothetical reference decoder (Annex C of HEVC). 2467 The use of the hypothetical reference decoder is 2468 recommended in HEVC encoders to verify that the produced 2469 bitstream conforms to the standard and to control the 2470 output bitrate. Thus, the coded picture buffer is 2471 conceptually independent of any other potential buffers 2472 in the receiver, including de-packetization and de- 2473 jitter buffers. The coded picture buffer need not be 2474 implemented in decoders as specified in Annex C of HEVC, 2475 but rather standard-compliant decoders can have any 2476 buffering arrangements provided that they can decode 2477 standard-compliant bitstreams. Thus, in practice, the 2478 input buffer for a video decoder can be integrated with 2479 de-packetization and de-jitter buffers of the receiver. 2481 max-dpb: 2482 The value of max-dpb is an integer indicating the maximum 2483 decoded picture buffer size in units decoded pictures at 2484 the MaxLumaPS for the highest level, i.e. the number of 2485 decoded pictures at the maximum picture size defined by the 2486 highest level. The value of max-dpb MUST be in the range 2487 of 1 to 16, respectively. The max-dpb parameter signals 2488 that the receiver has more memory than the minimum amount 2489 of decoded picture buffer memory required by default, which 2490 is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When 2491 max-dpb is signaled, the receiver MUST be able to decode 2492 bitstreams that conform to the highest level, with the 2493 exception that the MaxDpbPicBuff value defined in [HEVC] as 2494 6 is replaced with the value of max-dpb. Consequently, a 2495 receiver that signals max-dpb MUST be capable of storing 2496 the following number of decoded pictures (MaxDpbSize) in 2497 its decoded picture buffer: 2499 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2500 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2501 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2502 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2503 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 2504 ) ) 2505 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2506 else 2507 MaxDpbSize = max-dpb 2509 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the 2510 highest level and PicSizeInSamplesY is the current size of 2511 each decoded picture in units of luma samples as defined in 2512 [HEVC]. 2514 The value of max-dpb MUST be greater than or equal to the 2515 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2516 Senders MAY use this knowledge to construct coded 2517 bitstreams with improved compression. 2519 When not present, the value of max-dpb is inferred to be 2520 equal to the value of MaxDpbPicBuf (i.e. 6) as defined in 2521 [HEVC]. 2523 Informative note: This parameter was added primarily to 2524 complement a similar codepoint in the ITU-T 2525 Recommendation H.245, so as to facilitate signaling 2526 gateway designs. The decoded picture buffer stores 2527 reconstructed samples. There is no relationship between 2528 the size of the decoded picture buffer and the buffers 2529 used in RTP, especially de-packetization and de-jitter 2530 buffers. 2532 max-br: 2533 The value of max-br is an integer indicating the maximum 2534 video bitrate in units of CpbBrVclFactor bits per second 2535 for the VCL HRD parameters and in units of CpbBrNalFactor 2536 bits per second for the NAL HRD parameters, where 2537 CpbBrVclFactor and CpbBrNalFactor are defined in Section 2538 A.4 of [HEVC]. 2540 The max-br parameter signals that the video decoder of the 2541 receiver is capable of decoding video at a higher bitrate 2542 than is required by the highest level. 2544 When max-br is signaled, the video codec of the receiver 2545 MUST be able to decode bitstreams that conform to the 2546 highest level, with the following exceptions in the limits 2547 specified by the highest level: 2549 o The value of max-br replaces the MaxBR value in Table A- 2550 2 of [HEVC] for the highest level. 2551 o When the max-cpb parameter is not present, the result of 2552 the following formula replaces the value of MaxCPB in 2553 Table A-1 of [HEVC]: 2555 (MaxCPB of the highest level) * max-br / (MaxBR of 2556 the highest level) 2558 For example, if a receiver signals capability for Main 2559 profile Level 2 with max-br equal to 2000, this indicates a 2560 maximum video bitrate of 2000 kbits/sec for VCL HRD 2561 parameters, a maximum video bitrate of 2200 kbits/sec for 2562 NAL HRD parameters, and a CPB size of 2000000 bits (2000000 2563 / 1500000 * 1500000). 2565 Senders MAY use this knowledge to send higher bitrate video 2566 as allowed in the level definition of Annex A of HEVC to 2567 achieve improved video quality. 2569 When not present, the value of max-br is inferred to be 2570 equal to the value of MaxBR given in Table A-2 of [HEVC] 2571 for the highest level. 2573 The value of max-br MUST be in the range of MaxBR to 2574 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2575 [HEVC] for the highest level. 2577 Informative note: This parameter was added primarily to 2578 complement a similar codepoint in the ITU-T 2579 Recommendation H.245, so as to facilitate signaling 2580 gateway designs. The assumption that the network is 2581 capable of handling such bitrates at any given time 2582 cannot be made from the value of this parameter. In 2583 particular, no conclusion can be drawn that the signaled 2584 bitrate is possible under congestion control 2585 constraints. 2587 max-tr: 2588 The value of max-tr is an integer indication the maximum 2589 number of tile rows. The max-tr parameter signals that the 2590 receiver is capable of decoding video with a larger number 2591 of tile rows than the value allowed by the highest level. 2593 When max-tr is signaled, the receiver MUST be able to 2594 decode bitstreams that conform to the highest level, with 2595 the exception that the MaxTileRows value in Table A-1 of 2596 [HEVC] for the highest level is replaced with the value of 2597 max-tr. 2599 Senders MAY use this knowledge to send pictures utilizing a 2600 larger number of tile rows than the value allowed by the 2601 highest level. 2603 When not present, the value of max-tr is inferred to be 2604 equal to the value of MaxTileRows given in Table A-1 of 2605 [HEVC] for the highest level. 2607 The value of max-tr MUST be in the range of MaxTileRows to 2608 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2609 Table A-1 of [HEVC] for the highest level. 2611 max-tc: 2612 The value of max-tc is an integer indication the maximum 2613 number of tile columns. The max-tc parameter signals that 2614 the receiver is capable of decoding video with a larger 2615 number of tile columns than the value allowed by the 2616 highest level. 2618 When max-tc is signaled, the receiver MUST be able to 2619 decode bitstreams that conform to the highest level, with 2620 the exception that the MaxTileCols value in Table A-1 of 2621 [HEVC] for the highest level is replaced with the value of 2622 max-tc. 2624 Senders MAY use this knowledge to send pictures utilizing a 2625 larger number of tile columns than the value allowed by the 2626 highest level. 2628 When not present, the value of max-tc is inferred to be 2629 equal to the value of MaxTileCols given in Table A-1 of 2630 [HEVC] for the highest level. 2632 The value of max-tc MUST be in the range of MaxTileCols to 2633 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2634 Table A-1 of [HEVC] for the highest level. 2636 max-fps: 2638 The value of max-fps is an integer indicating the maximum 2639 picture rate in units of pictures per 100 seconds that can 2640 be effectively processed by the receiver. The max-fps 2641 parameter MAY be used to signal that the receiver has a 2642 constraint in that it is not capable of processing video 2643 effectively at the full picture rate that is implied by the 2644 highest level and, when present, one or more of the 2645 parameters max-lsr, max-lps, and max-br. 2647 The value of max-fps is not necessarily the picture rate at 2648 which the maximum picture size can be sent, it constitutes 2649 a constraint on maximum picture rate for all resolutions. 2651 Informative note: The max-fps parameter is semantically 2652 different from max-lsr, max-lps, max-cpb, max-dpb, max- 2653 br, max-tr, and max-tc in that max-fps is used to signal 2654 a constraint, lowering the maximum picture rate from 2655 what is implied by other parameters. 2657 The encoder MUST use a picture rate equal to or less than 2658 this value. In cases where the max-fps parameter is absent 2659 the encoder is free to choose any picture rate according to 2660 the highest level and any signaled optional parameters. 2662 The value of max-fps MUST be smaller than or equal to the 2663 full picture rate that is implied by the highest level and, 2664 when present, one or more of the parameters max-lsr, max- 2665 lps, and max-br. 2667 sprop-max-don-diff: 2669 The value of this parameter MUST be equal to 0, if the RTP 2670 stream does not depend on other RTP streams and there is no 2671 NAL unit naluA that is followed in transmission order by 2672 any NAL unit preceding naluA in decoding order. Otherwise, 2673 this parameter specifies the maximum absolute difference 2674 between the decoding order number (i.e., AbsDon) values of 2675 any two NAL units naluA and naluB, where naluA follows 2676 naluB in decoding order and precedes naluB in transmission 2677 order. 2679 The value of sprop-max-don-diff MUST be an integer in the 2680 range of 0 to 32767, inclusive. 2682 When not present, the value of sprop-max-don-diff is 2683 inferred to be equal to 0. 2685 When the RTP stream depends on one or more other RTP 2686 streams (in this case tx-mode MUST be equal to "MRST" or 2687 "MRMT"), this parameter MUST be present and the value MUST 2688 be greater than 0. 2690 Informative note: When the RTP stream does not depend on 2691 other RTP streams, any of SRST, MRST and MRMT may be in 2692 use. 2694 sprop-depack-buf-nalus: 2696 This parameter specifies the maximum number of NAL units 2697 that precede a NAL unit in transmission order and follow 2698 the NAL unit in decoding order. 2700 The value of sprop-depack-buf-nalus MUST be an integer in 2701 the range of 0 to 32767, inclusive. 2703 When not present, the value of sprop-depack-buf-nalus is 2704 inferred to be equal to 0. 2706 When the RTP stream depends on one or more other RTP 2707 streams (in this case tx-mode MUST be equal to "MRST" or 2708 "MRMT"), this parameter MUST be present and the value MUST 2709 be greater than 0. 2711 sprop-depack-buf-bytes: 2713 This parameter signals the required size of the de- 2714 packetization buffer in units of bytes. The value of the 2715 parameter MUST be greater than or equal to the maximum 2716 buffer occupancy (in units of bytes) of the de- 2717 packetization buffer as specified in section 6. 2719 The value of sprop-depack-buf-bytes MUST be an integer in 2720 the range of 0 to 4294967295, inclusive. 2722 When the RTP stream depends on one or more other RTP 2723 streams (in this case tx-mode MUST be equal to "MRST" or 2724 "MRMT") or sprop-max-don-diff is present and greater 2725 than 0, this parameter MUST be present and the value MUST 2726 be greater than 0. 2728 Informative note: The value of sprop-depack-buf-bytes 2729 indicates the required size of the de-packetization 2730 buffer only. When network jitter can occur, an 2731 appropriately sized jitter buffer has to be available as 2732 well. 2734 depack-buf-cap: 2736 This parameter signals the capabilities of a receiver 2737 implementation and indicates the amount of de-packetization 2738 buffer space in units of bytes that the receiver has 2739 available for reconstructing the NAL unit decoding order 2740 from NAL units carried in one or more RTP streams. A 2741 receiver is able to handle any RTP stream, and all RTP 2742 streams the RTP stream depends on, when present, for which 2743 the value of the sprop-depack-buf-bytes parameter is 2744 smaller than or equal to this parameter. 2746 When not present, the value of depack-buf-cap is inferred 2747 to be equal to 4294967295. The value of depack-buf-cap 2748 MUST be an integer in the range of 1 to 4294967295, 2749 inclusive. 2751 Informative note: depack-buf-cap indicates the maximum 2752 possible size of the de-packetization buffer of the 2753 receiver only. When network jitter can occur, an 2754 appropriately sized jitter buffer has to be available as 2755 well. 2757 sprop-segmentation-id: 2759 This parameter MAY be used to signal the segmentation tools 2760 present in the bitstream and that can be used for 2761 parallelization. The value of sprop-segmentation-id MUST 2762 be an integer in the range of 0 to 3, inclusive. When not 2763 present, the value of sprop-segmentation-id is inferred to 2764 be equal to 0. 2766 When sprop-segmentation-id is equal to 0, no information 2767 about the segmentation tools is provided. When sprop- 2768 segmentation-id is equal to 1, it indicates that slices are 2769 present in the bitstream. When sprop-segmentation-id is 2770 equal to 2, it indicates that tiles are present in the 2771 bitstream. When sprop-segmentation-id is equal to 3, it 2772 indicates that WPP is used in the bitstream. 2774 sprop-spatial-segmentation-idc: 2776 A base16 [RFC4648] representation of the syntax element 2777 min_spatial_segmentation_idc as specified in [HEVC]. This 2778 parameter MAY be used to describe parallelization 2779 capabilities of the bitstream. 2781 dec-parallel-cap: 2783 This parameter MAY be used to indicate the decoder's 2784 additional decoding capabilities given the presence of 2785 tools enabling parallel decoding, such as slices, tiles, 2786 and WPP, in the bitstream. The decoding capability of the 2787 decoder may vary with the setting of the parallel decoding 2788 tools present in the bitstream, e.g. the size of the tiles 2789 that are present in a bitstream. Therefore, multiple 2790 capability points may be provided, each indicating the 2791 minimum required decoding capability that is associated 2792 with a parallelism requirement, which is a requirement on 2793 the bitstream that enables parallel decoding. 2795 Each capability point is defined as a combination of 1) a 2796 parallelism requirement, 2) a profile (determined by 2797 profile-space and profile-id), 3) a highest level, and 4) a 2798 maximum processing rate, a maximum picture size, and a 2799 maximum video bitrate that may be equal to or greater than 2800 that determined by the highest level. The parameter's 2801 syntax in ABNF [RFC5234] is as follows: 2803 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2804 cap-point) "}" 2806 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2807 cap-parameter) 2809 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2811 cap-parameter = tier-flag / level-id / max-lsr 2812 / max-lps / max-br 2814 tier-flag = "tier-flag" EQ ("0" / "1") 2816 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2818 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2819 18,446,744,073,709,551,615) 2821 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2823 max-br = "max-br" EQ 1*20DIGIT ; (0- 2824 18,446,744,073,709,551,615) 2826 EQ = "=" 2828 The set of capability points expressed by the dec-parallel- 2829 cap parameter is enclosed in a pair of curly braces ("{}"). 2830 Each set of two consecutive capability points is separated 2831 by a comma (','). Within each capability point, each set 2832 of two consecutive parameters, and when present, their 2833 values, is separated by a semicolon (';'). 2835 The profile of all capability points is determined by 2836 profile-space and profile-id that are outside the dec- 2837 parallel-cap parameter. 2839 Each capability point starts with an indication of the 2840 parallelism requirement, which consists of a parallel tool 2841 type, which may be equal to 'w' or 't', and a decimal value 2842 of the spatial-seg-idc parameter. When the type is 'w', 2843 the capability point is valid only for H.265 bitstreams 2844 with WPP in use, i.e. entropy_coding_sync_enabled_flag 2845 equal to 1. When the type is 't', the capability point is 2846 valid only for H.265 bitstreams with WPP not in use (i.e. 2847 entropy_coding_sync_enabled_flag equal to 0). The 2848 capability-point is valid only for H.265 bitstreams with 2849 min_spatial_segmentation_idc equal to or greater than 2850 spatial-seg-idc. 2852 After the parallelism requirement indication, each 2853 capability point continues with one or more pairs of 2854 parameter and value in any order for any of the following 2855 parameters: 2857 o tier-flag 2858 o level-id 2859 o max-lsr 2860 o max-lps 2861 o max-br 2863 At most one occurrence of each of the above five parameters 2864 is allowed within each capability point. 2866 The values of dec-parallel-cap.tier-flag and dec-parallel- 2867 cap.level-id for a capability point indicate the highest 2868 level of the capability point. The values of dec-parallel- 2869 cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- 2870 cap.max-br for a capability point indicate the maximum 2871 processing rate in units of luma samples per second, the 2872 maximum picture size in units of luma samples, and the 2873 maximum video bitrate (in units of CpbBrVclFactor bits per 2874 second for the VCL HRD parameters and in units of 2875 CpbBrNalFactor bits per second for the NAL HRD parameters 2876 where CpbBrVclFactor and CpbBrNalFactor are defined in 2877 Section A.4 of [HEVC]). 2879 When not present, the value of dec-parallel-cap.tier-flag 2880 is inferred to be equal to the value of tier-flag outside 2881 the dec-parallel-cap parameter. When not present, the 2882 value of dec-parallel-cap.level-id is inferred to be equal 2883 to the value of max-recv-level-id outside the dec-parallel- 2884 cap parameter. When not present, the value of dec- 2885 parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- 2886 parallel-cap.max-br is inferred to be equal to the value of 2887 max-lsr, max-lps, or max-br, respectively, outside the dec- 2888 parallel-cap parameter. 2890 The general decoding capability, expressed by the set of 2891 parameters outside of dec-parallel-cap, is defined as the 2892 capability point that is determined by the following 2893 combination of parameters: 1) the parallelism requirement 2894 corresponding to the value of sprop-segmentation-id equal 2895 to 0 for a bitstream, 2) the profile determined by profile- 2896 space, profile-id, profile-compatibility-indicator, and 2897 interop-constraints, 3) the tier and the highest level 2898 determined by tier-flag and max-recv-level-id, and 4) the 2899 maximum processing rate, the maximum picture size, and the 2900 maximum video bitrate determined by the highest level. The 2901 general decoding capability MUST NOT be included as one of 2902 the set of capability points in the dec-parallel-cap 2903 parameter. 2905 For example, the following parameters express the general 2906 decoding capability of 720p30 (Level 3.1) plus an 2907 additional decoding capability of 1080p30 (Level 4) given 2908 that the spatially largest tile or slice used in the 2909 bitstream is equal to or less than 1/3 of the picture size: 2911 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- 2912 id=120} 2914 For another example, the following parameters express an 2915 additional decoding capability of 1080p30, using dec- 2916 parallel-cap.max-lsr and dec-parallel-cap.max-lps, given 2917 that WPP is used in the bitstream: 2919 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2920 max-lsr=62668800;max-lps=2088960} 2922 Informative note: When min_spatial_segmentation_idc is 2923 present in a bitstream and WPP is not used, [HEVC] 2924 specifies that there is no slice or no tile in the 2925 bitstream containing more than 4 * PicSizeInSamplesY / 2926 ( min_spatial_segmentation_idc + 4 ) luma samples. 2928 include-dph: 2930 This parameter is used to indicate the capability and 2931 preference to utilize or include decoded picture hash (DPH) 2932 SEI messages (See Section D.3.19 of [HEVC]) in the 2933 bitstream. DPH SEI messages can be used to detect picture 2934 corruption so the receiver can request picture repair, see 2935 Section 8. The value is a comma separated list of hash 2936 types that is supported or requested to be used, each hash 2937 type provided as an unsigned integer value (0-255), with 2938 the hash types listed from most preferred to the least 2939 preferred. Example: "include-dph=0,2", which indicates the 2940 capability for MD5 (most preferred) and Checksum (less 2941 preferred). If the parameter is not included or the value 2942 contains no hash types, then no capability to utilize DPH 2943 SEI messages is assumed. Note that DPH SEI messages MAY 2944 still be included in the bitstream even when there is no 2945 declaration of capability to use them, as in general SEI 2946 messages do not affect the normative decoding process and 2947 decoders are allowed to ignore SEI messages. 2949 Encoding considerations: 2951 This type is only defined for transfer via RTP (RFC 3550). 2953 Security considerations: 2955 See Section 9 of RFC XXXX. 2957 Public specification: 2959 Please refer to Section 13 of RFC XXXX. 2961 Additional information: None 2963 File extensions: none 2965 Macintosh file type code: none 2967 Object identifier or OID: none 2969 Person & email address to contact for further information: 2971 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 2973 Intended usage: COMMON 2975 Author: See Section 14 of RFC XXXX. 2977 Change controller: 2979 IETF Audio/Video Transport Payloads working group delegated 2980 from the IESG. 2982 7.2 SDP Parameters 2984 The receiver MUST ignore any parameter unspecified in this memo. 2986 7.2.1 Mapping of Payload Type Parameters to SDP 2988 The media type video/H265 string is mapped to fields in the 2989 Session Description Protocol (SDP) [RFC4566] as follows: 2991 o The media name in the "m=" line of SDP MUST be video. 2993 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 2994 (the media subtype). 2996 o The clock rate in the "a=rtpmap" line MUST be 90000. 2998 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 2999 flag", "level-id", "interop-constraints", "profile- 3000 compatibility-indicator", "sprop-sub-layer-id", "recv-sub- 3001 layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- 3002 lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", 3003 "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", 3004 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop- 3005 segmentation-id", "sprop-spatial-segmentation-idc", "dec- 3006 parallel-cap", and "include-dph", when present, MUST be 3007 included in the "a=fmtp" line of SDP. This parameter is 3008 expressed as a media type string, in the form of a semicolon 3009 separated list of parameter=value pairs. 3011 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 3012 pps", when present, MUST be included in the "a=fmtp" line of 3013 SDP or conveyed using the "fmtp" source attribute as specified 3014 in section 6.3 of [RFC5576]. For a particular media format 3015 (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- 3016 pps" MUST NOT be both included in the "a=fmtp" line of SDP and 3017 conveyed using the "fmtp" source attribute. When included in 3018 the "a=fmtp" line of SDP, these parameters are expressed as a 3019 media type string, in the form of a semicolon separated list 3020 of parameter=value pairs. When conveyed in the "a=fmtp" line 3021 of SDP for a particular payload type, the parameters "sprop- 3022 vps", "sprop-sps", and "sprop-pps" MUST be applied to each 3023 SSRC with the payload type. When conveyed using the "fmtp" 3024 source attribute, these parameters are only associated with 3025 the given source and payload type as parts of the "fmtp" 3026 source attribute. 3028 Informative note: Conveyance of "sprop-vps", "sprop-sps", 3029 and "sprop-pps" using the "fmtp" source attribute allows 3030 for out-of-band transport of parameter sets in topologies 3031 like Topo-Video-switch-MCU as specified in [RFC5117]. 3033 An example of media representation in SDP is as follows: 3035 m=video 49170 RTP/AVP 98 3036 a=rtpmap:98 H265/90000 3037 a=fmtp:98 profile-id=1; 3038 sprop-vps=