idnits 2.17.1 draft-ietf-payload-rtp-h265-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1616 has weird spacing: '... This memo ...' == Line 1621 has weird spacing: '... signal two ...' -- The document date (June 2, 2015) is 3249 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1753 == Unused Reference: 'RFC6190' is defined on line 3845, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-mmusic-sdp-bundle-negotiation' is defined on line 3882, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-05 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-02 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-02 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: December 2015 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 June 2, 2015 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-11.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization 22 of one or more Network Abstraction Layer (NAL) units in each RTP 23 packet payload, as well as fragmentation of a NAL unit into 24 multiple RTP packets. Furthermore, it supports transmission of 25 an HEVC bitstream over a single as well as multiple RTP streams. 26 When multiple RTP streams are used, a single or multiple 27 transports may be utilized. The payload format has wide 28 applicability in videoconferencing, Internet video streaming, and 29 high bit-rate entertainment-quality video, among others. 31 Status of this Memo 33 This Internet-Draft is submitted to IETF in full conformance with 34 the provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six 42 months and may be updated, replaced, or obsoleted by other 43 documents at any time. It is inappropriate to use Internet- 44 Drafts as reference material or to cite them other than as "work 45 in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on November 2, 2015. 55 Copyright and License Notice 57 Copyright (c) 2015 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with 65 respect to this document. Code Components extracted from this 66 document must include Simplified BSD License text as described in 67 Section 4.e of the Trust Legal Provisions and are provided 68 without warranty as described in the Simplified BSD License. 70 Table of Contents 72 Abstract..........................................................1 73 Status of this Memo...............................................1 74 Table of Contents.................................................3 75 1 Introduction....................................................5 76 1.1 Overview of the HEVC Codec.................................5 77 1.1.1 Coding-Tool Features..................................5 78 1.1.2 Systems and Transport Interfaces......................7 79 1.1.3 Parallel Processing Support..........................14 80 1.1.4 NAL Unit Header......................................16 81 1.2 Overview of the Payload Format............................18 82 2 Conventions....................................................18 83 3 Definitions and Abbreviations..................................19 84 3.1 Definitions...............................................19 85 3.1.1 Definitions from the HEVC Specification..............19 86 3.1.2 Definitions Specific to This Memo....................21 87 3.2 Abbreviations.............................................23 88 4 RTP Payload Format.............................................25 89 4.1 RTP Header Usage..........................................25 90 4.2 Used to identify the source of the RTP packets. When 91 using SRST, by definition a single SSRC is used for all parts 92 of a single bitstream. In MRST or MRMT, different SSRCs are 93 used for each RTP stream containing a subset of the sub-layers 94 of the single (temporally scalable) bitstream. A receiver is 95 required to correctly associate the set of SSRCs that are 96 included parts of the same bitstream. Payload Header Usage....27 97 4.3 Transmission Modes........................................27 98 4.4 Payload Structures........................................29 99 4.4.1 Single NAL Unit Packets..............................29 100 4.4.2 Aggregation Packets (APs)............................30 101 4.4.3 Fragmentation Units (FUs)............................35 102 4.4.4 PACI packets.........................................38 103 4.4.4.1 Reasons for the PACI rules (informative)........41 104 4.4.4.2 PACI extensions (Informative)...................42 105 4.5 Temporal Scalability Control Information..................43 106 4.6 Decoding Order Number.....................................45 107 5 Packetization Rules............................................47 108 6 De-packetization Process.......................................48 109 7 Payload Format Parameters......................................50 110 7.1 Media Type Registration...................................51 111 7.2 SDP Parameters............................................76 112 7.2.1 Mapping of Payload Type Parameters to SDP............77 113 7.2.2 Usage with SDP Offer/Answer Model....................78 114 7.2.3 Usage in Declarative Session Descriptions............87 115 7.2.4 Parameter Sets Considerations........................88 116 7.2.5 Dependency Signaling in Multi-Stream Mode............89 117 8 Use with Feedback Messages.....................................89 118 8.1 Picture Loss Indication (PLI).............................89 119 8.2 Slice Loss Indication (SLI)...............................90 120 8.3 Reference Picture Selection Indication (RPSI).............91 121 8.4 Full Intra Request (FIR)..................................92 122 9 Security Considerations........................................92 123 10 Congestion Control............................................93 124 11 IANA Consideration............................................95 125 12 Acknowledgements..............................................95 126 13 References....................................................95 127 13.1 Normative References.....................................95 128 13.2 Informative References...................................97 129 14 Authors' Addresses............................................99 131 1 Introduction 133 1.1 Overview of the HEVC Codec 135 High Efficiency Video Coding [HEVC], formally known as ITU-T 136 Recommendation H.265 and ISO/IEC International Standard 23008-2 137 was ratified by ITU-T in April 2013 and reportedly provides 138 significant coding efficiency gains over H.264 [H.264]. 140 As both H.264 [H.264] and its RTP payload format [RFC6184] are 141 widely deployed and generally known in the relevant implementer 142 communities, frequently only the differences between those two 143 specifications are highlighted in non-normative, explanatory 144 parts of this memo. Basic familiarity with both specifications 145 is assumed for those parts. However, the normative parts of this 146 memo do not require study of H.264 or its RTP payload format. 148 H.264 and HEVC share a similar hybrid video codec design. 149 Conceptually, both technologies include a video coding layer 150 (VCL), which is often used to refer to the coding-tool features, 151 and a network abstraction layer (NAL), which is often used to 152 refer to the systems and transport interface aspects of the 153 codecs. 155 1.1.1 Coding-Tool Features 157 Similarly to earlier hybrid-video-coding-based standards, 158 including H.264, the following basic video coding design is 159 employed by HEVC. A prediction signal is first formed either by 160 intra or motion compensated prediction, and the residual (the 161 difference between the original and the prediction) is then 162 coded. The gains in coding efficiency are achieved by 163 redesigning and improving almost all parts of the codec over 164 earlier designs. In addition, HEVC includes several tools to 165 make the implementation on parallel architectures easier. Below 166 is a summary of HEVC coding-tool features. 168 Quad-tree block and transform structure 170 One of the major tools that contribute significantly to the 171 coding efficiency of HEVC is the usage of flexible coding blocks 172 and transforms, which are defined in a hierarchical quad-tree 173 manner. Unlike H.264, where the basic coding block is a 174 macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit 175 (CTU) of a maximum size of 64x64. Each CTU can be divided into 176 smaller units in a hierarchical quad-tree manner and can 177 represent smaller blocks down to size 4x4. Similarly, the 178 transforms used in HEVC can have different sizes, starting from 179 4x4 and going up to 32x32. Utilizing large blocks and transforms 180 contribute to the major gain of HEVC, especially at high 181 resolutions. 183 Entropy coding 185 HEVC uses a single entropy coding engine, which is based on 186 Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], 187 whereas H.264 uses two distinct entropy coding engines. CABAC in 188 HEVC shares many similarities with CABAC of H.264, but contains 189 several improvements. Those include improvements in coding 190 efficiency and lowered implementation complexity, especially for 191 parallel architectures. 193 In-loop filtering 195 H.264 includes an in-loop adaptive deblocking filter, where the 196 blocking artifacts around the transform edges in the 197 reconstructed picture are smoothed to improve the picture quality 198 and compression efficiency. In HEVC, a similar deblocking filter 199 is employed but with somewhat lower complexity. In addition, 200 pictures undergo a subsequent filtering operation called Sample 201 Adaptive Offset (SAO), which is a new design element in HEVC. 202 SAO basically adds a pixel-level offset in an adaptive manner and 203 usually acts as a de-ringing filter. It is observed that SAO 204 improves the picture quality, especially around sharp edges 205 contributing substantially to visual quality improvements of 206 HEVC. 208 Motion prediction and coding 210 There have been a number of improvements in this area that are 211 summarized as follows. The first category is motion merge and 212 advanced motion vector prediction (AMVP) modes. The motion 213 information of a prediction block can be inferred from the 214 spatially or temporally neighboring blocks. This is similar to 215 the DIRECT mode in H.264 but includes new aspects to incorporate 216 the flexible quad-tree structure and methods to improve the 217 parallel implementations. In addition, the motion vector 218 predictor can be signaled for improved efficiency. The second 219 category is high-precision interpolation. The interpolation 220 filter length is increased to 8-tap from 6-tap, which improves 221 the coding efficiency but also comes with increased complexity. 222 In addition, the interpolation filter is defined with higher 223 precision without any intermediate rounding operations to further 224 improve the coding efficiency. 226 Intra prediction and intra coding 228 Compared to 8 intra prediction modes in H.264, HEVC supports 229 angular intra prediction with 33 directions. This increased 230 flexibility improves both objective coding efficiency and visual 231 quality as the edges can be better predicted and ringing 232 artifacts around the edges can be reduced. In addition, the 233 reference samples are adaptively smoothed based on the prediction 234 direction. To avoid contouring artifacts a new interpolative 235 prediction generation is included to improve the visual quality. 236 Furthermore, discrete sine transform (DST) is utilized instead of 237 traditional discrete cosine transform (DCT) for 4x4 intra 238 transform blocks. 240 Other coding-tool features 242 HEVC includes some tools for lossless coding and efficient screen 243 content coding, such as skipping the transform for certain 244 blocks. These tools are particularly useful for example when 245 streaming the user-interface of a mobile device to a large 246 display. 248 1.1.2 Systems and Transport Interfaces 250 HEVC inherited the basic systems and transport interfaces 251 designs, such as the NAL-unit-based syntax structure, the 252 hierarchical syntax and data unit structure from sequence-level 253 parameter sets, multi-picture-level or picture-level parameter 254 sets, slice-level header parameters, lower-level parameters, the 255 supplemental enhancement information (SEI) message mechanism, the 256 hypothetical reference decoder (HRD) based video buffering model, 257 and so on. In the following, a list of differences in these 258 aspects compared to H.264 is summarized. 260 Video parameter set 262 A new type of parameter set, called video parameter set (VPS), 263 was introduced. For the first (2013) version of [HEVC], the 264 video parameter set NAL unit is required to be available prior to 265 its activation, while the information contained in the video 266 parameter set is not necessary for operation of the decoding 267 process. For future HEVC extensions, such as the 3D or scalable 268 extensions, the video parameter set is expected to include 269 information necessary for operation of the decoding process, e.g. 270 decoding dependency or information for reference picture set 271 construction of enhancement layers. The VPS provides a "big 272 picture" of a bitstream, including what types of operation points 273 are provided, the profile, tier, and level of the operation 274 points, and some other high-level properties of the bitstream 275 that can be used as the basis for session negotiation and content 276 selection, etc. (see Section 7.1). 278 Profile, tier and level 280 The profile, tier and level syntax structure that can be included 281 in both VPS and sequence parameter set (SPS) includes 12 bytes of 282 data to describe the entire bitstream (including all temporally 283 scalable layers, which are referred to as sub-layers in the HEVC 284 specification), and can optionally include more profile, tier and 285 level information pertaining to individual temporally scalable 286 layers. The profile indicator indicates the "best viewed as" 287 profile when the bitstream conforms to multiple profiles, similar 288 to the major brand concept in the ISO base media file format 289 (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, 290 such as the 3GPP file format [3GPPFF]. The profile, tier and 291 level syntax structure also includes indications such as 1) 292 whether the bitstream is free of frame-packed content, 2) whether 293 the bitstream is free of interlaced source content, and 3) 294 whether the bitstream is free of field pictures. When the answer 295 is yes for both 2) and 3), the bitstream contains only frame 296 pictures of progressive source. Based on these indications, 297 clients/players without support of post-processing 298 functionalities for handling of frame-packed, interlaced source 299 content or field pictures can reject those bitstreams that 300 contain such pictures. 302 Bitstream and elementary stream 304 HEVC includes a definition of an elementary stream, which is new 305 compared to H.264. An elementary stream consists of a sequence 306 of one or more bitstreams. An elementary stream that consists of 307 two or more bitstreams has typically been formed by splicing 308 together two or more bitstreams (or parts thereof). When an 309 elementary stream contains more than one bitstream, the last NAL 310 unit of the last access unit of a bitstream (except the last 311 bitstream in the elementary stream) must contain an end of 312 bitstream NAL unit and the first access unit of the subsequent 313 bitstream must be an intra random access point (IRAP) access 314 unit. This IRAP access unit may be a clean random access (CRA), 315 broken link access (BLA), or instantaneous decoding refresh (IDR) 316 access unit. 318 Random access support 320 HEVC includes signaling in the NAL unit header, through NAL unit 321 types, of IRAP pictures beyond IDR pictures. Three types of IRAP 322 pictures, namely IDR, CRA and BLA pictures are supported, wherein 323 IDR pictures are conventionally referred to as closed group-of- 324 pictures (closed-GOP) random access points, and CRA and BLA 325 pictures are those conventionally referred to as open-GOP random 326 access points. BLA pictures usually originate from splicing of 327 two bitstreams or part thereof at a CRA picture, e.g. during 328 stream switching. To enable better systems usage of IRAP 329 pictures, altogether six different NAL units are defined to 330 signal the properties of the IRAP pictures, which can be used to 331 better match the stream access point (SAP) types as defined in 332 the ISOBMFF [ISOBMFF], which are utilized for random access 333 support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. 334 Pictures following an IRAP picture in decoding order and 335 preceding the IRAP picture in output order are referred to as 336 leading pictures associated with the IRAP picture. There are two 337 types of leading pictures, namely random access decodable leading 338 (RADL) pictures and random access skipped leading (RASL) 339 pictures. RADL pictures are decodable when the decoding started 340 at the associated IRAP picture, and RASL pictures are not 341 decodable when the decoding started at the associated IRAP 342 picture and are usually discarded. HEVC provides mechanisms to 343 enable the specification of conformance of bitstreams with RASL 344 pictures being discarded, thus to provide a standard-compliant 345 way to enable systems components to discard RASL pictures when 346 needed. 348 Temporal scalability support 350 HEVC includes an improved support of temporal scalability, by 351 inclusion of the signaling of TemporalId in the NAL unit header, 352 the restriction that pictures of a particular temporal sub-layer 353 cannot be used for inter prediction reference by pictures of a 354 lower temporal sub-layer, the sub-bitstream extraction process, 355 and the requirement that each sub-bitstream extraction output be 356 a conforming bitstream. Media-aware network elements (MANEs) can 357 utilize the TemporalId in the NAL unit header for stream 358 adaptation purposes based on temporal scalability. 360 Temporal sub-layer switching support 362 HEVC specifies, through NAL unit types present in the NAL unit 363 header, the signaling of temporal sub-layer access (TSA) and 364 stepwise temporal sub-layer access (STSA). A TSA picture and 365 pictures following the TSA picture in decoding order do not use 366 pictures prior to the TSA picture in decoding order with 367 TemporalId greater than or equal to that of the TSA picture for 368 inter prediction reference. A TSA picture enables up-switching, 369 at the TSA picture, to the sub-layer containing the TSA picture 370 or any higher sub-layer, from the immediately lower sub-layer. 371 An STSA picture does not use pictures with the same TemporalId as 372 the STSA picture for inter prediction reference. Pictures 373 following an STSA picture in decoding order with the same 374 TemporalId as the STSA picture do not use pictures prior to the 375 STSA picture in decoding order with the same TemporalId as the 376 STSA picture for inter prediction reference. An STSA picture 377 enables up-switching, at the STSA picture, to the sub-layer 378 containing the STSA picture, from the immediately lower sub- 379 layer. 381 Sub-layer reference or non-reference pictures 383 The concept and signaling of reference/non-reference pictures in 384 HEVC are different from H.264. In H.264, if a picture may be 385 used by any other picture for inter prediction reference, it is a 386 reference picture; otherwise it is a non-reference picture, and 387 this is signaled by two bits in the NAL unit header. In HEVC, a 388 picture is called a reference picture only when it is marked as 389 "used for reference". In addition, the concept of sub-layer 390 reference picture was introduced. If a picture may be used by 391 another other picture with the same TemporalId for inter 392 prediction reference, it is a sub-layer reference picture; 393 otherwise it is a sub-layer non-reference picture. Whether a 394 picture is a sub-layer reference picture or sub-layer non- 395 reference picture is signaled through NAL unit type values. 397 Extensibility 399 Besides the TemporalId in the NAL unit header, HEVC also includes 400 the signaling of a six-bit layer ID in the NAL unit header, which 401 must be equal to 0 for a single-layer bitstream. Extension 402 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, 403 slice headers, and so on. All these extension mechanisms enable 404 future extensions in a backward compatible manner, such that 405 bitstreams encoded according to potential future HEVC extensions 406 can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) 407 and the then-legacy decoders can decode and output the base layer 408 bitstream. 410 Bitstream extraction 412 HEVC includes a bitstream extraction process as an integral part 413 of the overall decoding process, as well as specification of the 414 use of the bitstream extraction process in description of 415 bitstream conformance tests as part of the hypothetical reference 416 decoder (HRD) specification. 418 Reference picture management 420 The reference picture management of HEVC, including reference 421 picture marking and removal from the decoded picture buffer (DPB) 422 as well as reference picture list construction (RPLC), differs 423 from that of H.264. Instead of the sliding window plus adaptive 424 memory management control operation (MMCO) based reference 425 picture marking mechanism in H.264, HEVC specifies a reference 426 picture set (RPS) based reference picture management and marking 427 mechanism, and the RPLC is consequently based on the RPS 428 mechanism. A reference picture set consists of a set of 429 reference pictures associated with a picture, consisting of all 430 reference pictures that are prior to the associated picture in 431 decoding order, that may be used for inter prediction of the 432 associated picture or any picture following the associated 433 picture in decoding order. The reference picture set consists of 434 five lists of reference pictures; RefPicSetStCurrBefore, 435 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 436 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 437 RefPicSetLtCurr contain all reference pictures that may be used 438 in inter prediction of the current picture and that may be used 439 in inter prediction of one or more of the pictures following the 440 current picture in decoding order. RefPicSetStFoll and 441 RefPicSetLtFoll consist of all reference pictures that are not 442 used in inter prediction of the current picture but may be used 443 in inter prediction of one or more of the pictures following the 444 current picture in decoding order. RPS provides an "intra-coded" 445 signaling of the DPB status, instead of an "inter-coded" 446 signaling, mainly for improved error resilience. The RPLC 447 process in HEVC is based on the RPS, by signaling an index to an 448 RPS subset for each reference index; this process is simpler than 449 the RPLC process in H.264. 451 Ultra low delay support 453 HEVC specifies a sub-picture-level HRD operation, for support of 454 the so-called ultra-low delay. The mechanism specifies a 455 standard-compliant way to enable delay reduction below one 456 picture interval. Sub-picture-level coded picture buffer (CPB) 457 and DPB parameters may be signaled, and utilization of these 458 information for the derivation of CPB timing (wherein the CPB 459 removal time corresponds to decoding time) and DPB output timing 460 (display time) is specified. Decoders are allowed to operate the 461 HRD at the conventional access-unit-level, even when the sub- 462 picture-level HRD parameters are present. 464 New SEI messages 466 HEVC inherits many H.264 SEI messages with changes in syntax 467 and/or semantics making them applicable to HEVC. Additionally, 468 there are a few new SEI messages reviewed briefly in the 469 following paragraphs. 471 The display orientation SEI message informs the decoder of a 472 transformation that is recommended to be applied to the cropped 473 decoded picture prior to display, such that the pictures can be 474 properly displayed, e.g. in an upside-up manner. 476 The structure of pictures SEI message provides information on the 477 NAL unit types, picture order count values, and prediction 478 dependencies of a sequence of pictures. The SEI message can be 479 used for example for concluding what impact a lost picture has on 480 other pictures. 482 The decoded picture hash SEI message provides a checksum derived 483 from the sample values of a decoded picture. It can be used for 484 detecting whether a picture was correctly received and decoded. 486 The active parameter sets SEI message includes the IDs of the 487 active video parameter set and the active sequence parameter set 488 and can be used to activate VPSs and SPSs. In addition, the SEI 489 message includes the following indications: 1) An indication of 490 whether "full random accessibility" is supported (when supported, 491 all parameter sets needed for decoding of the remaining of the 492 bitstream when random accessing from the beginning of the current 493 CVS by completely discarding all access units earlier in decoding 494 order are present in the remaining bitstream and all coded 495 pictures in the remaining bitstream can be correctly decoded); 2) 496 An indication of whether there is no parameter set within the 497 current CVS that updates another parameter set of the same type 498 preceding in decoding order. An update of a parameter set refers 499 to the use of the same parameter set ID but with some other 500 parameters changed. If this property is true for all CVSs in the 501 bitstream, then all parameter sets can be sent out-of-band before 502 session start. 504 The decoding unit information SEI message provides coded picture 505 buffer removal delay information for a decoding unit. The 506 message can be used in very-low-delay buffering operations. 508 The region refresh information SEI message can be used together 509 with the recovery point SEI message (present in both H.264 and 510 HEVC) for improved support of gradual decoding refresh. This 511 supports random access from inter-coded pictures, wherein 512 complete pictures can be correctly decoded or recovered after an 513 indicated number of pictures in output/display order. 515 1.1.3 Parallel Processing Support 517 The reportedly significantly higher encoding computational demand 518 of HEVC over H.264, in conjunction with the ever increasing video 519 resolution (both spatially and temporally) required by the 520 market, led to the adoption of VCL coding tools specifically 521 targeted to allow for parallelization on the sub-picture level. 522 That is, parallelization occurs, at the minimum, at the 523 granularity of an integer number of CTUs. The targets for this 524 type of high-level parallelization are multicore CPUs and DSPs as 525 well as multiprocessor systems. In a system design, to be 526 useful, these tools require signaling support, which is provided 527 in Section 7 of this memo. This section provides a brief 528 overview of the tools available in [HEVC]. 530 Many of the tools incorporated in HEVC were designed keeping in 531 mind the potential parallel implementations in multi-core/multi- 532 processor architectures. Specifically, for parallelization, four 533 picture partition strategies are available. 535 Slices are segments of the bitstream that can be reconstructed 536 independently from other slices within the same picture (though 537 there may still be interdependencies through loop filtering 538 operations). Slices are the only tool that can be used for 539 parallelization that is also available, in virtually identical 540 form, in H.264. Slices based parallelization does not require 541 much inter-processor or inter-core communication (except for 542 inter-processor or inter-core data sharing for motion 543 compensation when decoding a predictively coded picture, which is 544 typically much heavier than inter-processor or inter-core data 545 sharing due to in-picture prediction), as slices are designed to 546 be independently decodable. However, for the same reason, slices 547 can require some coding overhead. Further, slices (in contrast 548 to some of the other tools mentioned below) also serve as the key 549 mechanism for bitstream partitioning to match Maximum Transfer 550 Unit (MTU) size requirements, due to the in-picture independence 551 of slices and the fact that each regular slice is encapsulated in 552 its own NAL unit. In many cases, the goal of parallelization and 553 the goal of MTU size matching can place contradicting demands to 554 the slice layout in a picture. The realization of this situation 555 led to the development of the more advanced tools mentioned 556 below. 558 Dependent slice segments allow for fragmentation of a coded slice 559 into fragments at CTU boundaries without breaking any in-picture 560 prediction mechanism. They are complementary to the 561 fragmentation mechanism described in this memo in that they need 562 the cooperation of the encoder. As a dependent slice segment 563 necessarily contains an integer number of CTUs, a decoder using 564 multiple cores operating on CTUs can process a dependent slice 565 segment without communicating parts of the slice segment's 566 bitstream to other cores. Fragmentation, as specified in this 567 memo, in contrast, does not guarantee that a fragment contains an 568 integer number of CTUs. 570 In wavefront parallel processing (WPP), the picture is 571 partitioned into rows of CTUs. Entropy decoding and prediction 572 are allowed to use data from CTUs in other partitions. Parallel 573 processing is possible through parallel decoding of CTU rows, 574 where the start of the decoding of a row is delayed by two CTUs, 575 so to ensure that data related to a CTU above and to the right of 576 the subject CTU is available before the subject CTU is being 577 decoded. Using this staggered start (which appears like a 578 wavefront when represented graphically), parallelization is 579 possible with up to as many processors/cores as the picture 580 contains CTU rows. 582 Because in-picture prediction between neighboring CTU rows within 583 a picture is allowed, the required inter-processor/inter-core 584 communication to enable in-picture prediction can be substantial. 585 The WPP partitioning does not result in the creation of more NAL 586 units compared to when it is not applied, thus WPP cannot be used 587 for MTU size matching, though slices can be used in combination 588 for that purpose. 590 Tiles define horizontal and vertical boundaries that partition a 591 picture into tile columns and rows. The scan order of CTUs is 592 changed to be local within a tile (in the order of a CTU raster 593 scan of a tile), before decoding the top-left CTU of the next 594 tile in the order of tile raster scan of a picture. Similar to 595 slices, tiles break in-picture prediction dependencies (including 596 entropy decoding dependencies). However, they do not need to be 597 included into individual NAL units (same as WPP in this regard), 598 hence tiles cannot be used for MTU size matching, though slices 599 can be used in combination for that purpose. Each tile can be 600 processed by one processor/core, and the inter-processor/inter- 601 core communication required for in-picture prediction between 602 processing units decoding neighboring tiles is limited to 603 conveying the shared slice header in cases a slice is spanning 604 more than one tile, and loop filtering related sharing of 605 reconstructed samples and metadata. Insofar, tiles are less 606 demanding in terms of inter-processor communication bandwidth 607 compared to WPP due to the in-picture independence between two 608 neighboring partitions. 610 1.1.4 NAL Unit Header 612 HEVC maintains the NAL unit concept of H.264 with modifications. 613 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 614 payload of a NAL unit refers to the NAL unit excluding the NAL 615 unit header. 617 +---------------+---------------+ 618 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 620 |F| Type | LayerId | TID | 621 +-------------+-----------------+ 623 Figure 1 The structure of HEVC NAL unit header 625 The semantics of the fields in the NAL unit header are as 626 specified in [HEVC] and described briefly below for convenience. 627 In addition to the name and size of each field, the corresponding 628 syntax element name in [HEVC] is also provided. 630 F: 1 bit 631 forbidden_zero_bit. Required to be zero in [HEVC]. Note that 632 the inclusion of this bit in the NAL unit header was to enable 633 transport of HEVC video over MPEG-2 transport systems 634 (avoidance of start code emulations) [MPEG2S]. In the context 635 of this memo, the value 1 may be used to indicate a syntax 636 violation, e.g. for a NAL unit resulted from aggregating a 637 number of fragmented units of a NAL unit but missing the last 638 fragment, as described in Section 4.4.3. 640 Type: 6 bits 641 nal_unit_type. This field specifies the NAL unit type as 642 defined in Table 7-1 of [HEVC]. If the most significant bit 643 of this field of a NAL unit is equal to 0 (i.e. the value of 644 this field is less than 32), the NAL unit is a VCL NAL unit. 645 Otherwise, the NAL unit is a non-VCL NAL unit. For a 646 reference of all currently defined NAL unit types and their 647 semantics, please refer to Section 7.4.1 in [HEVC]. 649 LayerId: 6 bits 650 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 651 anticipated that in future scalable or 3D video coding 652 extensions of this specification, this syntax element will be 653 used to identify additional layers that may be present in the 654 CVS, wherein a layer may be, e.g. a spatial scalable layer, a 655 quality scalable layer, a texture view, or a depth view. 657 TID: 3 bits 658 nuh_temporal_id_plus1. This field specifies the temporal 659 identifier of the NAL unit plus 1. The value of TemporalId is 660 equal to TID minus 1. A TID value of 0 is illegal to ensure 661 that there is at least one bit in the NAL unit header equal to 662 1, so to enable independent considerations of start code 663 emulations in the NAL unit header and in the NAL unit payload 664 data. 666 1.2 Overview of the Payload Format 668 This payload format defines the following processes required for 669 transport of HEVC coded data over RTP [RFC3550]: 671 o Usage of RTP header with this payload format 673 o Packetization of HEVC coded NAL units into RTP packets using 674 three types of payload structures, namely single NAL unit 675 packet, aggregation packet, and fragment unit 677 o Transmission of HEVC NAL units of the same bitstream within a 678 single RTP stream or multiple RTP streams (within one or more 679 RTP sessions), where within an RTP stream transmission of NAL 680 units may be either non-interleaved (i.e. the transmission 681 order of NAL units is the same as their decoding order) or 682 interleaved (i.e. the transmission order of NAL units is 683 different from their decoding order) 685 o Media type parameters to be used with the Session Description 686 Protocol (SDP) [RFC4566] 688 o A payload header extension mechanism and data structures for 689 enhanced support of temporal scalability based on that 690 extension mechanism. 692 2 Conventions 694 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 695 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 696 "OPTIONAL" in this document are to be interpreted as described in 697 BCP 14, RFC 2119 [RFC2119]. 699 In this document, these key words will appear with that 700 interpretation only when in ALL CAPS. Lower case uses of these 701 words are not to be interpreted as carrying the RFC 2119 702 significance. 704 This specification uses the notion of setting and clearing a bit 705 when bit fields are handled. Setting a bit is the same as 706 assigning that bit the value of 1 (On). Clearing a bit is the 707 same as assigning that bit the value of 0 (Off). 709 3 Definitions and Abbreviations 711 3.1 Definitions 713 This document uses the terms and definitions of [HEVC]. Section 714 3.1.1 lists relevant definitions copied from [HEVC] for 715 convenience. Section 3.1.2 provides definitions specific to this 716 memo. 718 3.1.1 Definitions from the HEVC Specification 720 access unit: A set of NAL units that are associated with each 721 other according to a specified classification rule, are 722 consecutive in decoding order, and contain exactly one coded 723 picture. 725 BLA access unit: An access unit in which the coded picture is a 726 BLA picture. 728 BLA picture: An IRAP picture for which each VCL NAL unit has 729 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 731 coded video sequence (CVS): A sequence of access units that 732 consists, in decoding order, of an IRAP access unit with 733 NoRaslOutputFlag equal to 1, followed by zero or more access 734 units that are not IRAP access units with NoRaslOutputFlag equal 735 to 1, including all subsequent access units up to but not 736 including any subsequent access unit that is an IRAP access unit 737 with NoRaslOutputFlag equal to 1. 739 Informative note: An IRAP access unit may be an IDR access 740 unit, a BLA access unit, or a CRA access unit. The value of 741 NoRaslOutputFlag is equal to 1 for each IDR access unit, each 742 BLA access unit, and each CRA access unit that is the first 743 access unit in the bitstream in decoding order, is the first 744 access unit that follows an end of sequence NAL unit in 745 decoding order, or has HandleCraAsBlaFlag equal to 1. 747 CRA access unit: An access unit in which the coded picture is a 748 CRA picture. 750 CRA picture: A RAP picture for which each VCL NAL unit has 751 nal_unit_type equal to CRA_NUT. 753 IDR access unit: An access unit in which the coded picture is an 754 IDR picture. 756 IDR picture: A RAP picture for which each VCL NAL unit has 757 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 759 IRAP access unit: An access unit in which the coded picture is an 760 IRAP picture. 762 IRAP picture: A coded picture for which each VCL NAL unit has 763 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 764 (23), inclusive. 766 layer: A set of VCL NAL units that all have a particular value of 767 nuh_layer_id and the associated non-VCL NAL units, or one of a 768 set of syntactical structures having a hierarchical relationship. 770 operation point: bitstream created from another bitstream by 771 operation of the sub-bitstream extraction process with the 772 another bitstream, a target highest TemporalId, and a target 773 layer identifier list as inputs. 775 random access: The act of starting the decoding process for a 776 bitstream at a point other than the beginning of the bitstream. 778 sub-layer: A temporal scalable layer of a temporal scalable 779 bitstream consisting of VCL NAL units with a particular value of 780 the TemporalId variable, and the associated non-VCL NAL units. 782 sub-layer representation: A subset of the bitstream consisting of 783 NAL units of a particular sub-layer and the lower sub-layers. 785 tile: A rectangular region of coding tree blocks within a 786 particular tile column and a particular tile row in a picture. 788 tile column: A rectangular region of coding tree blocks having a 789 height equal to the height of the picture and a width specified 790 by syntax elements in the picture parameter set. 792 tile row: A rectangular region of coding tree blocks having a 793 height specified by syntax elements in the picture parameter set 794 and a width equal to the width of the picture. 796 3.1.2 Definitions Specific to This Memo 798 dependee RTP stream: An RTP stream on which another RTP stream 799 depends. All RTP streams in an MRST or MRMT except for the 800 highest RTP stream are dependee RTP streams. 802 highest RTP stream: The RTP stream on which no other RTP stream 803 depends. The RTP stream in an SRST is the highest RTP stream. 805 media aware network element (MANE): A network element, such as a 806 middlebox, selective forwarding unit, or application layer 807 gateway that is capable of parsing certain aspects of the RTP 808 payload headers or the RTP payload and reacting to their 809 contents. 811 Informative note: The concept of a MANE goes beyond normal 812 routers or gateways in that a MANE has to be aware of the 813 signaling (e.g. to learn about the payload type mappings of 814 the media streams), and in that it has to be trusted when 815 working with SRTP. The advantage of using MANEs is that they 816 allow packets to be dropped according to the needs of the 817 media coding. For example, if a MANE has to drop packets due 818 to congestion on a certain link, it can identify and remove 819 those packets whose elimination produces the least adverse 820 effect on the user experience. After dropping packets, MANEs 821 must rewrite RTCP packets to match the changes to the RTP 822 stream as specified in Section 7 of [RFC3550]. 824 Media Transport: As used in the MRST, MRMT, and SRST definitions 825 below, Media Transport denotes the transport of packets over a 826 transport association identified by a 5-tuple (source address, 827 source port, destination address, destination port, transport 828 protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- 829 grouping-taxonomy]. 831 Informative note: The term "bitstream" in this document is 832 equivalent to the term "encoded stream" in [I-D.ietf-avtext- 833 rtp-grouping-taxonomy]. 835 Multiple RTP streams on a Single Transport (MRST): Multiple RTP 836 streams carrying a single HEVC bitstream on a Single Transport. 837 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 839 Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP 840 streams carrying a single HEVC bitstream on Multiple Transports. 841 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 843 NAL unit decoding order: A NAL unit order that conforms to the 844 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 846 NAL unit output order: A NAL unit order in which NAL units of 847 different access units are in the output order of the decoded 848 pictures corresponding to the access units, as specified in 849 [HEVC], and in which NAL units within an access unit are in their 850 decoding order. 852 NAL-unit-like structure: A data structure that is similar to NAL 853 units in the sense that it also has a NAL unit header and a 854 payload, with a difference that the payload does not follow the 855 start code emulation prevention mechanism required for the NAL 856 unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples 857 NAL-unit-like structures defined in this memo are packet payloads 858 of AP, PACI, and FU packets. 860 NALU-time: The value that the RTP timestamp would have if the NAL 861 unit would be transported in its own RTP packet. 863 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within 864 the scope of this memo, one RTP stream is utilized to transport 865 one or more temporal sub-layers. 867 Single RTP stream on a Single Transport (SRST): Single RTP 868 stream carrying a single HEVC bitstream on a Single (Media) 869 Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp- 870 grouping-taxonomy]. 872 transmission order: The order of packets in ascending RTP 873 sequence number order (in modulo arithmetic). Within an 874 aggregation packet, the NAL unit transmission order is the same 875 as the order of appearance of NAL units in the packet. 877 3.2 Abbreviations 879 AP Aggregation Packet 881 BLA Broken Link Access 883 CRA Clean Random Access 885 CTB Coding Tree Block 887 CTU Coding Tree Unit 889 CVS Coded Video Sequence 891 DPH Decoded Picture Hash 893 FU Fragmentation Unit 895 HRD Hypothetical Reference Decoder 897 IDR Instantaneous Decoding Refresh 899 IRAP Intra Random Access Point 901 MANE Media Aware Network Element 902 MRMT Multiple RTP streams on Multiple Transports 904 MRST Multiple RTP streams on a Single Transport 906 MTU Maximum Transfer Unit 908 NAL Network Abstraction Layer 910 NALU Network Abstraction Layer Unit 912 PACI PAyload Content Information 914 PHES Payload Header Extension Structure 916 PPS Picture Parameter Set 918 RADL Random Access Decodable Leading (Picture) 920 RASL Random Access Skipped Leading (Picture) 922 RPS Reference Picture Set 924 SEI Supplemental Enhancement Information 926 SPS Sequence Parameter Set 928 SRST Single RTP stream on a Single Transport 930 STSA Step-wise Temporal Sub-layer Access 932 TSA Temporal Sub-layer Access 934 TSCI Temporal Scalability Control Information 936 VCL Video Coding Layer 938 VPS Video Parameter Set 940 4 RTP Payload Format 942 4.1 RTP Header Usage 944 The format of the RTP header is specified in [RFC3550] and 945 reprinted in Figure 2 for convenience. This payload format uses 946 the fields of the header in a manner consistent with that 947 specification. 949 The RTP payload (and the settings for some RTP header bits) for 950 aggregation packets and fragmentation units are specified in 951 Sections 4.4.2 and 4.4.3, respectively. 953 0 1 2 3 954 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 955 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 956 |V=2|P|X| CC |M| PT | sequence number | 957 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 958 | timestamp | 959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 960 | synchronization source (SSRC) identifier | 961 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 962 | contributing source (CSRC) identifiers | 963 | .... | 964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 966 Figure 2 RTP header according to [RFC3550] 968 The RTP header information to be set according to this RTP 969 payload format is set as follows: 971 Marker bit (M): 1 bit 973 Set for the last packet of the access unit, carried in the 974 current RTP stream. This is in line with the normal use of 975 the M bit in video formats to allow an efficient playout 976 buffer handling. When MRST or MRMT is in use, if an access 977 unit appears in multiple RTP streams, the marker bit is set on 978 each RTP stream's last packet of the access unit. 980 Informative note: The content of a NAL unit does not tell 981 whether or not the NAL unit is the last NAL unit, in 982 decoding order, of an access unit. An RTP sender 983 implementation may obtain these information from the video 984 encoder. If, however, the implementation cannot obtain 985 these information directly from the encoder, e.g. when the 986 bitstream was pre-encoded, and also there is no timestamp 987 allocated for each NAL unit, then the sender implementation 988 can inspect subsequent NAL units in decoding order to 989 determine whether or not the NAL unit is the last NAL unit 990 of an access unit as follows. A NAL unit is determined to 991 be the last NAL unit of an access unit if it is the last 992 NAL unit of the bitstream. A NAL unit naluX is also 993 determined to be the last NAL unit of an access unit if 994 both the following conditions are true: 1) the next VCL NAL 995 unit naluY in decoding order has the high-order bit of the 996 first byte after its NAL unit header equal to 1, and 2) all 997 NAL units between naluX and naluY, when present, have 998 nal_unit_type in the range of 32 to 35, inclusive, equal to 999 39, or in the ranges of 41 to 44, inclusive, or 48 to 55, 1000 inclusive. 1002 Payload type (PT): 7 bits 1004 The assignment of an RTP payload type for this new packet 1005 format is outside the scope of this document and will not be 1006 specified here. The assignment of a payload type has to be 1007 performed either through the profile used or in a dynamic way. 1009 Informative note: It is not required to use different 1010 payload type values for different RTP streams in MRST or 1011 MRMT. 1013 Sequence number (SN): 16 bits 1015 Set and used in accordance with RFC 3550 [RFC3550]. 1017 Timestamp: 32 bits 1019 The RTP timestamp is set to the sampling timestamp of the 1020 content. A 90 kHz clock rate MUST be used. 1022 If the NAL unit has no timing properties of its own (e.g. 1023 parameter set and SEI NAL units), the RTP timestamp MUST be 1024 set to the RTP timestamp of the coded picture of the access 1025 unit in which the NAL unit (according to Section 7.4.2.4.4 of 1026 [HEVC]) is included. 1028 Receivers MUST use the RTP timestamp for the display process, 1029 even when the bitstream contains picture timing SEI messages 1030 or decoding unit information SEI messages as specified in 1031 [HEVC]. However, this does not mean that picture timing SEI 1032 messages in the bitstream should be discarded, as picture 1033 timing SEI messages may contain frame-field information that 1034 is important in appropriately rendering interlaced video. 1036 Synchronization source (SSRC): 32-bits 1038 4.2Used to identify the source of the RTP packets. When using SRST, 1039 by definition a single SSRC is used for all parts of a single 1040 bitstream. In MRST or MRMT, different SSRCs are used for each 1041 RTP stream containing a subset of the sub-layers of the single 1042 (temporally scalable) bitstream. A receiver is required to 1043 correctly associate the set of SSRCs that are included parts of 1044 the same bitstream. Payload Header Usage 1046 The first two bytes of the payload of an RTP packet are referred 1047 to as the payload header. The payload header consists of the 1048 same fields (F, Type, LayerId, and TID) as the NAL unit header as 1049 shown in Section 1.1.4, irrespective of the type of the payload 1050 structure. 1052 The TID value indicates (among other things) the relative 1053 importance of an RTP packet, for example because NAL units 1054 belonging to higher temporal sub-layers are not used for the 1055 decoding of lower temporal sub-layers. A lower value of TID 1056 indicates a higher importance. More important NAL units MAY be 1057 better protected against transmission losses than less important 1058 NAL units. 1060 4.3 Transmission Modes 1062 This memo enables transmission of an HEVC bitstream over 1063 . a single RTP stream on a single Media Transport (SRST), 1064 . multiple RTP streams over a single Media Transport (MRST), 1065 or 1066 . multiple RTP streams over multiple Media Transports (MRMT). 1068 Informative Note: While this specification enables the use of 1069 MRST within the H.265 RTP payload, the signaling of MRST within 1070 SDP Offer/Answer is not fully specified at the time of this 1071 writing. See [RFC5576] and [RFC5583] for what is supported 1072 today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I- 1073 D.ietf-mmusic-sdp-bundle-negotiation] for future directions. 1075 When in MRMT, the dependency of one RTP stream on another RTP 1076 stream is typically indicated as specified in [RFC5583]. 1077 [RFC5583] can also be utilized to specify dependencies within 1078 MRST, but only if the RTP streams utilize distinct payload types. 1079 When an RTP stream A depends on another RTP stream B, the RTP 1080 stream B is referred to as a dependee RTP stream of the RTP 1081 stream A. 1083 SRST or MRST SHOULD be used for point-to-point unicast scenarios, 1084 while MRMT SHOULD be used for point-to-multipoint multicast 1085 scenarios where different receivers require different operation 1086 points of the same HEVC bitstream, to improve bandwidth utilizing 1087 efficiency. 1089 Informative note: A multicast may degrade to a unicast after 1090 all but one receivers have left (this is a justification of 1091 the first "SHOULD" instead of "MUST"), and there might be 1092 scenarios where MRMT is desirable but not possible e.g. when 1093 IP multicast is not deployed in certain network (this is a 1094 justification of the second "SHOULD" instead of "MUST"). 1096 The transmission mode is indicated by the tx-mode media parameter 1097 (see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be 1098 used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be 1099 used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. 1101 Informative note: When an RTP stream does not depend on other 1102 RTP streams, any of SRST, MRST and MRMT may be in use for the 1103 RTP stream. 1105 Receivers MUST support all of SRST, MRST, and MRMT. 1107 Informative note: The required support of MRMT by receivers 1108 does not imply that multicast must be supported by receivers. 1110 4.4 Payload Structures 1112 Four different types of RTP packet payload structures are 1113 specified. A receiver can identify the type of an RTP packet 1114 payload through the Type field in the payload header. 1116 The four different payload structures are as follows: 1118 o Single NAL unit packet: Contains a single NAL unit in the 1119 payload, and the NAL unit header of the NAL unit also serves 1120 as the payload header. This payload structure is specified in 1121 Section 4.4.1. 1123 o Aggregation packet (AP): Contains more than one NAL unit 1124 within one access unit. This payload structure is specified 1125 in Section 4.4.2. 1127 o Fragmentation unit (FU): Contains a subset of a single NAL 1128 unit. This payload structure is specified in Section 4.4.3. 1130 o PACI carrying RTP packet: Contains a payload header (that 1131 differs from other payload headers for efficiency), a Payload 1132 Header Extension Structure (PHES), and a PACI payload. This 1133 payload structure is specified in Section 4.4.4. 1135 4.4.1 Single NAL Unit Packets 1137 A single NAL unit packet contains exactly one NAL unit, and 1138 consists of a payload header (denoted as PayloadHdr), a 1139 conditional 16-bit DONL field (in network byte order), and the 1140 NAL unit payload data (the NAL unit excluding its NAL unit 1141 header) of the contained NAL unit, as shown in Figure 3. 1143 0 1 2 3 1144 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 | PayloadHdr | DONL (conditional) | 1147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1148 | | 1149 | NAL unit payload data | 1150 | | 1151 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1152 | :...OPTIONAL RTP padding | 1153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1155 Figure 3 The structure a single NAL unit packet 1157 The payload header SHOULD be an exact copy of the NAL unit header 1158 of the contained NAL unit. However, the Type (i.e. 1159 nal_unit_type) field MAY be changed, e.g. when it is desirable to 1160 handle a CRA picture to be a BLA picture [JCTVC-J0107]. 1162 The DONL field, when present, specifies the value of the 16 least 1163 significant bits of the decoding order number of the contained 1164 NAL unit. If sprop-max-don-diff is greater than 0 for any of the 1165 RTP streams, the DONL field MUST be present, and the variable DON 1166 for the contained NAL unit is derived as equal to the value of 1167 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1168 all the RTP streams), the DONL field MUST NOT be present. 1170 4.4.2 Aggregation Packets (APs) 1172 Aggregation packets (APs) are introduced to enable the reduction 1173 of packetization overhead for small NAL units, such as most of 1174 the non-VCL NAL units, which are often only a few octets in size. 1176 An AP aggregates NAL units within one access unit. Each NAL unit 1177 to be carried in an AP is encapsulated in an aggregation unit. 1178 NAL units aggregated in one AP are in NAL unit decoding order. 1180 An AP consists of a payload header (denoted as PayloadHdr) 1181 followed by two or more aggregation units, as shown in Figure 4. 1183 0 1 2 3 1184 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1185 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1186 | PayloadHdr (Type=48) | | 1187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1188 | | 1189 | two or more aggregation units | 1190 | | 1191 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1192 | :...OPTIONAL RTP padding | 1193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1195 Figure 4 The structure of an aggregation packet 1197 The fields in the payload header are set as follows. The F bit 1198 MUST be equal to 0 if the F bit of each aggregated NAL unit is 1199 equal to zero; otherwise, it MUST be equal to 1. The Type field 1200 MUST be equal to 48. The value of LayerId MUST be equal to the 1201 lowest value of LayerId of all the aggregated NAL units. The 1202 value of TID MUST be the lowest value of TID of all the 1203 aggregated NAL units. 1205 Informative Note: All VCL NAL units in an AP have the same TID 1206 value since they belong to the same access unit. However, an 1207 AP may contain non-VCL NAL units for which the TID value in 1208 the NAL unit header may be different than the TID value of the 1209 VCL NAL units in the same AP. 1211 An AP MUST carry at least two aggregation units and can carry as 1212 many aggregation units as necessary; however, the total amount of 1213 data in an AP obviously MUST fit into an IP packet, and the size 1214 SHOULD be chosen so that the resulting IP packet is smaller than 1215 the MTU size so to avoid IP layer fragmentation. An AP MUST NOT 1216 contain Fragmentation Units (FUs) specified in Section 4.4.3. 1217 APs MUST NOT be nested; i.e. an AP must not contain another AP. 1219 The first aggregation unit in an AP consists of a conditional 16- 1220 bit DONL field (in network byte order) followed by a 16-bit 1221 unsigned size information (in network byte order) that indicates 1222 the size of the NAL unit in bytes (excluding these two octets, 1223 but including the NAL unit header), followed by the NAL unit 1224 itself, including its NAL unit header, as shown in Figure 5. 1226 0 1 2 3 1227 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1228 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1229 : DONL (conditional) | NALU size | 1230 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1231 | NALU size | | 1232 +-+-+-+-+-+-+-+-+ NAL unit | 1233 | | 1234 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1235 | : 1236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1238 Figure 5 The structure of the first aggregation unit in an AP 1240 The DONL field, when present, specifies the value of the 16 least 1241 significant bits of the decoding order number of the aggregated 1242 NAL unit. 1244 If sprop-max-don-diff is greater than 0 for any of the RTP 1245 streams, the DONL field MUST be present in an aggregation unit 1246 that is the first aggregation unit in an AP, and the variable DON 1247 for the aggregated NAL unit is derived as equal to the value of 1248 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1249 all the RTP streams), the DONL field MUST NOT be present in an 1250 aggregation unit that is the first aggregation unit in an AP. 1252 An aggregation unit that is not the first aggregation unit in an 1253 AP consists of a conditional 8-bit DOND field followed by a 16- 1254 bit unsigned size information (in network byte order) that 1255 indicates the size of the NAL unit in bytes (excluding these two 1256 octets, but including the NAL unit header), followed by the NAL 1257 unit itself, including its NAL unit header, as shown in Figure 6. 1259 0 1 2 3 1260 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1262 : DOND (cond) | NALU size | 1263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1264 | | 1265 | NAL unit | 1266 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1267 | : 1268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 Figure 6 The structure of an aggregation unit that is not the 1271 first aggregation unit in an AP 1273 When present, the DOND field plus 1 specifies the difference 1274 between the decoding order number values of the current 1275 aggregated NAL unit and the preceding aggregated NAL unit in the 1276 same AP. 1278 If sprop-max-don-diff is greater than 0 for any of the RTP 1279 streams, the DOND field MUST be present in an aggregation unit 1280 that is not the first aggregation unit in an AP, and the variable 1281 DON for the aggregated NAL unit is derived as equal to the DON of 1282 the preceding aggregated NAL unit in the same AP plus the value 1283 of the DOND field plus 1 modulo 65536. Otherwise (sprop-max-don- 1284 diff is equal to 0 for all the RTP streams), the DOND field MUST 1285 NOT be present in an aggregation unit that is not the first 1286 aggregation unit in an AP, and in this case the transmission 1287 order and decoding order of NAL units carried in the AP are the 1288 same as the order the NAL units appear in the AP. 1290 Figure 7 presents an example of an AP that contains two 1291 aggregation units, labeled as 1 and 2 in the figure, without the 1292 DONL and DOND fields being present. 1294 0 1 2 3 1295 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1297 | RTP Header | 1298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1299 | PayloadHdr (Type=48) | NALU 1 Size | 1300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1301 | NALU 1 HDR | | 1302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1303 | . . . | 1304 | | 1305 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1306 | . . . | NALU 2 Size | NALU 2 HDR | 1307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1308 | NALU 2 HDR | | 1309 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1310 | . . . | 1311 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1312 | :...OPTIONAL RTP padding | 1313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1315 Figure 7 An example of an AP packet containing two aggregation 1316 units without the DONL and DOND fields 1318 Figure 8 presents an example of an AP that contains two 1319 aggregation units, labeled as 1 and 2 in the figure, with the 1320 DONL and DOND fields being present. 1322 0 1 2 3 1323 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1325 | RTP Header | 1326 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1327 | PayloadHdr (Type=48) | NALU 1 DONL | 1328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1329 | NALU 1 Size | NALU 1 HDR | 1330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1331 | | 1332 | NALU 1 Data . . . | 1333 | | 1334 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1335 | | NALU 2 DOND | NALU 2 Size | 1336 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1337 | NALU 2 HDR | | 1338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1339 | | 1340 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1341 | :...OPTIONAL RTP padding | 1342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1344 Figure 8 An example of an AP containing two aggregation units 1345 with the DONL and DOND fields 1347 4.4.3 Fragmentation Units (FUs) 1349 Fragmentation units (FUs) are introduced to enable fragmenting a 1350 single NAL unit into multiple RTP packets, possibly without 1351 cooperation or knowledge of the HEVC encoder. A fragment of a 1352 NAL unit consists of an integer number of consecutive octets of 1353 that NAL unit. Fragments of the same NAL unit MUST be sent in 1354 consecutive order with ascending RTP sequence numbers (with no 1355 other RTP packets within the same RTP stream being sent between 1356 the first and last fragment). 1358 When a NAL unit is fragmented and conveyed within FUs, it is 1359 referred to as a fragmented NAL unit. APs MUST NOT be 1360 fragmented. FUs MUST NOT be nested; i.e. an FU must not contain 1361 a subset of another FU. 1363 The RTP timestamp of an RTP packet carrying an FU is set to the 1364 NALU-time of the fragmented NAL unit. 1366 An FU consists of a payload header (denoted as PayloadHdr), an FU 1367 header of one octet, a conditional 16-bit DONL field (in network 1368 byte order), and an FU payload, as shown in Figure 9. 1370 0 1 2 3 1371 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1373 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1375 | DONL (cond) | | 1376 |-+-+-+-+-+-+-+-+ | 1377 | FU payload | 1378 | | 1379 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1380 | :...OPTIONAL RTP padding | 1381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1383 Figure 9 The structure of an FU 1385 The fields in the payload header are set as follows. The Type 1386 field MUST be equal to 49. The fields F, LayerId, and TID MUST 1387 be equal to the fields F, LayerId, and TID, respectively, of the 1388 fragmented NAL unit. 1390 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1391 field, as shown in Figure 10. 1393 +---------------+ 1394 |0|1|2|3|4|5|6|7| 1395 +-+-+-+-+-+-+-+-+ 1396 |S|E| FuType | 1397 +---------------+ 1399 Figure 10 The structure of FU header 1401 The semantics of the FU header fields are as follows: 1402 S: 1 bit 1403 When set to one, the S bit indicates the start of a fragmented 1404 NAL unit i.e. the first byte of the FU payload is also the 1405 first byte of the payload of the fragmented NAL unit. When 1406 the FU payload is not the start of the fragmented NAL unit 1407 payload, the S bit MUST be set to zero. 1409 E: 1 bit 1410 When set to one, the E bit indicates the end of a fragmented 1411 NAL unit, i.e. the last byte of the payload is also the last 1412 byte of the fragmented NAL unit. When the FU payload is not 1413 the last fragment of a fragmented NAL unit, the E bit MUST be 1414 set to zero. 1416 FuType: 6 bits 1417 The field FuType MUST be equal to the field Type of the 1418 fragmented NAL unit. 1420 The DONL field, when present, specifies the value of the 16 least 1421 significant bits of the decoding order number of the fragmented 1422 NAL unit. 1424 If sprop-max-don-diff is greater than 0 for any of the RTP 1425 streams, and the S bit is equal to 1, the DONL field MUST be 1426 present in the FU, and the variable DON for the fragmented NAL 1427 unit is derived as equal to the value of the DONL field. 1428 Otherwise (sprop-max-don-diff is equal to 0 for all the RTP 1429 streams, or the S bit is equal to 0), the DONL field MUST NOT be 1430 present in the FU. 1432 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1433 the Start bit and End bit must not both be set to one in the same 1434 FU header. 1436 The FU payload consists of fragments of the payload of the 1437 fragmented NAL unit so that if the FU payloads of consecutive 1438 FUs, starting with an FU with the S bit equal to 1 and ending 1439 with an FU with the E bit equal to 1, are sequentially 1440 concatenated, the payload of the fragmented NAL unit can be 1441 reconstructed. The NAL unit header of the fragmented NAL unit is 1442 not included as such in the FU payload, but rather the 1443 information of the NAL unit header of the fragmented NAL unit is 1444 conveyed in F, LayerId, and TID fields of the FU payload headers 1445 of the FUs and the FuType field of the FU header of the FUs. An 1446 FU payload MUST NOT be empty. 1448 If an FU is lost, the receiver SHOULD discard all following 1449 fragmentation units in transmission order corresponding to the 1450 same fragmented NAL unit, unless the decoder in the receiver is 1451 known to be prepared to gracefully handle incomplete NAL units. 1453 A receiver in an endpoint or in a MANE MAY aggregate the first n- 1454 1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1455 fragment n of that NAL unit is not received. In this case, the 1456 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1457 a syntax violation. 1459 4.4.4 PACI packets 1461 This section specifies the PACI packet structure. The basic 1462 payload header specified in this memo is intentionally limited to 1463 the 16 bits of the NAL unit header so to keep the packetization 1464 overhead to a minimum. However, cases have been identified where 1465 it is advisable to include control information in an easily 1466 accessible position in the packet header, despite the additional 1467 overhead. One such control information is the Temporal 1468 Scalability Control Information as specified in Section 4.5 1469 below. PACI packets carry this and future, similar structures. 1471 The PACI packet structure is based on a payload header extension 1472 mechanism that is generic and extensible to carry payload header 1473 extensions. In this section, the focus lies on the use within 1474 this specification. Section 4.4.4.2 below provides guidance for 1475 the specification designers in how to employ the extension 1476 mechanism in future specifications. 1478 A PACI packet consists of a payload header (denoted as 1479 PayloadHdr), for which the structure follows what is described in 1480 Section 4.2 above. The payload header is followed by the fields 1481 A, cType, PHSsize, F[0..2] and Y. 1483 Figure 11 shows a PACI packet in compliance with this memo; that 1484 is, without any extensions. 1486 0 1 2 3 1487 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1489 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1491 | Payload Header Extension Structure (PHES) | 1492 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1493 | | 1494 | PACI payload: NAL unit | 1495 | . . . | 1496 | | 1497 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1498 | :...OPTIONAL RTP padding | 1499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1501 Figure 11 The structure of a PACI 1503 The fields in the payload header are set as follows. The F bit 1504 MUST be equal to 0. The Type field MUST be equal to 50. The 1505 value of LayerId MUST be a copy of the LayerId field of the PACI 1506 payload NAL unit or NAL-unit-like structure. The value of TID 1507 MUST be a copy of the TID field of the PACI payload NAL unit or 1508 NAL-unit-like structure. 1510 The semantics of other fields are as follows: 1512 A: 1 bit 1513 Copy of the F bit of the PACI payload NAL unit or NAL-unit- 1514 like structure. 1516 cType: 6 bits 1517 Copy of the Type field of the PACI payload NAL unit or NAL- 1518 unit-like structure. 1520 PHSsize: 5 bits 1521 Indicates the length of the PHES field. The value is limited 1522 to be less than or equal to 32 octets, to simplify encoder 1523 design for MTU size matching. 1525 F0 1526 This field equal to 1 specifies the presence of a temporal 1527 scalability support extension in the PHES. 1529 F1, F2 1530 MUST be 0, available for future extensions, see Section 1531 4.4.4.2. Receivers compliant with this version of the HEVC 1532 payload format MUST ignore F1=1 and/or F2=1, and also ignore 1533 any information in the PHES indicated as present by F1=1 1534 and/or F2=1. 1536 Informative note: The receiver can do that by first 1537 decoding information associated with F0=1, and then 1538 skipping over any remaining bytes of the PHES based on the 1539 value of PHSsize. 1541 Y: 1 bit 1542 MUST be 0, available for future extensions, see Section 1543 4.4.4.2. Receivers compliant with this version of the HEVC 1544 payload format MUST ignore Y=1, and also ignore any 1545 information in the PHES indicated as present by Y. 1547 PHES: variable number of octets 1548 A variable number of octets as indicated by the value of 1549 PHSsize. 1551 PACI Payload 1552 The single NAL unit packet or NAL-unit-like structure (such 1553 as: FU or AP) to be carried, not including the first two 1554 octets. 1556 Informative note: The first two octets of the NAL unit or 1557 NAL-unit-like structure carried in the PACI payload are not 1558 included in the PACI payload. Rather, the respective values 1559 are copied in locations of the PayloadHdr of the RTP 1560 packet. This design offers two advantages: first, the 1561 overall structure of the payload header is preserved, i.e. 1562 there is no special case of payload header structure that 1563 needs to be implemented for PACI. Second, no additional 1564 overhead is introduced. 1566 A PACI payload MAY be a single NAL unit, an FU, or an AP. 1567 PACIs MUST NOT be fragmented or aggregated. The following 1568 subsection documents the reasons for these design choices. 1570 4.4.4.1 Reasons for the PACI rules (informative) 1572 A PACI cannot be fragmented. If a PACI could be fragmented, and 1573 a fragment other than the first fragment would get lost, access 1574 to the information in the PACI would not be possible. Therefore, 1575 a PACI must not be fragmented. In other words, an FU must not 1576 carry (fragments of) a PACI. 1578 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1579 from a compression viewpoint, as, in many cases, several to be 1580 aggregated NAL units would share identical PACI fields and values 1581 which would be carried redundantly for no reason. Most, if not 1582 all the practical effects of PACI aggregation can be achieved by 1583 aggregating NAL units and bundling them with a PACI (see below). 1584 Therefore, a PACI must not be aggregated. In other words, an AP 1585 must not contain a PACI. 1587 The payload of a PACI can be a fragment. Both middleboxes and 1588 sending systems with inflexible (often hardware-based) encoders 1589 occasionally find themselves in situations where a PACI and its 1590 headers, combined, are larger than the MTU size. In such a 1591 scenario, the middlebox or sender can fragment the NAL unit and 1592 encapsulate the fragment in a PACI. Doing so preserves the 1593 payload header extension information for all fragments, allowing 1594 downstream middleboxes and the receiver to take advantage of that 1595 information. Therefore, a sender may place a fragment into a 1596 PACI, and a receiver must be able to handle such a PACI. 1598 The payload of a PACI can be an aggregation NAL unit. HEVC 1599 bitstreams can contain unevenly sized and/or small (when compared 1600 to the MTU size) NAL units. In order to efficiently packetize 1601 such small NAL units, AP were introduced. The benefits of APs 1602 are independent from the need for a payload header extension. 1603 Therefore, a sender may place an AP into a PACI, and a receiver 1604 must be able to handle such a PACI. 1606 4.4.4.2 PACI extensions (Informative) 1608 This section includes recommendations for future specification 1609 designers on how to extent the PACI syntax to accommodate future 1610 extensions. Obviously, designers are free to specify whatever 1611 appears to be appropriate to them at the time of their design. 1612 However, a lot of thought has been invested into the extension 1613 mechanism described below, and we suggest that deviations from it 1614 warrant a good explanation. 1616 This memo defines only a single payload header extension 1617 (Temporal Scalability Control Information, described below in 1618 Section 4.5), and, therefore, only the F0 bit carries semantics. 1619 F1 and F2 are already named (and not just marked as reserved, as 1620 a typical video spec designer would do). They are intended to 1621 signal two additional extensions. The Y bit allows to, 1622 recursively, add further F and Y bits to extend the mechanism 1623 beyond 3 possible payload header extensions. It is suggested to 1624 define a new packet type (using a different value for Type) when 1625 assigning the F1, F2, or Y bits different semantics than what is 1626 suggested below. 1628 When a Y bit is set, an 8 bit flag-extension is inserted after 1629 the Y bit. A flag-extension consists of 7 flags F[n..n+6], and 1630 another Y bit. 1632 The basic PACI header already includes F0, F1, and F2. 1633 Therefore, the Fx bits in the first flag-extensions are numbered 1634 F3, F4, ..., F9, the F bits in the second flag-extension are 1635 numbered F10, F11, ..., F16, and so forth. As a result, at least 1636 3 Fx bits are always in the PACI, but the number of Fx bits (and 1637 associated types of extensions), can be increased by setting the 1638 next Y bit and adding an octet of flag-extensions, carrying 7 1639 flags and another Y bit. The size of this list of flags is 1640 subject to the limits specified in Section 4.4.4 (32 octets for 1641 all flag-extensions and the PHES information combined). 1643 Each of the F bits can indicate either the presence of 1644 information in the Payload Header Extension Structure (PHES), 1645 described below, or a given F bit can indicate a certain 1646 condition, without including additional information in the PHES. 1648 When a spec developer devises a new syntax that takes advantage 1649 of the PACI extension mechanism, he/she must follow the 1650 constraints listed below; otherwise the extension mechanism may 1651 break. 1653 1) The fields added for a particular Fx bit MUST be fixed in 1654 length and not depend on what other Fx bits are set (no 1655 parsing dependency). 1656 2) The Fx bits must be assigned in order. 1657 3) An implementation that supports the n-th Fn bit for any 1658 value of n must understand the syntax (though not 1659 necessarily the semantics) of the fields Fk (with k < n), so 1660 to be able to either use those bits when present, or at 1661 least be able to skip over them. 1663 4.5 Temporal Scalability Control Information 1665 This section describes the single payload header extension 1666 defined in this specification, known as Temporal Scalability 1667 Control Information (TSCI). If, in the future, additional 1668 payload header extensions become necessary, they could be 1669 specified in this section of an updated version of this document, 1670 or in their own documents. 1672 When F0 is set to 1 in a PACI, this specifies that the PHES field 1673 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as 1674 follows: 1676 0 1 2 3 1677 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1679 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1680 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1681 | TL0PICIDX | IrapPicID |S|E| RES | | 1682 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1683 | .... | 1684 | PACI payload: NAL unit | 1685 | | 1686 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1687 | :...OPTIONAL RTP padding | 1688 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1690 Figure 12 The structure of a PACI with a PHES containing a TSCI 1692 TL0PICIDX (8 bits) 1693 When present, the TL0PICIDX field MUST be set to equal to 1694 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1695 [H.265] for the access unit containing the NAL unit in the 1696 PACI. 1698 IrapPicID (8 bits) 1699 When present, the IrapPicID field MUST be set to equal to 1700 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1701 access unit containing the NAL unit in the PACI. 1703 S (1 bit) 1704 The S bit MUST be set to 1 if any of the following conditions 1705 is true and MUST be set to 0 otherwise: 1706 o The NAL unit in the payload of the PACI is the first VCL NAL 1707 unit, in decoding order, of a picture. 1708 o The NAL unit in the payload of the PACI is an AP and the NAL 1709 unit in the first contained aggregation unit is the first 1710 VCL NAL unit, in decoding order, of a picture. 1711 o The NAL unit in the payload of the PACI is an FU with its S 1712 bit equal to 1 and the FU payload containing a fragment of 1713 the first VCL NAL unit, in decoding order of a picture. 1715 E (1 bit) 1716 The E bit MUST be set to 1 if any of the following conditions 1717 is true and MUST be set to 0 otherwise: 1718 o The NAL unit in the payload of the PACI is the last VCL NAL 1719 unit, in decoding order, of a picture. 1720 o The NAL unit in the payload of the PACI is an AP and the NAL 1721 unit in the last contained aggregation unit is the last VCL 1722 NAL unit, in decoding order, of a picture. 1723 o The NAL unit in the payload of the PACI is an FU with its E 1724 bit equal to 1 and the FU payload containing a fragment of 1725 the last VCL NAL unit, in decoding order of a picture. 1727 RES (6 bits) 1728 MUST be equal to 0. Reserved for future extensions. 1730 The value of PHSsize MUST be set to 3. Receivers MUST allow 1731 other values of the fields F0, F1, F2, Y, and PHSsize, and MUST 1732 ignore any additional fields, when present, than specified above 1733 in the PHES. 1735 4.6 Decoding Order Number 1737 For each NAL unit, the variable AbsDon is derived, representing 1738 the decoding order number that is indicative of the NAL unit 1739 decoding order. 1741 Let NAL unit n be the n-th NAL unit in transmission order within 1742 an RTP stream. 1744 If sprop-max-don-diff is equal to 0 for all the RTP streams 1745 carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for 1746 NAL unit n, is derived as equal to n. 1748 Otherwise (sprop-max-don-diff is greater than 0 for any of the 1749 RTP streams), AbsDon[n] is derived as follows, where DON[n] is 1750 the value of the variable DON for NAL unit n: 1752 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit 1753 in transmission order), AbsDon[0] is set equal to DON[0]. 1755 o Otherwise (n is greater than 0), the following applies for 1756 derivation of AbsDon[n]: 1758 If DON[n] == DON[n-1], 1759 AbsDon[n] = AbsDon[n-1] 1761 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1762 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1764 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1765 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1767 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1768 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1769 DON[n]) 1771 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1772 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1774 For any two NAL units m and n, the following applies: 1776 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1777 follows NAL unit m in NAL unit decoding order. 1779 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding 1780 order of the two NAL units can be in either order. 1782 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n 1783 precedes NAL unit m in decoding order. 1785 Informative note: When two consecutive NAL units in the NAL 1786 unit decoding order have different values of AbsDon, the 1787 absolute difference between the two AbsDon values may be 1788 greater than or equal to 1. 1790 Informative note: There are multiple reasons to allow for the 1791 absolute difference of the values of AbsDon for two 1792 consecutive NAL units in the NAL unit decoding order to be 1793 greater than one. An increment by one is not required, as at 1794 the time of associating values of AbsDon to NAL units, it may 1795 not be known whether all NAL units are to be delivered to the 1796 receiver. For example, a gateway may not forward VCL NAL 1797 units of higher sub-layers or some SEI NAL units when there is 1798 congestion in the network. In another example, the first 1799 intra-coded picture of a pre-encoded clip is transmitted in 1800 advance to ensure that it is readily available in the 1801 receiver, and when transmitting the first intra-coded picture, 1802 the originator does not exactly know how many NAL units will 1803 be encoded before the first intra-coded picture of the pre- 1804 encoded clip follows in decoding order. Thus, the values of 1805 AbsDon for the NAL units of the first intra-coded picture of 1806 the pre-encoded clip have to be estimated when they are 1807 transmitted, and gaps in values of AbsDon may occur. Another 1808 example is MRST or MRMT with sprop-max-don-diff greater than 1809 0, where the AbsDon values must indicate cross-layer decoding 1810 order for NAL units conveyed in all the RTP streams. 1812 5 Packetization Rules 1814 The following packetization rules apply: 1816 o If sprop-max-don-diff is greater than 0 for any of the RTP 1817 streams, the transmission order of NAL units carried in the 1818 RTP stream MAY be different than the NAL unit decoding order 1819 and the NAL unit output order. Otherwise (sprop-max-don-diff 1820 is equal to 0 for all the RTP streams), the transmission order 1821 of NAL units carried in the RTP stream MUST be the same as the 1822 NAL unit decoding order, and, when tx-mode is equal to "MRST" 1823 or "MRMT", MUST also be the same as the NAL unit output order. 1825 o A NAL unit of a small size SHOULD be encapsulated in an 1826 aggregation packet together with one or more other NAL units 1827 in order to avoid the unnecessary packetization overhead for 1828 small NAL units. For example, non-VCL NAL units such as 1829 access unit delimiters, parameter sets, or SEI NAL units are 1830 typically small and can often be aggregated with VCL NAL units 1831 without violating MTU size constraints. 1833 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1834 match viewpoint, be encapsulated in an aggregation packet 1835 together with its associated VCL NAL unit, as typically a non- 1836 VCL NAL unit would be meaningless without the associated VCL 1837 NAL unit being available. 1839 o For carrying exactly one NAL unit in an RTP packet, a single 1840 NAL unit packet MUST be used. 1842 6 De-packetization Process 1844 The general concept behind de-packetization is to get the NAL 1845 units out of the RTP packets in an RTP stream and all RTP streams 1846 the RTP stream depends on, if any, and pass them to the decoder 1847 in the NAL unit decoding order. 1849 The de-packetization process is implementation dependent. 1850 Therefore, the following description should be seen as an example 1851 of a suitable implementation. Other schemes may be used as well 1852 as long as the output for the same input is the same as the 1853 process described below. The output is the same when the set of 1854 output NAL units and their order are both identical. 1855 Optimizations relative to the described algorithms are possible. 1857 All normal RTP mechanisms related to buffer management apply. In 1858 particular, duplicated or outdated RTP packets (as indicated by 1859 the RTP sequences number and the RTP timestamp) are removed. To 1860 determine the exact time for decoding, factors such as a possible 1861 intentional delay to allow for proper inter-stream 1862 synchronization must be factored in. 1864 NAL units with NAL unit type values in the range of 0 to 47, 1865 inclusive may be passed to the decoder. NAL-unit-like structures 1866 with NAL unit type values in the range of 48 to 63, inclusive, 1867 MUST NOT be passed to the decoder. 1869 The receiver includes a receiver buffer, which is used to 1870 compensate for transmission delay jitter within individual RTP 1871 streams and across RTP streams, to reorder NAL units from 1872 transmission order to the NAL unit decoding order, and to recover 1873 the NAL unit decoding order in MRST or MRMT, when applicable. In 1874 this section, the receiver operation is described under the 1875 assumption that there is no transmission delay jitter within an 1876 RTP stream and across RTP streams. To make a difference from a 1877 practical receiver buffer that is also used for compensation of 1878 transmission delay jitter, the receiver buffer is here after 1879 called the de-packetization buffer in this section. Receivers 1880 should also prepare for transmission delay jitter; i.e. either 1881 reserve separate buffers for transmission delay jitter buffering 1882 and de-packetization buffering or use a receiver buffer for both 1883 transmission delay jitter and de-packetization. Moreover, 1884 receivers should take transmission delay jitter into account in 1885 the buffering operation; e.g. by additional initial buffering 1886 before starting of decoding and playback. 1888 When sprop-max-don-diff is equal to 0 for all the received RTP 1889 streams, the de-packetization buffer size is zero bytes and the 1890 process described in the remainder of this paragraph applies. 1891 When there is only one RTP stream received, the NAL units carried 1892 in the single RTP stream are directly passed to the decoder in 1893 their transmission order, which is identical to their decoding 1894 order. When there is more than one RTP stream received, the NAL 1895 units carried in the multiple RTP streams are passed to the 1896 decoder in their NTP timestamp order. When there are several NAL 1897 units of different RTP streams with the same NTP timestamp, the 1898 order to pass them to the decoder is their dependency order, 1899 where NAL units of a dependee RTP stream are passed to the 1900 decoder prior to the NAL units of the dependent RTP stream. When 1901 there are several NAL units of the same RTP stream with the same 1902 NTP timestamp, the order to pass them to the decoder is their 1903 transmission order. 1905 Informative note: The mapping between RTP and NTP 1906 timestamps is conveyed in RTCP SR packets. In addition, 1907 the mechanisms for faster media timestamp synchronization 1908 discussed in [RFC6051] may be used to speed up the 1909 acquisition of the RTP-to-wall-clock mapping. 1911 When sprop-max-don-diff is greater than 0 for any the received 1912 RTP streams, the process described in the remainder of this 1913 section applies. 1915 There are two buffering states in the receiver: initial buffering 1916 and buffering while playing. Initial buffering starts when the 1917 reception is initialized. After initial buffering, decoding and 1918 playback are started, and the buffering-while-playing mode is 1919 used. 1921 Regardless of the buffering state, the receiver stores incoming 1922 NAL units, in reception order, into the de-packetization buffer. 1923 NAL units carried in RTP packets are stored in the de- 1924 packetization buffer individually, and the value of AbsDon is 1925 calculated and stored for each NAL unit. When MRST or MRMT is in 1926 use, NAL units of all RTP streams of a bitstream are stored in 1927 the same de-packetization buffer. When NAL units carried in any 1928 two RTP streams are available to be placed into the de- 1929 packetization buffer, those NAL units carried in the RTP stream 1930 that is lower in the dependency tree are placed into the buffer 1931 first. For example, if RTP stream A depends on RTP stream B, 1932 then NAL units carried in RTP stream B are placed into the buffer 1933 first. 1935 Initial buffering lasts until condition A (the difference between 1936 the greatest and smallest AbsDon values of the NAL units in the 1937 de-packetization buffer is greater than or equal to the value of 1938 sprop-max-don-diff of the highest RTP stream) or condition B (the 1939 number of NAL units in the de-packetization buffer is greater 1940 than the value of sprop-depack-buf-nalus) is true. 1942 After initial buffering, whenever condition A or condition B is 1943 true, the following operation is repeatedly applied until both 1944 condition A and condition B become false: 1946 o The NAL unit in the de-packetization buffer with the smallest 1947 value of AbsDon is removed from the de-packetization buffer 1948 and passed to the decoder. 1950 When no more NAL units are flowing into the de-packetization 1951 buffer, all NAL units remaining in the de-packetization buffer 1952 are removed from the buffer and passed to the decoder in the 1953 order of increasing AbsDon values. 1955 7 Payload Format Parameters 1957 This section specifies the parameters that MAY be used to select 1958 optional features of the payload format and certain features or 1959 properties of the bitstream or the RTP stream. The parameters 1960 are specified here as part of the media type registration for the 1961 HEVC codec. A mapping of the parameters into the Session 1962 Description Protocol (SDP) [RFC4566] is also provided for 1963 applications that use SDP. Equivalent parameters could be 1964 defined elsewhere for use with control protocols that do not use 1965 SDP. 1967 7.1 Media Type Registration 1969 The media subtype for the HEVC codec is allocated from the IETF 1970 tree. 1972 The receiver MUST ignore any unrecognized parameter. 1974 Media Type name: video 1976 Media subtype name: H265 1978 Required parameters: none 1980 OPTIONAL parameters: 1982 profile-space, tier-flag, profile-id, profile-compatibility- 1983 indicator, interop-constraints, and level-id: 1985 These parameters indicate the profile, tier, default level, 1986 and some constraints of the bitstream carried by the RTP 1987 stream and all RTP streams the RTP stream depends on, or a 1988 specific set of the profile, tier, default level, and some 1989 constraints the receiver supports. 1991 The profile and some constraints are indicated collectively 1992 by profile-space, profile-id, profile-compatibility- 1993 indicator, and interop-constraints. The profile specifies 1994 the subset of coding tools that may have been used to 1995 generate the bitstream or that the receiver supports. 1997 Informative note: There are 32 values of profile-id, and 1998 there are 32 flags in profile-compatibility-indicator, 1999 each flag corresponding to one value of profile-id. 2000 According to HEVC version 1 in [HEVC], when more than 2001 one of the 32 flags is set for a bitstream, the 2002 bitstream would comply with all the profiles 2003 corresponding to the set flags. However, in a draft of 2004 HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 2005 Format Range Extensions profiles have been specified, 2006 all using the same value of profile-id (4), 2007 differentiated by some of the 48 bits in interop- 2008 constraints - this (rather unexpected way of profile 2009 signalling) means that one of the 32 flags may 2010 correspond to multiple profiles. To be able to support 2011 whatever HEVC extension profile that might be specified 2012 and indicated using profile-space, profile-id, profile- 2013 compatibility-indicator, and interop-constraints in the 2014 future, it would be safe to require symmetric use of 2015 these parameters in SDP offer/answer unless recv-sub- 2016 layer-id is included in the SDP answer for choosing one 2017 of the sub-layers offered. 2019 The tier is indicated by tier-flag. The default level is 2020 indicated by level-id. The tier and the default level 2021 specify the limits on values of syntax elements or 2022 arithmetic combinations of values of syntax elements that 2023 are followed when generating the bitstream or that the 2024 receiver supports. 2026 A set of profile-space, tier-flag, profile-id, profile- 2027 compatibility-indicator, interop-constraints, and level-id 2028 parameters ptlA is said to be consistent with another set 2029 of these parameters ptlB if any decoder that conforms to 2030 the profile, tier, level, and constraints indicated by ptlB 2031 can decode any bitstream that conforms to the profile, 2032 tier, level, and constraints indicated by ptlA. 2034 In SDP offer/answer, when the SDP answer does not include 2035 the recv-sub-layer-id parameter that is less than the 2036 sprop-sub-layer-id parameter in the SDP offer, the 2037 following applies: 2039 o The profile-space, tier-flag, profile-id, profile- 2040 compatibility-indicator, and interop-constraints 2041 parameters MUST be used symmetrically, i.e. the value 2042 of each of these parameters in the offer MUST be the 2043 same as that in the answer, either explicitly 2044 signalled or implicitly inferred. 2046 o The level-id parameter is changeable as long as the 2047 highest level indicated by the answer is either equal 2048 to or lower than that in the offer. Note that the 2049 highest level is indicated by level-id and max-recv- 2050 level-id together. 2052 In SDP offer/answer, when the SDP answer does include the 2053 recv-sub-layer-id parameter that is less than the sprop- 2054 sub-layer-id parameter in the SDP offer, the set of 2055 profile-space, tier-flag, profile-id, profile- 2056 compatibility-indicator, interop-constraints, and level-id 2057 parameters included in the answer MUST be consistent with 2058 that for the chosen sub-layer representation as indicated 2059 in the SDP offer, with the exception that the level-id 2060 parameter in the SDP answer is changable as long as the 2061 highest level indicated by the answer is either lower than 2062 or equal to that in the offer. 2064 More specifications of these parameters, including how they 2065 relate to the values of the profile, tier, and level syntax 2066 elements specified in [HEVC] are provided below. 2068 profile-space, profile-id: 2070 The value of profile-space MUST be in the range of 0 to 3, 2071 inclusive. The value of profile-id MUST be in the range of 2072 0 to 31, inclusive. 2074 When profile-space is not present, a value of 0 MUST be 2075 inferred. When profile-id is not present, a value of 1 2076 (i.e. the Main profile) MUST be inferred. 2078 When used to indicate properties of a bitstream, profile- 2079 space and profile-id are derived from the profile, tier, 2080 and level syntax elements in SPS or VPS NAL units as 2081 follows, where general_profile_space, general_profile_idc, 2082 sub_layer_profile_space[j], and sub_layer_profile_idc[j] 2083 are specified in [HEVC]: 2085 If the RTP stream is the highest RTP stream, the 2086 following applies: 2088 o profile_space = general_profile_space 2089 o profile_id = general_profile_idc 2091 Otherwise (the RTP stream is a dependee RTP stream), the 2092 following applies, with j being the value of the sprop- 2093 sub-layer-id parameter: 2095 o profile_space = sub_layer_profile_space[j] 2096 o profile_id = sub_layer_profile_idc[j] 2098 tier-flag, level-id: 2100 The value of tier-flag MUST be in the range of 0 to 1, 2101 inclusive. The value of level-id MUST be in the range of 0 2102 to 255, inclusive. 2104 If the tier-flag and level-id parameters are used to 2105 indicate properties of a bitstream, they indicate the tier 2106 and the highest level the bitstream complies with. 2108 If the tier-flag and level-id parameters are used for 2109 capability exchange, the following applies. If max-recv- 2110 level-id is not present, the default level defined by 2111 level-id indicates the highest level the codec wishes to 2112 support. Otherwise, max-recv-level-id indicates the 2113 highest level the codec supports for receiving. For either 2114 receiving or sending, all levels that are lower than the 2115 highest level supported MUST also be supported. 2117 If no tier-flag is present, a value of 0 MUST be inferred 2118 and if no level-id is present, a value of 93 (i.e. level 2119 3.1) MUST be inferred. 2121 When used to indicate properties of a bitstream, the tier- 2122 flag and level-id parameters are derived from the profile, 2123 tier, and level syntax elements in SPS or VPS NAL units as 2124 follows, where general_tier_flag, general_level_idc, 2125 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 2126 specified in [HEVC]: 2128 If the RTP stream is the highest RTP stream, the 2129 following applies: 2131 o tier-flag = general_tier_flag 2132 o level-id = general_level_idc 2134 Otherwise (the RTP stream is a dependee RTP stream), the 2135 following applies, with j being the value of the sprop- 2136 sub-layer-id parameter: 2138 o tier-flag = sub_layer_tier_flag[j] 2139 o level-id = sub_layer_level_idc[j] 2141 interop-constraints: 2143 A base16 [RFC4648] (hexadecimal) representation of six 2144 bytes of data, consisting of progressive_source_flag, 2145 interlaced_source_flag, non_packed_constraint_flag, 2146 frame_only_constraint_flag, and reserved_zero_44bits. 2148 If the interop-constraints parameter is not present, the 2149 following MUST be inferred: 2151 o progressive_source_flag = 1 2152 o interlaced_source_flag = 0 2153 o non_packed_constraint_flag = 1 2154 o frame_only_constraint_flag = 1 2155 o reserved_zero_44bits = 0 2157 When the interop-constraints parameter is used to indicate 2158 properties of a bitstream, the following applies, where 2159 general_progressive_source_flag, 2160 general_interlaced_source_flag, 2161 general_non_packed_constraint_flag, 2162 general_non_packed_constraint_flag, 2163 general_frame_only_constraint_flag, 2164 general_reserved_zero_44bits, 2165 sub_layer_progressive_source_flag[j], 2166 sub_layer_interlaced_source_flag[j], 2167 sub_layer_non_packed_constraint_flag[j], 2168 sub_layer_frame_only_constraint_flag[j], and 2169 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2171 If the RTP stream is the highest RTP stream, the 2172 following applies: 2174 o progressive_source_flag = 2175 general_progressive_source_flag 2176 o interlaced_source_flag = 2177 general_interlaced_source_flag 2178 o non_packed_constraint_flag = 2179 general_non_packed_constraint_flag 2180 o frame_only_constraint_flag = 2181 general_frame_only_constraint_flag 2182 o reserved_zero_44bits = general_reserved_zero_44bits 2184 Otherwise (the RTP stream is a dependee RTP stream), the 2185 following applies, with j being the value of the sprop- 2186 sub-layer-id parameter: 2188 o progressive_source_flag = 2189 sub_layer_progressive_source_flag[j] 2190 o interlaced_source_flag = 2191 sub_layer_interlaced_source_flag[j] 2192 o non_packed_constraint_flag = 2194 sub_layer_non_packed_constraint_flag[j] 2195 o frame_only_constraint_flag = 2197 sub_layer_frame_only_constraint_flag[j] 2198 o reserved_zero_44bits = 2199 sub_layer_reserved_zero_44bits[j] 2201 Using interop-constraints for capability exchange results 2202 in a requirement on any bitstream to be compliant with the 2203 interop-constraints. 2205 profile-compatibility-indicator: 2207 A base16 [RFC4648] representation of four bytes of data. 2209 When profile-compatibility-indicator is used to indicate 2210 properties of a bitstream, the following applies, where 2211 general_profile_compatibility_flag[j] and 2212 sub_layer_profile_compatibility_flag[i][j] are specified in 2213 [HEVC]: 2215 The profile-compatibility-indicator in this case 2216 indicates additional profiles to the profile defined by 2217 profile_space, profile_id, and interop-constraints the 2218 bitstream conforms to. A decoder that conforms to any 2219 of all the profiles the bitstream conforms to would be 2220 capable of decoding the bitstream. These additional 2221 profiles are defined by profile-space, each set bit of 2222 profile-compatibility-indicator, and interop- 2223 constraints. 2225 If the RTP stream is the highest RTP stream, the 2226 following applies for each value of j in the range of 0 2227 to 31, inclusive: 2229 o bit j of profile-compatibility-indicator = 2230 general_profile_compatibility_flag[j] 2232 Otherwise (the RTP stream is a dependee RTP stream), the 2233 following applies for i equal to sprop-sub-layer-id and 2234 for each value of j in the range of 0 to 31, inclusive: 2236 o bit j of profile-compatibility-indicator = 2237 sub_layer_profile_compatibility_flag[i][j] 2239 Using profile-compatibility-indicator for capability 2240 exchange results in a requirement on any bitstream to be 2241 compliant with the profile-compatibility-indicator. This 2242 is intended to handle cases where any future HEVC profile 2243 is defined as an intersection of two or more profiles. 2245 If this parameter is not present, this parameter defaults 2246 to the following: bit j, with j equal to profile-id, of 2247 profile-compatibility-indicator is inferred to be equal to 2248 1, and all other bits are inferred to be equal to 0. 2250 sprop-sub-layer-id: 2252 This parameter MAY be used to indicate the highest allowed 2253 value of TID in the bitstream. When not present, the value 2254 of sprop-sub-layer-id is inferred to be equal to 6. 2256 The value of sprop-sub-layer-id MUST be in the range of 0 2257 to 6, inclusive. 2259 recv-sub-layer-id: 2261 This parameter MAY be used to signal a receiver's choice of 2262 the offered or declared sub-layer representations in the 2263 sprop-vps. The value of recv-sub-layer-id indicates the 2264 TID of the highest sub-layer of the bitstream that a 2265 receiver supports. When not present, the value of recv- 2266 sub-layer-id is inferred to be equal to the value of the 2267 sprop-sub-layer-id parameter in the SDP offer. 2269 The value of recv-sub-layer-id MUST be in the range of 0 to 2270 6, inclusive. 2272 max-recv-level-id: 2274 This parameter MAY be used to indicate the highest level a 2275 receiver supports. The highest level the receiver supports 2276 is equal to the value of max-recv-level-id divided by 30. 2278 The value of max-recv-level-id MUST be in the range of 0 2279 to 255, inclusive. 2281 When max-recv-level-id is not present, the value is 2282 inferred to be equal to level-id. 2284 max-recv-level-id MUST NOT be present when the highest 2285 level the receiver supports is not higher than the default 2286 level. 2288 tx-mode: 2290 This parameter indicates whether the transmission mode is 2291 SRST, MRST, or MRMT. 2293 The value of tx-mode MUST be equal to "SRST", "MRST" or 2294 "MRMT". When not present, the value of tx-mode is inferred 2295 to be equal to "SRST". 2297 If the value is equal to "MRST", MRST MUST be in use. 2298 Otherwise, if the value is equal to "MRMT", MRMT MUST be in 2299 use. Otherwise (the value is equal to "SRST"), SRST MUST 2300 be in use. 2302 The value of tx-mode MUST be equal to "MRST" for all RTP 2303 streams in an MRST. 2305 The value of tx-mode MUST be equal to "MRMT" for all RTP 2306 streams in an MRMT. 2308 sprop-vps: 2310 This parameter MAY be used to convey any video parameter 2311 set NAL unit of the bitstream for out-of-band transmission 2312 of video parameter sets. The parameter MAY also be used 2313 for capability exchange and to indicate sub-stream 2314 characteristics (i.e. properties of sub-layer 2315 representations as defined in [HEVC]). The value of the 2316 parameter is a comma-separated (',') list of base64 2317 [RFC4648] representations of the video parameter set NAL 2318 units as specified in Section 7.3.2.1 of [HEVC]. 2320 The sprop-vps parameter MAY contain one or more than one 2321 video parameter set NAL unit. However, all other video 2322 parameter sets contained in the sprop-vps parameter MUST be 2323 consistent with the first video parameter set in the sprop- 2324 vps parameter. A video parameter set vpsB is said to be 2325 consistent with another video parameter set vpsA if any 2326 decoder that conforms to the profile, tier, level, and 2327 constraints indicated by the 12 bytes of data starting from 2328 the syntax element general_profile_space to the syntax 2329 element general_level_id, inclusive, in the first 2330 profile_tier_level( ) syntax structure in vpsA can decode 2331 any bitstream that conforms to the profile, tier, level, 2332 and constraints indicated by the 12 bytes of data starting 2333 from the syntax element general_profile_space to the syntax 2334 element general_level_id, inclusive, in the first 2335 profile_tier_level( ) syntax structure in vpsB. 2337 sprop-sps: 2339 This parameter MAY be used to convey sequence parameter set 2340 NAL units of the bitstream for out-of-band transmission of 2341 sequence parameter sets. The value of the parameter is a 2342 comma-separated (',') list of base64 [RFC4648] 2343 representations of the sequence parameter set NAL units as 2344 specified in Section 7.3.2.2 of [HEVC]. 2346 sprop-pps: 2348 This parameter MAY be used to convey picture parameter set 2349 NAL units of the bitstream for out-of-band transmission of 2350 picture parameter sets. The value of the parameter is a 2351 comma-separated (',') list of base64 [RFC4648] 2352 representations of the picture parameter set NAL units as 2353 specified in Section 7.3.2.3 of [HEVC]. 2355 sprop-sei: 2357 This parameter MAY be used to convey one or more SEI 2358 messages that describe bitstream characteristics. When 2359 present, a decoder can rely on the bitstream 2360 characteristics that are described in the SEI messages for 2361 the entire duration of the session, independently from the 2362 persistence scopes of the SEI messages as specified in 2363 [HEVC]. 2365 The value of the parameter is a comma-separated (',') list 2366 of base64 [RFC4648] representations of SEI NAL units as 2367 specified in Section 7.3.2.4 of [HEVC]. 2369 Informative note: Intentionally, no list of applicable 2370 or inapplicable SEI messages is specified here. 2371 Conveying certain SEI messages in sprop-sei may be 2372 sensible in some application scenarios and meaningless 2373 in others. However, a few examples are described below: 2375 1) In an environment where the bitstream was created 2376 from film-based source material, and no splicing is 2377 going to occur during the lifetime of the session, 2378 the film grain characteristics SEI message or the 2379 tone mapping information SEI message are likely 2380 meaningful, and sending them in sprop-sei rather than 2381 in the bitstream at each entry point may help saving 2382 bits and allows to configure the renderer only once, 2383 avoiding unwanted artifacts. 2384 2) The structure of pictures information SEI message in 2385 sprop-sei can be used to inform a decoder of 2386 information on the NAL unit types, picture order 2387 count values, and prediction dependencies of a 2388 sequence of pictures. Having such knowledge can be 2389 helpful for error recovery. 2390 3) Examples for SEI messages that would be meaningless 2391 to be conveyed in sprop-sei include the decoded 2392 picture hash SEI message (it is close to impossible 2393 that all decoded pictures have the same hash-tag), 2394 the display orientation SEI message when the device 2395 is a handheld device (as the display orientation may 2396 change when the handheld device is turned around), or 2397 the filler payload SEI message (as there is no point 2398 in just having more bits in SDP). 2400 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2402 These parameters MAY be used to signal the capabilities of 2403 a receiver implementation. These parameters MUST NOT be 2404 used for any other purpose. The highest level (specified 2405 by max-recv-level-id) MUST be such that the receiver is 2406 fully capable of supporting. max-lsr, max-lps, max-cpb, 2407 max-dpb, max-br, max-tr, and max-tc MAY be used to indicate 2408 capabilities of the receiver that extend the required 2409 capabilities of the highest level, as specified below. 2411 When more than one parameter from the set (max-lsr, max- 2412 lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, 2413 the receiver MUST support all signaled capabilities 2414 simultaneously. For example, if both max-lsr and max-br 2415 are present, the highest level with the extension of both 2416 the picture rate and bitrate is supported. That is, the 2417 receiver is able to decode bitstreams in which the luma 2418 sample rate is up to max-lsr (inclusive), the bitrate is up 2419 to max-br (inclusive), the coded picture buffer size is 2420 derived as specified in the semantics of the max-br 2421 parameter below, and the other properties comply with the 2422 highest level specified by max-recv-level-id. 2424 Informative note: When the OPTIONAL media type 2425 parameters are used to signal the properties of a 2426 bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- 2427 br, max-tr, and max-tc are not present, the values of 2428 profile-space, tier-flag, profile-id, profile- 2429 compatibility-indicator, interop-constraints, and level- 2430 id must always be such that the bitstream complies fully 2431 with the specified profile, tier, and level. 2433 max-lsr: 2434 The value of max-lsr is an integer indicating the maximum 2435 processing rate in units of luma samples per second. The 2436 max-lsr parameter signals that the receiver is capable of 2437 decoding video at a higher rate than is required by the 2438 highest level. 2440 When max-lsr is signaled, the receiver MUST be able to 2441 decode bitstreams that conform to the highest level, with 2442 the exception that the MaxLumaSR value in Table A-2 of 2443 [HEVC] for the highest level is replaced with the value of 2444 max-lsr. Senders MAY use this knowledge to send pictures 2445 of a given size at a higher picture rate than is indicated 2446 in the highest level. 2448 When not present, the value of max-lsr is inferred to be 2449 equal to the value of MaxLumaSR given in Table A-2 of 2450 [HEVC] for the highest level. 2452 The value of max-lsr MUST be in the range of MaxLumaSR to 2453 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in 2454 Table A-2 of [HEVC] for the highest level. 2456 max-lps: 2457 The value of max-lps is an integer indicating the maximum 2458 picture size in units of luma samples. The max-lps 2459 parameter signals that the receiver is capable of decoding 2460 larger picture sizes than are required by the highest 2461 level. When max-lps is signaled, the receiver MUST be able 2462 to decode bitstreams that conform to the highest level, 2463 with the exception that the MaxLumaPS value in Table A-1 of 2464 [HEVC] for the highest level is replaced with the value of 2465 max-lps. Senders MAY use this knowledge to send larger 2466 pictures at a proportionally lower picture rate than is 2467 indicated in the highest level. 2469 When not present, the value of max-lps is inferred to be 2470 equal to the value of MaxLumaPS given in Table A-1 of 2471 [HEVC] for the highest level. 2473 The value of max-lps MUST be in the range of MaxLumaPS to 2474 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in 2475 Table A-1 of [HEVC] for the highest level. 2477 max-cpb: 2478 The value of max-cpb is an integer indicating the maximum 2479 coded picture buffer size in units of CpbBrVclFactor bits 2480 for the VCL HRD parameters and in units of CpbBrNalFactor 2481 bits for the NAL HRD parameters, where CpbBrVclFactor and 2482 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The 2483 max-cpb parameter signals that the receiver has more memory 2484 than the minimum amount of coded picture buffer memory 2485 required by the highest level. When max-cpb is signaled, 2486 the receiver MUST be able to decode bitstreams that conform 2487 to the highest level, with the exception that the MaxCPB 2488 value in Table A-1 of [HEVC] for the highest level is 2489 replaced with the value of max-cpb. Senders MAY use this 2490 knowledge to construct coded bitstreams with greater 2491 variation of bitrate than can be achieved with the MaxCPB 2492 value in Table A-1 of [HEVC]. 2494 When not present, the value of max-cpb is inferred to be 2495 equal to the value of MaxCPB given in Table A-1 of [HEVC] 2496 for the highest level. 2498 The value of max-cpb MUST be in the range of MaxCPB to 2499 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table 2500 A-1 of [HEVC] for the highest level. 2502 Informative note: The coded picture buffer is used in 2503 the hypothetical reference decoder (Annex C of HEVC). 2504 The use of the hypothetical reference decoder is 2505 recommended in HEVC encoders to verify that the produced 2506 bitstream conforms to the standard and to control the 2507 output bitrate. Thus, the coded picture buffer is 2508 conceptually independent of any other potential buffers 2509 in the receiver, including de-packetization and de- 2510 jitter buffers. The coded picture buffer need not be 2511 implemented in decoders as specified in Annex C of HEVC, 2512 but rather standard-compliant decoders can have any 2513 buffering arrangements provided that they can decode 2514 standard-compliant bitstreams. Thus, in practice, the 2515 input buffer for a video decoder can be integrated with 2516 de-packetization and de-jitter buffers of the receiver. 2518 max-dpb: 2519 The value of max-dpb is an integer indicating the maximum 2520 decoded picture buffer size in units decoded pictures at 2521 the MaxLumaPS for the highest level, i.e. the number of 2522 decoded pictures at the maximum picture size defined by the 2523 highest level. The value of max-dpb MUST be in the range 2524 of 1 to 16, respectively. The max-dpb parameter signals 2525 that the receiver has more memory than the minimum amount 2526 of decoded picture buffer memory required by default, which 2527 is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When 2528 max-dpb is signaled, the receiver MUST be able to decode 2529 bitstreams that conform to the highest level, with the 2530 exception that the MaxDpbPicBuff value defined in [HEVC] as 2531 6 is replaced with the value of max-dpb. Consequently, a 2532 receiver that signals max-dpb MUST be capable of storing 2533 the following number of decoded pictures (MaxDpbSize) in 2534 its decoded picture buffer: 2536 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2537 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2538 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2539 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2540 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 2541 ) ) 2542 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2543 else 2544 MaxDpbSize = max-dpb 2546 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the 2547 highest level and PicSizeInSamplesY is the current size of 2548 each decoded picture in units of luma samples as defined in 2549 [HEVC]. 2551 The value of max-dpb MUST be greater than or equal to the 2552 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2553 Senders MAY use this knowledge to construct coded 2554 bitstreams with improved compression. 2556 When not present, the value of max-dpb is inferred to be 2557 equal to the value of MaxDpbPicBuf (i.e. 6) as defined in 2558 [HEVC]. 2560 Informative note: This parameter was added primarily to 2561 complement a similar codepoint in the ITU-T 2562 Recommendation H.245, so as to facilitate signaling 2563 gateway designs. The decoded picture buffer stores 2564 reconstructed samples. There is no relationship between 2565 the size of the decoded picture buffer and the buffers 2566 used in RTP, especially de-packetization and de-jitter 2567 buffers. 2569 max-br: 2570 The value of max-br is an integer indicating the maximum 2571 video bitrate in units of CpbBrVclFactor bits per second 2572 for the VCL HRD parameters and in units of CpbBrNalFactor 2573 bits per second for the NAL HRD parameters, where 2574 CpbBrVclFactor and CpbBrNalFactor are defined in Section 2575 A.4 of [HEVC]. 2577 The max-br parameter signals that the video decoder of the 2578 receiver is capable of decoding video at a higher bitrate 2579 than is required by the highest level. 2581 When max-br is signaled, the video codec of the receiver 2582 MUST be able to decode bitstreams that conform to the 2583 highest level, with the following exceptions in the limits 2584 specified by the highest level: 2586 o The value of max-br replaces the MaxBR value in Table A- 2587 2 of [HEVC] for the highest level. 2588 o When the max-cpb parameter is not present, the result of 2589 the following formula replaces the value of MaxCPB in 2590 Table A-1 of [HEVC]: 2592 (MaxCPB of the highest level) * max-br / (MaxBR of 2593 the highest level) 2595 For example, if a receiver signals capability for Main 2596 profile Level 2 with max-br equal to 2000, this indicates a 2597 maximum video bitrate of 2000 kbits/sec for VCL HRD 2598 parameters, a maximum video bitrate of 2200 kbits/sec for 2599 NAL HRD parameters, and a CPB size of 2000000 bits (2000000 2600 / 1500000 * 1500000). 2602 Senders MAY use this knowledge to send higher bitrate video 2603 as allowed in the level definition of Annex A of HEVC to 2604 achieve improved video quality. 2606 When not present, the value of max-br is inferred to be 2607 equal to the value of MaxBR given in Table A-2 of [HEVC] 2608 for the highest level. 2610 The value of max-br MUST be in the range of MaxBR to 2611 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2612 [HEVC] for the highest level. 2614 Informative note: This parameter was added primarily to 2615 complement a similar codepoint in the ITU-T 2616 Recommendation H.245, so as to facilitate signaling 2617 gateway designs. The assumption that the network is 2618 capable of handling such bitrates at any given time 2619 cannot be made from the value of this parameter. In 2620 particular, no conclusion can be drawn that the signaled 2621 bitrate is possible under congestion control 2622 constraints. 2624 max-tr: 2625 The value of max-tr is an integer indication the maximum 2626 number of tile rows. The max-tr parameter signals that the 2627 receiver is capable of decoding video with a larger number 2628 of tile rows than the value allowed by the highest level. 2630 When max-tr is signaled, the receiver MUST be able to 2631 decode bitstreams that conform to the highest level, with 2632 the exception that the MaxTileRows value in Table A-1 of 2633 [HEVC] for the highest level is replaced with the value of 2634 max-tr. 2636 Senders MAY use this knowledge to send pictures utilizing a 2637 larger number of tile rows than the value allowed by the 2638 highest level. 2640 When not present, the value of max-tr is inferred to be 2641 equal to the value of MaxTileRows given in Table A-1 of 2642 [HEVC] for the highest level. 2644 The value of max-tr MUST be in the range of MaxTileRows to 2645 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2646 Table A-1 of [HEVC] for the highest level. 2648 max-tc: 2649 The value of max-tc is an integer indication the maximum 2650 number of tile columns. The max-tc parameter signals that 2651 the receiver is capable of decoding video with a larger 2652 number of tile columns than the value allowed by the 2653 highest level. 2655 When max-tc is signaled, the receiver MUST be able to 2656 decode bitstreams that conform to the highest level, with 2657 the exception that the MaxTileCols value in Table A-1 of 2658 [HEVC] for the highest level is replaced with the value of 2659 max-tc. 2661 Senders MAY use this knowledge to send pictures utilizing a 2662 larger number of tile columns than the value allowed by the 2663 highest level. 2665 When not present, the value of max-tc is inferred to be 2666 equal to the value of MaxTileCols given in Table A-1 of 2667 [HEVC] for the highest level. 2669 The value of max-tc MUST be in the range of MaxTileCols to 2670 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2671 Table A-1 of [HEVC] for the highest level. 2673 max-fps: 2675 The value of max-fps is an integer indicating the maximum 2676 picture rate in units of pictures per 100 seconds that can 2677 be effectively processed by the receiver. The max-fps 2678 parameter MAY be used to signal that the receiver has a 2679 constraint in that it is not capable of processing video 2680 effectively at the full picture rate that is implied by the 2681 highest level and, when present, one or more of the 2682 parameters max-lsr, max-lps, and max-br. 2684 The value of max-fps is not necessarily the picture rate at 2685 which the maximum picture size can be sent, it constitutes 2686 a constraint on maximum picture rate for all resolutions. 2688 Informative note: The max-fps parameter is semantically 2689 different from max-lsr, max-lps, max-cpb, max-dpb, max- 2690 br, max-tr, and max-tc in that max-fps is used to signal 2691 a constraint, lowering the maximum picture rate from 2692 what is implied by other parameters. 2694 The encoder MUST use a picture rate equal to or less than 2695 this value. In cases where the max-fps parameter is absent 2696 the encoder is free to choose any picture rate according to 2697 the highest level and any signaled optional parameters. 2699 The value of max-fps MUST be smaller than or equal to the 2700 full picture rate that is implied by the highest level and, 2701 when present, one or more of the parameters max-lsr, max- 2702 lps, and max-br. 2704 sprop-max-don-diff: 2706 If tx-mode is equal to "SRST" and there is no NAL unit 2707 naluA that is followed in transmission order by any NAL 2708 unit preceding naluA in decoding order (i.e. the 2709 transmission order of the NAL units is the same as the 2710 decoding order), the value of this parameter MUST be equal 2711 to 0. 2713 Otherwise, if tx-mode is equal to "MRST" or "MRMT", the 2714 decoding order of the NAL units of all the RTP streams is 2715 the same as the NAL unit transmission order and the NAL 2716 unit output order, the value of this parameter MUST be 2717 equal to either 0 or 1. 2719 Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the 2720 decoding order of the NAL units of all the RTP streams is 2721 the same as the NAL unit transmission order but not the 2722 same as the NAL unit output order, the value of this 2723 parameter MUST be equal to 1. 2725 Otherwise, this parameter specifies the maximum absolute 2726 difference between the decoding order number (i.e., AbsDon) 2727 values of any two NAL units naluA and naluB, where naluA 2728 follows naluB in decoding order and precedes naluB in 2729 transmission order. 2731 The value of sprop-max-don-diff MUST be an integer in the 2732 range of 0 to 32767, inclusive. 2734 When not present, the value of sprop-max-don-diff is 2735 inferred to be equal to 0. 2737 sprop-depack-buf-nalus: 2739 This parameter specifies the maximum number of NAL units 2740 that precede a NAL unit in transmission order and follow 2741 the NAL unit in decoding order. 2743 The value of sprop-depack-buf-nalus MUST be an integer in 2744 the range of 0 to 32767, inclusive. 2746 When not present, the value of sprop-depack-buf-nalus is 2747 inferred to be equal to 0. 2749 When sprop-max-don-diff is present and greater than 0, this 2750 parameter MUST be present and the value MUST be greater 2751 than 0. 2753 sprop-depack-buf-bytes: 2755 This parameter signals the required size of the de- 2756 packetization buffer in units of bytes. The value of the 2757 parameter MUST be greater than or equal to the maximum 2758 buffer occupancy (in units of bytes) of the de- 2759 packetization buffer as specified in Section 6. 2761 The value of sprop-depack-buf-bytes MUST be an integer in 2762 the range of 0 to 4294967295, inclusive. 2764 When sprop-max-don-diff is present and greater than 0, this 2765 parameter MUST be present and the value MUST be greater 2766 than 0. When not present, the value of sprop-depack-buf- 2767 bytes is inferred to be equal to 0. 2769 Informative note: The value of sprop-depack-buf-bytes 2770 indicates the required size of the de-packetization 2771 buffer only. When network jitter can occur, an 2772 appropriately sized jitter buffer has to be available as 2773 well. 2775 depack-buf-cap: 2777 This parameter signals the capabilities of a receiver 2778 implementation and indicates the amount of de-packetization 2779 buffer space in units of bytes that the receiver has 2780 available for reconstructing the NAL unit decoding order 2781 from NAL units carried in one or more RTP streams. A 2782 receiver is able to handle any RTP stream, and all RTP 2783 streams the RTP stream depends on, when present, for which 2784 the value of the sprop-depack-buf-bytes parameter is 2785 smaller than or equal to this parameter. 2787 When not present, the value of depack-buf-cap is inferred 2788 to be equal to 4294967295. The value of depack-buf-cap 2789 MUST be an integer in the range of 1 to 4294967295, 2790 inclusive. 2792 Informative note: depack-buf-cap indicates the maximum 2793 possible size of the de-packetization buffer of the 2794 receiver only. When network jitter can occur, an 2795 appropriately sized jitter buffer has to be available as 2796 well. 2798 sprop-segmentation-id: 2800 This parameter MAY be used to signal the segmentation tools 2801 present in the bitstream and that can be used for 2802 parallelization. The value of sprop-segmentation-id MUST 2803 be an integer in the range of 0 to 3, inclusive. When not 2804 present, the value of sprop-segmentation-id is inferred to 2805 be equal to 0. 2807 When sprop-segmentation-id is equal to 0, no information 2808 about the segmentation tools is provided. When sprop- 2809 segmentation-id is equal to 1, it indicates that slices are 2810 present in the bitstream. When sprop-segmentation-id is 2811 equal to 2, it indicates that tiles are present in the 2812 bitstream. When sprop-segmentation-id is equal to 3, it 2813 indicates that WPP is used in the bitstream. 2815 sprop-spatial-segmentation-idc: 2817 A base16 [RFC4648] representation of the syntax element 2818 min_spatial_segmentation_idc as specified in [HEVC]. This 2819 parameter MAY be used to describe parallelization 2820 capabilities of the bitstream. 2822 dec-parallel-cap: 2824 This parameter MAY be used to indicate the decoder's 2825 additional decoding capabilities given the presence of 2826 tools enabling parallel decoding, such as slices, tiles, 2827 and WPP, in the bitstream. The decoding capability of the 2828 decoder may vary with the setting of the parallel decoding 2829 tools present in the bitstream, e.g. the size of the tiles 2830 that are present in a bitstream. Therefore, multiple 2831 capability points may be provided, each indicating the 2832 minimum required decoding capability that is associated 2833 with a parallelism requirement, which is a requirement on 2834 the bitstream that enables parallel decoding. 2836 Each capability point is defined as a combination of 1) a 2837 parallelism requirement, 2) a profile (determined by 2838 profile-space and profile-id), 3) a highest level, and 4) a 2839 maximum processing rate, a maximum picture size, and a 2840 maximum video bitrate that may be equal to or greater than 2841 that determined by the highest level. The parameter's 2842 syntax in ABNF [RFC5234] is as follows: 2844 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2845 cap-point) "}" 2847 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2848 cap-parameter) 2850 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2852 cap-parameter = tier-flag / level-id / max-lsr 2853 / max-lps / max-br 2855 tier-flag = "tier-flag" EQ ("0" / "1") 2857 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2859 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2860 18,446,744,073,709,551,615) 2862 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2863 max-br = "max-br" EQ 1*20DIGIT ; (0- 2864 18,446,744,073,709,551,615) 2866 EQ = "=" 2868 The set of capability points expressed by the dec-parallel- 2869 cap parameter is enclosed in a pair of curly braces ("{}"). 2870 Each set of two consecutive capability points is separated 2871 by a comma (','). Within each capability point, each set 2872 of two consecutive parameters, and when present, their 2873 values, is separated by a semicolon (';'). 2875 The profile of all capability points is determined by 2876 profile-space and profile-id that are outside the dec- 2877 parallel-cap parameter. 2879 Each capability point starts with an indication of the 2880 parallelism requirement, which consists of a parallel tool 2881 type, which may be equal to 'w' or 't', and a decimal value 2882 of the spatial-seg-idc parameter. When the type is 'w', 2883 the capability point is valid only for H.265 bitstreams 2884 with WPP in use, i.e. entropy_coding_sync_enabled_flag 2885 equal to 1. When the type is 't', the capability point is 2886 valid only for H.265 bitstreams with WPP not in use (i.e. 2887 entropy_coding_sync_enabled_flag equal to 0). The 2888 capability-point is valid only for H.265 bitstreams with 2889 min_spatial_segmentation_idc equal to or greater than 2890 spatial-seg-idc. 2892 After the parallelism requirement indication, each 2893 capability point continues with one or more pairs of 2894 parameter and value in any order for any of the following 2895 parameters: 2897 o tier-flag 2898 o level-id 2899 o max-lsr 2900 o max-lps 2901 o max-br 2903 At most one occurrence of each of the above five parameters 2904 is allowed within each capability point. 2906 The values of dec-parallel-cap.tier-flag and dec-parallel- 2907 cap.level-id for a capability point indicate the highest 2908 level of the capability point. The values of dec-parallel- 2909 cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- 2910 cap.max-br for a capability point indicate the maximum 2911 processing rate in units of luma samples per second, the 2912 maximum picture size in units of luma samples, and the 2913 maximum video bitrate (in units of CpbBrVclFactor bits per 2914 second for the VCL HRD parameters and in units of 2915 CpbBrNalFactor bits per second for the NAL HRD parameters 2916 where CpbBrVclFactor and CpbBrNalFactor are defined in 2917 Section A.4 of [HEVC]). 2919 When not present, the value of dec-parallel-cap.tier-flag 2920 is inferred to be equal to the value of tier-flag outside 2921 the dec-parallel-cap parameter. When not present, the 2922 value of dec-parallel-cap.level-id is inferred to be equal 2923 to the value of max-recv-level-id outside the dec-parallel- 2924 cap parameter. When not present, the value of dec- 2925 parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- 2926 parallel-cap.max-br is inferred to be equal to the value of 2927 max-lsr, max-lps, or max-br, respectively, outside the dec- 2928 parallel-cap parameter. 2930 The general decoding capability, expressed by the set of 2931 parameters outside of dec-parallel-cap, is defined as the 2932 capability point that is determined by the following 2933 combination of parameters: 1) the parallelism requirement 2934 corresponding to the value of sprop-segmentation-id equal 2935 to 0 for a bitstream, 2) the profile determined by profile- 2936 space, profile-id, profile-compatibility-indicator, and 2937 interop-constraints, 3) the tier and the highest level 2938 determined by tier-flag and max-recv-level-id, and 4) the 2939 maximum processing rate, the maximum picture size, and the 2940 maximum video bitrate determined by the highest level. The 2941 general decoding capability MUST NOT be included as one of 2942 the set of capability points in the dec-parallel-cap 2943 parameter. 2945 For example, the following parameters express the general 2946 decoding capability of 720p30 (Level 3.1) plus an 2947 additional decoding capability of 1080p30 (Level 4) given 2948 that the spatially largest tile or slice used in the 2949 bitstream is equal to or less than 1/3 of the picture size: 2951 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- 2952 id=120} 2954 For another example, the following parameters express an 2955 additional decoding capability of 1080p30, using dec- 2956 parallel-cap.max-lsr and dec-parallel-cap.max-lps, given 2957 that WPP is used in the bitstream: 2959 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2960 max-lsr=62668800;max-lps=2088960} 2962 Informative note: When min_spatial_segmentation_idc is 2963 present in a bitstream and WPP is not used, [HEVC] 2964 specifies that there is no slice or no tile in the 2965 bitstream containing more than 4 * PicSizeInSamplesY / 2966 ( min_spatial_segmentation_idc + 4 ) luma samples. 2968 include-dph: 2970 This parameter is used to indicate the capability and 2971 preference to utilize or include decoded picture hash (DPH) 2972 SEI messages (See Section D.3.19 of [HEVC]) in the 2973 bitstream. DPH SEI messages can be used to detect picture 2974 corruption so the receiver can request picture repair, see 2975 Section 8. The value is a comma separated list of hash 2976 types that is supported or requested to be used, each hash 2977 type provided as an unsigned integer value (0-255), with 2978 the hash types listed from most preferred to the least 2979 preferred. Example: "include-dph=0,2", which indicates the 2980 capability for MD5 (most preferred) and Checksum (less 2981 preferred). If the parameter is not included or the value 2982 contains no hash types, then no capability to utilize DPH 2983 SEI messages is assumed. Note that DPH SEI messages MAY 2984 still be included in the bitstream even when there is no 2985 declaration of capability to use them, as in general SEI 2986 messages do not affect the normative decoding process and 2987 decoders are allowed to ignore SEI messages. 2989 Encoding considerations: 2991 This type is only defined for transfer via RTP (RFC 3550). 2993 Security considerations: 2995 See Section 9 of RFC XXXX. 2997 Public specification: 2999 Please refer to Section 13 of RFC XXXX. 3001 Additional information: None 3003 File extensions: none 3005 Macintosh file type code: none 3007 Object identifier or OID: none 3009 Person & email address to contact for further information: 3011 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 3013 Intended usage: COMMON 3015 Author: See Section 14 of RFC XXXX. 3017 Change controller: 3019 IETF Audio/Video Transport Payloads working group delegated 3020 from the IESG. 3022 7.2 SDP Parameters 3024 The receiver MUST ignore any parameter unspecified in this memo. 3026 7.2.1 Mapping of Payload Type Parameters to SDP 3028 The media type video/H265 string is mapped to fields in the 3029 Session Description Protocol (SDP) [RFC4566] as follows: 3031 o The media name in the "m=" line of SDP MUST be video. 3033 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 3034 (the media subtype). 3036 o The clock rate in the "a=rtpmap" line MUST be 90000. 3038 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 3039 flag", "level-id", "interop-constraints", "profile- 3040 compatibility-indicator", "sprop-sub-layer-id", "recv-sub- 3041 layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- 3042 lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", 3043 "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", 3044 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop- 3045 segmentation-id", "sprop-spatial-segmentation-idc", "dec- 3046 parallel-cap", and "include-dph", when present, MUST be 3047 included in the "a=fmtp" line of SDP. This parameter is 3048 expressed as a media type string, in the form of a semicolon 3049 separated list of parameter=value pairs. 3051 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 3052 pps", when present, MUST be included in the "a=fmtp" line of 3053 SDP or conveyed using the "fmtp" source attribute as specified 3054 in Section 6.3 of [RFC5576]. For a particular media format 3055 (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- 3056 pps" MUST NOT be both included in the "a=fmtp" line of SDP and 3057 conveyed using the "fmtp" source attribute. When included in 3058 the "a=fmtp" line of SDP, these parameters are expressed as a 3059 media type string, in the form of a semicolon separated list 3060 of parameter=value pairs. When conveyed in the "a=fmtp" line 3061 of SDP for a particular payload type, the parameters "sprop- 3062 vps", "sprop-sps", and "sprop-pps" MUST be applied to each 3063 SSRC with the payload type. When conveyed using the "fmtp" 3064 source attribute, these parameters are only associated with 3065 the given source and payload type as parts of the "fmtp" 3066 source attribute. 3068 Informative note: Conveyance of "sprop-vps", "sprop-sps", 3069 and "sprop-pps" using the "fmtp" source attribute allows 3070 for out-of-band transport of parameter sets in topologies 3071 like Topo-Video-switch-MCU as specified in [RFC5117]. 3073 An example of media representation in SDP is as follows: 3075 m=video 49170 RTP/AVP 98 3076 a=rtpmap:98 H265/90000 3077 a=fmtp:98 profile-id=1; 3078 sprop-vps=