idnits 2.17.1 draft-ietf-payload-rtp-h265-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1615 has weird spacing: '... This memo ...' == Line 1620 has weird spacing: '... signal two ...' -- The document date (May 29, 2015) is 3254 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1752 == Unused Reference: 'RFC6190' is defined on line 3844, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-mmusic-sdp-bundle-negotiation' is defined on line 3881, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-05 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-02 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-02 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: November 2015 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 May 29, 2015 12 RTP Payload Format for High Efficiency Video Coding 13 draft-ietf-payload-rtp-h265-10.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization 22 of one or more Network Abstraction Layer (NAL) units in each RTP 23 packet payload, as well as fragmentation of a NAL unit into 24 multiple RTP packets. Furthermore, it supports transmission of 25 an HEVC bitstream over a single as well as multiple RTP streams. 26 When multiple RTP streams are used, a single or multiple 27 transports may be utilized. The payload format has wide 28 applicability in videoconferencing, Internet video streaming, and 29 high bit-rate entertainment-quality video, among others. 31 Status of this Memo 33 This Internet-Draft is submitted to IETF in full conformance with 34 the provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six 42 months and may be updated, replaced, or obsoleted by other 43 documents at any time. It is inappropriate to use Internet- 44 Drafts as reference material or to cite them other than as "work 45 in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on November 29, 2015. 55 Copyright and License Notice 57 Copyright (c) 2015 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with 65 respect to this document. Code Components extracted from this 66 document must include Simplified BSD License text as described in 67 Section 4.e of the Trust Legal Provisions and are provided 68 without warranty as described in the Simplified BSD License. 70 Table of Contents 72 Abstract..........................................................1 73 Status of this Memo...............................................1 74 Table of Contents.................................................3 75 1 Introduction....................................................5 76 1.1 Overview of the HEVC Codec.................................5 77 1.1.1 Coding-Tool Features..................................5 78 1.1.2 Systems and Transport Interfaces......................7 79 1.1.3 Parallel Processing Support..........................14 80 1.1.4 NAL Unit Header......................................16 81 1.2 Overview of the Payload Format............................18 82 2 Conventions....................................................19 83 3 Definitions and Abbreviations..................................19 84 3.1 Definitions...............................................19 85 3.1.1 Definitions from the HEVC Specification..............19 86 3.1.2 Definitions Specific to This Memo....................21 87 3.2 Abbreviations.............................................23 88 4 RTP Payload Format.............................................25 89 4.1 RTP Header Usage..........................................25 90 4.2 Payload Header Usage......................................27 91 4.3 Transmission Modes........................................28 92 4.4 Payload Structures........................................29 93 4.4.1 Single NAL Unit Packets..............................29 94 4.4.2 Aggregation Packets (APs)............................30 95 4.4.3 Fragmentation Units (FUs)............................35 96 4.4.4 PACI packets.........................................38 97 4.4.4.1 Reasons for the PACI rules (informative)........41 98 4.4.4.2 PACI extensions (Informative)...................42 99 4.5 Temporal Scalability Control Information..................43 100 4.6 Decoding Order Number.....................................45 101 5 Packetization Rules............................................47 102 6 De-packetization Process.......................................48 103 7 Payload Format Parameters......................................50 104 7.1 Media Type Registration...................................51 105 7.2 SDP Parameters............................................76 106 7.2.1 Mapping of Payload Type Parameters to SDP............77 107 7.2.2 Usage with SDP Offer/Answer Model....................78 108 7.2.3 Usage in Declarative Session Descriptions............87 109 7.2.4 Parameter Sets Considerations........................88 110 7.2.5 Dependency Signaling in Multi-Stream Mode............89 111 8 Use with Feedback Messages.....................................89 112 8.1 Picture Loss Indication (PLI).............................89 113 8.2 Slice Loss Indication (SLI)...............................90 114 8.3 Reference Picture Selection Indication (RPSI).............91 115 8.4 Full Intra Request (FIR)..................................92 116 9 Security Considerations........................................92 117 10 Congestion Control............................................93 118 11 IANA Consideration............................................95 119 12 Acknowledgements..............................................95 120 13 References....................................................95 121 13.1 Normative References.....................................95 122 13.2 Informative References...................................97 123 14 Authors' Addresses............................................99 125 1 Introduction 127 1.1 Overview of the HEVC Codec 129 High Efficiency Video Coding [HEVC], formally known as ITU-T 130 Recommendation H.265 and ISO/IEC International Standard 23008-2 131 was ratified by ITU-T in April 2013 and reportedly provides 132 significant coding efficiency gains over H.264 [H.264]. 134 As both H.264 [H.264] and its RTP payload format [RFC6184] are 135 widely deployed and generally known in the relevant implementer 136 communities, frequently only the differences between those two 137 specifications are highlighted in non-normative, explanatory 138 parts of this memo. Basic familiarity with both specifications 139 is assumed for those parts. However, the normative parts of this 140 memo do not require study of H.264 or its RTP payload format. 142 H.264 and HEVC share a similar hybrid video codec design. 143 Conceptually, both technologies include a video coding layer 144 (VCL), which is often used to refer to the coding-tool features, 145 and a network abstraction layer (NAL), which is often used to 146 refer to the systems and transport interface aspects of the 147 codecs. 149 1.1.1 Coding-Tool Features 151 Similarly to earlier hybrid-video-coding-based standards, 152 including H.264, the following basic video coding design is 153 employed by HEVC. A prediction signal is first formed either by 154 intra or motion compensated prediction, and the residual (the 155 difference between the original and the prediction) is then 156 coded. The gains in coding efficiency are achieved by 157 redesigning and improving almost all parts of the codec over 158 earlier designs. In addition, HEVC includes several tools to 159 make the implementation on parallel architectures easier. Below 160 is a summary of HEVC coding-tool features. 162 Quad-tree block and transform structure 164 One of the major tools that contribute significantly to the 165 coding efficiency of HEVC is the usage of flexible coding blocks 166 and transforms, which are defined in a hierarchical quad-tree 167 manner. Unlike H.264, where the basic coding block is a 168 macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit 169 (CTU) of a maximum size of 64x64. Each CTU can be divided into 170 smaller units in a hierarchical quad-tree manner and can 171 represent smaller blocks down to size 4x4. Similarly, the 172 transforms used in HEVC can have different sizes, starting from 173 4x4 and going up to 32x32. Utilizing large blocks and transforms 174 contribute to the major gain of HEVC, especially at high 175 resolutions. 177 Entropy coding 179 HEVC uses a single entropy coding engine, which is based on 180 Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], 181 whereas H.264 uses two distinct entropy coding engines. CABAC in 182 HEVC shares many similarities with CABAC of H.264, but contains 183 several improvements. Those include improvements in coding 184 efficiency and lowered implementation complexity, especially for 185 parallel architectures. 187 In-loop filtering 189 H.264 includes an in-loop adaptive deblocking filter, where the 190 blocking artifacts around the transform edges in the 191 reconstructed picture are smoothed to improve the picture quality 192 and compression efficiency. In HEVC, a similar deblocking filter 193 is employed but with somewhat lower complexity. In addition, 194 pictures undergo a subsequent filtering operation called Sample 195 Adaptive Offset (SAO), which is a new design element in HEVC. 196 SAO basically adds a pixel-level offset in an adaptive manner and 197 usually acts as a de-ringing filter. It is observed that SAO 198 improves the picture quality, especially around sharp edges 199 contributing substantially to visual quality improvements of 200 HEVC. 202 Motion prediction and coding 204 There have been a number of improvements in this area that are 205 summarized as follows. The first category is motion merge and 206 advanced motion vector prediction (AMVP) modes. The motion 207 information of a prediction block can be inferred from the 208 spatially or temporally neighboring blocks. This is similar to 209 the DIRECT mode in H.264 but includes new aspects to incorporate 210 the flexible quad-tree structure and methods to improve the 211 parallel implementations. In addition, the motion vector 212 predictor can be signaled for improved efficiency. The second 213 category is high-precision interpolation. The interpolation 214 filter length is increased to 8-tap from 6-tap, which improves 215 the coding efficiency but also comes with increased complexity. 216 In addition, the interpolation filter is defined with higher 217 precision without any intermediate rounding operations to further 218 improve the coding efficiency. 220 Intra prediction and intra coding 222 Compared to 8 intra prediction modes in H.264, HEVC supports 223 angular intra prediction with 33 directions. This increased 224 flexibility improves both objective coding efficiency and visual 225 quality as the edges can be better predicted and ringing 226 artifacts around the edges can be reduced. In addition, the 227 reference samples are adaptively smoothed based on the prediction 228 direction. To avoid contouring artifacts a new interpolative 229 prediction generation is included to improve the visual quality. 230 Furthermore, discrete sine transform (DST) is utilized instead of 231 traditional discrete cosine transform (DCT) for 4x4 intra 232 transform blocks. 234 Other coding-tool features 236 HEVC includes some tools for lossless coding and efficient screen 237 content coding, such as skipping the transform for certain 238 blocks. These tools are particularly useful for example when 239 streaming the user-interface of a mobile device to a large 240 display. 242 1.1.2 Systems and Transport Interfaces 244 HEVC inherited the basic systems and transport interfaces 245 designs, such as the NAL-unit-based syntax structure, the 246 hierarchical syntax and data unit structure from sequence-level 247 parameter sets, multi-picture-level or picture-level parameter 248 sets, slice-level header parameters, lower-level parameters, the 249 supplemental enhancement information (SEI) message mechanism, the 250 hypothetical reference decoder (HRD) based video buffering model, 251 and so on. In the following, a list of differences in these 252 aspects compared to H.264 is summarized. 254 Video parameter set 256 A new type of parameter set, called video parameter set (VPS), 257 was introduced. For the first (2013) version of [HEVC], the 258 video parameter set NAL unit is required to be available prior to 259 its activation, while the information contained in the video 260 parameter set is not necessary for operation of the decoding 261 process. For future HEVC extensions, such as the 3D or scalable 262 extensions, the video parameter set is expected to include 263 information necessary for operation of the decoding process, e.g. 264 decoding dependency or information for reference picture set 265 construction of enhancement layers. The VPS provides a "big 266 picture" of a bitstream, including what types of operation points 267 are provided, the profile, tier, and level of the operation 268 points, and some other high-level properties of the bitstream 269 that can be used as the basis for session negotiation and content 270 selection, etc. (see Section 7.1). 272 Profile, tier and level 274 The profile, tier and level syntax structure that can be included 275 in both VPS and sequence parameter set (SPS) includes 12 bytes of 276 data to describe the entire bitstream (including all temporally 277 scalable layers, which are referred to as sub-layers in the HEVC 278 specification), and can optionally include more profile, tier and 279 level information pertaining to individual temporally scalable 280 layers. The profile indicator indicates the "best viewed as" 281 profile when the bitstream conforms to multiple profiles, similar 282 to the major brand concept in the ISO base media file format 283 (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, 284 such as the 3GPP file format [3GPPFF]. The profile, tier and 285 level syntax structure also includes indications such as 1) 286 whether the bitstream is free of frame-packed content, 2) whether 287 the bitstream is free of interlaced source content, and 3) 288 whether the bitstream is free of field pictures. When the answer 289 is yes for both 2) and 3), the bitstream contains only frame 290 pictures of progressive source. Based on these indications, 291 clients/players without support of post-processing 292 functionalities for handling of frame-packed, interlaced source 293 content or field pictures can reject those bitstreams that 294 contain such pictures. 296 Bitstream and elementary stream 298 HEVC includes a definition of an elementary stream, which is new 299 compared to H.264. An elementary stream consists of a sequence 300 of one or more bitstreams. An elementary stream that consists of 301 two or more bitstreams has typically been formed by splicing 302 together two or more bitstreams (or parts thereof). When an 303 elementary stream contains more than one bitstream, the last NAL 304 unit of the last access unit of a bitstream (except the last 305 bitstream in the elementary stream) must contain an end of 306 bitstream NAL unit and the first access unit of the subsequent 307 bitstream must be an intra random access point (IRAP) access 308 unit. This IRAP access unit may be a clean random access (CRA), 309 broken link access (BLA), or instantaneous decoding refresh (IDR) 310 access unit. 312 Random access support 314 HEVC includes signaling in the NAL unit header, through NAL unit 315 types, of IRAP pictures beyond IDR pictures. Three types of IRAP 316 pictures, namely IDR, CRA and BLA pictures are supported, wherein 317 IDR pictures are conventionally referred to as closed group-of- 318 pictures (closed-GOP) random access points, and CRA and BLA 319 pictures are those conventionally referred to as open-GOP random 320 access points. BLA pictures usually originate from splicing of 321 two bitstreams or part thereof at a CRA picture, e.g. during 322 stream switching. To enable better systems usage of IRAP 323 pictures, altogether six different NAL units are defined to 324 signal the properties of the IRAP pictures, which can be used to 325 better match the stream access point (SAP) types as defined in 326 the ISOBMFF [ISOBMFF], which are utilized for random access 327 support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. 328 Pictures following an IRAP picture in decoding order and 329 preceding the IRAP picture in output order are referred to as 330 leading pictures associated with the IRAP picture. There are two 331 types of leading pictures, namely random access decodable leading 332 (RADL) pictures and random access skipped leading (RASL) 333 pictures. RADL pictures are decodable when the decoding started 334 at the associated IRAP picture, and RASL pictures are not 335 decodable when the decoding started at the associated IRAP 336 picture and are usually discarded. HEVC provides mechanisms to 337 enable the specification of conformance of bitstreams with RASL 338 pictures being discarded, thus to provide a standard-compliant 339 way to enable systems components to discard RASL pictures when 340 needed. 342 Temporal scalability support 344 HEVC includes an improved support of temporal scalability, by 345 inclusion of the signaling of TemporalId in the NAL unit header, 346 the restriction that pictures of a particular temporal sub-layer 347 cannot be used for inter prediction reference by pictures of a 348 lower temporal sub-layer, the sub-bitstream extraction process, 349 and the requirement that each sub-bitstream extraction output be 350 a conforming bitstream. Media-aware network elements (MANEs) can 351 utilize the TemporalId in the NAL unit header for stream 352 adaptation purposes based on temporal scalability. 354 Temporal sub-layer switching support 356 HEVC specifies, through NAL unit types present in the NAL unit 357 header, the signaling of temporal sub-layer access (TSA) and 358 stepwise temporal sub-layer access (STSA). A TSA picture and 359 pictures following the TSA picture in decoding order do not use 360 pictures prior to the TSA picture in decoding order with 361 TemporalId greater than or equal to that of the TSA picture for 362 inter prediction reference. A TSA picture enables up-switching, 363 at the TSA picture, to the sub-layer containing the TSA picture 364 or any higher sub-layer, from the immediately lower sub-layer. 365 An STSA picture does not use pictures with the same TemporalId as 366 the STSA picture for inter prediction reference. Pictures 367 following an STSA picture in decoding order with the same 368 TemporalId as the STSA picture do not use pictures prior to the 369 STSA picture in decoding order with the same TemporalId as the 370 STSA picture for inter prediction reference. An STSA picture 371 enables up-switching, at the STSA picture, to the sub-layer 372 containing the STSA picture, from the immediately lower sub- 373 layer. 375 Sub-layer reference or non-reference pictures 377 The concept and signaling of reference/non-reference pictures in 378 HEVC are different from H.264. In H.264, if a picture may be 379 used by any other picture for inter prediction reference, it is a 380 reference picture; otherwise it is a non-reference picture, and 381 this is signaled by two bits in the NAL unit header. In HEVC, a 382 picture is called a reference picture only when it is marked as 383 "used for reference". In addition, the concept of sub-layer 384 reference picture was introduced. If a picture may be used by 385 another other picture with the same TemporalId for inter 386 prediction reference, it is a sub-layer reference picture; 387 otherwise it is a sub-layer non-reference picture. Whether a 388 picture is a sub-layer reference picture or sub-layer non- 389 reference picture is signaled through NAL unit type values. 391 Extensibility 393 Besides the TemporalId in the NAL unit header, HEVC also includes 394 the signaling of a six-bit layer ID in the NAL unit header, which 395 must be equal to 0 for a single-layer bitstream. Extension 396 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, 397 slice headers, and so on. All these extension mechanisms enable 398 future extensions in a backward compatible manner, such that 399 bitstreams encoded according to potential future HEVC extensions 400 can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) 401 and the then-legacy decoders can decode and output the base layer 402 bitstream. 404 Bitstream extraction 406 HEVC includes a bitstream extraction process as an integral part 407 of the overall decoding process, as well as specification of the 408 use of the bitstream extraction process in description of 409 bitstream conformance tests as part of the hypothetical reference 410 decoder (HRD) specification. 412 Reference picture management 414 The reference picture management of HEVC, including reference 415 picture marking and removal from the decoded picture buffer (DPB) 416 as well as reference picture list construction (RPLC), differs 417 from that of H.264. Instead of the sliding window plus adaptive 418 memory management control operation (MMCO) based reference 419 picture marking mechanism in H.264, HEVC specifies a reference 420 picture set (RPS) based reference picture management and marking 421 mechanism, and the RPLC is consequently based on the RPS 422 mechanism. A reference picture set consists of a set of 423 reference pictures associated with a picture, consisting of all 424 reference pictures that are prior to the associated picture in 425 decoding order, that may be used for inter prediction of the 426 associated picture or any picture following the associated 427 picture in decoding order. The reference picture set consists of 428 five lists of reference pictures; RefPicSetStCurrBefore, 429 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 430 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 431 RefPicSetLtCurr contain all reference pictures that may be used 432 in inter prediction of the current picture and that may be used 433 in inter prediction of one or more of the pictures following the 434 current picture in decoding order. RefPicSetStFoll and 435 RefPicSetLtFoll consist of all reference pictures that are not 436 used in inter prediction of the current picture but may be used 437 in inter prediction of one or more of the pictures following the 438 current picture in decoding order. RPS provides an "intra-coded" 439 signaling of the DPB status, instead of an "inter-coded" 440 signaling, mainly for improved error resilience. The RPLC 441 process in HEVC is based on the RPS, by signaling an index to an 442 RPS subset for each reference index; this process is simpler than 443 the RPLC process in H.264. 445 Ultra low delay support 447 HEVC specifies a sub-picture-level HRD operation, for support of 448 the so-called ultra-low delay. The mechanism specifies a 449 standard-compliant way to enable delay reduction below one 450 picture interval. Sub-picture-level coded picture buffer (CPB) 451 and DPB parameters may be signaled, and utilization of these 452 information for the derivation of CPB timing (wherein the CPB 453 removal time corresponds to decoding time) and DPB output timing 454 (display time) is specified. Decoders are allowed to operate the 455 HRD at the conventional access-unit-level, even when the sub- 456 picture-level HRD parameters are present. 458 New SEI messages 460 HEVC inherits many H.264 SEI messages with changes in syntax 461 and/or semantics making them applicable to HEVC. Additionally, 462 there are a few new SEI messages reviewed briefly in the 463 following paragraphs. 465 The display orientation SEI message informs the decoder of a 466 transformation that is recommended to be applied to the cropped 467 decoded picture prior to display, such that the pictures can be 468 properly displayed, e.g. in an upside-up manner. 470 The structure of pictures SEI message provides information on the 471 NAL unit types, picture order count values, and prediction 472 dependencies of a sequence of pictures. The SEI message can be 473 used for example for concluding what impact a lost picture has on 474 other pictures. 476 The decoded picture hash SEI message provides a checksum derived 477 from the sample values of a decoded picture. It can be used for 478 detecting whether a picture was correctly received and decoded. 480 The active parameter sets SEI message includes the IDs of the 481 active video parameter set and the active sequence parameter set 482 and can be used to activate VPSs and SPSs. In addition, the SEI 483 message includes the following indications: 1) An indication of 484 whether "full random accessibility" is supported (when supported, 485 all parameter sets needed for decoding of the remaining of the 486 bitstream when random accessing from the beginning of the current 487 CVS by completely discarding all access units earlier in decoding 488 order are present in the remaining bitstream and all coded 489 pictures in the remaining bitstream can be correctly decoded); 2) 490 An indication of whether there is no parameter set within the 491 current CVS that updates another parameter set of the same type 492 preceding in decoding order. An update of a parameter set refers 493 to the use of the same parameter set ID but with some other 494 parameters changed. If this property is true for all CVSs in the 495 bitstream, then all parameter sets can be sent out-of-band before 496 session start. 498 The decoding unit information SEI message provides coded picture 499 buffer removal delay information for a decoding unit. The 500 message can be used in very-low-delay buffering operations. 502 The region refresh information SEI message can be used together 503 with the recovery point SEI message (present in both H.264 and 504 HEVC) for improved support of gradual decoding refresh. This 505 supports random access from inter-coded pictures, wherein 506 complete pictures can be correctly decoded or recovered after an 507 indicated number of pictures in output/display order. 509 1.1.3 Parallel Processing Support 511 The reportedly significantly higher encoding computational demand 512 of HEVC over H.264, in conjunction with the ever increasing video 513 resolution (both spatially and temporally) required by the 514 market, led to the adoption of VCL coding tools specifically 515 targeted to allow for parallelization on the sub-picture level. 516 That is, parallelization occurs, at the minimum, at the 517 granularity of an integer number of CTUs. The targets for this 518 type of high-level parallelization are multicore CPUs and DSPs as 519 well as multiprocessor systems. In a system design, to be 520 useful, these tools require signaling support, which is provided 521 in Section 7 of this memo. This section provides a brief 522 overview of the tools available in [HEVC]. 524 Many of the tools incorporated in HEVC were designed keeping in 525 mind the potential parallel implementations in multi-core/multi- 526 processor architectures. Specifically, for parallelization, four 527 picture partition strategies are available. 529 Slices are segments of the bitstream that can be reconstructed 530 independently from other slices within the same picture (though 531 there may still be interdependencies through loop filtering 532 operations). Slices are the only tool that can be used for 533 parallelization that is also available, in virtually identical 534 form, in H.264. Slices based parallelization does not require 535 much inter-processor or inter-core communication (except for 536 inter-processor or inter-core data sharing for motion 537 compensation when decoding a predictively coded picture, which is 538 typically much heavier than inter-processor or inter-core data 539 sharing due to in-picture prediction), as slices are designed to 540 be independently decodable. However, for the same reason, slices 541 can require some coding overhead. Further, slices (in contrast 542 to some of the other tools mentioned below) also serve as the key 543 mechanism for bitstream partitioning to match Maximum Transfer 544 Unit (MTU) size requirements, due to the in-picture independence 545 of slices and the fact that each regular slice is encapsulated in 546 its own NAL unit. In many cases, the goal of parallelization and 547 the goal of MTU size matching can place contradicting demands to 548 the slice layout in a picture. The realization of this situation 549 led to the development of the more advanced tools mentioned 550 below. 552 Dependent slice segments allow for fragmentation of a coded slice 553 into fragments at CTU boundaries without breaking any in-picture 554 prediction mechanism. They are complementary to the 555 fragmentation mechanism described in this memo in that they need 556 the cooperation of the encoder. As a dependent slice segment 557 necessarily contains an integer number of CTUs, a decoder using 558 multiple cores operating on CTUs can process a dependent slice 559 segment without communicating parts of the slice segment's 560 bitstream to other cores. Fragmentation, as specified in this 561 memo, in contrast, does not guarantee that a fragment contains an 562 integer number of CTUs. 564 In wavefront parallel processing (WPP), the picture is 565 partitioned into rows of CTUs. Entropy decoding and prediction 566 are allowed to use data from CTUs in other partitions. Parallel 567 processing is possible through parallel decoding of CTU rows, 568 where the start of the decoding of a row is delayed by two CTUs, 569 so to ensure that data related to a CTU above and to the right of 570 the subject CTU is available before the subject CTU is being 571 decoded. Using this staggered start (which appears like a 572 wavefront when represented graphically), parallelization is 573 possible with up to as many processors/cores as the picture 574 contains CTU rows. 576 Because in-picture prediction between neighboring CTU rows within 577 a picture is allowed, the required inter-processor/inter-core 578 communication to enable in-picture prediction can be substantial. 579 The WPP partitioning does not result in the creation of more NAL 580 units compared to when it is not applied, thus WPP cannot be used 581 for MTU size matching, though slices can be used in combination 582 for that purpose. 584 Tiles define horizontal and vertical boundaries that partition a 585 picture into tile columns and rows. The scan order of CTUs is 586 changed to be local within a tile (in the order of a CTU raster 587 scan of a tile), before decoding the top-left CTU of the next 588 tile in the order of tile raster scan of a picture. Similar to 589 slices, tiles break in-picture prediction dependencies (including 590 entropy decoding dependencies). However, they do not need to be 591 included into individual NAL units (same as WPP in this regard), 592 hence tiles cannot be used for MTU size matching, though slices 593 can be used in combination for that purpose. Each tile can be 594 processed by one processor/core, and the inter-processor/inter- 595 core communication required for in-picture prediction between 596 processing units decoding neighboring tiles is limited to 597 conveying the shared slice header in cases a slice is spanning 598 more than one tile, and loop filtering related sharing of 599 reconstructed samples and metadata. Insofar, tiles are less 600 demanding in terms of inter-processor communication bandwidth 601 compared to WPP due to the in-picture independence between two 602 neighboring partitions. 604 1.1.4 NAL Unit Header 606 HEVC maintains the NAL unit concept of H.264 with modifications. 607 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 608 payload of a NAL unit refers to the NAL unit excluding the NAL 609 unit header. 611 +---------------+---------------+ 612 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 614 |F| Type | LayerId | TID | 615 +-------------+-----------------+ 617 Figure 1 The structure of HEVC NAL unit header 619 The semantics of the fields in the NAL unit header are as 620 specified in [HEVC] and described briefly below for convenience. 621 In addition to the name and size of each field, the corresponding 622 syntax element name in [HEVC] is also provided. 624 F: 1 bit 625 forbidden_zero_bit. Required to be zero in [HEVC]. Note that 626 the inclusion of this bit in the NAL unit header was to enable 627 transport of HEVC video over MPEG-2 transport systems 628 (avoidance of start code emulations) [MPEG2S]. In the context 629 of this memo, the value 1 MAY be used to indicate a syntax 630 violation. For example, when RTP is transported over UDP-lite 631 [RFC3828] and the receiver decides to feed the video decoder 632 NAL unit(s) where the corresponding UDP-lite packet failed a 633 checksum test, then this bit can be set to alarm the decoder 634 of possible bit errors. 636 Type: 6 bits 637 nal_unit_type. This field specifies the NAL unit type as 638 defined in Table 7-1 of [HEVC]. If the most significant bit 639 of this field of a NAL unit is equal to 0 (i.e. the value of 640 this field is less than 32), the NAL unit is a VCL NAL unit. 641 Otherwise, the NAL unit is a non-VCL NAL unit. For a 642 reference of all currently defined NAL unit types and their 643 semantics, please refer to Section 7.4.1 in [HEVC]. 645 LayerId: 6 bits 646 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 647 anticipated that in future scalable or 3D video coding 648 extensions of this specification, this syntax element will be 649 used to identify additional layers that may be present in the 650 CVS, wherein a layer may be, e.g. a spatial scalable layer, a 651 quality scalable layer, a texture view, or a depth view. 653 TID: 3 bits 654 nuh_temporal_id_plus1. This field specifies the temporal 655 identifier of the NAL unit plus 1. The value of TemporalId is 656 equal to TID minus 1. A TID value of 0 is illegal to ensure 657 that there is at least one bit in the NAL unit header equal to 658 1, so to enable independent considerations of start code 659 emulations in the NAL unit header and in the NAL unit payload 660 data. 662 1.2 Overview of the Payload Format 664 This payload format defines the following processes required for 665 transport of HEVC coded data over RTP [RFC3550]: 667 o Usage of RTP header with this payload format 669 o Packetization of HEVC coded NAL units into RTP packets using 670 three types of payload structures, namely single NAL unit 671 packet, aggregation packet, and fragment unit 673 o Transmission of HEVC NAL units of the same bitstream within a 674 single RTP stream or multiple RTP streams (within one or more 675 RTP sessions), where within an RTP stream transmission of NAL 676 units may be either non-interleaved (i.e. the transmission 677 order of NAL units is the same as their decoding order) or 678 interleaved (i.e. the transmission order of NAL units is 679 different from their decoding order) 681 o Media type parameters to be used with the Session Description 682 Protocol (SDP) [RFC4566] 684 o A payload header extension mechanism and data structures for 685 enhanced support of temporal scalability based on that 686 extension mechanism. 688 2 Conventions 690 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 691 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 692 "OPTIONAL" in this document are to be interpreted as described in 693 BCP 14, RFC 2119 [RFC2119]. 695 In this document, these key words will appear with that 696 interpretation only when in ALL CAPS. Lower case uses of these 697 words are not to be interpreted as carrying the RFC 2119 698 significance. 700 This specification uses the notion of setting and clearing a bit 701 when bit fields are handled. Setting a bit is the same as 702 assigning that bit the value of 1 (On). Clearing a bit is the 703 same as assigning that bit the value of 0 (Off). 705 3 Definitions and Abbreviations 707 3.1 Definitions 709 This document uses the terms and definitions of [HEVC]. Section 710 3.1.1 lists relevant definitions copied from [HEVC] for 711 convenience. Section 3.1.2 provides definitions specific to this 712 memo. 714 3.1.1 Definitions from the HEVC Specification 716 access unit: A set of NAL units that are associated with each 717 other according to a specified classification rule, are 718 consecutive in decoding order, and contain exactly one coded 719 picture. 721 BLA access unit: An access unit in which the coded picture is a 722 BLA picture. 724 BLA picture: An IRAP picture for which each VCL NAL unit has 725 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 727 coded video sequence (CVS): A sequence of access units that 728 consists, in decoding order, of an IRAP access unit with 729 NoRaslOutputFlag equal to 1, followed by zero or more access 730 units that are not IRAP access units with NoRaslOutputFlag equal 731 to 1, including all subsequent access units up to but not 732 including any subsequent access unit that is an IRAP access unit 733 with NoRaslOutputFlag equal to 1. 735 Informative note: An IRAP access unit may be an IDR access 736 unit, a BLA access unit, or a CRA access unit. The value of 737 NoRaslOutputFlag is equal to 1 for each IDR access unit, each 738 BLA access unit, and each CRA access unit that is the first 739 access unit in the bitstream in decoding order, is the first 740 access unit that follows an end of sequence NAL unit in 741 decoding order, or has HandleCraAsBlaFlag equal to 1. 743 CRA access unit: An access unit in which the coded picture is a 744 CRA picture. 746 CRA picture: A RAP picture for which each VCL NAL unit has 747 nal_unit_type equal to CRA_NUT. 749 IDR access unit: An access unit in which the coded picture is an 750 IDR picture. 752 IDR picture: A RAP picture for which each VCL NAL unit has 753 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 755 IRAP access unit: An access unit in which the coded picture is an 756 IRAP picture. 758 IRAP picture: A coded picture for which each VCL NAL unit has 759 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 760 (23), inclusive. 762 layer: A set of VCL NAL units that all have a particular value of 763 nuh_layer_id and the associated non-VCL NAL units, or one of a 764 set of syntactical structures having a hierarchical relationship. 766 operation point: bitstream created from another bitstream by 767 operation of the sub-bitstream extraction process with the 768 another bitstream, a target highest TemporalId, and a target 769 layer identifier list as inputs. 771 random access: The act of starting the decoding process for a 772 bitstream at a point other than the beginning of the bitstream. 774 sub-layer: A temporal scalable layer of a temporal scalable 775 bitstream consisting of VCL NAL units with a particular value of 776 the TemporalId variable, and the associated non-VCL NAL units. 778 sub-layer representation: A subset of the bitstream consisting of 779 NAL units of a particular sub-layer and the lower sub-layers. 781 tile: A rectangular region of coding tree blocks within a 782 particular tile column and a particular tile row in a picture. 784 tile column: A rectangular region of coding tree blocks having a 785 height equal to the height of the picture and a width specified 786 by syntax elements in the picture parameter set. 788 tile row: A rectangular region of coding tree blocks having a 789 height specified by syntax elements in the picture parameter set 790 and a width equal to the width of the picture. 792 3.1.2 Definitions Specific to This Memo 794 dependee RTP stream: An RTP stream on which another RTP stream 795 depends. All RTP streams in an MRST or MRMT except for the 796 highest RTP stream are dependee RTP streams. 798 highest RTP stream: The RTP stream on which no other RTP stream 799 depends. The RTP stream in an SRST is the highest RTP stream. 801 media aware network element (MANE): A network element, such as a 802 middlebox, selective forwarding unit, or application layer 803 gateway that is capable of parsing certain aspects of the RTP 804 payload headers or the RTP payload and reacting to their 805 contents. 807 Informative note: The concept of a MANE goes beyond normal 808 routers or gateways in that a MANE has to be aware of the 809 signaling (e.g. to learn about the payload type mappings of 810 the media streams), and in that it has to be trusted when 811 working with SRTP. The advantage of using MANEs is that they 812 allow packets to be dropped according to the needs of the 813 media coding. For example, if a MANE has to drop packets due 814 to congestion on a certain link, it can identify and remove 815 those packets whose elimination produces the least adverse 816 effect on the user experience. After dropping packets, MANEs 817 must rewrite RTCP packets to match the changes to the RTP 818 stream as specified in Section 7 of [RFC3550]. 820 Media Transport: As used in the MRST, MRMT, and SRST definitions 821 below, Media Transport denotes the transport of packets over a 822 transport association identified by a 5-tuple (source address, 823 source port, destination address, destination port, transport 824 protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- 825 grouping-taxonomy]. 827 Informative note: The term "bitstream" in this document is 828 equivalent to the term "encoded stream" in [I-D.ietf-avtext- 829 rtp-grouping-taxonomy]. 831 Multiple RTP streams on a Single Transport (MRST): Multiple RTP 832 streams carrying a single HEVC bitstream on a Single Transport. 833 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 835 Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP 836 streams carrying a single HEVC bitstream on Multiple Transports. 837 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 839 NAL unit decoding order: A NAL unit order that conforms to the 840 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 842 NAL unit output order: A NAL unit order in which NAL units of 843 different access units are in the output order of the decoded 844 pictures corresponding to the access units, as specified in 845 [HEVC], and in which NAL units within an access unit are in their 846 decoding order. 848 NAL-unit-like structure: A data structure that is similar to NAL 849 units in the sense that it also has a NAL unit header and a 850 payload, with a difference that the payload does not follow the 851 start code emulation prevention mechanism required for the NAL 852 unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples 853 NAL-unit-like structures defined in this memo are packet payloads 854 of AP, PACI, and FU packets. 856 NALU-time: The value that the RTP timestamp would have if the NAL 857 unit would be transported in its own RTP packet. 859 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within 860 the scope of this memo, one RTP stream is utilized to transport 861 one or more temporal sub-layers. 863 Single RTP stream on a Single Transport (SRST): Single RTP 864 stream carrying a single HEVC bitstream on a Single (Media) 865 Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp- 866 grouping-taxonomy]. 868 transmission order: The order of packets in ascending RTP 869 sequence number order (in modulo arithmetic). Within an 870 aggregation packet, the NAL unit transmission order is the same 871 as the order of appearance of NAL units in the packet. 873 3.2 Abbreviations 875 AP Aggregation Packet 877 BLA Broken Link Access 879 CRA Clean Random Access 881 CTB Coding Tree Block 883 CTU Coding Tree Unit 885 CVS Coded Video Sequence 887 DPH Decoded Picture Hash 889 FU Fragmentation Unit 891 HRD Hypothetical Reference Decoder 893 IDR Instantaneous Decoding Refresh 894 IRAP Intra Random Access Point 896 MANE Media Aware Network Element 898 MRMT Multiple RTP streams on Multiple Transports 900 MRST Multiple RTP streams on a Single Transport 902 MTU Maximum Transfer Unit 904 NAL Network Abstraction Layer 906 NALU Network Abstraction Layer Unit 908 PACI PAyload Content Information 910 PHES Payload Header Extension Structure 912 PPS Picture Parameter Set 914 RADL Random Access Decodable Leading (Picture) 916 RASL Random Access Skipped Leading (Picture) 918 RPS Reference Picture Set 920 SEI Supplemental Enhancement Information 922 SPS Sequence Parameter Set 924 SRST Single RTP stream on a Single Transport 926 STSA Step-wise Temporal Sub-layer Access 928 TSA Temporal Sub-layer Access 930 TSCI Temporal Scalability Control Information 932 VCL Video Coding Layer 934 VPS Video Parameter Set 936 4 RTP Payload Format 938 4.1 RTP Header Usage 940 The format of the RTP header is specified in [RFC3550] and 941 reprinted in Figure 2 for convenience. This payload format uses 942 the fields of the header in a manner consistent with that 943 specification. 945 The RTP payload (and the settings for some RTP header bits) for 946 aggregation packets and fragmentation units are specified in 947 Sections 4.4.2 and 4.4.3, respectively. 949 0 1 2 3 950 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 951 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 952 |V=2|P|X| CC |M| PT | sequence number | 953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 954 | timestamp | 955 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 956 | synchronization source (SSRC) identifier | 957 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 958 | contributing source (CSRC) identifiers | 959 | .... | 960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 962 Figure 2 RTP header according to [RFC3550] 964 The RTP header information to be set according to this RTP 965 payload format is set as follows: 967 Marker bit (M): 1 bit 969 Set for the last packet carried in the current RTP stream of 970 the access unit. This is in line with the normal use of the M 971 bit in video formats to allow an efficient playout buffer 972 handling. When MRST or MRMT is in use, if an access unit 973 appears in multiple RTP streams, the marker bit is set on each 974 RTP stream's last packet of the access unit. 976 Informative note: The content of a NAL unit does not tell 977 whether or not the NAL unit is the last NAL unit, in 978 decoding order, of an access unit. An RTP sender 979 implementation may obtain these information from the video 980 encoder. If, however, the implementation cannot obtain 981 these information directly from the encoder, e.g. when the 982 bitstream was pre-encoded, and also there is no timestamp 983 allocated for each NAL unit, then the sender implementation 984 can inspect subsequent NAL units in decoding order to 985 determine whether or not the NAL unit is the last NAL unit 986 of an access unit as follows. A NAL unit is determined to 987 be the last NAL unit of an access unit if it is the last 988 NAL unit of the bitstream. A NAL unit naluX is also 989 determined to be the last NAL unit of an access unit if 990 both the following conditions are true: 1) the next VCL NAL 991 unit naluY in decoding order has the high-order bit of the 992 first byte after its NAL unit header equal to 1, and 2) all 993 NAL units between naluX and naluY, when present, have 994 nal_unit_type in the range of 32 to 35, inclusive, equal to 995 39, or in the ranges of 41 to 44, inclusive, or 48 to 55, 996 inclusive. 998 Payload type (PT): 7 bits 1000 The assignment of an RTP payload type for this new packet 1001 format is outside the scope of this document and will not be 1002 specified here. The assignment of a payload type has to be 1003 performed either through the profile used or in a dynamic way. 1005 Informative note: It is not required to use different 1006 payload type values for different RTP streams in MRST or 1007 MRMT. 1009 Sequence number (SN): 16 bits 1011 Set and used in accordance with RFC 3550 [RFC3550]. 1013 Timestamp: 32 bits 1015 The RTP timestamp is set to the sampling timestamp of the 1016 content. A 90 kHz clock rate MUST be used. 1018 If the NAL unit has no timing properties of its own (e.g. 1019 parameter set and SEI NAL units), the RTP timestamp MUST be 1020 set to the RTP timestamp of the coded picture of the access 1021 unit in which the NAL unit (according to Section 7.4.2.4.4 of 1022 [HEVC]) is included. 1024 Receivers MUST use the RTP timestamp for the display process, 1025 even when the bitstream contains picture timing SEI messages 1026 or decoding unit information SEI messages as specified in 1027 [HEVC]. However, this does not mean that picture timing SEI 1028 messages in the bitstream should be discarded, as picture 1029 timing SEI messages may contain frame-field information that 1030 is important in appropriately rendering interlaced video. 1032 Synchronization source (SSRC): 32-bits 1034 Used to identify the source of the RTP packets. When using 1035 SRST, by definition a single SSRC is used for all parts of a 1036 single bitstream. In MRST or MRMT, different SSRCs are used 1037 for each RTP stream containing a subset of the sub-layers of 1038 the single (temporally scalable) bitstream. A receiver is 1039 required to correctly associate the set of SSRCs that are 1040 included parts of the same bitstream. 1042 4.2 Payload Header Usage 1044 The first two bytes of the payload of an RTP packet are referred 1045 to as the payload header. The payload header consists of the 1046 same fields (F, Type, LayerId, and TID) as the NAL unit header as 1047 shown in Section 1.1.4, irrespective of the type of the payload 1048 structure. 1050 The TID value indicates (among other things) the relative 1051 importance of an RTP packet, for example because NAL units 1052 belonging to higher temporal sub-layers are not used for the 1053 decoding of lower temporal sub-layers. A lower value of TID 1054 indicates a higher importance. More important NAL units MAY be 1055 better protected against transmission losses than less important 1056 NAL units. 1058 4.3 Transmission Modes 1060 This memo enables transmission of an HEVC bitstream over 1062 . a single RTP stream on a single Media Transport (SRST), 1063 . multiple RTP streams over a single Media Transport (MRST), 1064 or 1065 . multiple RTP streams over multiple Media Transports (MRMT). 1067 Informative Note: While this specification enables the use of 1068 MRST within the H.265 RTP payload, the signaling of MRST within 1069 SDP Offer/Answer is not fully specified at the time of this 1070 writing. See [RFC5576] and [RFC5583] for what is supported 1071 today as well as [I-D.ietf-avtcore-rtp-multi-stream] and [I- 1072 D.ietf-mmusic-sdp-bundle-negotiation] for future directions. 1074 When in MRMT, the dependency of one RTP stream on another RTP 1075 stream is typically indicated as specified in [RFC5583]. 1076 [RFC5583] can also be utilized to specify dependencies within 1077 MRST, but only if the RTP streams utilize distinct payload types. 1078 When an RTP stream A depends on another RTP stream B, the RTP 1079 stream B is referred to as a dependee RTP stream of the RTP 1080 stream A. 1082 SRST or MRST SHOULD be used for point-to-point unicast scenarios, 1083 while MRMT SHOULD be used for point-to-multipoint multicast 1084 scenarios where different receivers require different operation 1085 points of the same HEVC bitstream, to improve bandwidth utilizing 1086 efficiency. 1088 Informative note: A multicast may degrade to a unicast after 1089 all but one receivers have left (this is a justification of 1090 the first "SHOULD" instead of "MUST"), and there might be 1091 scenarios where MRMT is desirable but not possible e.g. when 1092 IP multicast is not deployed in certain network (this is a 1093 justification of the second "SHOULD" instead of "MUST"). 1095 The transmission mode is indicated by the tx-mode media parameter 1096 (see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be 1097 used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be 1098 used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. 1100 Informative note: When an RTP stream does not depend on other 1101 RTP streams, any of SRST, MRST and MRMT may be in use for the 1102 RTP stream. 1104 Receivers MUST support all of SRST, MRST, and MRMT. 1106 Informative note: The required support of MRMT by receivers 1107 does not imply that multicast must be supported by receivers. 1109 4.4 Payload Structures 1111 Four different types of RTP packet payload structures are 1112 specified. A receiver can identify the type of an RTP packet 1113 payload through the Type field in the payload header. 1115 The four different payload structures are as follows: 1117 o Single NAL unit packet: Contains a single NAL unit in the 1118 payload, and the NAL unit header of the NAL unit also serves 1119 as the payload header. This payload structure is specified in 1120 Section 4.4.1. 1122 o Aggregation packet (AP): Contains more than one NAL unit 1123 within one access unit. This payload structure is specified 1124 in Section 4.4.2. 1126 o Fragmentation unit (FU): Contains a subset of a single NAL 1127 unit. This payload structure is specified in Section 4.4.3. 1129 o PACI carrying RTP packet: Contains a payload header (that 1130 differs from other payload headers for efficiency), a Payload 1131 Header Extension Structure (PHES), and a PACI payload. This 1132 payload structure is specified in Section 4.4.4. 1134 4.4.1 Single NAL Unit Packets 1136 A single NAL unit packet contains exactly one NAL unit, and 1137 consists of a payload header (denoted as PayloadHdr), a 1138 conditional 16-bit DONL field (in network byte order), and the 1139 NAL unit payload data (the NAL unit excluding its NAL unit 1140 header) of the contained NAL unit, as shown in Figure 3. 1142 0 1 2 3 1143 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1145 | PayloadHdr | DONL (conditional) | 1146 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1147 | | 1148 | NAL unit payload data | 1149 | | 1150 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1151 | :...OPTIONAL RTP padding | 1152 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1154 Figure 3 The structure a single NAL unit packet 1156 The payload header SHOULD be an exact copy of the NAL unit header 1157 of the contained NAL unit. However, the Type (i.e. 1158 nal_unit_type) field MAY be changed, e.g. when it is desirable to 1159 handle a CRA picture to be a BLA picture [JCTVC-J0107]. 1161 The DONL field, when present, specifies the value of the 16 least 1162 significant bits of the decoding order number of the contained 1163 NAL unit. If sprop-max-don-diff is greater than 0 for any of the 1164 RTP streams, the DONL field MUST be present, and the variable DON 1165 for the contained NAL unit is derived as equal to the value of 1166 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1167 all the RTP streams), the DONL field MUST NOT be present. 1169 4.4.2 Aggregation Packets (APs) 1171 Aggregation packets (APs) are introduced to enable the reduction 1172 of packetization overhead for small NAL units, such as most of 1173 the non-VCL NAL units, which are often only a few octets in size. 1175 An AP aggregates NAL units within one access unit. Each NAL unit 1176 to be carried in an AP is encapsulated in an aggregation unit. 1177 NAL units aggregated in one AP are in NAL unit decoding order. 1179 An AP consists of a payload header (denoted as PayloadHdr) 1180 followed by two or more aggregation units, as shown in Figure 4. 1182 0 1 2 3 1183 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1184 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1185 | PayloadHdr (Type=48) | | 1186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1187 | | 1188 | two or more aggregation units | 1189 | | 1190 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1191 | :...OPTIONAL RTP padding | 1192 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1194 Figure 4 The structure of an aggregation packet 1196 The fields in the payload header are set as follows. The F bit 1197 MUST be equal to 0 if the F bit of each aggregated NAL unit is 1198 equal to zero; otherwise, it MUST be equal to 1. The Type field 1199 MUST be equal to 48. The value of LayerId MUST be equal to the 1200 lowest value of LayerId of all the aggregated NAL units. The 1201 value of TID MUST be the lowest value of TID of all the 1202 aggregated NAL units. 1204 Informative Note: All VCL NAL units in an AP have the same TID 1205 value since they belong to the same access unit. However, an 1206 AP may contain non-VCL NAL units for which the TID value in 1207 the NAL unit header may be different than the TID value of the 1208 VCL NAL units in the same AP. 1210 An AP MUST carry at least two aggregation units and can carry as 1211 many aggregation units as necessary; however, the total amount of 1212 data in an AP obviously MUST fit into an IP packet, and the size 1213 SHOULD be chosen so that the resulting IP packet is smaller than 1214 the MTU size so to avoid IP layer fragmentation. An AP MUST NOT 1215 contain Fragmentation Units (FUs) specified in Section 4.4.3. 1216 APs MUST NOT be nested; i.e. an AP must not contain another AP. 1218 The first aggregation unit in an AP consists of a conditional 16- 1219 bit DONL field (in network byte order) followed by a 16-bit 1220 unsigned size information (in network byte order) that indicates 1221 the size of the NAL unit in bytes (excluding these two octets, 1222 but including the NAL unit header), followed by the NAL unit 1223 itself, including its NAL unit header, as shown in Figure 5. 1225 0 1 2 3 1226 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1227 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1228 : DONL (conditional) | NALU size | 1229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1230 | NALU size | | 1231 +-+-+-+-+-+-+-+-+ NAL unit | 1232 | | 1233 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1234 | : 1235 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1237 Figure 5 The structure of the first aggregation unit in an AP 1239 The DONL field, when present, specifies the value of the 16 least 1240 significant bits of the decoding order number of the aggregated 1241 NAL unit. 1243 If sprop-max-don-diff is greater than 0 for any of the RTP 1244 streams, the DONL field MUST be present in an aggregation unit 1245 that is the first aggregation unit in an AP, and the variable DON 1246 for the aggregated NAL unit is derived as equal to the value of 1247 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1248 all the RTP streams), the DONL field MUST NOT be present in an 1249 aggregation unit that is the first aggregation unit in an AP. 1251 An aggregation unit that is not the first aggregation unit in an 1252 AP consists of a conditional 8-bit DOND field followed by a 16- 1253 bit unsigned size information (in network byte order) that 1254 indicates the size of the NAL unit in bytes (excluding these two 1255 octets, but including the NAL unit header), followed by the NAL 1256 unit itself, including its NAL unit header, as shown in Figure 6. 1258 0 1 2 3 1259 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1261 : DOND (cond) | NALU size | 1262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1263 | | 1264 | NAL unit | 1265 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1266 | : 1267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1269 Figure 6 The structure of an aggregation unit that is not the 1270 first aggregation unit in an AP 1272 When present, the DOND field plus 1 specifies the difference 1273 between the decoding order number values of the current 1274 aggregated NAL unit and the preceding aggregated NAL unit in the 1275 same AP. 1277 If sprop-max-don-diff is greater than 0 for any of the RTP 1278 streams, the DOND field MUST be present in an aggregation unit 1279 that is not the first aggregation unit in an AP, and the variable 1280 DON for the aggregated NAL unit is derived as equal to the DON of 1281 the preceding aggregated NAL unit in the same AP plus the value 1282 of the DOND field plus 1 modulo 65536. Otherwise (sprop-max-don- 1283 diff is equal to 0 for all the RTP streams), the DOND field MUST 1284 NOT be present in an aggregation unit that is not the first 1285 aggregation unit in an AP, and in this case the transmission 1286 order and decoding order of NAL units carried in the AP are the 1287 same as the order the NAL units appear in the AP. 1289 Figure 7 presents an example of an AP that contains two 1290 aggregation units, labeled as 1 and 2 in the figure, without the 1291 DONL and DOND fields being present. 1293 0 1 2 3 1294 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1296 | RTP Header | 1297 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 | PayloadHdr (Type=48) | NALU 1 Size | 1299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1300 | NALU 1 HDR | | 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1302 | . . . | 1303 | | 1304 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1305 | . . . | NALU 2 Size | NALU 2 HDR | 1306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1307 | NALU 2 HDR | | 1308 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1309 | . . . | 1310 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 | :...OPTIONAL RTP padding | 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1314 Figure 7 An example of an AP packet containing two aggregation 1315 units without the DONL and DOND fields 1317 Figure 8 presents an example of an AP that contains two 1318 aggregation units, labeled as 1 and 2 in the figure, with the 1319 DONL and DOND fields being present. 1321 0 1 2 3 1322 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1324 | RTP Header | 1325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1326 | PayloadHdr (Type=48) | NALU 1 DONL | 1327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1328 | NALU 1 Size | NALU 1 HDR | 1329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1330 | | 1331 | NALU 1 Data . . . | 1332 | | 1333 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1334 | | NALU 2 DOND | NALU 2 Size | 1335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 | NALU 2 HDR | | 1337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1338 | | 1339 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1340 | :...OPTIONAL RTP padding | 1341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1343 Figure 8 An example of an AP containing two aggregation units 1344 with the DONL and DOND fields 1346 4.4.3 Fragmentation Units (FUs) 1348 Fragmentation units (FUs) are introduced to enable fragmenting a 1349 single NAL unit into multiple RTP packets, possibly without 1350 cooperation or knowledge of the HEVC encoder. A fragment of a 1351 NAL unit consists of an integer number of consecutive octets of 1352 that NAL unit. Fragments of the same NAL unit MUST be sent in 1353 consecutive order with ascending RTP sequence numbers (with no 1354 other RTP packets within the same RTP stream being sent between 1355 the first and last fragment). 1357 When a NAL unit is fragmented and conveyed within FUs, it is 1358 referred to as a fragmented NAL unit. APs MUST NOT be 1359 fragmented. FUs MUST NOT be nested; i.e. an FU must not contain 1360 a subset of another FU. 1362 The RTP timestamp of an RTP packet carrying an FU is set to the 1363 NALU-time of the fragmented NAL unit. 1365 An FU consists of a payload header (denoted as PayloadHdr), an FU 1366 header of one octet, a conditional 16-bit DONL field (in network 1367 byte order), and an FU payload, as shown in Figure 9. 1369 0 1 2 3 1370 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1372 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1374 | DONL (cond) | | 1375 |-+-+-+-+-+-+-+-+ | 1376 | FU payload | 1377 | | 1378 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1379 | :...OPTIONAL RTP padding | 1380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1382 Figure 9 The structure of an FU 1384 The fields in the payload header are set as follows. The Type 1385 field MUST be equal to 49. The fields F, LayerId, and TID MUST 1386 be equal to the fields F, LayerId, and TID, respectively, of the 1387 fragmented NAL unit. 1389 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1390 field, as shown in Figure 10. 1392 +---------------+ 1393 |0|1|2|3|4|5|6|7| 1394 +-+-+-+-+-+-+-+-+ 1395 |S|E| FuType | 1396 +---------------+ 1398 Figure 10 The structure of FU header 1400 The semantics of the FU header fields are as follows: 1401 S: 1 bit 1402 When set to one, the S bit indicates the start of a fragmented 1403 NAL unit i.e. the first byte of the FU payload is also the 1404 first byte of the payload of the fragmented NAL unit. When 1405 the FU payload is not the start of the fragmented NAL unit 1406 payload, the S bit MUST be set to zero. 1408 E: 1 bit 1409 When set to one, the E bit indicates the end of a fragmented 1410 NAL unit, i.e. the last byte of the payload is also the last 1411 byte of the fragmented NAL unit. When the FU payload is not 1412 the last fragment of a fragmented NAL unit, the E bit MUST be 1413 set to zero. 1415 FuType: 6 bits 1416 The field FuType MUST be equal to the field Type of the 1417 fragmented NAL unit. 1419 The DONL field, when present, specifies the value of the 16 least 1420 significant bits of the decoding order number of the fragmented 1421 NAL unit. 1423 If sprop-max-don-diff is greater than 0 for any of the RTP 1424 streams, and the S bit is equal to 1, the DONL field MUST be 1425 present in the FU, and the variable DON for the fragmented NAL 1426 unit is derived as equal to the value of the DONL field. 1427 Otherwise (sprop-max-don-diff is equal to 0 for all the RTP 1428 streams, or the S bit is equal to 0), the DONL field MUST NOT be 1429 present in the FU. 1431 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1432 the Start bit and End bit must not both be set to one in the same 1433 FU header. 1435 The FU payload consists of fragments of the payload of the 1436 fragmented NAL unit so that if the FU payloads of consecutive 1437 FUs, starting with an FU with the S bit equal to 1 and ending 1438 with an FU with the E bit equal to 1, are sequentially 1439 concatenated, the payload of the fragmented NAL unit can be 1440 reconstructed. The NAL unit header of the fragmented NAL unit is 1441 not included as such in the FU payload, but rather the 1442 information of the NAL unit header of the fragmented NAL unit is 1443 conveyed in F, LayerId, and TID fields of the FU payload headers 1444 of the FUs and the FuType field of the FU header of the FUs. An 1445 FU payload MUST NOT be empty. 1447 If an FU is lost, the receiver SHOULD discard all following 1448 fragmentation units in transmission order corresponding to the 1449 same fragmented NAL unit, unless the decoder in the receiver is 1450 known to be prepared to gracefully handle incomplete NAL units. 1452 A receiver in an endpoint or in a MANE MAY aggregate the first n- 1453 1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1454 fragment n of that NAL unit is not received. In this case, the 1455 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1456 a syntax violation. 1458 4.4.4 PACI packets 1460 This section specifies the PACI packet structure. The basic 1461 payload header specified in this memo is intentionally limited to 1462 the 16 bits of the NAL unit header so to keep the packetization 1463 overhead to a minimum. However, cases have been identified where 1464 it is advisable to include control information in an easily 1465 accessible position in the packet header, despite the additional 1466 overhead. One such control information is the Temporal 1467 Scalability Control Information as specified in Section 4.5 1468 below. PACI packets carry this and future, similar structures. 1470 The PACI packet structure is based on a payload header extension 1471 mechanism that is generic and extensible to carry payload header 1472 extensions. In this section, the focus lies on the use within 1473 this specification. Section 4.4.4.2 below provides guidance for 1474 the specification designers in how to employ the extension 1475 mechanism in future specifications. 1477 A PACI packet consists of a payload header (denoted as 1478 PayloadHdr), for which the structure follows what is described in 1479 Section 4.2 above. The payload header is followed by the fields 1480 A, cType, PHSsize, F[0..2] and Y. 1482 Figure 11 shows a PACI packet in compliance with this memo; that 1483 is, without any extensions. 1485 0 1 2 3 1486 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1488 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1490 | Payload Header Extension Structure (PHES) | 1491 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1492 | | 1493 | PACI payload: NAL unit | 1494 | . . . | 1495 | | 1496 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1497 | :...OPTIONAL RTP padding | 1498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1500 Figure 11 The structure of a PACI 1502 The fields in the payload header are set as follows. The F bit 1503 MUST be equal to 0. The Type field MUST be equal to 50. The 1504 value of LayerId MUST be a copy of the LayerId field of the PACI 1505 payload NAL unit or NAL-unit-like structure. The value of TID 1506 MUST be a copy of the TID field of the PACI payload NAL unit or 1507 NAL-unit-like structure. 1509 The semantics of other fields are as follows: 1511 A: 1 bit 1512 Copy of the F bit of the PACI payload NAL unit or NAL-unit- 1513 like structure. 1515 cType: 6 bits 1516 Copy of the Type field of the PACI payload NAL unit or NAL- 1517 unit-like structure. 1519 PHSsize: 5 bits 1520 Indicates the length of the PHES field. The value is limited 1521 to be less than or equal to 32 octets, to simplify encoder 1522 design for MTU size matching. 1524 F0 1525 This field equal to 1 specifies the presence of a temporal 1526 scalability support extension in the PHES. 1528 F1, F2 1529 MUST be 0, available for future extensions, see Section 1530 4.4.4.2. Receivers compliant with this version of the HEVC 1531 payload format MUST ignore F1=1 and/or F2=1, and also ignore 1532 any information in the PHES indicated as present by F1=1 1533 and/or F2=1. 1535 Informative note: The receiver can do that by first 1536 decoding information associated with F0=1, and then 1537 skipping over any remaining bytes of the PHES based on the 1538 value of PHSsize. 1540 Y: 1 bit 1541 MUST be 0, available for future extensions, see Section 1542 4.4.4.2. Receivers compliant with this version of the HEVC 1543 payload format MUST ignore Y=1, and also ignore any 1544 information in the PHES indicated as present by Y. 1546 PHES: variable number of octets 1547 A variable number of octets as indicated by the value of 1548 PHSsize. 1550 PACI Payload 1551 The single NAL unit packet or NAL-unit-like structure (such 1552 as: FU or AP) to be carried, not including the first two 1553 octets. 1555 Informative note: The first two octets of the NAL unit or 1556 NAL-unit-like structure carried in the PACI payload are not 1557 included in the PACI payload. Rather, the respective values 1558 are copied in locations of the PayloadHdr of the RTP 1559 packet. This design offers two advantages: first, the 1560 overall structure of the payload header is preserved, i.e. 1561 there is no special case of payload header structure that 1562 needs to be implemented for PACI. Second, no additional 1563 overhead is introduced. 1565 A PACI payload MAY be a single NAL unit, an FU, or an AP. 1566 PACIs MUST NOT be fragmented or aggregated. The following 1567 subsection documents the reasons for these design choices. 1569 4.4.4.1 Reasons for the PACI rules (informative) 1571 A PACI cannot be fragmented. If a PACI could be fragmented, and 1572 a fragment other than the first fragment would get lost, access 1573 to the information in the PACI would not be possible. Therefore, 1574 a PACI must not be fragmented. In other words, an FU must not 1575 carry (fragments of) a PACI. 1577 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1578 from a compression viewpoint, as, in many cases, several to be 1579 aggregated NAL units would share identical PACI fields and values 1580 which would be carried redundantly for no reason. Most, if not 1581 all the practical effects of PACI aggregation can be achieved by 1582 aggregating NAL units and bundling them with a PACI (see below). 1583 Therefore, a PACI must not be aggregated. In other words, an AP 1584 must not contain a PACI. 1586 The payload of a PACI can be a fragment. Both middleboxes and 1587 sending systems with inflexible (often hardware-based) encoders 1588 occasionally find themselves in situations where a PACI and its 1589 headers, combined, are larger than the MTU size. In such a 1590 scenario, the middlebox or sender can fragment the NAL unit and 1591 encapsulate the fragment in a PACI. Doing so preserves the 1592 payload header extension information for all fragments, allowing 1593 downstream middleboxes and the receiver to take advantage of that 1594 information. Therefore, a sender may place a fragment into a 1595 PACI, and a receiver must be able to handle such a PACI. 1597 The payload of a PACI can be an aggregation NAL unit. HEVC 1598 bitstreams can contain unevenly sized and/or small (when compared 1599 to the MTU size) NAL units. In order to efficiently packetize 1600 such small NAL units, AP were introduced. The benefits of APs 1601 are independent from the need for a payload header extension. 1602 Therefore, a sender may place an AP into a PACI, and a receiver 1603 must be able to handle such a PACI. 1605 4.4.4.2 PACI extensions (Informative) 1607 This section includes recommendations for future specification 1608 designers on how to extent the PACI syntax to accommodate future 1609 extensions. Obviously, designers are free to specify whatever 1610 appears to be appropriate to them at the time of their design. 1611 However, a lot of thought has been invested into the extension 1612 mechanism described below, and we suggest that deviations from it 1613 warrant a good explanation. 1615 This memo defines only a single payload header extension 1616 (Temporal Scalability Control Information, described below in 1617 Section 4.5), and, therefore, only the F0 bit carries semantics. 1618 F1 and F2 are already named (and not just marked as reserved, as 1619 a typical video spec designer would do). They are intended to 1620 signal two additional extensions. The Y bit allows to, 1621 recursively, add further F and Y bits to extend the mechanism 1622 beyond 3 possible payload header extensions. It is suggested to 1623 define a new packet type (using a different value for Type) when 1624 assigning the F1, F2, or Y bits different semantics than what is 1625 suggested below. 1627 When a Y bit is set, an 8 bit flag-extension is inserted after 1628 the Y bit. A flag-extension consists of 7 flags F[n..n+6], and 1629 another Y bit. 1631 The basic PACI header already includes F0, F1, and F2. 1632 Therefore, the Fx bits in the first flag-extensions are numbered 1633 F3, F4, ..., F9, the F bits in the second flag-extension are 1634 numbered F10, F11, ..., F16, and so forth. As a result, at least 1635 3 Fx bits are always in the PACI, but the number of Fx bits (and 1636 associated types of extensions), can be increased by setting the 1637 next Y bit and adding an octet of flag-extensions, carrying 7 1638 flags and another Y bit. The size of this list of flags is 1639 subject to the limits specified in Section 4.4.4 (32 octets for 1640 all flag-extensions and the PHES information combined). 1642 Each of the F bits can indicate either the presence of 1643 information in the Payload Header Extension Structure (PHES), 1644 described below, or a given F bit can indicate a certain 1645 condition, without including additional information in the PHES. 1647 When a spec developer devises a new syntax that takes advantage 1648 of the PACI extension mechanism, he/she must follow the 1649 constraints listed below; otherwise the extension mechanism may 1650 break. 1652 1) The fields added for a particular Fx bit MUST be fixed in 1653 length and not depend on what other Fx bits are set (no 1654 parsing dependency). 1655 2) The Fx bits must be assigned in order. 1656 3) An implementation that supports the n-th Fn bit for any 1657 value of n must understand the syntax (though not 1658 necessarily the semantics) of the fields Fk (with k < n), so 1659 to be able to either use those bits when present, or at 1660 least be able to skip over them. 1662 4.5 Temporal Scalability Control Information 1664 This section describes the single payload header extension 1665 defined in this specification, known as Temporal Scalability 1666 Control Information (TSCI). If, in the future, additional 1667 payload header extensions become necessary, they could be 1668 specified in this section of an updated version of this document, 1669 or in their own documents. 1671 When F0 is set to 1 in a PACI, this specifies that the PHES field 1672 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as 1673 follows: 1675 0 1 2 3 1676 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1677 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1678 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1680 | TL0PICIDX | IrapPicID |S|E| RES | | 1681 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1682 | .... | 1683 | PACI payload: NAL unit | 1684 | | 1685 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1686 | :...OPTIONAL RTP padding | 1687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1689 Figure 12 The structure of a PACI with a PHES containing a TSCI 1691 TL0PICIDX (8 bits) 1692 When present, the TL0PICIDX field MUST be set to equal to 1693 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1694 [H.265] for the access unit containing the NAL unit in the 1695 PACI. 1697 IrapPicID (8 bits) 1698 When present, the IrapPicID field MUST be set to equal to 1699 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1700 access unit containing the NAL unit in the PACI. 1702 S (1 bit) 1703 The S bit MUST be set to 1 if any of the following conditions 1704 is true and MUST be set to 0 otherwise: 1705 o The NAL unit in the payload of the PACI is the first VCL NAL 1706 unit, in decoding order, of a picture. 1707 o The NAL unit in the payload of the PACI is an AP and the NAL 1708 unit in the first contained aggregation unit is the first 1709 VCL NAL unit, in decoding order, of a picture. 1710 o The NAL unit in the payload of the PACI is an FU with its S 1711 bit equal to 1 and the FU payload containing a fragment of 1712 the first VCL NAL unit, in decoding order of a picture. 1714 E (1 bit) 1715 The E bit MUST be set to 1 if any of the following conditions 1716 is true and MUST be set to 0 otherwise: 1717 o The NAL unit in the payload of the PACI is the last VCL NAL 1718 unit, in decoding order, of a picture. 1719 o The NAL unit in the payload of the PACI is an AP and the NAL 1720 unit in the last contained aggregation unit is the last VCL 1721 NAL unit, in decoding order, of a picture. 1722 o The NAL unit in the payload of the PACI is an FU with its E 1723 bit equal to 1 and the FU payload containing a fragment of 1724 the last VCL NAL unit, in decoding order of a picture. 1726 RES (6 bits) 1727 MUST be equal to 0. Reserved for future extensions. 1729 The value of PHSsize MUST be set to 3. Receivers MUST allow 1730 other values of the fields F0, F1, F2, Y, and PHSsize, and MUST 1731 ignore any additional fields, when present, than specified above 1732 in the PHES. 1734 4.6 Decoding Order Number 1736 For each NAL unit, the variable AbsDon is derived, representing 1737 the decoding order number that is indicative of the NAL unit 1738 decoding order. 1740 Let NAL unit n be the n-th NAL unit in transmission order within 1741 an RTP stream. 1743 If sprop-max-don-diff is equal to 0 for all the RTP streams 1744 carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for 1745 NAL unit n, is derived as equal to n. 1747 Otherwise (sprop-max-don-diff is greater than 0 for any of the 1748 RTP streams), AbsDon[n] is derived as follows, where DON[n] is 1749 the value of the variable DON for NAL unit n: 1751 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit 1752 in transmission order), AbsDon[0] is set equal to DON[0]. 1754 o Otherwise (n is greater than 0), the following applies for 1755 derivation of AbsDon[n]: 1757 If DON[n] == DON[n-1], 1758 AbsDon[n] = AbsDon[n-1] 1760 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1761 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1763 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1764 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1766 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1767 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1768 DON[n]) 1770 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1771 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1773 For any two NAL units m and n, the following applies: 1775 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1776 follows NAL unit m in NAL unit decoding order. 1778 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding 1779 order of the two NAL units can be in either order. 1781 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n 1782 precedes NAL unit m in decoding order. 1784 Informative note: When two consecutive NAL units in the NAL 1785 unit decoding order have different values of AbsDon, the 1786 absolute difference between the two AbsDon values may be 1787 greater than or equal to 1. 1789 Informative note: There are multiple reasons to allow for the 1790 absolute difference of the values of AbsDon for two 1791 consecutive NAL units in the NAL unit decoding order to be 1792 greater than one. An increment by one is not required, as at 1793 the time of associating values of AbsDon to NAL units, it may 1794 not be known whether all NAL units are to be delivered to the 1795 receiver. For example, a gateway may not forward VCL NAL 1796 units of higher sub-layers or some SEI NAL units when there is 1797 congestion in the network. In another example, the first 1798 intra-coded picture of a pre-encoded clip is transmitted in 1799 advance to ensure that it is readily available in the 1800 receiver, and when transmitting the first intra-coded picture, 1801 the originator does not exactly know how many NAL units will 1802 be encoded before the first intra-coded picture of the pre- 1803 encoded clip follows in decoding order. Thus, the values of 1804 AbsDon for the NAL units of the first intra-coded picture of 1805 the pre-encoded clip have to be estimated when they are 1806 transmitted, and gaps in values of AbsDon may occur. Another 1807 example is MRST or MRMT with sprop-max-don-diff greater than 1808 0, where the AbsDon values must indicate cross-layer decoding 1809 order for NAL units conveyed in all the RTP streams. 1811 5 Packetization Rules 1813 The following packetization rules apply: 1815 o If sprop-max-don-diff is greater than 0 for any of the RTP 1816 streams, the transmission order of NAL units carried in the 1817 RTP stream MAY be different than the NAL unit decoding order 1818 and the NAL unit output order. Otherwise (sprop-max-don-diff 1819 is equal to 0 for all the RTP streams), the transmission order 1820 of NAL units carried in the RTP stream MUST be the same as the 1821 NAL unit decoding order, and, when tx-mode is equal to "MRST" 1822 or "MRMT", MUST also be the same as the NAL unit output order. 1824 o A NAL unit of a small size SHOULD be encapsulated in an 1825 aggregation packet together with one or more other NAL units 1826 in order to avoid the unnecessary packetization overhead for 1827 small NAL units. For example, non-VCL NAL units such as 1828 access unit delimiters, parameter sets, or SEI NAL units are 1829 typically small and can often be aggregated with VCL NAL units 1830 without violating MTU size constraints. 1832 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1833 match viewpoint, be encapsulated in an aggregation packet 1834 together with its associated VCL NAL unit, as typically a non- 1835 VCL NAL unit would be meaningless without the associated VCL 1836 NAL unit being available. 1838 o For carrying exactly one NAL unit in an RTP packet, a single 1839 NAL unit packet MUST be used. 1841 6 De-packetization Process 1843 The general concept behind de-packetization is to get the NAL 1844 units out of the RTP packets in an RTP stream and all RTP streams 1845 the RTP stream depends on, if any, and pass them to the decoder 1846 in the NAL unit decoding order. 1848 The de-packetization process is implementation dependent. 1849 Therefore, the following description should be seen as an example 1850 of a suitable implementation. Other schemes may be used as well 1851 as long as the output for the same input is the same as the 1852 process described below. The output is the same when the set of 1853 output NAL units and their order are both identical. 1854 Optimizations relative to the described algorithms are possible. 1856 All normal RTP mechanisms related to buffer management apply. In 1857 particular, duplicated or outdated RTP packets (as indicated by 1858 the RTP sequences number and the RTP timestamp) are removed. To 1859 determine the exact time for decoding, factors such as a possible 1860 intentional delay to allow for proper inter-stream 1861 synchronization must be factored in. 1863 NAL units with NAL unit type values in the range of 0 to 47, 1864 inclusive may be passed to the decoder. NAL-unit-like structures 1865 with NAL unit type values in the range of 48 to 63, inclusive, 1866 MUST NOT be passed to the decoder. 1868 The receiver includes a receiver buffer, which is used to 1869 compensate for transmission delay jitter within individual RTP 1870 streams and across RTP streams, to reorder NAL units from 1871 transmission order to the NAL unit decoding order, and to recover 1872 the NAL unit decoding order in MRST or MRMT, when applicable. In 1873 this section, the receiver operation is described under the 1874 assumption that there is no transmission delay jitter within an 1875 RTP stream and across RTP streams. To make a difference from a 1876 practical receiver buffer that is also used for compensation of 1877 transmission delay jitter, the receiver buffer is here after 1878 called the de-packetization buffer in this section. Receivers 1879 should also prepare for transmission delay jitter; i.e. either 1880 reserve separate buffers for transmission delay jitter buffering 1881 and de-packetization buffering or use a receiver buffer for both 1882 transmission delay jitter and de-packetization. Moreover, 1883 receivers should take transmission delay jitter into account in 1884 the buffering operation; e.g. by additional initial buffering 1885 before starting of decoding and playback. 1887 When sprop-max-don-diff is equal to 0 for all the received RTP 1888 streams, the de-packetization buffer size is zero bytes and the 1889 process described in the remainder of this paragraph applies. 1890 When there is only one RTP stream received, the NAL units carried 1891 in the single RTP stream are directly passed to the decoder in 1892 their transmission order, which is identical to their decoding 1893 order. When there is more than one RTP stream received, the NAL 1894 units carried in the multiple RTP streams are passed to the 1895 decoder in their NTP timestamp order. When there are several NAL 1896 units of different RTP streams with the same NTP timestamp, the 1897 order to pass them to the decoder is their dependency order, 1898 where NAL units of a dependee RTP stream are passed to the 1899 decoder prior to the NAL units of the dependent RTP stream. When 1900 there are several NAL units of the same RTP stream with the same 1901 NTP timestamp, the order to pass them to the decoder is their 1902 transmission order. 1904 Informative note: The mapping between RTP and NTP 1905 timestamps is conveyed in RTCP SR packets. In addition, 1906 the mechanisms for faster media timestamp synchronization 1907 discussed in [RFC6051] may be used to speed up the 1908 acquisition of the RTP-to-wall-clock mapping. 1910 When sprop-max-don-diff is greater than 0 for any the received 1911 RTP streams, the process described in the remainder of this 1912 section applies. 1914 There are two buffering states in the receiver: initial buffering 1915 and buffering while playing. Initial buffering starts when the 1916 reception is initialized. After initial buffering, decoding and 1917 playback are started, and the buffering-while-playing mode is 1918 used. 1920 Regardless of the buffering state, the receiver stores incoming 1921 NAL units, in reception order, into the de-packetization buffer. 1922 NAL units carried in RTP packets are stored in the de- 1923 packetization buffer individually, and the value of AbsDon is 1924 calculated and stored for each NAL unit. When MRST or MRMT is in 1925 use, NAL units of all RTP streams of a bitstream are stored in 1926 the same de-packetization buffer. When NAL units carried in any 1927 two RTP streams are available to be placed into the de- 1928 packetization buffer, those NAL units carried in the RTP stream 1929 that is lower in the dependency tree are placed into the buffer 1930 first. For example, if RTP stream A depends on RTP stream B, 1931 then NAL units carried in RTP stream B are placed into the buffer 1932 first. 1934 Initial buffering lasts until condition A (the difference between 1935 the greatest and smallest AbsDon values of the NAL units in the 1936 de-packetization buffer is greater than or equal to the value of 1937 sprop-max-don-diff of the highest RTP stream) or condition B (the 1938 number of NAL units in the de-packetization buffer is greater 1939 than the value of sprop-depack-buf-nalus) is true. 1941 After initial buffering, whenever condition A or condition B is 1942 true, the following operation is repeatedly applied until both 1943 condition A and condition B become false: 1945 o The NAL unit in the de-packetization buffer with the smallest 1946 value of AbsDon is removed from the de-packetization buffer 1947 and passed to the decoder. 1949 When no more NAL units are flowing into the de-packetization 1950 buffer, all NAL units remaining in the de-packetization buffer 1951 are removed from the buffer and passed to the decoder in the 1952 order of increasing AbsDon values. 1954 7 Payload Format Parameters 1956 This section specifies the parameters that MAY be used to select 1957 optional features of the payload format and certain features or 1958 properties of the bitstream or the RTP stream. The parameters 1959 are specified here as part of the media type registration for the 1960 HEVC codec. A mapping of the parameters into the Session 1961 Description Protocol (SDP) [RFC4566] is also provided for 1962 applications that use SDP. Equivalent parameters could be 1963 defined elsewhere for use with control protocols that do not use 1964 SDP. 1966 7.1 Media Type Registration 1968 The media subtype for the HEVC codec is allocated from the IETF 1969 tree. 1971 The receiver MUST ignore any unrecognized parameter. 1973 Media Type name: video 1975 Media subtype name: H265 1977 Required parameters: none 1979 OPTIONAL parameters: 1981 profile-space, tier-flag, profile-id, profile-compatibility- 1982 indicator, interop-constraints, and level-id: 1984 These parameters indicate the profile, tier, default level, 1985 and some constraints of the bitstream carried by the RTP 1986 stream and all RTP streams the RTP stream depends on, or a 1987 specific set of the profile, tier, default level, and some 1988 constraints the receiver supports. 1990 The profile and some constraints are indicated collectively 1991 by profile-space, profile-id, profile-compatibility- 1992 indicator, and interop-constraints. The profile specifies 1993 the subset of coding tools that may have been used to 1994 generate the bitstream or that the receiver supports. 1996 Informative note: There are 32 values of profile-id, and 1997 there are 32 flags in profile-compatibility-indicator, 1998 each flag corresponding to one value of profile-id. 1999 According to HEVC version 1 in [HEVC], when more than 2000 one of the 32 flags is set for a bitstream, the 2001 bitstream would comply with all the profiles 2002 corresponding to the set flags. However, in a draft of 2003 HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 2004 Format Range Extensions profiles have been specified, 2005 all using the same value of profile-id (4), 2006 differentiated by some of the 48 bits in interop- 2007 constraints - this (rather unexpected way of profile 2008 signalling) means that one of the 32 flags may 2009 correspond to multiple profiles. To be able to support 2010 whatever HEVC extension profile that might be specified 2011 and indicated using profile-space, profile-id, profile- 2012 compatibility-indicator, and interop-constraints in the 2013 future, it would be safe to require symmetric use of 2014 these parameters in SDP offer/answer unless recv-sub- 2015 layer-id is included in the SDP answer for choosing one 2016 of the sub-layers offered. 2018 The tier is indicated by tier-flag. The default level is 2019 indicated by level-id. The tier and the default level 2020 specify the limits on values of syntax elements or 2021 arithmetic combinations of values of syntax elements that 2022 are followed when generating the bitstream or that the 2023 receiver supports. 2025 A set of profile-space, tier-flag, profile-id, profile- 2026 compatibility-indicator, interop-constraints, and level-id 2027 parameters ptlA is said to be consistent with another set 2028 of these parameters ptlB if any decoder that conforms to 2029 the profile, tier, level, and constraints indicated by ptlB 2030 can decode any bitstream that conforms to the profile, 2031 tier, level, and constraints indicated by ptlA. 2033 In SDP offer/answer, when the SDP answer does not include 2034 the recv-sub-layer-id parameter that is less than the 2035 sprop-sub-layer-id parameter in the SDP offer, the 2036 following applies: 2038 o The profile-space, tier-flag, profile-id, profile- 2039 compatibility-indicator, and interop-constraints 2040 parameters MUST be used symmetrically, i.e. the value 2041 of each of these parameters in the offer MUST be the 2042 same as that in the answer, either explicitly 2043 signalled or implicitly inferred. 2045 o The level-id parameter is changeable as long as the 2046 highest level indicated by the answer is either equal 2047 to or lower than that in the offer. Note that the 2048 highest level is indicated by level-id and max-recv- 2049 level-id together. 2051 In SDP offer/answer, when the SDP answer does include the 2052 recv-sub-layer-id parameter that is less than the sprop- 2053 sub-layer-id parameter in the SDP offer, the set of 2054 profile-space, tier-flag, profile-id, profile- 2055 compatibility-indicator, interop-constraints, and level-id 2056 parameters included in the answer MUST be consistent with 2057 that for the chosen sub-layer representation as indicated 2058 in the SDP offer, with the exception that the level-id 2059 parameter in the SDP answer is changable as long as the 2060 highest level indicated by the answer is either lower than 2061 or equal to that in the offer. 2063 More specifications of these parameters, including how they 2064 relate to the values of the profile, tier, and level syntax 2065 elements specified in [HEVC] are provided below. 2067 profile-space, profile-id: 2069 The value of profile-space MUST be in the range of 0 to 3, 2070 inclusive. The value of profile-id MUST be in the range of 2071 0 to 31, inclusive. 2073 When profile-space is not present, a value of 0 MUST be 2074 inferred. When profile-id is not present, a value of 1 2075 (i.e. the Main profile) MUST be inferred. 2077 When used to indicate properties of a bitstream, profile- 2078 space and profile-id are derived from the profile, tier, 2079 and level syntax elements in SPS or VPS NAL units as 2080 follows, where general_profile_space, general_profile_idc, 2081 sub_layer_profile_space[j], and sub_layer_profile_idc[j] 2082 are specified in [HEVC]: 2084 If the RTP stream is the highest RTP stream, the 2085 following applies: 2087 o profile_space = general_profile_space 2088 o profile_id = general_profile_idc 2090 Otherwise (the RTP stream is a dependee RTP stream), the 2091 following applies, with j being the value of the sprop- 2092 sub-layer-id parameter: 2094 o profile_space = sub_layer_profile_space[j] 2095 o profile_id = sub_layer_profile_idc[j] 2097 tier-flag, level-id: 2099 The value of tier-flag MUST be in the range of 0 to 1, 2100 inclusive. The value of level-id MUST be in the range of 0 2101 to 255, inclusive. 2103 If the tier-flag and level-id parameters are used to 2104 indicate properties of a bitstream, they indicate the tier 2105 and the highest level the bitstream complies with. 2107 If the tier-flag and level-id parameters are used for 2108 capability exchange, the following applies. If max-recv- 2109 level-id is not present, the default level defined by 2110 level-id indicates the highest level the codec wishes to 2111 support. Otherwise, max-recv-level-id indicates the 2112 highest level the codec supports for receiving. For either 2113 receiving or sending, all levels that are lower than the 2114 highest level supported MUST also be supported. 2116 If no tier-flag is present, a value of 0 MUST be inferred 2117 and if no level-id is present, a value of 93 (i.e. level 2118 3.1) MUST be inferred. 2120 When used to indicate properties of a bitstream, the tier- 2121 flag and level-id parameters are derived from the profile, 2122 tier, and level syntax elements in SPS or VPS NAL units as 2123 follows, where general_tier_flag, general_level_idc, 2124 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 2125 specified in [HEVC]: 2127 If the RTP stream is the highest RTP stream, the 2128 following applies: 2130 o tier-flag = general_tier_flag 2131 o level-id = general_level_idc 2133 Otherwise (the RTP stream is a dependee RTP stream), the 2134 following applies, with j being the value of the sprop- 2135 sub-layer-id parameter: 2137 o tier-flag = sub_layer_tier_flag[j] 2138 o level-id = sub_layer_level_idc[j] 2140 interop-constraints: 2142 A base16 [RFC4648] (hexadecimal) representation of six 2143 bytes of data, consisting of progressive_source_flag, 2144 interlaced_source_flag, non_packed_constraint_flag, 2145 frame_only_constraint_flag, and reserved_zero_44bits. 2147 If the interop-constraints parameter is not present, the 2148 following MUST be inferred: 2150 o progressive_source_flag = 1 2151 o interlaced_source_flag = 0 2152 o non_packed_constraint_flag = 1 2153 o frame_only_constraint_flag = 1 2154 o reserved_zero_44bits = 0 2156 When the interop-constraints parameter is used to indicate 2157 properties of a bitstream, the following applies, where 2158 general_progressive_source_flag, 2159 general_interlaced_source_flag, 2160 general_non_packed_constraint_flag, 2161 general_non_packed_constraint_flag, 2162 general_frame_only_constraint_flag, 2163 general_reserved_zero_44bits, 2164 sub_layer_progressive_source_flag[j], 2165 sub_layer_interlaced_source_flag[j], 2166 sub_layer_non_packed_constraint_flag[j], 2167 sub_layer_frame_only_constraint_flag[j], and 2168 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2170 If the RTP stream is the highest RTP stream, the 2171 following applies: 2173 o progressive_source_flag = 2174 general_progressive_source_flag 2175 o interlaced_source_flag = 2176 general_interlaced_source_flag 2177 o non_packed_constraint_flag = 2178 general_non_packed_constraint_flag 2179 o frame_only_constraint_flag = 2180 general_frame_only_constraint_flag 2181 o reserved_zero_44bits = general_reserved_zero_44bits 2183 Otherwise (the RTP stream is a dependee RTP stream), the 2184 following applies, with j being the value of the sprop- 2185 sub-layer-id parameter: 2187 o progressive_source_flag = 2188 sub_layer_progressive_source_flag[j] 2189 o interlaced_source_flag = 2190 sub_layer_interlaced_source_flag[j] 2191 o non_packed_constraint_flag = 2193 sub_layer_non_packed_constraint_flag[j] 2194 o frame_only_constraint_flag = 2196 sub_layer_frame_only_constraint_flag[j] 2197 o reserved_zero_44bits = 2198 sub_layer_reserved_zero_44bits[j] 2200 Using interop-constraints for capability exchange results 2201 in a requirement on any bitstream to be compliant with the 2202 interop-constraints. 2204 profile-compatibility-indicator: 2206 A base16 [RFC4648] representation of four bytes of data. 2208 When profile-compatibility-indicator is used to indicate 2209 properties of a bitstream, the following applies, where 2210 general_profile_compatibility_flag[j] and 2211 sub_layer_profile_compatibility_flag[i][j] are specified in 2212 [HEVC]: 2214 The profile-compatibility-indicator in this case 2215 indicates additional profiles to the profile defined by 2216 profile_space, profile_id, and interop-constraints the 2217 bitstream conforms to. A decoder that conforms to any 2218 of all the profiles the bitstream conforms to would be 2219 capable of decoding the bitstream. These additional 2220 profiles are defined by profile-space, each set bit of 2221 profile-compatibility-indicator, and interop- 2222 constraints. 2224 If the RTP stream is the highest RTP stream, the 2225 following applies for each value of j in the range of 0 2226 to 31, inclusive: 2228 o bit j of profile-compatibility-indicator = 2229 general_profile_compatibility_flag[j] 2231 Otherwise (the RTP stream is a dependee RTP stream), the 2232 following applies for i equal to sprop-sub-layer-id and 2233 for each value of j in the range of 0 to 31, inclusive: 2235 o bit j of profile-compatibility-indicator = 2236 sub_layer_profile_compatibility_flag[i][j] 2238 Using profile-compatibility-indicator for capability 2239 exchange results in a requirement on any bitstream to be 2240 compliant with the profile-compatibility-indicator. This 2241 is intended to handle cases where any future HEVC profile 2242 is defined as an intersection of two or more profiles. 2244 If this parameter is not present, this parameter defaults 2245 to the following: bit j, with j equal to profile-id, of 2246 profile-compatibility-indicator is inferred to be equal to 2247 1, and all other bits are inferred to be equal to 0. 2249 sprop-sub-layer-id: 2251 This parameter MAY be used to indicate the highest allowed 2252 value of TID in the bitstream. When not present, the value 2253 of sprop-sub-layer-id is inferred to be equal to 6. 2255 The value of sprop-sub-layer-id MUST be in the range of 0 2256 to 6, inclusive. 2258 recv-sub-layer-id: 2260 This parameter MAY be used to signal a receiver's choice of 2261 the offered or declared sub-layer representations in the 2262 sprop-vps. The value of recv-sub-layer-id indicates the 2263 TID of the highest sub-layer of the bitstream that a 2264 receiver supports. When not present, the value of recv- 2265 sub-layer-id is inferred to be equal to the value of the 2266 sprop-sub-layer-id parameter in the SDP offer. 2268 The value of recv-sub-layer-id MUST be in the range of 0 to 2269 6, inclusive. 2271 max-recv-level-id: 2273 This parameter MAY be used to indicate the highest level a 2274 receiver supports. The highest level the receiver supports 2275 is equal to the value of max-recv-level-id divided by 30. 2277 The value of max-recv-level-id MUST be in the range of 0 2278 to 255, inclusive. 2280 When max-recv-level-id is not present, the value is 2281 inferred to be equal to level-id. 2283 max-recv-level-id MUST NOT be present when the highest 2284 level the receiver supports is not higher than the default 2285 level. 2287 tx-mode: 2289 This parameter indicates whether the transmission mode is 2290 SRST, MRST, or MRMT. 2292 The value of tx-mode MUST be equal to "SRST", "MRST" or 2293 "MRMT". When not present, the value of tx-mode is inferred 2294 to be equal to "SRST". 2296 If the value is equal to "MRST", MRST MUST be in use. 2297 Otherwise, if the value is equal to "MRMT", MRMT MUST be in 2298 use. Otherwise (the value is equal to "SRST"), SRST MUST 2299 be in use. 2301 The value of tx-mode MUST be equal to "MRST" for all RTP 2302 streams in an MRST. 2304 The value of tx-mode MUST be equal to "MRMT" for all RTP 2305 streams in an MRMT. 2307 sprop-vps: 2309 This parameter MAY be used to convey any video parameter 2310 set NAL unit of the bitstream for out-of-band transmission 2311 of video parameter sets. The parameter MAY also be used 2312 for capability exchange and to indicate sub-stream 2313 characteristics (i.e. properties of sub-layer 2314 representations as defined in [HEVC]). The value of the 2315 parameter is a comma-separated (',') list of base64 2316 [RFC4648] representations of the video parameter set NAL 2317 units as specified in Section 7.3.2.1 of [HEVC]. 2319 The sprop-vps parameter MAY contain one or more than one 2320 video parameter set NAL unit. However, all other video 2321 parameter sets contained in the sprop-vps parameter MUST be 2322 consistent with the first video parameter set in the sprop- 2323 vps parameter. A video parameter set vpsB is said to be 2324 consistent with another video parameter set vpsA if any 2325 decoder that conforms to the profile, tier, level, and 2326 constraints indicated by the 12 bytes of data starting from 2327 the syntax element general_profile_space to the syntax 2328 element general_level_id, inclusive, in the first 2329 profile_tier_level( ) syntax structure in vpsA can decode 2330 any bitstream that conforms to the profile, tier, level, 2331 and constraints indicated by the 12 bytes of data starting 2332 from the syntax element general_profile_space to the syntax 2333 element general_level_id, inclusive, in the first 2334 profile_tier_level( ) syntax structure in vpsB. 2336 sprop-sps: 2338 This parameter MAY be used to convey sequence parameter set 2339 NAL units of the bitstream for out-of-band transmission of 2340 sequence parameter sets. The value of the parameter is a 2341 comma-separated (',') list of base64 [RFC4648] 2342 representations of the sequence parameter set NAL units as 2343 specified in Section 7.3.2.2 of [HEVC]. 2345 sprop-pps: 2347 This parameter MAY be used to convey picture parameter set 2348 NAL units of the bitstream for out-of-band transmission of 2349 picture parameter sets. The value of the parameter is a 2350 comma-separated (',') list of base64 [RFC4648] 2351 representations of the picture parameter set NAL units as 2352 specified in Section 7.3.2.3 of [HEVC]. 2354 sprop-sei: 2356 This parameter MAY be used to convey one or more SEI 2357 messages that describe bitstream characteristics. When 2358 present, a decoder can rely on the bitstream 2359 characteristics that are described in the SEI messages for 2360 the entire duration of the session, independently from the 2361 persistence scopes of the SEI messages as specified in 2362 [HEVC]. 2364 The value of the parameter is a comma-separated (',') list 2365 of base64 [RFC4648] representations of SEI NAL units as 2366 specified in Section 7.3.2.4 of [HEVC]. 2368 Informative note: Intentionally, no list of applicable 2369 or inapplicable SEI messages is specified here. 2370 Conveying certain SEI messages in sprop-sei may be 2371 sensible in some application scenarios and meaningless 2372 in others. However, a few examples are described below: 2374 1) In an environment where the bitstream was created 2375 from film-based source material, and no splicing is 2376 going to occur during the lifetime of the session, 2377 the film grain characteristics SEI message or the 2378 tone mapping information SEI message are likely 2379 meaningful, and sending them in sprop-sei rather than 2380 in the bitstream at each entry point may help saving 2381 bits and allows to configure the renderer only once, 2382 avoiding unwanted artifacts. 2383 2) The structure of pictures information SEI message in 2384 sprop-sei can be used to inform a decoder of 2385 information on the NAL unit types, picture order 2386 count values, and prediction dependencies of a 2387 sequence of pictures. Having such knowledge can be 2388 helpful for error recovery. 2389 3) Examples for SEI messages that would be meaningless 2390 to be conveyed in sprop-sei include the decoded 2391 picture hash SEI message (it is close to impossible 2392 that all decoded pictures have the same hash-tag), 2393 the display orientation SEI message when the device 2394 is a handheld device (as the display orientation may 2395 change when the handheld device is turned around), or 2396 the filler payload SEI message (as there is no point 2397 in just having more bits in SDP). 2399 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2401 These parameters MAY be used to signal the capabilities of 2402 a receiver implementation. These parameters MUST NOT be 2403 used for any other purpose. The highest level (specified 2404 by max-recv-level-id) MUST be such that the receiver is 2405 fully capable of supporting. max-lsr, max-lps, max-cpb, 2406 max-dpb, max-br, max-tr, and max-tc MAY be used to indicate 2407 capabilities of the receiver that extend the required 2408 capabilities of the highest level, as specified below. 2410 When more than one parameter from the set (max-lsr, max- 2411 lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, 2412 the receiver MUST support all signaled capabilities 2413 simultaneously. For example, if both max-lsr and max-br 2414 are present, the highest level with the extension of both 2415 the picture rate and bitrate is supported. That is, the 2416 receiver is able to decode bitstreams in which the luma 2417 sample rate is up to max-lsr (inclusive), the bitrate is up 2418 to max-br (inclusive), the coded picture buffer size is 2419 derived as specified in the semantics of the max-br 2420 parameter below, and the other properties comply with the 2421 highest level specified by max-recv-level-id. 2423 Informative note: When the OPTIONAL media type 2424 parameters are used to signal the properties of a 2425 bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- 2426 br, max-tr, and max-tc are not present, the values of 2427 profile-space, tier-flag, profile-id, profile- 2428 compatibility-indicator, interop-constraints, and level- 2429 id must always be such that the bitstream complies fully 2430 with the specified profile, tier, and level. 2432 max-lsr: 2433 The value of max-lsr is an integer indicating the maximum 2434 processing rate in units of luma samples per second. The 2435 max-lsr parameter signals that the receiver is capable of 2436 decoding video at a higher rate than is required by the 2437 highest level. 2439 When max-lsr is signaled, the receiver MUST be able to 2440 decode bitstreams that conform to the highest level, with 2441 the exception that the MaxLumaSR value in Table A-2 of 2442 [HEVC] for the highest level is replaced with the value of 2443 max-lsr. Senders MAY use this knowledge to send pictures 2444 of a given size at a higher picture rate than is indicated 2445 in the highest level. 2447 When not present, the value of max-lsr is inferred to be 2448 equal to the value of MaxLumaSR given in Table A-2 of 2449 [HEVC] for the highest level. 2451 The value of max-lsr MUST be in the range of MaxLumaSR to 2452 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in 2453 Table A-2 of [HEVC] for the highest level. 2455 max-lps: 2456 The value of max-lps is an integer indicating the maximum 2457 picture size in units of luma samples. The max-lps 2458 parameter signals that the receiver is capable of decoding 2459 larger picture sizes than are required by the highest 2460 level. When max-lps is signaled, the receiver MUST be able 2461 to decode bitstreams that conform to the highest level, 2462 with the exception that the MaxLumaPS value in Table A-1 of 2463 [HEVC] for the highest level is replaced with the value of 2464 max-lps. Senders MAY use this knowledge to send larger 2465 pictures at a proportionally lower picture rate than is 2466 indicated in the highest level. 2468 When not present, the value of max-lps is inferred to be 2469 equal to the value of MaxLumaPS given in Table A-1 of 2470 [HEVC] for the highest level. 2472 The value of max-lps MUST be in the range of MaxLumaPS to 2473 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in 2474 Table A-1 of [HEVC] for the highest level. 2476 max-cpb: 2477 The value of max-cpb is an integer indicating the maximum 2478 coded picture buffer size in units of CpbBrVclFactor bits 2479 for the VCL HRD parameters and in units of CpbBrNalFactor 2480 bits for the NAL HRD parameters, where CpbBrVclFactor and 2481 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The 2482 max-cpb parameter signals that the receiver has more memory 2483 than the minimum amount of coded picture buffer memory 2484 required by the highest level. When max-cpb is signaled, 2485 the receiver MUST be able to decode bitstreams that conform 2486 to the highest level, with the exception that the MaxCPB 2487 value in Table A-1 of [HEVC] for the highest level is 2488 replaced with the value of max-cpb. Senders MAY use this 2489 knowledge to construct coded bitstreams with greater 2490 variation of bitrate than can be achieved with the MaxCPB 2491 value in Table A-1 of [HEVC]. 2493 When not present, the value of max-cpb is inferred to be 2494 equal to the value of MaxCPB given in Table A-1 of [HEVC] 2495 for the highest level. 2497 The value of max-cpb MUST be in the range of MaxCPB to 2498 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table 2499 A-1 of [HEVC] for the highest level. 2501 Informative note: The coded picture buffer is used in 2502 the hypothetical reference decoder (Annex C of HEVC). 2503 The use of the hypothetical reference decoder is 2504 recommended in HEVC encoders to verify that the produced 2505 bitstream conforms to the standard and to control the 2506 output bitrate. Thus, the coded picture buffer is 2507 conceptually independent of any other potential buffers 2508 in the receiver, including de-packetization and de- 2509 jitter buffers. The coded picture buffer need not be 2510 implemented in decoders as specified in Annex C of HEVC, 2511 but rather standard-compliant decoders can have any 2512 buffering arrangements provided that they can decode 2513 standard-compliant bitstreams. Thus, in practice, the 2514 input buffer for a video decoder can be integrated with 2515 de-packetization and de-jitter buffers of the receiver. 2517 max-dpb: 2518 The value of max-dpb is an integer indicating the maximum 2519 decoded picture buffer size in units decoded pictures at 2520 the MaxLumaPS for the highest level, i.e. the number of 2521 decoded pictures at the maximum picture size defined by the 2522 highest level. The value of max-dpb MUST be in the range 2523 of 1 to 16, respectively. The max-dpb parameter signals 2524 that the receiver has more memory than the minimum amount 2525 of decoded picture buffer memory required by default, which 2526 is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When 2527 max-dpb is signaled, the receiver MUST be able to decode 2528 bitstreams that conform to the highest level, with the 2529 exception that the MaxDpbPicBuff value defined in [HEVC] as 2530 6 is replaced with the value of max-dpb. Consequently, a 2531 receiver that signals max-dpb MUST be capable of storing 2532 the following number of decoded pictures (MaxDpbSize) in 2533 its decoded picture buffer: 2535 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2536 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2537 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2538 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2539 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 2540 ) ) 2541 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2542 else 2543 MaxDpbSize = max-dpb 2545 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the 2546 highest level and PicSizeInSamplesY is the current size of 2547 each decoded picture in units of luma samples as defined in 2548 [HEVC]. 2550 The value of max-dpb MUST be greater than or equal to the 2551 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2552 Senders MAY use this knowledge to construct coded 2553 bitstreams with improved compression. 2555 When not present, the value of max-dpb is inferred to be 2556 equal to the value of MaxDpbPicBuf (i.e. 6) as defined in 2557 [HEVC]. 2559 Informative note: This parameter was added primarily to 2560 complement a similar codepoint in the ITU-T 2561 Recommendation H.245, so as to facilitate signaling 2562 gateway designs. The decoded picture buffer stores 2563 reconstructed samples. There is no relationship between 2564 the size of the decoded picture buffer and the buffers 2565 used in RTP, especially de-packetization and de-jitter 2566 buffers. 2568 max-br: 2569 The value of max-br is an integer indicating the maximum 2570 video bitrate in units of CpbBrVclFactor bits per second 2571 for the VCL HRD parameters and in units of CpbBrNalFactor 2572 bits per second for the NAL HRD parameters, where 2573 CpbBrVclFactor and CpbBrNalFactor are defined in Section 2574 A.4 of [HEVC]. 2576 The max-br parameter signals that the video decoder of the 2577 receiver is capable of decoding video at a higher bitrate 2578 than is required by the highest level. 2580 When max-br is signaled, the video codec of the receiver 2581 MUST be able to decode bitstreams that conform to the 2582 highest level, with the following exceptions in the limits 2583 specified by the highest level: 2585 o The value of max-br replaces the MaxBR value in Table A- 2586 2 of [HEVC] for the highest level. 2587 o When the max-cpb parameter is not present, the result of 2588 the following formula replaces the value of MaxCPB in 2589 Table A-1 of [HEVC]: 2591 (MaxCPB of the highest level) * max-br / (MaxBR of 2592 the highest level) 2594 For example, if a receiver signals capability for Main 2595 profile Level 2 with max-br equal to 2000, this indicates a 2596 maximum video bitrate of 2000 kbits/sec for VCL HRD 2597 parameters, a maximum video bitrate of 2200 kbits/sec for 2598 NAL HRD parameters, and a CPB size of 2000000 bits (2000000 2599 / 1500000 * 1500000). 2601 Senders MAY use this knowledge to send higher bitrate video 2602 as allowed in the level definition of Annex A of HEVC to 2603 achieve improved video quality. 2605 When not present, the value of max-br is inferred to be 2606 equal to the value of MaxBR given in Table A-2 of [HEVC] 2607 for the highest level. 2609 The value of max-br MUST be in the range of MaxBR to 2610 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2611 [HEVC] for the highest level. 2613 Informative note: This parameter was added primarily to 2614 complement a similar codepoint in the ITU-T 2615 Recommendation H.245, so as to facilitate signaling 2616 gateway designs. The assumption that the network is 2617 capable of handling such bitrates at any given time 2618 cannot be made from the value of this parameter. In 2619 particular, no conclusion can be drawn that the signaled 2620 bitrate is possible under congestion control 2621 constraints. 2623 max-tr: 2624 The value of max-tr is an integer indication the maximum 2625 number of tile rows. The max-tr parameter signals that the 2626 receiver is capable of decoding video with a larger number 2627 of tile rows than the value allowed by the highest level. 2629 When max-tr is signaled, the receiver MUST be able to 2630 decode bitstreams that conform to the highest level, with 2631 the exception that the MaxTileRows value in Table A-1 of 2632 [HEVC] for the highest level is replaced with the value of 2633 max-tr. 2635 Senders MAY use this knowledge to send pictures utilizing a 2636 larger number of tile rows than the value allowed by the 2637 highest level. 2639 When not present, the value of max-tr is inferred to be 2640 equal to the value of MaxTileRows given in Table A-1 of 2641 [HEVC] for the highest level. 2643 The value of max-tr MUST be in the range of MaxTileRows to 2644 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2645 Table A-1 of [HEVC] for the highest level. 2647 max-tc: 2648 The value of max-tc is an integer indication the maximum 2649 number of tile columns. The max-tc parameter signals that 2650 the receiver is capable of decoding video with a larger 2651 number of tile columns than the value allowed by the 2652 highest level. 2654 When max-tc is signaled, the receiver MUST be able to 2655 decode bitstreams that conform to the highest level, with 2656 the exception that the MaxTileCols value in Table A-1 of 2657 [HEVC] for the highest level is replaced with the value of 2658 max-tc. 2660 Senders MAY use this knowledge to send pictures utilizing a 2661 larger number of tile columns than the value allowed by the 2662 highest level. 2664 When not present, the value of max-tc is inferred to be 2665 equal to the value of MaxTileCols given in Table A-1 of 2666 [HEVC] for the highest level. 2668 The value of max-tc MUST be in the range of MaxTileCols to 2669 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2670 Table A-1 of [HEVC] for the highest level. 2672 max-fps: 2674 The value of max-fps is an integer indicating the maximum 2675 picture rate in units of pictures per 100 seconds that can 2676 be effectively processed by the receiver. The max-fps 2677 parameter MAY be used to signal that the receiver has a 2678 constraint in that it is not capable of processing video 2679 effectively at the full picture rate that is implied by the 2680 highest level and, when present, one or more of the 2681 parameters max-lsr, max-lps, and max-br. 2683 The value of max-fps is not necessarily the picture rate at 2684 which the maximum picture size can be sent, it constitutes 2685 a constraint on maximum picture rate for all resolutions. 2687 Informative note: The max-fps parameter is semantically 2688 different from max-lsr, max-lps, max-cpb, max-dpb, max- 2689 br, max-tr, and max-tc in that max-fps is used to signal 2690 a constraint, lowering the maximum picture rate from 2691 what is implied by other parameters. 2693 The encoder MUST use a picture rate equal to or less than 2694 this value. In cases where the max-fps parameter is absent 2695 the encoder is free to choose any picture rate according to 2696 the highest level and any signaled optional parameters. 2698 The value of max-fps MUST be smaller than or equal to the 2699 full picture rate that is implied by the highest level and, 2700 when present, one or more of the parameters max-lsr, max- 2701 lps, and max-br. 2703 sprop-max-don-diff: 2705 If tx-mode is equal to "SRST" and there is no NAL unit 2706 naluA that is followed in transmission order by any NAL 2707 unit preceding naluA in decoding order (i.e. the 2708 transmission order of the NAL units is the same as the 2709 decoding order), the value of this parameter MUST be equal 2710 to 0. 2712 Otherwise, if tx-mode is equal to "MRST" or "MRMT", the 2713 decoding order of the NAL units of all the RTP streams is 2714 the same as the NAL unit transmission order and the NAL 2715 unit output order, the value of this parameter MUST be 2716 equal to either 0 or 1. 2718 Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the 2719 decoding order of the NAL units of all the RTP streams is 2720 the same as the NAL unit transmission order but not the 2721 same as the NAL unit output order, the value of this 2722 parameter MUST be equal to 1. 2724 Otherwise, this parameter specifies the maximum absolute 2725 difference between the decoding order number (i.e., AbsDon) 2726 values of any two NAL units naluA and naluB, where naluA 2727 follows naluB in decoding order and precedes naluB in 2728 transmission order. 2730 The value of sprop-max-don-diff MUST be an integer in the 2731 range of 0 to 32767, inclusive. 2733 When not present, the value of sprop-max-don-diff is 2734 inferred to be equal to 0. 2736 sprop-depack-buf-nalus: 2738 This parameter specifies the maximum number of NAL units 2739 that precede a NAL unit in transmission order and follow 2740 the NAL unit in decoding order. 2742 The value of sprop-depack-buf-nalus MUST be an integer in 2743 the range of 0 to 32767, inclusive. 2745 When not present, the value of sprop-depack-buf-nalus is 2746 inferred to be equal to 0. 2748 When sprop-max-don-diff is present and greater than 0, this 2749 parameter MUST be present and the value MUST be greater 2750 than 0. 2752 sprop-depack-buf-bytes: 2754 This parameter signals the required size of the de- 2755 packetization buffer in units of bytes. The value of the 2756 parameter MUST be greater than or equal to the maximum 2757 buffer occupancy (in units of bytes) of the de- 2758 packetization buffer as specified in Section 6. 2760 The value of sprop-depack-buf-bytes MUST be an integer in 2761 the range of 0 to 4294967295, inclusive. 2763 When sprop-max-don-diff is present and greater than 0, this 2764 parameter MUST be present and the value MUST be greater 2765 than 0. When not present, the value of sprop-depack-buf- 2766 bytes is inferred to be equal to 0. 2768 Informative note: The value of sprop-depack-buf-bytes 2769 indicates the required size of the de-packetization 2770 buffer only. When network jitter can occur, an 2771 appropriately sized jitter buffer has to be available as 2772 well. 2774 depack-buf-cap: 2776 This parameter signals the capabilities of a receiver 2777 implementation and indicates the amount of de-packetization 2778 buffer space in units of bytes that the receiver has 2779 available for reconstructing the NAL unit decoding order 2780 from NAL units carried in one or more RTP streams. A 2781 receiver is able to handle any RTP stream, and all RTP 2782 streams the RTP stream depends on, when present, for which 2783 the value of the sprop-depack-buf-bytes parameter is 2784 smaller than or equal to this parameter. 2786 When not present, the value of depack-buf-cap is inferred 2787 to be equal to 4294967295. The value of depack-buf-cap 2788 MUST be an integer in the range of 1 to 4294967295, 2789 inclusive. 2791 Informative note: depack-buf-cap indicates the maximum 2792 possible size of the de-packetization buffer of the 2793 receiver only. When network jitter can occur, an 2794 appropriately sized jitter buffer has to be available as 2795 well. 2797 sprop-segmentation-id: 2799 This parameter MAY be used to signal the segmentation tools 2800 present in the bitstream and that can be used for 2801 parallelization. The value of sprop-segmentation-id MUST 2802 be an integer in the range of 0 to 3, inclusive. When not 2803 present, the value of sprop-segmentation-id is inferred to 2804 be equal to 0. 2806 When sprop-segmentation-id is equal to 0, no information 2807 about the segmentation tools is provided. When sprop- 2808 segmentation-id is equal to 1, it indicates that slices are 2809 present in the bitstream. When sprop-segmentation-id is 2810 equal to 2, it indicates that tiles are present in the 2811 bitstream. When sprop-segmentation-id is equal to 3, it 2812 indicates that WPP is used in the bitstream. 2814 sprop-spatial-segmentation-idc: 2816 A base16 [RFC4648] representation of the syntax element 2817 min_spatial_segmentation_idc as specified in [HEVC]. This 2818 parameter MAY be used to describe parallelization 2819 capabilities of the bitstream. 2821 dec-parallel-cap: 2823 This parameter MAY be used to indicate the decoder's 2824 additional decoding capabilities given the presence of 2825 tools enabling parallel decoding, such as slices, tiles, 2826 and WPP, in the bitstream. The decoding capability of the 2827 decoder may vary with the setting of the parallel decoding 2828 tools present in the bitstream, e.g. the size of the tiles 2829 that are present in a bitstream. Therefore, multiple 2830 capability points may be provided, each indicating the 2831 minimum required decoding capability that is associated 2832 with a parallelism requirement, which is a requirement on 2833 the bitstream that enables parallel decoding. 2835 Each capability point is defined as a combination of 1) a 2836 parallelism requirement, 2) a profile (determined by 2837 profile-space and profile-id), 3) a highest level, and 4) a 2838 maximum processing rate, a maximum picture size, and a 2839 maximum video bitrate that may be equal to or greater than 2840 that determined by the highest level. The parameter's 2841 syntax in ABNF [RFC5234] is as follows: 2843 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2844 cap-point) "}" 2846 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2847 cap-parameter) 2849 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2851 cap-parameter = tier-flag / level-id / max-lsr 2852 / max-lps / max-br 2854 tier-flag = "tier-flag" EQ ("0" / "1") 2856 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2858 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2859 18,446,744,073,709,551,615) 2861 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2862 max-br = "max-br" EQ 1*20DIGIT ; (0- 2863 18,446,744,073,709,551,615) 2865 EQ = "=" 2867 The set of capability points expressed by the dec-parallel- 2868 cap parameter is enclosed in a pair of curly braces ("{}"). 2869 Each set of two consecutive capability points is separated 2870 by a comma (','). Within each capability point, each set 2871 of two consecutive parameters, and when present, their 2872 values, is separated by a semicolon (';'). 2874 The profile of all capability points is determined by 2875 profile-space and profile-id that are outside the dec- 2876 parallel-cap parameter. 2878 Each capability point starts with an indication of the 2879 parallelism requirement, which consists of a parallel tool 2880 type, which may be equal to 'w' or 't', and a decimal value 2881 of the spatial-seg-idc parameter. When the type is 'w', 2882 the capability point is valid only for H.265 bitstreams 2883 with WPP in use, i.e. entropy_coding_sync_enabled_flag 2884 equal to 1. When the type is 't', the capability point is 2885 valid only for H.265 bitstreams with WPP not in use (i.e. 2886 entropy_coding_sync_enabled_flag equal to 0). The 2887 capability-point is valid only for H.265 bitstreams with 2888 min_spatial_segmentation_idc equal to or greater than 2889 spatial-seg-idc. 2891 After the parallelism requirement indication, each 2892 capability point continues with one or more pairs of 2893 parameter and value in any order for any of the following 2894 parameters: 2896 o tier-flag 2897 o level-id 2898 o max-lsr 2899 o max-lps 2900 o max-br 2902 At most one occurrence of each of the above five parameters 2903 is allowed within each capability point. 2905 The values of dec-parallel-cap.tier-flag and dec-parallel- 2906 cap.level-id for a capability point indicate the highest 2907 level of the capability point. The values of dec-parallel- 2908 cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- 2909 cap.max-br for a capability point indicate the maximum 2910 processing rate in units of luma samples per second, the 2911 maximum picture size in units of luma samples, and the 2912 maximum video bitrate (in units of CpbBrVclFactor bits per 2913 second for the VCL HRD parameters and in units of 2914 CpbBrNalFactor bits per second for the NAL HRD parameters 2915 where CpbBrVclFactor and CpbBrNalFactor are defined in 2916 Section A.4 of [HEVC]). 2918 When not present, the value of dec-parallel-cap.tier-flag 2919 is inferred to be equal to the value of tier-flag outside 2920 the dec-parallel-cap parameter. When not present, the 2921 value of dec-parallel-cap.level-id is inferred to be equal 2922 to the value of max-recv-level-id outside the dec-parallel- 2923 cap parameter. When not present, the value of dec- 2924 parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- 2925 parallel-cap.max-br is inferred to be equal to the value of 2926 max-lsr, max-lps, or max-br, respectively, outside the dec- 2927 parallel-cap parameter. 2929 The general decoding capability, expressed by the set of 2930 parameters outside of dec-parallel-cap, is defined as the 2931 capability point that is determined by the following 2932 combination of parameters: 1) the parallelism requirement 2933 corresponding to the value of sprop-segmentation-id equal 2934 to 0 for a bitstream, 2) the profile determined by profile- 2935 space, profile-id, profile-compatibility-indicator, and 2936 interop-constraints, 3) the tier and the highest level 2937 determined by tier-flag and max-recv-level-id, and 4) the 2938 maximum processing rate, the maximum picture size, and the 2939 maximum video bitrate determined by the highest level. The 2940 general decoding capability MUST NOT be included as one of 2941 the set of capability points in the dec-parallel-cap 2942 parameter. 2944 For example, the following parameters express the general 2945 decoding capability of 720p30 (Level 3.1) plus an 2946 additional decoding capability of 1080p30 (Level 4) given 2947 that the spatially largest tile or slice used in the 2948 bitstream is equal to or less than 1/3 of the picture size: 2950 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- 2951 id=120} 2953 For another example, the following parameters express an 2954 additional decoding capability of 1080p30, using dec- 2955 parallel-cap.max-lsr and dec-parallel-cap.max-lps, given 2956 that WPP is used in the bitstream: 2958 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2959 max-lsr=62668800;max-lps=2088960} 2961 Informative note: When min_spatial_segmentation_idc is 2962 present in a bitstream and WPP is not used, [HEVC] 2963 specifies that there is no slice or no tile in the 2964 bitstream containing more than 4 * PicSizeInSamplesY / 2965 ( min_spatial_segmentation_idc + 4 ) luma samples. 2967 include-dph: 2969 This parameter is used to indicate the capability and 2970 preference to utilize or include decoded picture hash (DPH) 2971 SEI messages (See Section D.3.19 of [HEVC]) in the 2972 bitstream. DPH SEI messages can be used to detect picture 2973 corruption so the receiver can request picture repair, see 2974 Section 8. The value is a comma separated list of hash 2975 types that is supported or requested to be used, each hash 2976 type provided as an unsigned integer value (0-255), with 2977 the hash types listed from most preferred to the least 2978 preferred. Example: "include-dph=0,2", which indicates the 2979 capability for MD5 (most preferred) and Checksum (less 2980 preferred). If the parameter is not included or the value 2981 contains no hash types, then no capability to utilize DPH 2982 SEI messages is assumed. Note that DPH SEI messages MAY 2983 still be included in the bitstream even when there is no 2984 declaration of capability to use them, as in general SEI 2985 messages do not affect the normative decoding process and 2986 decoders are allowed to ignore SEI messages. 2988 Encoding considerations: 2990 This type is only defined for transfer via RTP (RFC 3550). 2992 Security considerations: 2994 See Section 9 of RFC XXXX. 2996 Public specification: 2998 Please refer to Section 13 of RFC XXXX. 3000 Additional information: None 3002 File extensions: none 3004 Macintosh file type code: none 3006 Object identifier or OID: none 3008 Person & email address to contact for further information: 3010 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 3012 Intended usage: COMMON 3014 Author: See Section 14 of RFC XXXX. 3016 Change controller: 3018 IETF Audio/Video Transport Payloads working group delegated 3019 from the IESG. 3021 7.2 SDP Parameters 3023 The receiver MUST ignore any parameter unspecified in this memo. 3025 7.2.1 Mapping of Payload Type Parameters to SDP 3027 The media type video/H265 string is mapped to fields in the 3028 Session Description Protocol (SDP) [RFC4566] as follows: 3030 o The media name in the "m=" line of SDP MUST be video. 3032 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 3033 (the media subtype). 3035 o The clock rate in the "a=rtpmap" line MUST be 90000. 3037 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 3038 flag", "level-id", "interop-constraints", "profile- 3039 compatibility-indicator", "sprop-sub-layer-id", "recv-sub- 3040 layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- 3041 lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", 3042 "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", 3043 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop- 3044 segmentation-id", "sprop-spatial-segmentation-idc", "dec- 3045 parallel-cap", and "include-dph", when present, MUST be 3046 included in the "a=fmtp" line of SDP. This parameter is 3047 expressed as a media type string, in the form of a semicolon 3048 separated list of parameter=value pairs. 3050 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 3051 pps", when present, MUST be included in the "a=fmtp" line of 3052 SDP or conveyed using the "fmtp" source attribute as specified 3053 in Section 6.3 of [RFC5576]. For a particular media format 3054 (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- 3055 pps" MUST NOT be both included in the "a=fmtp" line of SDP and 3056 conveyed using the "fmtp" source attribute. When included in 3057 the "a=fmtp" line of SDP, these parameters are expressed as a 3058 media type string, in the form of a semicolon separated list 3059 of parameter=value pairs. When conveyed in the "a=fmtp" line 3060 of SDP for a particular payload type, the parameters "sprop- 3061 vps", "sprop-sps", and "sprop-pps" MUST be applied to each 3062 SSRC with the payload type. When conveyed using the "fmtp" 3063 source attribute, these parameters are only associated with 3064 the given source and payload type as parts of the "fmtp" 3065 source attribute. 3067 Informative note: Conveyance of "sprop-vps", "sprop-sps", 3068 and "sprop-pps" using the "fmtp" source attribute allows 3069 for out-of-band transport of parameter sets in topologies 3070 like Topo-Video-switch-MCU as specified in [RFC5117]. 3072 An example of media representation in SDP is as follows: 3074 m=video 49170 RTP/AVP 98 3075 a=rtpmap:98 H265/90000 3076 a=fmtp:98 profile-id=1; 3077 sprop-vps=