idnits 2.17.1 draft-ietf-payload-rtp-h265-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1630 has weird spacing: '... This memo ...' == Line 1635 has weird spacing: '... signal two ...' -- The document date (November 5, 2015) is 3095 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1767 -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-09 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-23 -- Obsolete informational reference (is this intentional?): RFC 2326 (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group Y.-K. Wang 2 Internet Draft Qualcomm 3 Intended status: Standards track Y. Sanchez 4 Expires: May 2016 T. Schierl 5 Fraunhofer HHI 6 S. Wenger 7 Vidyo 8 M. M. Hannuksela 9 Nokia 10 November 5, 2015 12 RTP Payload Format for H.265/HEVC Video 13 draft-ietf-payload-rtp-h265-15.txt 15 Abstract 17 This memo describes an RTP payload format for the video coding 18 standard ITU-T Recommendation H.265 and ISO/IEC International 19 Standard 23008-2, both also known as High Efficiency Video Coding 20 (HEVC) and developed by the Joint Collaborative Team on Video 21 Coding (JCT-VC). The RTP payload format allows for packetization 22 of one or more Network Abstraction Layer (NAL) units in each RTP 23 packet payload, as well as fragmentation of a NAL unit into 24 multiple RTP packets. Furthermore, it supports transmission of 25 an HEVC bitstream over a single as well as multiple RTP streams. 26 When multiple RTP streams are used, a single or multiple 27 transports may be utilized. The payload format has wide 28 applicability in videoconferencing, Internet video streaming, and 29 high bit-rate entertainment-quality video, among others. 31 Status of this Memo 33 This Internet-Draft is submitted to IETF in full conformance with 34 the provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF), its areas, and its working groups. Note that 38 other groups may also distribute working documents as Internet- 39 Drafts. 41 Internet-Drafts are draft documents valid for a maximum of six 42 months and may be updated, replaced, or obsoleted by other 43 documents at any time. It is inappropriate to use Internet- 44 Drafts as reference material or to cite them other than as "work 45 in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on May 5, 2016. 55 Copyright and License Notice 57 Copyright (c) 2015 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with 65 respect to this document. Code Components extracted from this 66 document must include Simplified BSD License text as described in 67 Section 4.e of the Trust Legal Provisions and are provided 68 without warranty as described in the Simplified BSD License. 70 Table of Contents 72 Abstract..........................................................1 73 Status of this Memo...............................................1 74 Table of Contents.................................................3 75 1 Introduction....................................................5 76 1.1 Overview of the HEVC Codec.................................5 77 1.1.1 Coding-Tool Features..................................6 78 1.1.2 Systems and Transport Interfaces......................8 79 1.1.3 Parallel Processing Support..........................14 80 1.1.4 NAL Unit Header......................................17 81 1.2 Overview of the Payload Format............................18 82 2 Conventions....................................................19 83 3 Definitions and Abbreviations..................................19 84 3.1 Definitions...............................................19 85 3.1.1 Definitions from the HEVC Specification..............19 86 3.1.2 Definitions Specific to This Memo....................21 87 3.2 Abbreviations.............................................23 88 4 RTP Payload Format.............................................25 89 4.1 RTP Header Usage..........................................25 90 4.2 Payload Header Usage......................................27 91 4.3 Transmission Modes........................................28 92 4.4 Payload Structures........................................29 93 4.4.1 Single NAL Unit Packets..............................30 94 4.4.2 Aggregation Packets (APs)............................30 95 4.4.3 Fragmentation Units (FUs)............................35 96 4.4.4 PACI packets.........................................38 97 4.4.4.1 Reasons for the PACI rules (informative)........41 98 4.4.4.2 PACI extensions (Informative)...................42 99 4.5 Temporal Scalability Control Information..................43 100 4.6 Decoding Order Number.....................................45 101 5 Packetization Rules............................................47 102 6 De-packetization Process.......................................48 103 7 Payload Format Parameters......................................50 104 7.1 Media Type Registration...................................51 105 7.2 SDP Parameters............................................76 106 7.2.1 Mapping of Payload Type Parameters to SDP............76 107 7.2.2 Usage with SDP Offer/Answer Model....................78 108 7.2.3 Usage in Declarative Session Descriptions............87 109 7.2.4 Parameter Sets Considerations........................88 110 7.2.5 Dependency Signaling in Multi-Stream Mode............88 111 8 Use with Feedback Messages.....................................89 112 8.1 Picture Loss Indication (PLI).............................89 113 8.2 Slice Loss Indication (SLI)...............................89 114 8.3 Reference Picture Selection Indication (RPSI).............91 115 8.4 Full Intra Request (FIR)..................................91 116 9 Security Considerations........................................92 117 10 Congestion Control............................................94 118 11 IANA Consideration............................................95 119 12 Acknowledgements..............................................95 120 13 References....................................................96 121 13.1 Normative References.....................................96 122 13.2 Informative References...................................97 123 14 Authors' Addresses............................................99 125 1 Introduction 127 The High Efficiency Video Coding [HEVC], formally known as ITU-T 128 Recommendation H.265 and ISO/IEC International Standard 23008-2 129 was ratified by ITU-T in April 2013 and reportedly provides 130 significant coding efficiency gains over H.264 [H.264]. 132 This memo describes an RTP payload format for HEVC. It shares 133 its basic design with the RTP payload formats of [RFC6184] and 134 [RFC6190]. With respect to design philosophy, security, 135 congestion control, and overall implementation complexity, it has 136 similar properties to those earlier payload format 137 specifications. This is a conscious choice, as at least RFC6184 138 is widely deployed and generally known in the relevant 139 implementer communities. Mechanisms from RFC6190 were 140 incorporated as HEVC version 1 supports temporal scalability. 142 In order to help the overlapping implementer community, 143 frequently only the differences between RFC6184/RFC6190 and the 144 HEVC payload format are highlighted in non-normative, explanatory 145 parts of this memo. Basic familiarity with both specifications 146 is assumed for those parts. However, the normative parts of this 147 memo do not require study of RFC6184 or RFC6190. 149 1.1 Overview of the HEVC Codec 151 H.264 and HEVC share a similar hybrid video codec design. In 152 this memo, we provide a very brief overview of those features of 153 HEVC that are in some form addressed by the payload format 154 specified herein. Implementers have to read and understand, and 155 apply the ITU-T/ISO/IEC specifications pertaining to HEVC to 156 arrive at interoperable, well-performing implementations. 157 Implementers should consider testing their design (including the 158 interworking between the payload format implementation and the 159 core video codec) using the tools provided by ITU-T/ISO/IEC; for 160 example, conformance bitstreams as specified in [add confermance 161 spec). Not doing so has historically led to badly performing and 162 unsecure systems. 164 Conceptually, both H.264 and HEVC include a video coding layer 165 (VCL), which is often used to refer to the coding-tool features, 166 and a network abstraction layer (NAL), which is often used to 167 refer to the systems and transport interface aspects of the 168 codecs. 170 1.1.1 Coding-Tool Features 172 Similarly to earlier hybrid-video-coding-based standards, 173 including H.264, the following basic video coding design is 174 employed by HEVC. A prediction signal is first formed either by 175 intra or motion compensated prediction, and the residual (the 176 difference between the original and the prediction) is then 177 coded. The gains in coding efficiency are achieved by 178 redesigning and improving almost all parts of the codec over 179 earlier designs. In addition, HEVC includes several tools to 180 make the implementation on parallel architectures easier. Below 181 is a summary of HEVC coding-tool features. 183 Quad-tree block and transform structure 185 One of the major tools that contribute significantly to the 186 coding efficiency of HEVC is the usage of flexible coding blocks 187 and transforms, which are defined in a hierarchical quad-tree 188 manner. Unlike H.264, where the basic coding block is a 189 macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit 190 (CTU) of a maximum size of 64x64. Each CTU can be divided into 191 smaller units in a hierarchical quad-tree manner and can 192 represent smaller blocks down to size 4x4. Similarly, the 193 transforms used in HEVC can have different sizes, starting from 194 4x4 and going up to 32x32. Utilizing large blocks and transforms 195 contribute to the major gain of HEVC, especially at high 196 resolutions. 198 Entropy coding 200 HEVC uses a single entropy coding engine, which is based on 201 Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], 202 whereas H.264 uses two distinct entropy coding engines. CABAC in 203 HEVC shares many similarities with CABAC of H.264, but contains 204 several improvements. Those include improvements in coding 205 efficiency and lowered implementation complexity, especially for 206 parallel architectures. 208 In-loop filtering 210 H.264 includes an in-loop adaptive deblocking filter, where the 211 blocking artifacts around the transform edges in the 212 reconstructed picture are smoothed to improve the picture quality 213 and compression efficiency. In HEVC, a similar deblocking filter 214 is employed but with somewhat lower complexity. In addition, 215 pictures undergo a subsequent filtering operation called Sample 216 Adaptive Offset (SAO), which is a new design element in HEVC. 217 SAO basically adds a pixel-level offset in an adaptive manner and 218 usually acts as a de-ringing filter. It is observed that SAO 219 improves the picture quality, especially around sharp edges 220 contributing substantially to visual quality improvements of 221 HEVC. 223 Motion prediction and coding 225 There have been a number of improvements in this area that are 226 summarized as follows. The first category is motion merge and 227 advanced motion vector prediction (AMVP) modes. The motion 228 information of a prediction block can be inferred from the 229 spatially or temporally neighboring blocks. This is similar to 230 the DIRECT mode in H.264 but includes new aspects to incorporate 231 the flexible quad-tree structure and methods to improve the 232 parallel implementations. In addition, the motion vector 233 predictor can be signaled for improved efficiency. The second 234 category is high-precision interpolation. The interpolation 235 filter length is increased to 8-tap from 6-tap, which improves 236 the coding efficiency but also comes with increased complexity. 237 In addition, the interpolation filter is defined with higher 238 precision without any intermediate rounding operations to further 239 improve the coding efficiency. 241 Intra prediction and intra coding 243 Compared to 8 intra prediction modes in H.264, HEVC supports 244 angular intra prediction with 33 directions. This increased 245 flexibility improves both objective coding efficiency and visual 246 quality as the edges can be better predicted and ringing 247 artifacts around the edges can be reduced. In addition, the 248 reference samples are adaptively smoothed based on the prediction 249 direction. To avoid contouring artifacts a new interpolative 250 prediction generation is included to improve the visual quality. 251 Furthermore, discrete sine transform (DST) is utilized instead of 252 traditional discrete cosine transform (DCT) for 4x4 intra 253 transform blocks. 255 Other coding-tool features 257 HEVC includes some tools for lossless coding and efficient screen 258 content coding, such as skipping the transform for certain 259 blocks. These tools are particularly useful for example when 260 streaming the user-interface of a mobile device to a large 261 display. 263 1.1.2 Systems and Transport Interfaces 265 HEVC inherited the basic systems and transport interfaces 266 designs, such as the NAL-unit-based syntax structure, the 267 hierarchical syntax and data unit structure from sequence-level 268 parameter sets, multi-picture-level or picture-level parameter 269 sets, slice-level header parameters, lower-level parameters, the 270 supplemental enhancement information (SEI) message mechanism, the 271 hypothetical reference decoder (HRD) based video buffering model, 272 and so on. In the following, a list of differences in these 273 aspects compared to H.264 is summarized. 275 Video parameter set 277 A new type of parameter set, called video parameter set (VPS), 278 was introduced. For the first (2013) version of [HEVC], the 279 video parameter set NAL unit is required to be available prior to 280 its activation, while the information contained in the video 281 parameter set is not necessary for operation of the decoding 282 process. For future HEVC extensions, such as the 3D or scalable 283 extensions, the video parameter set is expected to include 284 information necessary for operation of the decoding process, e.g. 285 decoding dependency or information for reference picture set 286 construction of enhancement layers. The VPS provides a "big 287 picture" of a bitstream, including what types of operation points 288 are provided, the profile, tier, and level of the operation 289 points, and some other high-level properties of the bitstream 290 that can be used as the basis for session negotiation and content 291 selection, etc. (see Section 7.1). 293 Profile, tier and level 295 The profile, tier and level syntax structure that can be included 296 in both VPS and sequence parameter set (SPS) includes 12 bytes of 297 data to describe the entire bitstream (including all temporally 298 scalable layers, which are referred to as sub-layers in the HEVC 299 specification), and can optionally include more profile, tier and 300 level information pertaining to individual temporally scalable 301 layers. The profile indicator indicates the "best viewed as" 302 profile when the bitstream conforms to multiple profiles, similar 303 to the major brand concept in the ISO base media file format 304 (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF, 305 such as the 3GPP file format [3GPPFF]. The profile, tier and 306 level syntax structure also includes indications such as 1) 307 whether the bitstream is free of frame-packed content, 2) whether 308 the bitstream is free of interlaced source content, and 3) 309 whether the bitstream is free of field pictures. When the answer 310 is yes for both 2) and 3), the bitstream contains only frame 311 pictures of progressive source. Based on these indications, 312 clients/players without support of post-processing 313 functionalities for handling of frame-packed, interlaced source 314 content or field pictures can reject those bitstreams that 315 contain such pictures. 317 Bitstream and elementary stream 319 HEVC includes a definition of an elementary stream, which is new 320 compared to H.264. An elementary stream consists of a sequence 321 of one or more bitstreams. An elementary stream that consists of 322 two or more bitstreams has typically been formed by splicing 323 together two or more bitstreams (or parts thereof). When an 324 elementary stream contains more than one bitstream, the last NAL 325 unit of the last access unit of a bitstream (except the last 326 bitstream in the elementary stream) must contain an end of 327 bitstream NAL unit and the first access unit of the subsequent 328 bitstream must be an intra random access point (IRAP) access 329 unit. This IRAP access unit may be a clean random access (CRA), 330 broken link access (BLA), or instantaneous decoding refresh (IDR) 331 access unit. 333 Random access support 335 HEVC includes signaling in the NAL unit header, through NAL unit 336 types, of IRAP pictures beyond IDR pictures. Three types of IRAP 337 pictures, namely IDR, CRA and BLA pictures are supported, wherein 338 IDR pictures are conventionally referred to as closed group-of- 339 pictures (closed-GOP) random access points, and CRA and BLA 340 pictures are those conventionally referred to as open-GOP random 341 access points. BLA pictures usually originate from splicing of 342 two bitstreams or part thereof at a CRA picture, e.g. during 343 stream switching. To enable better systems usage of IRAP 344 pictures, altogether six different NAL units are defined to 345 signal the properties of the IRAP pictures, which can be used to 346 better match the stream access point (SAP) types as defined in 347 the ISOBMFF [ISOBMFF], which are utilized for random access 348 support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH]. 349 Pictures following an IRAP picture in decoding order and 350 preceding the IRAP picture in output order are referred to as 351 leading pictures associated with the IRAP picture. There are two 352 types of leading pictures, namely random access decodable leading 353 (RADL) pictures and random access skipped leading (RASL) 354 pictures. RADL pictures are decodable when the decoding started 355 at the associated IRAP picture, and RASL pictures are not 356 decodable when the decoding started at the associated IRAP 357 picture and are usually discarded. HEVC provides mechanisms to 358 enable the specification of conformance of bitstreams with RASL 359 pictures being discarded, thus to provide a standard-compliant 360 way to enable systems components to discard RASL pictures when 361 needed. 363 Temporal scalability support 365 HEVC includes an improved support of temporal scalability, by 366 inclusion of the signaling of TemporalId in the NAL unit header, 367 the restriction that pictures of a particular temporal sub-layer 368 cannot be used for inter prediction reference by pictures of a 369 lower temporal sub-layer, the sub-bitstream extraction process, 370 and the requirement that each sub-bitstream extraction output be 371 a conforming bitstream. Media-aware network elements (MANEs) can 372 utilize the TemporalId in the NAL unit header for stream 373 adaptation purposes based on temporal scalability. 375 Temporal sub-layer switching support 377 HEVC specifies, through NAL unit types present in the NAL unit 378 header, the signaling of temporal sub-layer access (TSA) and 379 stepwise temporal sub-layer access (STSA). A TSA picture and 380 pictures following the TSA picture in decoding order do not use 381 pictures prior to the TSA picture in decoding order with 382 TemporalId greater than or equal to that of the TSA picture for 383 inter prediction reference. A TSA picture enables up-switching, 384 at the TSA picture, to the sub-layer containing the TSA picture 385 or any higher sub-layer, from the immediately lower sub-layer. 386 An STSA picture does not use pictures with the same TemporalId as 387 the STSA picture for inter prediction reference. Pictures 388 following an STSA picture in decoding order with the same 389 TemporalId as the STSA picture do not use pictures prior to the 390 STSA picture in decoding order with the same TemporalId as the 391 STSA picture for inter prediction reference. An STSA picture 392 enables up-switching, at the STSA picture, to the sub-layer 393 containing the STSA picture, from the immediately lower sub- 394 layer. 396 Sub-layer reference or non-reference pictures 398 The concept and signaling of reference/non-reference pictures in 399 HEVC are different from H.264. In H.264, if a picture may be 400 used by any other picture for inter prediction reference, it is a 401 reference picture; otherwise it is a non-reference picture, and 402 this is signaled by two bits in the NAL unit header. In HEVC, a 403 picture is called a reference picture only when it is marked as 404 "used for reference". In addition, the concept of sub-layer 405 reference picture was introduced. If a picture may be used by 406 another other picture with the same TemporalId for inter 407 prediction reference, it is a sub-layer reference picture; 408 otherwise it is a sub-layer non-reference picture. Whether a 409 picture is a sub-layer reference picture or sub-layer non- 410 reference picture is signaled through NAL unit type values. 412 Extensibility 414 Besides the TemporalId in the NAL unit header, HEVC also includes 415 the signaling of a six-bit layer ID in the NAL unit header, which 416 must be equal to 0 for a single-layer bitstream. Extension 417 mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, 418 slice headers, and so on. All these extension mechanisms enable 419 future extensions in a backward compatible manner, such that 420 bitstreams encoded according to potential future HEVC extensions 421 can be fed to then-legacy decoders (e.g. HEVC version 1 decoders) 422 and the then-legacy decoders can decode and output the base layer 423 bitstream. 425 Bitstream extraction 427 HEVC includes a bitstream extraction process as an integral part 428 of the overall decoding process, as well as specification of the 429 use of the bitstream extraction process in description of 430 bitstream conformance tests as part of the hypothetical reference 431 decoder (HRD) specification. 433 Reference picture management 435 The reference picture management of HEVC, including reference 436 picture marking and removal from the decoded picture buffer (DPB) 437 as well as reference picture list construction (RPLC), differs 438 from that of H.264. Instead of the sliding window plus adaptive 439 memory management control operation (MMCO) based reference 440 picture marking mechanism in H.264, HEVC specifies a reference 441 picture set (RPS) based reference picture management and marking 442 mechanism, and the RPLC is consequently based on the RPS 443 mechanism. A reference picture set consists of a set of 444 reference pictures associated with a picture, consisting of all 445 reference pictures that are prior to the associated picture in 446 decoding order, that may be used for inter prediction of the 447 associated picture or any picture following the associated 448 picture in decoding order. The reference picture set consists of 449 five lists of reference pictures; RefPicSetStCurrBefore, 450 RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and 451 RefPicSetLtFoll. RefPicSetStCurrBefore, RefPicSetStCurrAfter and 452 RefPicSetLtCurr contain all reference pictures that may be used 453 in inter prediction of the current picture and that may be used 454 in inter prediction of one or more of the pictures following the 455 current picture in decoding order. RefPicSetStFoll and 456 RefPicSetLtFoll consist of all reference pictures that are not 457 used in inter prediction of the current picture but may be used 458 in inter prediction of one or more of the pictures following the 459 current picture in decoding order. RPS provides an "intra-coded" 460 signaling of the DPB status, instead of an "inter-coded" 461 signaling, mainly for improved error resilience. The RPLC 462 process in HEVC is based on the RPS, by signaling an index to an 463 RPS subset for each reference index; this process is simpler than 464 the RPLC process in H.264. 466 Ultra low delay support 468 HEVC specifies a sub-picture-level HRD operation, for support of 469 the so-called ultra-low delay. The mechanism specifies a 470 standard-compliant way to enable delay reduction below one 471 picture interval. Sub-picture-level coded picture buffer (CPB) 472 and DPB parameters may be signaled, and utilization of these 473 information for the derivation of CPB timing (wherein the CPB 474 removal time corresponds to decoding time) and DPB output timing 475 (display time) is specified. Decoders are allowed to operate the 476 HRD at the conventional access-unit-level, even when the sub- 477 picture-level HRD parameters are present. 479 New SEI messages 481 HEVC inherits many H.264 SEI messages with changes in syntax 482 and/or semantics making them applicable to HEVC. Additionally, 483 there are a few new SEI messages reviewed briefly in the 484 following paragraphs. 486 The display orientation SEI message informs the decoder of a 487 transformation that is recommended to be applied to the cropped 488 decoded picture prior to display, such that the pictures can be 489 properly displayed, e.g. in an upside-up manner. 491 The structure of pictures SEI message provides information on the 492 NAL unit types, picture order count values, and prediction 493 dependencies of a sequence of pictures. The SEI message can be 494 used for example for concluding what impact a lost picture has on 495 other pictures. 497 The decoded picture hash SEI message provides a checksum derived 498 from the sample values of a decoded picture. It can be used for 499 detecting whether a picture was correctly received and decoded. 501 The active parameter sets SEI message includes the IDs of the 502 active video parameter set and the active sequence parameter set 503 and can be used to activate VPSs and SPSs. In addition, the SEI 504 message includes the following indications: 1) An indication of 505 whether "full random accessibility" is supported (when supported, 506 all parameter sets needed for decoding of the remaining of the 507 bitstream when random accessing from the beginning of the current 508 CVS by completely discarding all access units earlier in decoding 509 order are present in the remaining bitstream and all coded 510 pictures in the remaining bitstream can be correctly decoded); 2) 511 An indication of whether there is no parameter set within the 512 current CVS that updates another parameter set of the same type 513 preceding in decoding order. An update of a parameter set refers 514 to the use of the same parameter set ID but with some other 515 parameters changed. If this property is true for all CVSs in the 516 bitstream, then all parameter sets can be sent out-of-band before 517 session start. 519 The decoding unit information SEI message provides coded picture 520 buffer removal delay information for a decoding unit. The 521 message can be used in very-low-delay buffering operations. 523 The region refresh information SEI message can be used together 524 with the recovery point SEI message (present in both H.264 and 525 HEVC) for improved support of gradual decoding refresh. This 526 supports random access from inter-coded pictures, wherein 527 complete pictures can be correctly decoded or recovered after an 528 indicated number of pictures in output/display order. 530 1.1.3 Parallel Processing Support 532 The reportedly significantly higher encoding computational demand 533 of HEVC over H.264, in conjunction with the ever increasing video 534 resolution (both spatially and temporally) required by the 535 market, led to the adoption of VCL coding tools specifically 536 targeted to allow for parallelization on the sub-picture level. 537 That is, parallelization occurs, at the minimum, at the 538 granularity of an integer number of CTUs. The targets for this 539 type of high-level parallelization are multicore CPUs and DSPs as 540 well as multiprocessor systems. In a system design, to be 541 useful, these tools require signaling support, which is provided 542 in Section 7 of this memo. This section provides a brief 543 overview of the tools available in [HEVC]. 545 Many of the tools incorporated in HEVC were designed keeping in 546 mind the potential parallel implementations in multi-core/multi- 547 processor architectures. Specifically, for parallelization, four 548 picture partition strategies, as described below, are available. 550 Slices are segments of the bitstream that can be reconstructed 551 independently from other slices within the same picture (though 552 there may still be interdependencies through loop filtering 553 operations). Slices are the only tool that can be used for 554 parallelization that is also available, in virtually identical 555 form, in H.264. Slices based parallelization does not require 556 much inter-processor or inter-core communication (except for 557 inter-processor or inter-core data sharing for motion 558 compensation when decoding a predictively coded picture, which is 559 typically much heavier than inter-processor or inter-core data 560 sharing due to in-picture prediction), as slices are designed to 561 be independently decodable. However, for the same reason, slices 562 can require some coding overhead. Further, slices (in contrast 563 to some of the other tools mentioned below) also serve as the key 564 mechanism for bitstream partitioning to match Maximum Transfer 565 Unit (MTU) size requirements, due to the in-picture independence 566 of slices and the fact that each regular slice is encapsulated in 567 its own NAL unit. In many cases, the goal of parallelization and 568 the goal of MTU size matching can place contradicting demands to 569 the slice layout in a picture. The realization of this situation 570 led to the development of the more advanced tools mentioned 571 below. 573 Dependent slice segments allow for fragmentation of a coded slice 574 into fragments at CTU boundaries without breaking any in-picture 575 prediction mechanism. They are complementary to the 576 fragmentation mechanism described in this memo in that they need 577 the cooperation of the encoder. As a dependent slice segment 578 necessarily contains an integer number of CTUs, a decoder using 579 multiple cores operating on CTUs can process a dependent slice 580 segment without communicating parts of the slice segment's 581 bitstream to other cores. Fragmentation, as specified in this 582 memo, in contrast, does not guarantee that a fragment contains an 583 integer number of CTUs. 585 In wavefront parallel processing (WPP), the picture is 586 partitioned into rows of CTUs. Entropy decoding and prediction 587 are allowed to use data from CTUs in other partitions. Parallel 588 processing is possible through parallel decoding of CTU rows, 589 where the start of the decoding of a row is delayed by two CTUs, 590 so to ensure that data related to a CTU above and to the right of 591 the subject CTU is available before the subject CTU is being 592 decoded. Using this staggered start (which appears like a 593 wavefront when represented graphically), parallelization is 594 possible with up to as many processors/cores as the picture 595 contains CTU rows. 597 Because in-picture prediction between neighboring CTU rows within 598 a picture is allowed, the required inter-processor/inter-core 599 communication to enable in-picture prediction can be substantial. 600 The WPP partitioning does not result in the creation of more NAL 601 units compared to when it is not applied, thus WPP cannot be used 602 for MTU size matching, though slices can be used in combination 603 for that purpose. 605 Tiles define horizontal and vertical boundaries that partition a 606 picture into tile columns and rows. The scan order of CTUs is 607 changed to be local within a tile (in the order of a CTU raster 608 scan of a tile), before decoding the top-left CTU of the next 609 tile in the order of tile raster scan of a picture. Similar to 610 slices, tiles break in-picture prediction dependencies (including 611 entropy decoding dependencies). However, they do not need to be 612 included into individual NAL units (same as WPP in this regard), 613 hence tiles cannot be used for MTU size matching, though slices 614 can be used in combination for that purpose. Each tile can be 615 processed by one processor/core, and the inter-processor/inter- 616 core communication required for in-picture prediction between 617 processing units decoding neighboring tiles is limited to 618 conveying the shared slice header in cases a slice is spanning 619 more than one tile, and loop filtering related sharing of 620 reconstructed samples and metadata. Insofar, tiles are less 621 demanding in terms of inter-processor communication bandwidth 622 compared to WPP due to the in-picture independence between two 623 neighboring partitions. 625 1.1.4 NAL Unit Header 627 HEVC maintains the NAL unit concept of H.264 with modifications. 628 HEVC uses a two-byte NAL unit header, as shown in Figure 1. The 629 payload of a NAL unit refers to the NAL unit excluding the NAL 630 unit header. 632 +---------------+---------------+ 633 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 635 |F| Type | LayerId | TID | 636 +-------------+-----------------+ 638 Figure 1 The structure of HEVC NAL unit header 640 The semantics of the fields in the NAL unit header are as 641 specified in [HEVC] and described briefly below for convenience. 642 In addition to the name and size of each field, the corresponding 643 syntax element name in [HEVC] is also provided. 645 F: 1 bit 646 forbidden_zero_bit. Required to be zero in [HEVC]. Note that 647 the inclusion of this bit in the NAL unit header was to enable 648 transport of HEVC video over MPEG-2 transport systems 649 (avoidance of start code emulations) [MPEG2S]. In the context 650 of this memo, the value 1 may be used to indicate a syntax 651 violation, e.g. for a NAL unit resulted from aggregating a 652 number of fragmented units of a NAL unit but missing the last 653 fragment, as described in Section 4.4.3. 655 Type: 6 bits 656 nal_unit_type. This field specifies the NAL unit type as 657 defined in Table 7-1 of [HEVC]. If the most significant bit 658 of this field of a NAL unit is equal to 0 (i.e. the value of 659 this field is less than 32), the NAL unit is a VCL NAL unit. 660 Otherwise, the NAL unit is a non-VCL NAL unit. For a 661 reference of all currently defined NAL unit types and their 662 semantics, please refer to Section 7.4.1 in [HEVC]. 664 LayerId: 6 bits 665 nuh_layer_id. Required to be equal to zero in [HEVC]. It is 666 anticipated that in future scalable or 3D video coding 667 extensions of this specification, this syntax element will be 668 used to identify additional layers that may be present in the 669 CVS, wherein a layer may be, e.g. a spatial scalable layer, a 670 quality scalable layer, a texture view, or a depth view. 672 TID: 3 bits 673 nuh_temporal_id_plus1. This field specifies the temporal 674 identifier of the NAL unit plus 1. The value of TemporalId is 675 equal to TID minus 1. A TID value of 0 is illegal to ensure 676 that there is at least one bit in the NAL unit header equal to 677 1, so to enable independent considerations of start code 678 emulations in the NAL unit header and in the NAL unit payload 679 data. 681 1.2 Overview of the Payload Format 683 This payload format defines the following processes required for 684 transport of HEVC coded data over RTP [RFC3550]: 686 o Usage of RTP header with this payload format 688 o Packetization of HEVC coded NAL units into RTP packets using 689 three types of payload structures, namely single NAL unit 690 packet, aggregation packet, and fragment unit 692 o Transmission of HEVC NAL units of the same bitstream within a 693 single RTP stream or multiple RTP streams (within one or more 694 RTP sessions), where within an RTP stream transmission of NAL 695 units may be either non-interleaved (i.e. the transmission 696 order of NAL units is the same as their decoding order) or 697 interleaved (i.e. the transmission order of NAL units is 698 different from their decoding order) 700 o Media type parameters to be used with the Session Description 701 Protocol (SDP) [RFC4566] 703 o A payload header extension mechanism and data structures for 704 enhanced support of temporal scalability based on that 705 extension mechanism. 707 2 Conventions 709 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 710 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 711 "OPTIONAL" in this document are to be interpreted as described in 712 BCP 14, RFC 2119 [RFC2119]. 714 In this document, these key words will appear with that 715 interpretation only when in ALL CAPS. Lower case uses of these 716 words are not to be interpreted as carrying the RFC 2119 717 significance. 719 This specification uses the notion of setting and clearing a bit 720 when bit fields are handled. Setting a bit is the same as 721 assigning that bit the value of 1 (On). Clearing a bit is the 722 same as assigning that bit the value of 0 (Off). 724 3 Definitions and Abbreviations 726 3.1 Definitions 728 This document uses the terms and definitions of [HEVC]. Section 729 3.1.1 lists relevant definitions copied from [HEVC] (the April 730 2013 version of the H.265 specification) for convenience. 731 Section 3.1.2 provides definitions specific to this memo. 733 3.1.1 Definitions from the HEVC Specification 735 access unit: A set of NAL units that are associated with each 736 other according to a specified classification rule, are 737 consecutive in decoding order, and contain exactly one coded 738 picture. 740 BLA access unit: An access unit in which the coded picture is a 741 BLA picture. 743 BLA picture: An IRAP picture for which each VCL NAL unit has 744 nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP. 746 coded video sequence (CVS): A sequence of access units that 747 consists, in decoding order, of an IRAP access unit with 748 NoRaslOutputFlag equal to 1, followed by zero or more access 749 units that are not IRAP access units with NoRaslOutputFlag equal 750 to 1, including all subsequent access units up to but not 751 including any subsequent access unit that is an IRAP access unit 752 with NoRaslOutputFlag equal to 1. 754 Informative note: An IRAP access unit may be an IDR access 755 unit, a BLA access unit, or a CRA access unit. The value of 756 NoRaslOutputFlag is equal to 1 for each IDR access unit, each 757 BLA access unit, and each CRA access unit that is the first 758 access unit in the bitstream in decoding order, is the first 759 access unit that follows an end of sequence NAL unit in 760 decoding order, or has HandleCraAsBlaFlag equal to 1. 762 CRA access unit: An access unit in which the coded picture is a 763 CRA picture. 765 CRA picture: A RAP picture for which each VCL NAL unit has 766 nal_unit_type equal to CRA_NUT. 768 IDR access unit: An access unit in which the coded picture is an 769 IDR picture. 771 IDR picture: A RAP picture for which each VCL NAL unit has 772 nal_unit_type equal to IDR_W_RADL or IDR_N_LP. 774 IRAP access unit: An access unit in which the coded picture is an 775 IRAP picture. 777 IRAP picture: A coded picture for which each VCL NAL unit has 778 nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 779 (23), inclusive. 781 layer: A set of VCL NAL units that all have a particular value of 782 nuh_layer_id and the associated non-VCL NAL units, or one of a 783 set of syntactical structures having a hierarchical relationship. 785 operation point: bitstream created from another bitstream by 786 operation of the sub-bitstream extraction process with the 787 another bitstream, a target highest TemporalId, and a target 788 layer identifier list as inputs. 790 random access: The act of starting the decoding process for a 791 bitstream at a point other than the beginning of the bitstream. 793 sub-layer: A temporal scalable layer of a temporal scalable 794 bitstream consisting of VCL NAL units with a particular value of 795 the TemporalId variable, and the associated non-VCL NAL units. 797 sub-layer representation: A subset of the bitstream consisting of 798 NAL units of a particular sub-layer and the lower sub-layers. 800 tile: A rectangular region of coding tree blocks within a 801 particular tile column and a particular tile row in a picture. 803 tile column: A rectangular region of coding tree blocks having a 804 height equal to the height of the picture and a width specified 805 by syntax elements in the picture parameter set. 807 tile row: A rectangular region of coding tree blocks having a 808 height specified by syntax elements in the picture parameter set 809 and a width equal to the width of the picture. 811 3.1.2 Definitions Specific to This Memo 813 dependee RTP stream: An RTP stream on which another RTP stream 814 depends. All RTP streams in an MRST or MRMT except for the 815 highest RTP stream are dependee RTP streams. 817 highest RTP stream: The RTP stream on which no other RTP stream 818 depends. The RTP stream in an SRST is the highest RTP stream. 820 media aware network element (MANE): A network element, such as a 821 middlebox, selective forwarding unit, or application layer 822 gateway that is capable of parsing certain aspects of the RTP 823 payload headers or the RTP payload and reacting to their 824 contents. 826 Informative note: The concept of a MANE goes beyond normal 827 routers or gateways in that a MANE has to be aware of the 828 signaling (e.g. to learn about the payload type mappings of 829 the media streams), and in that it has to be trusted when 830 working with SRTP. The advantage of using MANEs is that they 831 allow packets to be dropped according to the needs of the 832 media coding. For example, if a MANE has to drop packets due 833 to congestion on a certain link, it can identify and remove 834 those packets whose elimination produces the least adverse 835 effect on the user experience. After dropping packets, MANEs 836 must rewrite RTCP packets to match the changes to the RTP 837 stream as specified in Section 7 of [RFC3550]. 839 Media Transport: As used in the MRST, MRMT, and SRST definitions 840 below, Media Transport denotes the transport of packets over a 841 transport association identified by a 5-tuple (source address, 842 source port, destination address, destination port, transport 843 protocol). See also Section 2.1.13 of [I-D.ietf-avtext-rtp- 844 grouping-taxonomy]. 846 Informative note: The term "bitstream" in this document is 847 equivalent to the term "encoded stream" in [I-D.ietf-avtext- 848 rtp-grouping-taxonomy]. 850 Multiple RTP streams on a Single Transport (MRST): Multiple RTP 851 streams carrying a single HEVC bitstream on a Single Transport. 852 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 854 Multiple RTP streams on Multiple Transports (MRMT): Multiple RTP 855 streams carrying a single HEVC bitstream on Multiple Transports. 856 See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy]. 858 NAL unit decoding order: A NAL unit order that conforms to the 859 constraints on NAL unit order given in Section 7.4.2.4 in [HEVC]. 861 NAL unit output order: A NAL unit order in which NAL units of 862 different access units are in the output order of the decoded 863 pictures corresponding to the access units, as specified in 864 [HEVC], and in which NAL units within an access unit are in their 865 decoding order. 867 NAL-unit-like structure: A data structure that is similar to NAL 868 units in the sense that it also has a NAL unit header and a 869 payload, with a difference that the payload does not follow the 870 start code emulation prevention mechanism required for the NAL 871 unit syntax as specified in Section 7.3.1.1 of [HEVC]. Examples 872 NAL-unit-like structures defined in this memo are packet payloads 873 of AP, PACI, and FU packets. 875 NALU-time: The value that the RTP timestamp would have if the NAL 876 unit would be transported in its own RTP packet. 878 RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy]. Within 879 the scope of this memo, one RTP stream is utilized to transport 880 one or more temporal sub-layers. 882 Single RTP stream on a Single Transport (SRST): Single RTP 883 stream carrying a single HEVC bitstream on a Single (Media) 884 Transport. See also Section 3.5 of [I-D.ietf-avtext-rtp- 885 grouping-taxonomy]. 887 transmission order: The order of packets in ascending RTP 888 sequence number order (in modulo arithmetic). Within an 889 aggregation packet, the NAL unit transmission order is the same 890 as the order of appearance of NAL units in the packet. 892 3.2 Abbreviations 894 AP Aggregation Packet 896 BLA Broken Link Access 898 CRA Clean Random Access 900 CTB Coding Tree Block 902 CTU Coding Tree Unit 903 CVS Coded Video Sequence 905 DPH Decoded Picture Hash 907 FU Fragmentation Unit 909 HRD Hypothetical Reference Decoder 911 IDR Instantaneous Decoding Refresh 913 IRAP Intra Random Access Point 915 MANE Media Aware Network Element 917 MRMT Multiple RTP streams on Multiple Transports 919 MRST Multiple RTP streams on a Single Transport 921 MTU Maximum Transfer Unit 923 NAL Network Abstraction Layer 925 NALU Network Abstraction Layer Unit 927 PACI PAyload Content Information 929 PHES Payload Header Extension Structure 931 PPS Picture Parameter Set 933 RADL Random Access Decodable Leading (Picture) 935 RASL Random Access Skipped Leading (Picture) 937 RPS Reference Picture Set 939 SEI Supplemental Enhancement Information 941 SPS Sequence Parameter Set 943 SRST Single RTP stream on a Single Transport 945 STSA Step-wise Temporal Sub-layer Access 946 TSA Temporal Sub-layer Access 948 TSCI Temporal Scalability Control Information 950 VCL Video Coding Layer 952 VPS Video Parameter Set 954 4 RTP Payload Format 956 4.1 RTP Header Usage 958 The format of the RTP header is specified in [RFC3550] and 959 reprinted in Figure 2 for convenience. This payload format uses 960 the fields of the header in a manner consistent with that 961 specification. 963 The RTP payload (and the settings for some RTP header bits) for 964 aggregation packets and fragmentation units are specified in 965 Sections 4.4.2 and 4.4.3, respectively. 967 0 1 2 3 968 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 969 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 970 |V=2|P|X| CC |M| PT | sequence number | 971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 | timestamp | 973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 974 | synchronization source (SSRC) identifier | 975 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 976 | contributing source (CSRC) identifiers | 977 | .... | 978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 980 Figure 2 RTP header according to [RFC3550] 982 The RTP header information to be set according to this RTP 983 payload format is set as follows: 985 Marker bit (M): 1 bit 987 Set for the last packet of the access unit, carried in the 988 current RTP stream. This is in line with the normal use of 989 the M bit in video formats to allow an efficient playout 990 buffer handling. When MRST or MRMT is in use, if an access 991 unit appears in multiple RTP streams, the marker bit is set on 992 each RTP stream's last packet of the access unit. 994 Informative note: The content of a NAL unit does not tell 995 whether or not the NAL unit is the last NAL unit, in 996 decoding order, of an access unit. An RTP sender 997 implementation may obtain these information from the video 998 encoder. If, however, the implementation cannot obtain 999 these information directly from the encoder, e.g. when the 1000 bitstream was pre-encoded, and also there is no timestamp 1001 allocated for each NAL unit, then the sender implementation 1002 can inspect subsequent NAL units in decoding order to 1003 determine whether or not the NAL unit is the last NAL unit 1004 of an access unit as follows. A NAL unit is determined to 1005 be the last NAL unit of an access unit if it is the last 1006 NAL unit of the bitstream. A NAL unit naluX is also 1007 determined to be the last NAL unit of an access unit if 1008 both the following conditions are true: 1) the next VCL NAL 1009 unit naluY in decoding order has the high-order bit of the 1010 first byte after its NAL unit header equal to 1, and 2) all 1011 NAL units between naluX and naluY, when present, have 1012 nal_unit_type in the range of 32 to 35, inclusive, equal to 1013 39, or in the ranges of 41 to 44, inclusive, or 48 to 55, 1014 inclusive. 1016 Payload type (PT): 7 bits 1018 The assignment of an RTP payload type for this new packet 1019 format is outside the scope of this document and will not be 1020 specified here. The assignment of a payload type has to be 1021 performed either through the profile used or in a dynamic way. 1023 Informative note: It is not required to use different 1024 payload type values for different RTP streams in MRST or 1025 MRMT. 1027 Sequence number (SN): 16 bits 1029 Set and used in accordance with RFC 3550 [RFC3550]. 1031 Timestamp: 32 bits 1033 The RTP timestamp is set to the sampling timestamp of the 1034 content. A 90 kHz clock rate MUST be used. 1036 If the NAL unit has no timing properties of its own (e.g. 1037 parameter set and SEI NAL units), the RTP timestamp MUST be 1038 set to the RTP timestamp of the coded picture of the access 1039 unit in which the NAL unit (according to Section 7.4.2.4.4 of 1040 [HEVC]) is included. 1042 Receivers MUST use the RTP timestamp for the display process, 1043 even when the bitstream contains picture timing SEI messages 1044 or decoding unit information SEI messages as specified in 1045 [HEVC]. However, this does not mean that picture timing SEI 1046 messages in the bitstream should be discarded, as picture 1047 timing SEI messages may contain frame-field information that 1048 is important in appropriately rendering interlaced video. 1050 Synchronization source (SSRC): 32-bits 1052 Used to identify the source of the RTP packets. When using 1053 SRST, by definition a single SSRC is used for all parts of a 1054 single bitstream. In MRST or MRMT, different SSRCs are used 1055 for each RTP stream containing a subset of the sub-layers of 1056 the single (temporally scalable) bitstream. A receiver is 1057 required to correctly associate the set of SSRCs that are 1058 included parts of the same bitstream. 1060 4.2 Payload Header Usage 1062 The first two bytes of the payload of an RTP packet are referred 1063 to as the payload header. The payload header consists of the 1064 same fields (F, Type, LayerId, and TID) as the NAL unit header as 1065 shown in Section 1.1.4, irrespective of the type of the payload 1066 structure. 1068 The TID value indicates (among other things) the relative 1069 importance of an RTP packet, for example because NAL units 1070 belonging to higher temporal sub-layers are not used for the 1071 decoding of lower temporal sub-layers. A lower value of TID 1072 indicates a higher importance. More important NAL units MAY be 1073 better protected against transmission losses than less important 1074 NAL units. 1076 4.3 Transmission Modes 1078 This memo enables transmission of an HEVC bitstream over 1080 . a single RTP stream on a single Media Transport (SRST), 1081 . multiple RTP streams over a single Media Transport (MRST), 1082 or 1083 . multiple RTP streams over multiple Media Transports (MRMT). 1085 Informative Note: While this specification enables the use of 1086 MRST within the H.265 RTP payload, the signaling of MRST within 1087 SDP Offer/Answer is not fully specified at the time of this 1088 writing. See [RFC5576] and [RFC5583] for what is supported 1089 today as well as [I-D.ietf-avtcore-rtp-multi-stream] and 1090 [I-D.ietf-mmusic-sdp-bundle-negotiation] for future directions. 1092 When in MRMT, the dependency of one RTP stream on another RTP 1093 stream is typically indicated as specified in [RFC5583]. 1094 [RFC5583] can also be utilized to specify dependencies within 1095 MRST, but only if the RTP streams utilize distinct payload types. 1097 SRST or MRST SHOULD be used for point-to-point unicast scenarios, 1098 while MRMT SHOULD be used for point-to-multipoint multicast 1099 scenarios where different receivers require different operation 1100 points of the same HEVC bitstream, to improve bandwidth utilizing 1101 efficiency. 1103 Informative note: A multicast may degrade to a unicast after 1104 all but one receivers have left (this is a justification of 1105 the first "SHOULD" instead of "MUST"), and there might be 1106 scenarios where MRMT is desirable but not possible e.g. when 1107 IP multicast is not deployed in certain network (this is a 1108 justification of the second "SHOULD" instead of "MUST"). 1110 The transmission mode is indicated by the tx-mode media parameter 1111 (see Section 7.1). If tx-mode is equal to "SRST", SRST MUST be 1112 used. Otherwise, if tx-mode is equal to "MRST", MRST MUST be 1113 used. Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used. 1115 Informative note: When an RTP stream does not depend on other 1116 RTP streams, any of SRST, MRST and MRMT may be in use for the 1117 RTP stream. 1119 Receivers MUST support all of SRST, MRST, and MRMT. 1121 Informative note: The required support of MRMT by receivers 1122 does not imply that multicast must be supported by receivers. 1124 4.4 Payload Structures 1126 Four different types of RTP packet payload structures are 1127 specified. A receiver can identify the type of an RTP packet 1128 payload through the Type field in the payload header. 1130 The four different payload structures are as follows: 1132 o Single NAL unit packet: Contains a single NAL unit in the 1133 payload, and the NAL unit header of the NAL unit also serves 1134 as the payload header. This payload structure is specified in 1135 Section 4.4.1. 1137 o Aggregation packet (AP): Contains more than one NAL unit 1138 within one access unit. This payload structure is specified 1139 in Section 4.4.2. 1141 o Fragmentation unit (FU): Contains a subset of a single NAL 1142 unit. This payload structure is specified in Section 4.4.3. 1144 o PACI carrying RTP packet: Contains a payload header (that 1145 differs from other payload headers for efficiency), a Payload 1146 Header Extension Structure (PHES), and a PACI payload. This 1147 payload structure is specified in Section 4.4.4. 1149 4.4.1 Single NAL Unit Packets 1151 A single NAL unit packet contains exactly one NAL unit, and 1152 consists of a payload header (denoted as PayloadHdr), a 1153 conditional 16-bit DONL field (in network byte order), and the 1154 NAL unit payload data (the NAL unit excluding its NAL unit 1155 header) of the contained NAL unit, as shown in Figure 3. 1157 0 1 2 3 1158 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1160 | PayloadHdr | DONL (conditional) | 1161 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1162 | | 1163 | NAL unit payload data | 1164 | | 1165 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1166 | :...OPTIONAL RTP padding | 1167 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1169 Figure 3 The structure a single NAL unit packet 1171 The payload header SHOULD be an exact copy of the NAL unit header 1172 of the contained NAL unit. However, the Type (i.e. 1173 nal_unit_type) field MAY be changed, e.g. when it is desirable to 1174 handle a CRA picture to be a BLA picture [JCTVC-J0107]. 1176 The DONL field, when present, specifies the value of the 16 least 1177 significant bits of the decoding order number of the contained 1178 NAL unit. If sprop-max-don-diff is greater than 0 for any of the 1179 RTP streams, the DONL field MUST be present, and the variable DON 1180 for the contained NAL unit is derived as equal to the value of 1181 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1182 all the RTP streams), the DONL field MUST NOT be present. 1184 4.4.2 Aggregation Packets (APs) 1186 Aggregation packets (APs) are introduced to enable the reduction 1187 of packetization overhead for small NAL units, such as most of 1188 the non-VCL NAL units, which are often only a few octets in size. 1190 An AP aggregates NAL units within one access unit. Each NAL unit 1191 to be carried in an AP is encapsulated in an aggregation unit. 1192 NAL units aggregated in one AP are in NAL unit decoding order. 1194 An AP consists of a payload header (denoted as PayloadHdr) 1195 followed by two or more aggregation units, as shown in Figure 4. 1197 0 1 2 3 1198 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1199 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1200 | PayloadHdr (Type=48) | | 1201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1202 | | 1203 | two or more aggregation units | 1204 | | 1205 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1206 | :...OPTIONAL RTP padding | 1207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1209 Figure 4 The structure of an aggregation packet 1211 The fields in the payload header are set as follows. The F bit 1212 MUST be equal to 0 if the F bit of each aggregated NAL unit is 1213 equal to zero; otherwise, it MUST be equal to 1. The Type field 1214 MUST be equal to 48. The value of LayerId MUST be equal to the 1215 lowest value of LayerId of all the aggregated NAL units. The 1216 value of TID MUST be the lowest value of TID of all the 1217 aggregated NAL units. 1219 Informative Note: All VCL NAL units in an AP have the same TID 1220 value since they belong to the same access unit. However, an 1221 AP may contain non-VCL NAL units for which the TID value in 1222 the NAL unit header may be different than the TID value of the 1223 VCL NAL units in the same AP. 1225 An AP MUST carry at least two aggregation units and can carry as 1226 many aggregation units as necessary; however, the total amount of 1227 data in an AP obviously MUST fit into an IP packet, and the size 1228 SHOULD be chosen so that the resulting IP packet is smaller than 1229 the MTU size so to avoid IP layer fragmentation. An AP MUST NOT 1230 contain Fragmentation Units (FUs) specified in Section 4.4.3. 1231 APs MUST NOT be nested; i.e. an AP must not contain another AP. 1233 The first aggregation unit in an AP consists of a conditional 16- 1234 bit DONL field (in network byte order) followed by a 16-bit 1235 unsigned size information (in network byte order) that indicates 1236 the size of the NAL unit in bytes (excluding these two octets, 1237 but including the NAL unit header), followed by the NAL unit 1238 itself, including its NAL unit header, as shown in Figure 5. 1240 0 1 2 3 1241 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1243 : DONL (conditional) | NALU size | 1244 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1245 | NALU size | | 1246 +-+-+-+-+-+-+-+-+ NAL unit | 1247 | | 1248 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | : 1250 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1252 Figure 5 The structure of the first aggregation unit in an AP 1254 The DONL field, when present, specifies the value of the 16 least 1255 significant bits of the decoding order number of the aggregated 1256 NAL unit. 1258 If sprop-max-don-diff is greater than 0 for any of the RTP 1259 streams, the DONL field MUST be present in an aggregation unit 1260 that is the first aggregation unit in an AP, and the variable DON 1261 for the aggregated NAL unit is derived as equal to the value of 1262 the DONL field. Otherwise (sprop-max-don-diff is equal to 0 for 1263 all the RTP streams), the DONL field MUST NOT be present in an 1264 aggregation unit that is the first aggregation unit in an AP. 1266 An aggregation unit that is not the first aggregation unit in an 1267 AP consists of a conditional 8-bit DOND field followed by a 16- 1268 bit unsigned size information (in network byte order) that 1269 indicates the size of the NAL unit in bytes (excluding these two 1270 octets, but including the NAL unit header), followed by the NAL 1271 unit itself, including its NAL unit header, as shown in Figure 6. 1273 0 1 2 3 1274 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1275 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1276 : DOND (cond) | NALU size | 1277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1278 | | 1279 | NAL unit | 1280 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1281 | : 1282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1284 Figure 6 The structure of an aggregation unit that is not the 1285 first aggregation unit in an AP 1287 When present, the DOND field plus 1 specifies the difference 1288 between the decoding order number values of the current 1289 aggregated NAL unit and the preceding aggregated NAL unit in the 1290 same AP. 1292 If sprop-max-don-diff is greater than 0 for any of the RTP 1293 streams, the DOND field MUST be present in an aggregation unit 1294 that is not the first aggregation unit in an AP, and the variable 1295 DON for the aggregated NAL unit is derived as equal to the DON of 1296 the preceding aggregated NAL unit in the same AP plus the value 1297 of the DOND field plus 1 modulo 65536. Otherwise (sprop-max-don- 1298 diff is equal to 0 for all the RTP streams), the DOND field MUST 1299 NOT be present in an aggregation unit that is not the first 1300 aggregation unit in an AP, and in this case the transmission 1301 order and decoding order of NAL units carried in the AP are the 1302 same as the order the NAL units appear in the AP. 1304 Figure 7 presents an example of an AP that contains two 1305 aggregation units, labeled as 1 and 2 in the figure, without the 1306 DONL and DOND fields being present. 1308 0 1 2 3 1309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 | RTP Header | 1312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1313 | PayloadHdr (Type=48) | NALU 1 Size | 1314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1315 | NALU 1 HDR | | 1316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 1317 | . . . | 1318 | | 1319 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1320 | . . . | NALU 2 Size | NALU 2 HDR | 1321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1322 | NALU 2 HDR | | 1323 +-+-+-+-+-+-+-+-+ NALU 2 Data | 1324 | . . . | 1325 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1326 | :...OPTIONAL RTP padding | 1327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1329 Figure 7 An example of an AP packet containing two aggregation 1330 units without the DONL and DOND fields 1332 Figure 8 presents an example of an AP that contains two 1333 aggregation units, labeled as 1 and 2 in the figure, with the 1334 DONL and DOND fields being present. 1336 0 1 2 3 1337 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1339 | RTP Header | 1340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1341 | PayloadHdr (Type=48) | NALU 1 DONL | 1342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1343 | NALU 1 Size | NALU 1 HDR | 1344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1345 | | 1346 | NALU 1 Data . . . | 1347 | | 1348 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1349 | | NALU 2 DOND | NALU 2 Size | 1350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1351 | NALU 2 HDR | | 1352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 1353 | | 1354 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1355 | :...OPTIONAL RTP padding | 1356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1358 Figure 8 An example of an AP containing two aggregation units 1359 with the DONL and DOND fields 1361 4.4.3 Fragmentation Units (FUs) 1363 Fragmentation units (FUs) are introduced to enable fragmenting a 1364 single NAL unit into multiple RTP packets, possibly without 1365 cooperation or knowledge of the HEVC encoder. A fragment of a 1366 NAL unit consists of an integer number of consecutive octets of 1367 that NAL unit. Fragments of the same NAL unit MUST be sent in 1368 consecutive order with ascending RTP sequence numbers (with no 1369 other RTP packets within the same RTP stream being sent between 1370 the first and last fragment). 1372 When a NAL unit is fragmented and conveyed within FUs, it is 1373 referred to as a fragmented NAL unit. APs MUST NOT be 1374 fragmented. FUs MUST NOT be nested; i.e. an FU must not contain 1375 a subset of another FU. 1377 The RTP timestamp of an RTP packet carrying an FU is set to the 1378 NALU-time of the fragmented NAL unit. 1380 An FU consists of a payload header (denoted as PayloadHdr), an FU 1381 header of one octet, a conditional 16-bit DONL field (in network 1382 byte order), and an FU payload, as shown in Figure 9. 1384 0 1 2 3 1385 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1386 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1387 | PayloadHdr (Type=49) | FU header | DONL (cond) | 1388 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1389 | DONL (cond) | | 1390 |-+-+-+-+-+-+-+-+ | 1391 | FU payload | 1392 | | 1393 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1394 | :...OPTIONAL RTP padding | 1395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1397 Figure 9 The structure of an FU 1399 The fields in the payload header are set as follows. The Type 1400 field MUST be equal to 49. The fields F, LayerId, and TID MUST 1401 be equal to the fields F, LayerId, and TID, respectively, of the 1402 fragmented NAL unit. 1404 The FU header consists of an S bit, an E bit, and a 6-bit FuType 1405 field, as shown in Figure 10. 1407 +---------------+ 1408 |0|1|2|3|4|5|6|7| 1409 +-+-+-+-+-+-+-+-+ 1410 |S|E| FuType | 1411 +---------------+ 1413 Figure 10 The structure of FU header 1415 The semantics of the FU header fields are as follows: 1416 S: 1 bit 1417 When set to one, the S bit indicates the start of a fragmented 1418 NAL unit i.e. the first byte of the FU payload is also the 1419 first byte of the payload of the fragmented NAL unit. When 1420 the FU payload is not the start of the fragmented NAL unit 1421 payload, the S bit MUST be set to zero. 1423 E: 1 bit 1424 When set to one, the E bit indicates the end of a fragmented 1425 NAL unit, i.e. the last byte of the payload is also the last 1426 byte of the fragmented NAL unit. When the FU payload is not 1427 the last fragment of a fragmented NAL unit, the E bit MUST be 1428 set to zero. 1430 FuType: 6 bits 1431 The field FuType MUST be equal to the field Type of the 1432 fragmented NAL unit. 1434 The DONL field, when present, specifies the value of the 16 least 1435 significant bits of the decoding order number of the fragmented 1436 NAL unit. 1438 If sprop-max-don-diff is greater than 0 for any of the RTP 1439 streams, and the S bit is equal to 1, the DONL field MUST be 1440 present in the FU, and the variable DON for the fragmented NAL 1441 unit is derived as equal to the value of the DONL field. 1442 Otherwise (sprop-max-don-diff is equal to 0 for all the RTP 1443 streams, or the S bit is equal to 0), the DONL field MUST NOT be 1444 present in the FU. 1446 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e. 1447 the Start bit and End bit must not both be set to one in the same 1448 FU header. 1450 The FU payload consists of fragments of the payload of the 1451 fragmented NAL unit so that if the FU payloads of consecutive 1452 FUs, starting with an FU with the S bit equal to 1 and ending 1453 with an FU with the E bit equal to 1, are sequentially 1454 concatenated, the payload of the fragmented NAL unit can be 1455 reconstructed. The NAL unit header of the fragmented NAL unit is 1456 not included as such in the FU payload, but rather the 1457 information of the NAL unit header of the fragmented NAL unit is 1458 conveyed in F, LayerId, and TID fields of the FU payload headers 1459 of the FUs and the FuType field of the FU header of the FUs. An 1460 FU payload MUST NOT be empty. 1462 If an FU is lost, the receiver SHOULD discard all following 1463 fragmentation units in transmission order corresponding to the 1464 same fragmented NAL unit, unless the decoder in the receiver is 1465 known to be prepared to gracefully handle incomplete NAL units. 1467 A receiver in an endpoint or in a MANE MAY aggregate the first n- 1468 1 fragments of a NAL unit to an (incomplete) NAL unit, even if 1469 fragment n of that NAL unit is not received. In this case, the 1470 forbidden_zero_bit of the NAL unit MUST be set to one to indicate 1471 a syntax violation. 1473 4.4.4 PACI packets 1475 This section specifies the PACI packet structure. The basic 1476 payload header specified in this memo is intentionally limited to 1477 the 16 bits of the NAL unit header so to keep the packetization 1478 overhead to a minimum. However, cases have been identified where 1479 it is advisable to include control information in an easily 1480 accessible position in the packet header, despite the additional 1481 overhead. One such control information is the Temporal 1482 Scalability Control Information as specified in Section 4.5 1483 below. PACI packets carry this and future, similar structures. 1485 The PACI packet structure is based on a payload header extension 1486 mechanism that is generic and extensible to carry payload header 1487 extensions. In this section, the focus lies on the use within 1488 this specification. Section 4.4.4.2 below provides guidance for 1489 the specification designers in how to employ the extension 1490 mechanism in future specifications. 1492 A PACI packet consists of a payload header (denoted as 1493 PayloadHdr), for which the structure follows what is described in 1494 Section 4.2 above. The payload header is followed by the fields 1495 A, cType, PHSsize, F[0..2] and Y. 1497 Figure 11 shows a PACI packet in compliance with this memo; that 1498 is, without any extensions. 1500 0 1 2 3 1501 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1503 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1504 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1505 | Payload Header Extension Structure (PHES) | 1506 |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=| 1507 | | 1508 | PACI payload: NAL unit | 1509 | . . . | 1510 | | 1511 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1512 | :...OPTIONAL RTP padding | 1513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1515 Figure 11 The structure of a PACI 1517 The fields in the payload header are set as follows. The F bit 1518 MUST be equal to 0. The Type field MUST be equal to 50. The 1519 value of LayerId MUST be a copy of the LayerId field of the PACI 1520 payload NAL unit or NAL-unit-like structure. The value of TID 1521 MUST be a copy of the TID field of the PACI payload NAL unit or 1522 NAL-unit-like structure. 1524 The semantics of other fields are as follows: 1526 A: 1 bit 1527 Copy of the F bit of the PACI payload NAL unit or NAL-unit- 1528 like structure. 1530 cType: 6 bits 1531 Copy of the Type field of the PACI payload NAL unit or NAL- 1532 unit-like structure. 1534 PHSsize: 5 bits 1535 Indicates the length of the PHES field. The value is limited 1536 to be less than or equal to 32 octets, to simplify encoder 1537 design for MTU size matching. 1539 F0 1540 This field equal to 1 specifies the presence of a temporal 1541 scalability support extension in the PHES. 1543 F1, F2 1544 MUST be 0, available for future extensions, see Section 1545 4.4.4.2. Receivers compliant with this version of the HEVC 1546 payload format MUST ignore F1=1 and/or F2=1, and also ignore 1547 any information in the PHES indicated as present by F1=1 1548 and/or F2=1. 1550 Informative note: The receiver can do that by first 1551 decoding information associated with F0=1, and then 1552 skipping over any remaining bytes of the PHES based on the 1553 value of PHSsize. 1555 Y: 1 bit 1556 MUST be 0, available for future extensions, see Section 1557 4.4.4.2. Receivers compliant with this version of the HEVC 1558 payload format MUST ignore Y=1, and also ignore any 1559 information in the PHES indicated as present by Y. 1561 PHES: variable number of octets 1562 A variable number of octets as indicated by the value of 1563 PHSsize. 1565 PACI Payload 1566 The single NAL unit packet or NAL-unit-like structure (such 1567 as: FU or AP) to be carried, not including the first two 1568 octets. 1570 Informative note: The first two octets of the NAL unit or 1571 NAL-unit-like structure carried in the PACI payload are not 1572 included in the PACI payload. Rather, the respective values 1573 are copied in locations of the PayloadHdr of the RTP 1574 packet. This design offers two advantages: first, the 1575 overall structure of the payload header is preserved, i.e. 1576 there is no special case of payload header structure that 1577 needs to be implemented for PACI. Second, no additional 1578 overhead is introduced. 1580 A PACI payload MAY be a single NAL unit, an FU, or an AP. 1581 PACIs MUST NOT be fragmented or aggregated. The following 1582 subsection documents the reasons for these design choices. 1584 4.4.4.1 Reasons for the PACI rules (informative) 1586 A PACI cannot be fragmented. If a PACI could be fragmented, and 1587 a fragment other than the first fragment would get lost, access 1588 to the information in the PACI would not be possible. Therefore, 1589 a PACI must not be fragmented. In other words, an FU must not 1590 carry (fragments of) a PACI. 1592 A PACI cannot be aggregated. Aggregation of PACIs is inadvisable 1593 from a compression viewpoint, as, in many cases, several to be 1594 aggregated NAL units would share identical PACI fields and values 1595 which would be carried redundantly for no reason. Most, if not 1596 all the practical effects of PACI aggregation can be achieved by 1597 aggregating NAL units and bundling them with a PACI (see below). 1598 Therefore, a PACI must not be aggregated. In other words, an AP 1599 must not contain a PACI. 1601 The payload of a PACI can be a fragment. Both middleboxes and 1602 sending systems with inflexible (often hardware-based) encoders 1603 occasionally find themselves in situations where a PACI and its 1604 headers, combined, are larger than the MTU size. In such a 1605 scenario, the middlebox or sender can fragment the NAL unit and 1606 encapsulate the fragment in a PACI. Doing so preserves the 1607 payload header extension information for all fragments, allowing 1608 downstream middleboxes and the receiver to take advantage of that 1609 information. Therefore, a sender may place a fragment into a 1610 PACI, and a receiver must be able to handle such a PACI. 1612 The payload of a PACI can be an aggregation NAL unit. HEVC 1613 bitstreams can contain unevenly sized and/or small (when compared 1614 to the MTU size) NAL units. In order to efficiently packetize 1615 such small NAL units, AP were introduced. The benefits of APs 1616 are independent from the need for a payload header extension. 1617 Therefore, a sender may place an AP into a PACI, and a receiver 1618 must be able to handle such a PACI. 1620 4.4.4.2 PACI extensions (Informative) 1622 This section includes recommendations for future specification 1623 designers on how to extent the PACI syntax to accommodate future 1624 extensions. Obviously, designers are free to specify whatever 1625 appears to be appropriate to them at the time of their design. 1626 However, a lot of thought has been invested into the extension 1627 mechanism described below, and we suggest that deviations from it 1628 warrant a good explanation. 1630 This memo defines only a single payload header extension 1631 (Temporal Scalability Control Information, described below in 1632 Section 4.5), and, therefore, only the F0 bit carries semantics. 1633 F1 and F2 are already named (and not just marked as reserved, as 1634 a typical video spec designer would do). They are intended to 1635 signal two additional extensions. The Y bit allows to, 1636 recursively, add further F and Y bits to extend the mechanism 1637 beyond 3 possible payload header extensions. It is suggested to 1638 define a new packet type (using a different value for Type) when 1639 assigning the F1, F2, or Y bits different semantics than what is 1640 suggested below. 1642 When a Y bit is set, an 8 bit flag-extension is inserted after 1643 the Y bit. A flag-extension consists of 7 flags F[n..n+6], and 1644 another Y bit. 1646 The basic PACI header already includes F0, F1, and F2. 1647 Therefore, the Fx bits in the first flag-extensions are numbered 1648 F3, F4, ..., F9, the F bits in the second flag-extension are 1649 numbered F10, F11, ..., F16, and so forth. As a result, at least 1650 3 Fx bits are always in the PACI, but the number of Fx bits (and 1651 associated types of extensions), can be increased by setting the 1652 next Y bit and adding an octet of flag-extensions, carrying 7 1653 flags and another Y bit. The size of this list of flags is 1654 subject to the limits specified in Section 4.4.4 (32 octets for 1655 all flag-extensions and the PHES information combined). 1657 Each of the F bits can indicate either the presence of 1658 information in the Payload Header Extension Structure (PHES), 1659 described below, or a given F bit can indicate a certain 1660 condition, without including additional information in the PHES. 1662 When a spec developer devises a new syntax that takes advantage 1663 of the PACI extension mechanism, he/she must follow the 1664 constraints listed below; otherwise the extension mechanism may 1665 break. 1667 1) The fields added for a particular Fx bit MUST be fixed in 1668 length and not depend on what other Fx bits are set (no 1669 parsing dependency). 1670 2) The Fx bits must be assigned in order. 1671 3) An implementation that supports the n-th Fn bit for any 1672 value of n must understand the syntax (though not 1673 necessarily the semantics) of the fields Fk (with k < n), so 1674 to be able to either use those bits when present, or at 1675 least be able to skip over them. 1677 4.5 Temporal Scalability Control Information 1679 This section describes the single payload header extension 1680 defined in this specification, known as Temporal Scalability 1681 Control Information (TSCI). If, in the future, additional 1682 payload header extensions become necessary, they could be 1683 specified in this section of an updated version of this document, 1684 or in their own documents. 1686 When F0 is set to 1 in a PACI, this specifies that the PHES field 1687 includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as 1688 follows: 1690 0 1 2 3 1691 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1693 | PayloadHdr (Type=50) |A| cType | PHSsize |F0..2|Y| 1694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1695 | TL0PICIDX | IrapPicID |S|E| RES | | 1696 |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1697 | .... | 1698 | PACI payload: NAL unit | 1699 | | 1700 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1701 | :...OPTIONAL RTP padding | 1702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1704 Figure 12 The structure of a PACI with a PHES containing a TSCI 1706 TL0PICIDX (8 bits) 1707 When present, the TL0PICIDX field MUST be set to equal to 1708 temporal_sub_layer_zero_idx as specified in Section D.3.22 of 1709 [H.265] for the access unit containing the NAL unit in the 1710 PACI. 1712 IrapPicID (8 bits) 1713 When present, the IrapPicID field MUST be set to equal to 1714 irap_pic_id as specified in Section D.3.22 of [H.265] for the 1715 access unit containing the NAL unit in the PACI. 1717 S (1 bit) 1718 The S bit MUST be set to 1 if any of the following conditions 1719 is true and MUST be set to 0 otherwise: 1720 o The NAL unit in the payload of the PACI is the first VCL NAL 1721 unit, in decoding order, of a picture. 1722 o The NAL unit in the payload of the PACI is an AP and the NAL 1723 unit in the first contained aggregation unit is the first 1724 VCL NAL unit, in decoding order, of a picture. 1725 o The NAL unit in the payload of the PACI is an FU with its S 1726 bit equal to 1 and the FU payload containing a fragment of 1727 the first VCL NAL unit, in decoding order of a picture. 1729 E (1 bit) 1730 The E bit MUST be set to 1 if any of the following conditions 1731 is true and MUST be set to 0 otherwise: 1732 o The NAL unit in the payload of the PACI is the last VCL NAL 1733 unit, in decoding order, of a picture. 1734 o The NAL unit in the payload of the PACI is an AP and the NAL 1735 unit in the last contained aggregation unit is the last VCL 1736 NAL unit, in decoding order, of a picture. 1737 o The NAL unit in the payload of the PACI is an FU with its E 1738 bit equal to 1 and the FU payload containing a fragment of 1739 the last VCL NAL unit, in decoding order of a picture. 1741 RES (6 bits) 1742 MUST be equal to 0. Reserved for future extensions. 1744 The value of PHSsize MUST be set to 3. Receivers MUST allow 1745 other values of the fields F0, F1, F2, Y, and PHSsize, and MUST 1746 ignore any additional fields, when present, than specified above 1747 in the PHES. 1749 4.6 Decoding Order Number 1751 For each NAL unit, the variable AbsDon is derived, representing 1752 the decoding order number that is indicative of the NAL unit 1753 decoding order. 1755 Let NAL unit n be the n-th NAL unit in transmission order within 1756 an RTP stream. 1758 If sprop-max-don-diff is equal to 0 for all the RTP streams 1759 carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for 1760 NAL unit n, is derived as equal to n. 1762 Otherwise (sprop-max-don-diff is greater than 0 for any of the 1763 RTP streams), AbsDon[n] is derived as follows, where DON[n] is 1764 the value of the variable DON for NAL unit n: 1766 o If n is equal to 0 (i.e. NAL unit n is the very first NAL unit 1767 in transmission order), AbsDon[0] is set equal to DON[0]. 1769 o Otherwise (n is greater than 0), the following applies for 1770 derivation of AbsDon[n]: 1772 If DON[n] == DON[n-1], 1773 AbsDon[n] = AbsDon[n-1] 1775 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1776 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1778 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1779 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1781 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1782 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1783 DON[n]) 1785 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1786 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1788 For any two NAL units m and n, the following applies: 1790 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n 1791 follows NAL unit m in NAL unit decoding order. 1793 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding 1794 order of the two NAL units can be in either order. 1796 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n 1797 precedes NAL unit m in decoding order. 1799 Informative note: When two consecutive NAL units in the NAL 1800 unit decoding order have different values of AbsDon, the 1801 absolute difference between the two AbsDon values may be 1802 greater than or equal to 1. 1804 Informative note: There are multiple reasons to allow for the 1805 absolute difference of the values of AbsDon for two 1806 consecutive NAL units in the NAL unit decoding order to be 1807 greater than one. An increment by one is not required, as at 1808 the time of associating values of AbsDon to NAL units, it may 1809 not be known whether all NAL units are to be delivered to the 1810 receiver. For example, a gateway may not forward VCL NAL 1811 units of higher sub-layers or some SEI NAL units when there is 1812 congestion in the network. In another example, the first 1813 intra-coded picture of a pre-encoded clip is transmitted in 1814 advance to ensure that it is readily available in the 1815 receiver, and when transmitting the first intra-coded picture, 1816 the originator does not exactly know how many NAL units will 1817 be encoded before the first intra-coded picture of the pre- 1818 encoded clip follows in decoding order. Thus, the values of 1819 AbsDon for the NAL units of the first intra-coded picture of 1820 the pre-encoded clip have to be estimated when they are 1821 transmitted, and gaps in values of AbsDon may occur. Another 1822 example is MRST or MRMT with sprop-max-don-diff greater than 1823 0, where the AbsDon values must indicate cross-layer decoding 1824 order for NAL units conveyed in all the RTP streams. 1826 5 Packetization Rules 1828 The following packetization rules apply: 1830 o If sprop-max-don-diff is greater than 0 for any of the RTP 1831 streams, the transmission order of NAL units carried in the 1832 RTP stream MAY be different than the NAL unit decoding order 1833 and the NAL unit output order. Otherwise (sprop-max-don-diff 1834 is equal to 0 for all the RTP streams), the transmission order 1835 of NAL units carried in the RTP stream MUST be the same as the 1836 NAL unit decoding order, and, when tx-mode is equal to "MRST" 1837 or "MRMT", MUST also be the same as the NAL unit output order. 1839 o A NAL unit of a small size SHOULD be encapsulated in an 1840 aggregation packet together with one or more other NAL units 1841 in order to avoid the unnecessary packetization overhead for 1842 small NAL units. For example, non-VCL NAL units such as 1843 access unit delimiters, parameter sets, or SEI NAL units are 1844 typically small and can often be aggregated with VCL NAL units 1845 without violating MTU size constraints. 1847 o Each non-VCL NAL unit SHOULD, when possible from an MTU size 1848 match viewpoint, be encapsulated in an aggregation packet 1849 together with its associated VCL NAL unit, as typically a non- 1850 VCL NAL unit would be meaningless without the associated VCL 1851 NAL unit being available. 1853 o For carrying exactly one NAL unit in an RTP packet, a single 1854 NAL unit packet MUST be used. 1856 6 De-packetization Process 1858 The general concept behind de-packetization is to get the NAL 1859 units out of the RTP packets in an RTP stream and all RTP streams 1860 the RTP stream depends on, if any, and pass them to the decoder 1861 in the NAL unit decoding order. 1863 The de-packetization process is implementation dependent. 1864 Therefore, the following description should be seen as an example 1865 of a suitable implementation. Other schemes may be used as well 1866 as long as the output for the same input is the same as the 1867 process described below. The output is the same when the set of 1868 output NAL units and their order are both identical. 1869 Optimizations relative to the described algorithms are possible. 1871 All normal RTP mechanisms related to buffer management apply. In 1872 particular, duplicated or outdated RTP packets (as indicated by 1873 the RTP sequences number and the RTP timestamp) are removed. To 1874 determine the exact time for decoding, factors such as a possible 1875 intentional delay to allow for proper inter-stream 1876 synchronization must be factored in. 1878 NAL units with NAL unit type values in the range of 0 to 47, 1879 inclusive may be passed to the decoder. NAL-unit-like structures 1880 with NAL unit type values in the range of 48 to 63, inclusive, 1881 MUST NOT be passed to the decoder. 1883 The receiver includes a receiver buffer, which is used to 1884 compensate for transmission delay jitter within individual RTP 1885 streams and across RTP streams, to reorder NAL units from 1886 transmission order to the NAL unit decoding order, and to recover 1887 the NAL unit decoding order in MRST or MRMT, when applicable. In 1888 this section, the receiver operation is described under the 1889 assumption that there is no transmission delay jitter within an 1890 RTP stream and across RTP streams. To make a difference from a 1891 practical receiver buffer that is also used for compensation of 1892 transmission delay jitter, the receiver buffer is here after 1893 called the de-packetization buffer in this section. Receivers 1894 should also prepare for transmission delay jitter; i.e. either 1895 reserve separate buffers for transmission delay jitter buffering 1896 and de-packetization buffering or use a receiver buffer for both 1897 transmission delay jitter and de-packetization. Moreover, 1898 receivers should take transmission delay jitter into account in 1899 the buffering operation; e.g. by additional initial buffering 1900 before starting of decoding and playback. 1902 When sprop-max-don-diff is equal to 0 for all the received RTP 1903 streams, the de-packetization buffer size is zero bytes and the 1904 process described in the remainder of this paragraph applies. 1905 When there is only one RTP stream received, the NAL units carried 1906 in the single RTP stream are directly passed to the decoder in 1907 their transmission order, which is identical to their decoding 1908 order. When there is more than one RTP stream received, the NAL 1909 units carried in the multiple RTP streams are passed to the 1910 decoder in their NTP timestamp order. When there are several NAL 1911 units of different RTP streams with the same NTP timestamp, the 1912 order to pass them to the decoder is their dependency order, 1913 where NAL units of a dependee RTP stream are passed to the 1914 decoder prior to the NAL units of the dependent RTP stream. When 1915 there are several NAL units of the same RTP stream with the same 1916 NTP timestamp, the order to pass them to the decoder is their 1917 transmission order. 1919 Informative note: The mapping between RTP and NTP 1920 timestamps is conveyed in RTCP SR packets. In addition, 1921 the mechanisms for faster media timestamp synchronization 1922 discussed in [RFC6051] may be used to speed up the 1923 acquisition of the RTP-to-wall-clock mapping. 1925 When sprop-max-don-diff is greater than 0 for any the received 1926 RTP streams, the process described in the remainder of this 1927 section applies. 1929 There are two buffering states in the receiver: initial buffering 1930 and buffering while playing. Initial buffering starts when the 1931 reception is initialized. After initial buffering, decoding and 1932 playback are started, and the buffering-while-playing mode is 1933 used. 1935 Regardless of the buffering state, the receiver stores incoming 1936 NAL units, in reception order, into the de-packetization buffer. 1937 NAL units carried in RTP packets are stored in the de- 1938 packetization buffer individually, and the value of AbsDon is 1939 calculated and stored for each NAL unit. When MRST or MRMT is in 1940 use, NAL units of all RTP streams of a bitstream are stored in 1941 the same de-packetization buffer. When NAL units carried in any 1942 two RTP streams are available to be placed into the de- 1943 packetization buffer, those NAL units carried in the RTP stream 1944 that is lower in the dependency tree are placed into the buffer 1945 first. For example, if RTP stream A depends on RTP stream B, 1946 then NAL units carried in RTP stream B are placed into the buffer 1947 first. 1949 Initial buffering lasts until condition A (the difference between 1950 the greatest and smallest AbsDon values of the NAL units in the 1951 de-packetization buffer is greater than or equal to the value of 1952 sprop-max-don-diff of the highest RTP stream) or condition B (the 1953 number of NAL units in the de-packetization buffer is greater 1954 than the value of sprop-depack-buf-nalus) is true. 1956 After initial buffering, whenever condition A or condition B is 1957 true, the following operation is repeatedly applied until both 1958 condition A and condition B become false: 1960 o The NAL unit in the de-packetization buffer with the smallest 1961 value of AbsDon is removed from the de-packetization buffer 1962 and passed to the decoder. 1964 When no more NAL units are flowing into the de-packetization 1965 buffer, all NAL units remaining in the de-packetization buffer 1966 are removed from the buffer and passed to the decoder in the 1967 order of increasing AbsDon values. 1969 7 Payload Format Parameters 1971 This section specifies the parameters that MAY be used to select 1972 optional features of the payload format and certain features or 1973 properties of the bitstream or the RTP stream. The parameters 1974 are specified here as part of the media type registration for the 1975 HEVC codec. A mapping of the parameters into the Session 1976 Description Protocol (SDP) [RFC4566] is also provided for 1977 applications that use SDP. Equivalent parameters could be 1978 defined elsewhere for use with control protocols that do not use 1979 SDP. 1981 7.1 Media Type Registration 1983 The media subtype for the HEVC codec is allocated from the IETF 1984 tree. 1986 The receiver MUST ignore any unrecognized parameter. 1988 Media Type name: video 1990 Media subtype name: H265 1992 Required parameters: none 1994 OPTIONAL parameters: 1996 profile-space, tier-flag, profile-id, profile-compatibility- 1997 indicator, interop-constraints, and level-id: 1999 These parameters indicate the profile, tier, default level, 2000 and some constraints of the bitstream carried by the RTP 2001 stream and all RTP streams the RTP stream depends on, or a 2002 specific set of the profile, tier, default level, and some 2003 constraints the receiver supports. 2005 The profile and some constraints are indicated collectively 2006 by profile-space, profile-id, profile-compatibility- 2007 indicator, and interop-constraints. The profile specifies 2008 the subset of coding tools that may have been used to 2009 generate the bitstream or that the receiver supports. 2011 Informative note: There are 32 values of profile-id, and 2012 there are 32 flags in profile-compatibility-indicator, 2013 each flag corresponding to one value of profile-id. 2014 According to HEVC version 1 in [HEVC], when more than 2015 one of the 32 flags is set for a bitstream, the 2016 bitstream would comply with all the profiles 2017 corresponding to the set flags. However, in a draft of 2018 HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19 2019 Format Range Extensions profiles have been specified, 2020 all using the same value of profile-id (4), 2021 differentiated by some of the 48 bits in interop- 2022 constraints - this (rather unexpected way of profile 2023 signalling) means that one of the 32 flags may 2024 correspond to multiple profiles. To be able to support 2025 whatever HEVC extension profile that might be specified 2026 and indicated using profile-space, profile-id, profile- 2027 compatibility-indicator, and interop-constraints in the 2028 future, it would be safe to require symmetric use of 2029 these parameters in SDP offer/answer unless recv-sub- 2030 layer-id is included in the SDP answer for choosing one 2031 of the sub-layers offered. 2033 The tier is indicated by tier-flag. The default level is 2034 indicated by level-id. The tier and the default level 2035 specify the limits on values of syntax elements or 2036 arithmetic combinations of values of syntax elements that 2037 are followed when generating the bitstream or that the 2038 receiver supports. 2040 A set of profile-space, tier-flag, profile-id, profile- 2041 compatibility-indicator, interop-constraints, and level-id 2042 parameters ptlA is said to be consistent with another set 2043 of these parameters ptlB if any decoder that conforms to 2044 the profile, tier, level, and constraints indicated by ptlB 2045 can decode any bitstream that conforms to the profile, 2046 tier, level, and constraints indicated by ptlA. 2048 In SDP offer/answer, when the SDP answer does not include 2049 the recv-sub-layer-id parameter that is less than the 2050 sprop-sub-layer-id parameter in the SDP offer, the 2051 following applies: 2053 o The profile-space, tier-flag, profile-id, profile- 2054 compatibility-indicator, and interop-constraints 2055 parameters MUST be used symmetrically, i.e. the value 2056 of each of these parameters in the offer MUST be the 2057 same as that in the answer, either explicitly 2058 signalled or implicitly inferred. 2060 o The level-id parameter is changeable as long as the 2061 highest level indicated by the answer is either equal 2062 to or lower than that in the offer. Note that the 2063 highest level is indicated by level-id and max-recv- 2064 level-id together. 2066 In SDP offer/answer, when the SDP answer does include the 2067 recv-sub-layer-id parameter that is less than the sprop- 2068 sub-layer-id parameter in the SDP offer, the set of 2069 profile-space, tier-flag, profile-id, profile- 2070 compatibility-indicator, interop-constraints, and level-id 2071 parameters included in the answer MUST be consistent with 2072 that for the chosen sub-layer representation as indicated 2073 in the SDP offer, with the exception that the level-id 2074 parameter in the SDP answer is changable as long as the 2075 highest level indicated by the answer is either lower than 2076 or equal to that in the offer. 2078 More specifications of these parameters, including how they 2079 relate to the values of the profile, tier, and level syntax 2080 elements specified in [HEVC] are provided below. 2082 profile-space, profile-id: 2084 The value of profile-space MUST be in the range of 0 to 3, 2085 inclusive. The value of profile-id MUST be in the range of 2086 0 to 31, inclusive. 2088 When profile-space is not present, a value of 0 MUST be 2089 inferred. When profile-id is not present, a value of 1 2090 (i.e. the Main profile) MUST be inferred. 2092 When used to indicate properties of a bitstream, profile- 2093 space and profile-id are derived from the profile, tier, 2094 and level syntax elements in SPS or VPS NAL units as 2095 follows, where general_profile_space, general_profile_idc, 2096 sub_layer_profile_space[j], and sub_layer_profile_idc[j] 2097 are specified in [HEVC]: 2099 If the RTP stream is the highest RTP stream, the 2100 following applies: 2102 o profile_space = general_profile_space 2103 o profile_id = general_profile_idc 2105 Otherwise (the RTP stream is a dependee RTP stream), the 2106 following applies, with j being the value of the sprop- 2107 sub-layer-id parameter: 2109 o profile_space = sub_layer_profile_space[j] 2110 o profile_id = sub_layer_profile_idc[j] 2112 tier-flag, level-id: 2114 The value of tier-flag MUST be in the range of 0 to 1, 2115 inclusive. The value of level-id MUST be in the range of 0 2116 to 255, inclusive. 2118 If the tier-flag and level-id parameters are used to 2119 indicate properties of a bitstream, they indicate the tier 2120 and the highest level the bitstream complies with. 2122 If the tier-flag and level-id parameters are used for 2123 capability exchange, the following applies. If max-recv- 2124 level-id is not present, the default level defined by 2125 level-id indicates the highest level the codec wishes to 2126 support. Otherwise, max-recv-level-id indicates the 2127 highest level the codec supports for receiving. For either 2128 receiving or sending, all levels that are lower than the 2129 highest level supported MUST also be supported. 2131 If no tier-flag is present, a value of 0 MUST be inferred 2132 and if no level-id is present, a value of 93 (i.e. level 2133 3.1) MUST be inferred. 2135 When used to indicate properties of a bitstream, the tier- 2136 flag and level-id parameters are derived from the profile, 2137 tier, and level syntax elements in SPS or VPS NAL units as 2138 follows, where general_tier_flag, general_level_idc, 2139 sub_layer_tier_flag[j], and sub_layer_level_idc[j] are 2140 specified in [HEVC]: 2142 If the RTP stream is the highest RTP stream, the 2143 following applies: 2145 o tier-flag = general_tier_flag 2146 o level-id = general_level_idc 2148 Otherwise (the RTP stream is a dependee RTP stream), the 2149 following applies, with j being the value of the sprop- 2150 sub-layer-id parameter: 2152 o tier-flag = sub_layer_tier_flag[j] 2153 o level-id = sub_layer_level_idc[j] 2155 interop-constraints: 2157 A base16 [RFC4648] (hexadecimal) representation of six 2158 bytes of data, consisting of progressive_source_flag, 2159 interlaced_source_flag, non_packed_constraint_flag, 2160 frame_only_constraint_flag, and reserved_zero_44bits. 2162 If the interop-constraints parameter is not present, the 2163 following MUST be inferred: 2165 o progressive_source_flag = 1 2166 o interlaced_source_flag = 0 2167 o non_packed_constraint_flag = 1 2168 o frame_only_constraint_flag = 1 2169 o reserved_zero_44bits = 0 2171 When the interop-constraints parameter is used to indicate 2172 properties of a bitstream, the following applies, where 2173 general_progressive_source_flag, 2174 general_interlaced_source_flag, 2175 general_non_packed_constraint_flag, 2176 general_non_packed_constraint_flag, 2177 general_frame_only_constraint_flag, 2178 general_reserved_zero_44bits, 2179 sub_layer_progressive_source_flag[j], 2180 sub_layer_interlaced_source_flag[j], 2181 sub_layer_non_packed_constraint_flag[j], 2182 sub_layer_frame_only_constraint_flag[j], and 2183 sub_layer_reserved_zero_44bits[j] are specified in [HEVC]: 2185 If the RTP stream is the highest RTP stream, the 2186 following applies: 2188 o progressive_source_flag = 2189 general_progressive_source_flag 2190 o interlaced_source_flag = 2191 general_interlaced_source_flag 2192 o non_packed_constraint_flag = 2193 general_non_packed_constraint_flag 2194 o frame_only_constraint_flag = 2195 general_frame_only_constraint_flag 2196 o reserved_zero_44bits = general_reserved_zero_44bits 2198 Otherwise (the RTP stream is a dependee RTP stream), the 2199 following applies, with j being the value of the sprop- 2200 sub-layer-id parameter: 2202 o progressive_source_flag = 2203 sub_layer_progressive_source_flag[j] 2204 o interlaced_source_flag = 2205 sub_layer_interlaced_source_flag[j] 2206 o non_packed_constraint_flag = 2208 sub_layer_non_packed_constraint_flag[j] 2209 o frame_only_constraint_flag = 2211 sub_layer_frame_only_constraint_flag[j] 2212 o reserved_zero_44bits = 2213 sub_layer_reserved_zero_44bits[j] 2215 Using interop-constraints for capability exchange results 2216 in a requirement on any bitstream to be compliant with the 2217 interop-constraints. 2219 profile-compatibility-indicator: 2221 A base16 [RFC4648] representation of four bytes of data. 2223 When profile-compatibility-indicator is used to indicate 2224 properties of a bitstream, the following applies, where 2225 general_profile_compatibility_flag[j] and 2226 sub_layer_profile_compatibility_flag[i][j] are specified in 2227 [HEVC]: 2229 The profile-compatibility-indicator in this case 2230 indicates additional profiles to the profile defined by 2231 profile_space, profile_id, and interop-constraints the 2232 bitstream conforms to. A decoder that conforms to any 2233 of all the profiles the bitstream conforms to would be 2234 capable of decoding the bitstream. These additional 2235 profiles are defined by profile-space, each set bit of 2236 profile-compatibility-indicator, and interop- 2237 constraints. 2239 If the RTP stream is the highest RTP stream, the 2240 following applies for each value of j in the range of 0 2241 to 31, inclusive: 2243 o bit j of profile-compatibility-indicator = 2244 general_profile_compatibility_flag[j] 2246 Otherwise (the RTP stream is a dependee RTP stream), the 2247 following applies for i equal to sprop-sub-layer-id and 2248 for each value of j in the range of 0 to 31, inclusive: 2250 o bit j of profile-compatibility-indicator = 2251 sub_layer_profile_compatibility_flag[i][j] 2253 Using profile-compatibility-indicator for capability 2254 exchange results in a requirement on any bitstream to be 2255 compliant with the profile-compatibility-indicator. This 2256 is intended to handle cases where any future HEVC profile 2257 is defined as an intersection of two or more profiles. 2259 If this parameter is not present, this parameter defaults 2260 to the following: bit j, with j equal to profile-id, of 2261 profile-compatibility-indicator is inferred to be equal to 2262 1, and all other bits are inferred to be equal to 0. 2264 sprop-sub-layer-id: 2266 This parameter MAY be used to indicate the highest allowed 2267 value of TID in the bitstream. When not present, the value 2268 of sprop-sub-layer-id is inferred to be equal to 6. 2270 The value of sprop-sub-layer-id MUST be in the range of 0 2271 to 6, inclusive. 2273 recv-sub-layer-id: 2275 This parameter MAY be used to signal a receiver's choice of 2276 the offered or declared sub-layer representations in the 2277 sprop-vps. The value of recv-sub-layer-id indicates the 2278 TID of the highest sub-layer of the bitstream that a 2279 receiver supports. When not present, the value of recv- 2280 sub-layer-id is inferred to be equal to the value of the 2281 sprop-sub-layer-id parameter in the SDP offer. 2283 The value of recv-sub-layer-id MUST be in the range of 0 to 2284 6, inclusive. 2286 max-recv-level-id: 2288 This parameter MAY be used to indicate the highest level a 2289 receiver supports. The highest level the receiver supports 2290 is equal to the value of max-recv-level-id divided by 30. 2292 The value of max-recv-level-id MUST be in the range of 0 2293 to 255, inclusive. 2295 When max-recv-level-id is not present, the value is 2296 inferred to be equal to level-id. 2298 max-recv-level-id MUST NOT be present when the highest 2299 level the receiver supports is not higher than the default 2300 level. 2302 tx-mode: 2304 This parameter indicates whether the transmission mode is 2305 SRST, MRST, or MRMT. 2307 The value of tx-mode MUST be equal to "SRST", "MRST" or 2308 "MRMT". When not present, the value of tx-mode is inferred 2309 to be equal to "SRST". 2311 If the value is equal to "MRST", MRST MUST be in use. 2312 Otherwise, if the value is equal to "MRMT", MRMT MUST be in 2313 use. Otherwise (the value is equal to "SRST"), SRST MUST 2314 be in use. 2316 The value of tx-mode MUST be equal to "MRST" for all RTP 2317 streams in an MRST. 2319 The value of tx-mode MUST be equal to "MRMT" for all RTP 2320 streams in an MRMT. 2322 sprop-vps: 2324 This parameter MAY be used to convey any video parameter 2325 set NAL unit of the bitstream for out-of-band transmission 2326 of video parameter sets. The parameter MAY also be used 2327 for capability exchange and to indicate sub-stream 2328 characteristics (i.e. properties of sub-layer 2329 representations as defined in [HEVC]). The value of the 2330 parameter is a comma-separated (',') list of base64 2331 [RFC4648] representations of the video parameter set NAL 2332 units as specified in Section 7.3.2.1 of [HEVC]. 2334 The sprop-vps parameter MAY contain one or more than one 2335 video parameter set NAL unit. However, all other video 2336 parameter sets contained in the sprop-vps parameter MUST be 2337 consistent with the first video parameter set in the sprop- 2338 vps parameter. A video parameter set vpsB is said to be 2339 consistent with another video parameter set vpsA if any 2340 decoder that conforms to the profile, tier, level, and 2341 constraints indicated by the 12 bytes of data starting from 2342 the syntax element general_profile_space to the syntax 2343 element general_level_id, inclusive, in the first 2344 profile_tier_level( ) syntax structure in vpsA can decode 2345 any bitstream that conforms to the profile, tier, level, 2346 and constraints indicated by the 12 bytes of data starting 2347 from the syntax element general_profile_space to the syntax 2348 element general_level_id, inclusive, in the first 2349 profile_tier_level( ) syntax structure in vpsB. 2351 sprop-sps: 2353 This parameter MAY be used to convey sequence parameter set 2354 NAL units of the bitstream for out-of-band transmission of 2355 sequence parameter sets. The value of the parameter is a 2356 comma-separated (',') list of base64 [RFC4648] 2357 representations of the sequence parameter set NAL units as 2358 specified in Section 7.3.2.2 of [HEVC]. 2360 sprop-pps: 2362 This parameter MAY be used to convey picture parameter set 2363 NAL units of the bitstream for out-of-band transmission of 2364 picture parameter sets. The value of the parameter is a 2365 comma-separated (',') list of base64 [RFC4648] 2366 representations of the picture parameter set NAL units as 2367 specified in Section 7.3.2.3 of [HEVC]. 2369 sprop-sei: 2371 This parameter MAY be used to convey one or more SEI 2372 messages that describe bitstream characteristics. When 2373 present, a decoder can rely on the bitstream 2374 characteristics that are described in the SEI messages for 2375 the entire duration of the session, independently from the 2376 persistence scopes of the SEI messages as specified in 2377 [HEVC]. 2379 The value of the parameter is a comma-separated (',') list 2380 of base64 [RFC4648] representations of SEI NAL units as 2381 specified in Section 7.3.2.4 of [HEVC]. 2383 Informative note: Intentionally, no list of applicable 2384 or inapplicable SEI messages is specified here. 2385 Conveying certain SEI messages in sprop-sei may be 2386 sensible in some application scenarios and meaningless 2387 in others. However, a few examples are described below: 2389 1) In an environment where the bitstream was created 2390 from film-based source material, and no splicing is 2391 going to occur during the lifetime of the session, 2392 the film grain characteristics SEI message or the 2393 tone mapping information SEI message are likely 2394 meaningful, and sending them in sprop-sei rather than 2395 in the bitstream at each entry point may help saving 2396 bits and allows to configure the renderer only once, 2397 avoiding unwanted artifacts. 2398 2) The structure of pictures information SEI message in 2399 sprop-sei can be used to inform a decoder of 2400 information on the NAL unit types, picture order 2401 count values, and prediction dependencies of a 2402 sequence of pictures. Having such knowledge can be 2403 helpful for error recovery. 2404 3) Examples for SEI messages that would be meaningless 2405 to be conveyed in sprop-sei include the decoded 2406 picture hash SEI message (it is close to impossible 2407 that all decoded pictures have the same hash-tag), 2408 the display orientation SEI message when the device 2409 is a handheld device (as the display orientation may 2410 change when the handheld device is turned around), or 2411 the filler payload SEI message (as there is no point 2412 in just having more bits in SDP). 2414 max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc: 2416 These parameters MAY be used to signal the capabilities of 2417 a receiver implementation. These parameters MUST NOT be 2418 used for any other purpose. The highest level (specified 2419 by max-recv-level-id) MUST be the highest that the receiver 2420 is fully capable of supporting. max-lsr, max-lps, max-cpb, 2421 max-dpb, max-br, max-tr, and max-tc MAY be used to indicate 2422 capabilities of the receiver that extend the required 2423 capabilities of the highest level, as specified below. 2425 When more than one parameter from the set (max-lsr, max- 2426 lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present, 2427 the receiver MUST support all signaled capabilities 2428 simultaneously. For example, if both max-lsr and max-br 2429 are present, the highest level with the extension of both 2430 the picture rate and bitrate is supported. That is, the 2431 receiver is able to decode bitstreams in which the luma 2432 sample rate is up to max-lsr (inclusive), the bitrate is up 2433 to max-br (inclusive), the coded picture buffer size is 2434 derived as specified in the semantics of the max-br 2435 parameter below, and the other properties comply with the 2436 highest level specified by max-recv-level-id. 2438 Informative note: When the OPTIONAL media type 2439 parameters are used to signal the properties of a 2440 bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max- 2441 br, max-tr, and max-tc are not present, the values of 2442 profile-space, tier-flag, profile-id, profile- 2443 compatibility-indicator, interop-constraints, and level- 2444 id must always be such that the bitstream complies fully 2445 with the specified profile, tier, and level. 2447 max-lsr: 2448 The value of max-lsr is an integer indicating the maximum 2449 processing rate in units of luma samples per second. The 2450 max-lsr parameter signals that the receiver is capable of 2451 decoding video at a higher rate than is required by the 2452 highest level. 2454 When max-lsr is signaled, the receiver MUST be able to 2455 decode bitstreams that conform to the highest level, with 2456 the exception that the MaxLumaSR value in Table A-2 of 2457 [HEVC] for the highest level is replaced with the value of 2458 max-lsr. Senders MAY use this knowledge to send pictures 2459 of a given size at a higher picture rate than is indicated 2460 in the highest level. 2462 When not present, the value of max-lsr is inferred to be 2463 equal to the value of MaxLumaSR given in Table A-2 of 2464 [HEVC] for the highest level. 2466 The value of max-lsr MUST be in the range of MaxLumaSR to 2467 16 * MaxLumaSR, inclusive, where MaxLumaSR is given in 2468 Table A-2 of [HEVC] for the highest level. 2470 max-lps: 2471 The value of max-lps is an integer indicating the maximum 2472 picture size in units of luma samples. The max-lps 2473 parameter signals that the receiver is capable of decoding 2474 larger picture sizes than are required by the highest 2475 level. When max-lps is signaled, the receiver MUST be able 2476 to decode bitstreams that conform to the highest level, 2477 with the exception that the MaxLumaPS value in Table A-1 of 2478 [HEVC] for the highest level is replaced with the value of 2479 max-lps. Senders MAY use this knowledge to send larger 2480 pictures at a proportionally lower picture rate than is 2481 indicated in the highest level. 2483 When not present, the value of max-lps is inferred to be 2484 equal to the value of MaxLumaPS given in Table A-1 of 2485 [HEVC] for the highest level. 2487 The value of max-lps MUST be in the range of MaxLumaPS to 2488 16 * MaxLumaPS, inclusive, where MaxLumaPS is given in 2489 Table A-1 of [HEVC] for the highest level. 2491 max-cpb: 2492 The value of max-cpb is an integer indicating the maximum 2493 coded picture buffer size in units of CpbBrVclFactor bits 2494 for the VCL HRD parameters and in units of CpbBrNalFactor 2495 bits for the NAL HRD parameters, where CpbBrVclFactor and 2496 CpbBrNalFactor are defined in Section A.4 of [HEVC]. The 2497 max-cpb parameter signals that the receiver has more memory 2498 than the minimum amount of coded picture buffer memory 2499 required by the highest level. When max-cpb is signaled, 2500 the receiver MUST be able to decode bitstreams that conform 2501 to the highest level, with the exception that the MaxCPB 2502 value in Table A-1 of [HEVC] for the highest level is 2503 replaced with the value of max-cpb. Senders MAY use this 2504 knowledge to construct coded bitstreams with greater 2505 variation of bitrate than can be achieved with the MaxCPB 2506 value in Table A-1 of [HEVC]. 2508 When not present, the value of max-cpb is inferred to be 2509 equal to the value of MaxCPB given in Table A-1 of [HEVC] 2510 for the highest level. 2512 The value of max-cpb MUST be in the range of MaxCPB to 2513 16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table 2514 A-1 of [HEVC] for the highest level. 2516 Informative note: The coded picture buffer is used in 2517 the hypothetical reference decoder (Annex C of HEVC). 2518 The use of the hypothetical reference decoder is 2519 recommended in HEVC encoders to verify that the produced 2520 bitstream conforms to the standard and to control the 2521 output bitrate. Thus, the coded picture buffer is 2522 conceptually independent of any other potential buffers 2523 in the receiver, including de-packetization and de- 2524 jitter buffers. The coded picture buffer need not be 2525 implemented in decoders as specified in Annex C of HEVC, 2526 but rather standard-compliant decoders can have any 2527 buffering arrangements provided that they can decode 2528 standard-compliant bitstreams. Thus, in practice, the 2529 input buffer for a video decoder can be integrated with 2530 de-packetization and de-jitter buffers of the receiver. 2532 max-dpb: 2533 The value of max-dpb is an integer indicating the maximum 2534 decoded picture buffer size in units decoded pictures at 2535 the MaxLumaPS for the highest level, i.e. the number of 2536 decoded pictures at the maximum picture size defined by the 2537 highest level. The value of max-dpb MUST be in the range 2538 of 1 to 16, respectively. The max-dpb parameter signals 2539 that the receiver has more memory than the minimum amount 2540 of decoded picture buffer memory required by default, which 2541 is MaxDpbPicBuf as defined in [HEVC] (equal to 6). When 2542 max-dpb is signaled, the receiver MUST be able to decode 2543 bitstreams that conform to the highest level, with the 2544 exception that the MaxDpbPicBuff value defined in [HEVC] as 2545 6 is replaced with the value of max-dpb. Consequently, a 2546 receiver that signals max-dpb MUST be capable of storing 2547 the following number of decoded pictures (MaxDpbSize) in 2548 its decoded picture buffer: 2550 if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) ) 2551 MaxDpbSize = Min( 4 * max-dpb, 16 ) 2552 else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) ) 2553 MaxDpbSize = Min( 2 * max-dpb, 16 ) 2554 else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 2555 ) ) 2556 MaxDpbSize = Min( (4 * max-dpb) / 3, 16 ) 2557 else 2558 MaxDpbSize = max-dpb 2560 Wherein MaxLumaPS given in Table A-1 of [HEVC] for the 2561 highest level and PicSizeInSamplesY is the current size of 2562 each decoded picture in units of luma samples as defined in 2563 [HEVC]. 2565 The value of max-dpb MUST be greater than or equal to the 2566 value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. 2567 Senders MAY use this knowledge to construct coded 2568 bitstreams with improved compression. 2570 When not present, the value of max-dpb is inferred to be 2571 equal to the value of MaxDpbPicBuf (i.e. 6) as defined in 2572 [HEVC]. 2574 Informative note: This parameter was added primarily to 2575 complement a similar codepoint in the ITU-T 2576 Recommendation H.245, so as to facilitate signaling 2577 gateway designs. The decoded picture buffer stores 2578 reconstructed samples. There is no relationship between 2579 the size of the decoded picture buffer and the buffers 2580 used in RTP, especially de-packetization and de-jitter 2581 buffers. 2583 max-br: 2584 The value of max-br is an integer indicating the maximum 2585 video bitrate in units of CpbBrVclFactor bits per second 2586 for the VCL HRD parameters and in units of CpbBrNalFactor 2587 bits per second for the NAL HRD parameters, where 2588 CpbBrVclFactor and CpbBrNalFactor are defined in Section 2589 A.4 of [HEVC]. 2591 The max-br parameter signals that the video decoder of the 2592 receiver is capable of decoding video at a higher bitrate 2593 than is required by the highest level. 2595 When max-br is signaled, the video codec of the receiver 2596 MUST be able to decode bitstreams that conform to the 2597 highest level, with the following exceptions in the limits 2598 specified by the highest level: 2600 o The value of max-br replaces the MaxBR value in Table A- 2601 2 of [HEVC] for the highest level. 2602 o When the max-cpb parameter is not present, the result of 2603 the following formula replaces the value of MaxCPB in 2604 Table A-1 of [HEVC]: 2606 (MaxCPB of the highest level) * max-br / (MaxBR of 2607 the highest level) 2609 For example, if a receiver signals capability for Main 2610 profile Level 2 with max-br equal to 2000, this indicates a 2611 maximum video bitrate of 2000 kbits/sec for VCL HRD 2612 parameters, a maximum video bitrate of 2200 kbits/sec for 2613 NAL HRD parameters, and a CPB size of 2000000 bits (2000000 2614 / 1500000 * 1500000). 2616 Senders MAY use this knowledge to send higher bitrate video 2617 as allowed in the level definition of Annex A of HEVC to 2618 achieve improved video quality. 2620 When not present, the value of max-br is inferred to be 2621 equal to the value of MaxBR given in Table A-2 of [HEVC] 2622 for the highest level. 2624 The value of max-br MUST be in the range of MaxBR to 2625 16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of 2626 [HEVC] for the highest level. 2628 Informative note: This parameter was added primarily to 2629 complement a similar codepoint in the ITU-T 2630 Recommendation H.245, so as to facilitate signaling 2631 gateway designs. The assumption that the network is 2632 capable of handling such bitrates at any given time 2633 cannot be made from the value of this parameter. In 2634 particular, no conclusion can be drawn that the signaled 2635 bitrate is possible under congestion control 2636 constraints. 2638 max-tr: 2639 The value of max-tr is an integer indication the maximum 2640 number of tile rows. The max-tr parameter signals that the 2641 receiver is capable of decoding video with a larger number 2642 of tile rows than the value allowed by the highest level. 2644 When max-tr is signaled, the receiver MUST be able to 2645 decode bitstreams that conform to the highest level, with 2646 the exception that the MaxTileRows value in Table A-1 of 2647 [HEVC] for the highest level is replaced with the value of 2648 max-tr. 2650 Senders MAY use this knowledge to send pictures utilizing a 2651 larger number of tile rows than the value allowed by the 2652 highest level. 2654 When not present, the value of max-tr is inferred to be 2655 equal to the value of MaxTileRows given in Table A-1 of 2656 [HEVC] for the highest level. 2658 The value of max-tr MUST be in the range of MaxTileRows to 2659 16 * MaxTileRows, inclusive, where MaxTileRows is given in 2660 Table A-1 of [HEVC] for the highest level. 2662 max-tc: 2663 The value of max-tc is an integer indication the maximum 2664 number of tile columns. The max-tc parameter signals that 2665 the receiver is capable of decoding video with a larger 2666 number of tile columns than the value allowed by the 2667 highest level. 2669 When max-tc is signaled, the receiver MUST be able to 2670 decode bitstreams that conform to the highest level, with 2671 the exception that the MaxTileCols value in Table A-1 of 2672 [HEVC] for the highest level is replaced with the value of 2673 max-tc. 2675 Senders MAY use this knowledge to send pictures utilizing a 2676 larger number of tile columns than the value allowed by the 2677 highest level. 2679 When not present, the value of max-tc is inferred to be 2680 equal to the value of MaxTileCols given in Table A-1 of 2681 [HEVC] for the highest level. 2683 The value of max-tc MUST be in the range of MaxTileCols to 2684 16 * MaxTileCols, inclusive, where MaxTileCols is given in 2685 Table A-1 of [HEVC] for the highest level. 2687 max-fps: 2689 The value of max-fps is an integer indicating the maximum 2690 picture rate in units of pictures per 100 seconds that can 2691 be effectively processed by the receiver. The max-fps 2692 parameter MAY be used to signal that the receiver has a 2693 constraint in that it is not capable of processing video 2694 effectively at the full picture rate that is implied by the 2695 highest level and, when present, one or more of the 2696 parameters max-lsr, max-lps, and max-br. 2698 The value of max-fps is not necessarily the picture rate at 2699 which the maximum picture size can be sent, it constitutes 2700 a constraint on maximum picture rate for all resolutions. 2702 Informative note: The max-fps parameter is semantically 2703 different from max-lsr, max-lps, max-cpb, max-dpb, max- 2704 br, max-tr, and max-tc in that max-fps is used to signal 2705 a constraint, lowering the maximum picture rate from 2706 what is implied by other parameters. 2708 The encoder MUST use a picture rate equal to or less than 2709 this value. In cases where the max-fps parameter is absent 2710 the encoder is free to choose any picture rate according to 2711 the highest level and any signaled optional parameters. 2713 The value of max-fps MUST be smaller than or equal to the 2714 full picture rate that is implied by the highest level and, 2715 when present, one or more of the parameters max-lsr, max- 2716 lps, and max-br. 2718 sprop-max-don-diff: 2720 If tx-mode is equal to "SRST" and there is no NAL unit 2721 naluA that is followed in transmission order by any NAL 2722 unit preceding naluA in decoding order (i.e. the 2723 transmission order of the NAL units is the same as the 2724 decoding order), the value of this parameter MUST be equal 2725 to 0. 2727 Otherwise, if tx-mode is equal to "MRST" or "MRMT", the 2728 decoding order of the NAL units of all the RTP streams is 2729 the same as the NAL unit transmission order and the NAL 2730 unit output order, the value of this parameter MUST be 2731 equal to either 0 or 1. 2733 Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the 2734 decoding order of the NAL units of all the RTP streams is 2735 the same as the NAL unit transmission order but not the 2736 same as the NAL unit output order, the value of this 2737 parameter MUST be equal to 1. 2739 Otherwise, this parameter specifies the maximum absolute 2740 difference between the decoding order number (i.e., AbsDon) 2741 values of any two NAL units naluA and naluB, where naluA 2742 follows naluB in decoding order and precedes naluB in 2743 transmission order. 2745 The value of sprop-max-don-diff MUST be an integer in the 2746 range of 0 to 32767, inclusive. 2748 When not present, the value of sprop-max-don-diff is 2749 inferred to be equal to 0. 2751 sprop-depack-buf-nalus: 2753 This parameter specifies the maximum number of NAL units 2754 that precede a NAL unit in transmission order and follow 2755 the NAL unit in decoding order. 2757 The value of sprop-depack-buf-nalus MUST be an integer in 2758 the range of 0 to 32767, inclusive. 2760 When not present, the value of sprop-depack-buf-nalus is 2761 inferred to be equal to 0. 2763 When sprop-max-don-diff is present and greater than 0, this 2764 parameter MUST be present and the value MUST be greater 2765 than 0. 2767 sprop-depack-buf-bytes: 2769 This parameter signals the required size of the de- 2770 packetization buffer in units of bytes. The value of the 2771 parameter MUST be greater than or equal to the maximum 2772 buffer occupancy (in units of bytes) of the de- 2773 packetization buffer as specified in Section 6. 2775 The value of sprop-depack-buf-bytes MUST be an integer in 2776 the range of 0 to 4294967295, inclusive. 2778 When sprop-max-don-diff is present and greater than 0, this 2779 parameter MUST be present and the value MUST be greater 2780 than 0. When not present, the value of sprop-depack-buf- 2781 bytes is inferred to be equal to 0. 2783 Informative note: The value of sprop-depack-buf-bytes 2784 indicates the required size of the de-packetization 2785 buffer only. When network jitter can occur, an 2786 appropriately sized jitter buffer has to be available as 2787 well. 2789 depack-buf-cap: 2791 This parameter signals the capabilities of a receiver 2792 implementation and indicates the amount of de-packetization 2793 buffer space in units of bytes that the receiver has 2794 available for reconstructing the NAL unit decoding order 2795 from NAL units carried in one or more RTP streams. A 2796 receiver is able to handle any RTP stream, and all RTP 2797 streams the RTP stream depends on, when present, for which 2798 the value of the sprop-depack-buf-bytes parameter is 2799 smaller than or equal to this parameter. 2801 When not present, the value of depack-buf-cap is inferred 2802 to be equal to 4294967295. The value of depack-buf-cap 2803 MUST be an integer in the range of 1 to 4294967295, 2804 inclusive. 2806 Informative note: depack-buf-cap indicates the maximum 2807 possible size of the de-packetization buffer of the 2808 receiver only, without allowing for network jitter. 2810 sprop-segmentation-id: 2812 This parameter MAY be used to signal the segmentation tools 2813 present in the bitstream and that can be used for 2814 parallelization. The value of sprop-segmentation-id MUST 2815 be an integer in the range of 0 to 3, inclusive. When not 2816 present, the value of sprop-segmentation-id is inferred to 2817 be equal to 0. 2819 When sprop-segmentation-id is equal to 0, no information 2820 about the segmentation tools is provided. When sprop- 2821 segmentation-id is equal to 1, it indicates that slices are 2822 present in the bitstream. When sprop-segmentation-id is 2823 equal to 2, it indicates that tiles are present in the 2824 bitstream. When sprop-segmentation-id is equal to 3, it 2825 indicates that WPP is used in the bitstream. 2827 sprop-spatial-segmentation-idc: 2829 A base16 [RFC4648] representation of the syntax element 2830 min_spatial_segmentation_idc as specified in [HEVC]. This 2831 parameter MAY be used to describe parallelization 2832 capabilities of the bitstream. 2834 dec-parallel-cap: 2836 This parameter MAY be used to indicate the decoder's 2837 additional decoding capabilities given the presence of 2838 tools enabling parallel decoding, such as slices, tiles, 2839 and WPP, in the bitstream. The decoding capability of the 2840 decoder may vary with the setting of the parallel decoding 2841 tools present in the bitstream, e.g. the size of the tiles 2842 that are present in a bitstream. Therefore, multiple 2843 capability points may be provided, each indicating the 2844 minimum required decoding capability that is associated 2845 with a parallelism requirement, which is a requirement on 2846 the bitstream that enables parallel decoding. 2848 Each capability point is defined as a combination of 1) a 2849 parallelism requirement, 2) a profile (determined by 2850 profile-space and profile-id), 3) a highest level, and 4) a 2851 maximum processing rate, a maximum picture size, and a 2852 maximum video bitrate that may be equal to or greater than 2853 that determined by the highest level. The parameter's 2854 syntax in ABNF [RFC5234] is as follows: 2856 dec-parallel-cap = "dec-parallel-cap={" cap-point *("," 2857 cap-point) "}" 2859 cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";" 2860 cap-parameter) 2862 spatial-seg-idc = 1*4DIGIT ; (1-4095) 2864 cap-parameter = tier-flag / level-id / max-lsr 2865 / max-lps / max-br 2867 tier-flag = "tier-flag" EQ ("0" / "1") 2869 level-id = "level-id" EQ 1*3DIGIT ; (0-255) 2871 max-lsr = "max-lsr" EQ 1*20DIGIT ; (0- 2872 18,446,744,073,709,551,615) 2874 max-lps = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295) 2876 max-br = "max-br" EQ 1*20DIGIT ; (0- 2877 18,446,744,073,709,551,615) 2879 EQ = "=" 2881 The set of capability points expressed by the dec-parallel- 2882 cap parameter is enclosed in a pair of curly braces ("{}"). 2883 Each set of two consecutive capability points is separated 2884 by a comma (','). Within each capability point, each set 2885 of two consecutive parameters, and when present, their 2886 values, is separated by a semicolon (';'). 2888 The profile of all capability points is determined by 2889 profile-space and profile-id that are outside the dec- 2890 parallel-cap parameter. 2892 Each capability point starts with an indication of the 2893 parallelism requirement, which consists of a parallel tool 2894 type, which may be equal to 'w' or 't', and a decimal value 2895 of the spatial-seg-idc parameter. When the type is 'w', 2896 the capability point is valid only for H.265 bitstreams 2897 with WPP in use, i.e. entropy_coding_sync_enabled_flag 2898 equal to 1. When the type is 't', the capability point is 2899 valid only for H.265 bitstreams with WPP not in use (i.e. 2900 entropy_coding_sync_enabled_flag equal to 0). The 2901 capability-point is valid only for H.265 bitstreams with 2902 min_spatial_segmentation_idc equal to or greater than 2903 spatial-seg-idc. 2905 After the parallelism requirement indication, each 2906 capability point continues with one or more pairs of 2907 parameter and value in any order for any of the following 2908 parameters: 2910 o tier-flag 2911 o level-id 2912 o max-lsr 2913 o max-lps 2914 o max-br 2916 At most one occurrence of each of the above five parameters 2917 is allowed within each capability point. 2919 The values of dec-parallel-cap.tier-flag and dec-parallel- 2920 cap.level-id for a capability point indicate the highest 2921 level of the capability point. The values of dec-parallel- 2922 cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel- 2923 cap.max-br for a capability point indicate the maximum 2924 processing rate in units of luma samples per second, the 2925 maximum picture size in units of luma samples, and the 2926 maximum video bitrate (in units of CpbBrVclFactor bits per 2927 second for the VCL HRD parameters and in units of 2928 CpbBrNalFactor bits per second for the NAL HRD parameters 2929 where CpbBrVclFactor and CpbBrNalFactor are defined in 2930 Section A.4 of [HEVC]). 2932 When not present, the value of dec-parallel-cap.tier-flag 2933 is inferred to be equal to the value of tier-flag outside 2934 the dec-parallel-cap parameter. When not present, the 2935 value of dec-parallel-cap.level-id is inferred to be equal 2936 to the value of max-recv-level-id outside the dec-parallel- 2937 cap parameter. When not present, the value of dec- 2938 parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec- 2939 parallel-cap.max-br is inferred to be equal to the value of 2940 max-lsr, max-lps, or max-br, respectively, outside the dec- 2941 parallel-cap parameter. 2943 The general decoding capability, expressed by the set of 2944 parameters outside of dec-parallel-cap, is defined as the 2945 capability point that is determined by the following 2946 combination of parameters: 1) the parallelism requirement 2947 corresponding to the value of sprop-segmentation-id equal 2948 to 0 for a bitstream, 2) the profile determined by profile- 2949 space, profile-id, profile-compatibility-indicator, and 2950 interop-constraints, 3) the tier and the highest level 2951 determined by tier-flag and max-recv-level-id, and 4) the 2952 maximum processing rate, the maximum picture size, and the 2953 maximum video bitrate determined by the highest level. The 2954 general decoding capability MUST NOT be included as one of 2955 the set of capability points in the dec-parallel-cap 2956 parameter. 2958 For example, the following parameters express the general 2959 decoding capability of 720p30 (Level 3.1) plus an 2960 additional decoding capability of 1080p30 (Level 4) given 2961 that the spatially largest tile or slice used in the 2962 bitstream is equal to or less than 1/3 of the picture size: 2964 a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level- 2965 id=120} 2967 For another example, the following parameters express an 2968 additional decoding capability of 1080p30, using dec- 2969 parallel-cap.max-lsr and dec-parallel-cap.max-lps, given 2970 that WPP is used in the bitstream: 2972 a=fmtp:98 level-id=93;dec-parallel-cap={w:8; 2973 max-lsr=62668800;max-lps=2088960} 2975 Informative note: When min_spatial_segmentation_idc is 2976 present in a bitstream and WPP is not used, [HEVC] 2977 specifies that there is no slice or no tile in the 2978 bitstream containing more than 4 * PicSizeInSamplesY / 2979 ( min_spatial_segmentation_idc + 4 ) luma samples. 2981 include-dph: 2983 This parameter is used to indicate the capability and 2984 preference to utilize or include decoded picture hash (DPH) 2985 SEI messages (See Section D.3.19 of [HEVC]) in the 2986 bitstream. DPH SEI messages can be used to detect picture 2987 corruption so the receiver can request picture repair, see 2988 Section 8. The value is a comma separated list of hash 2989 types that is supported or requested to be used, each hash 2990 type provided as an unsigned integer value (0-255), with 2991 the hash types listed from most preferred to the least 2992 preferred. Example: "include-dph=0,2", which indicates the 2993 capability for MD5 (most preferred) and Checksum (less 2994 preferred). If the parameter is not included or the value 2995 contains no hash types, then no capability to utilize DPH 2996 SEI messages is assumed. Note that DPH SEI messages MAY 2997 still be included in the bitstream even when there is no 2998 declaration of capability to use them, as in general SEI 2999 messages do not affect the normative decoding process and 3000 decoders are allowed to ignore SEI messages. 3002 Encoding considerations: 3004 This type is only defined for transfer via RTP (RFC 3550). 3006 Security considerations: 3008 See Section 9 of RFC XXXX. 3010 Public specification: 3012 Please refer to Section 13 of RFC XXXX. 3014 Additional information: None 3016 File extensions: none 3018 Macintosh file type code: none 3020 Object identifier or OID: none 3022 Person & email address to contact for further information: 3024 Ye-Kui Wang (yekuiw@qti.qualcomm.com). 3026 Intended usage: COMMON 3028 Author: See Section 14 of RFC XXXX. 3030 Change controller: 3032 IETF Audio/Video Transport Payloads working group delegated 3033 from the IESG. 3035 7.2 SDP Parameters 3037 The receiver MUST ignore any parameter unspecified in this memo. 3039 7.2.1 Mapping of Payload Type Parameters to SDP 3041 The media type video/H265 string is mapped to fields in the 3042 Session Description Protocol (SDP) [RFC4566] as follows: 3044 o The media name in the "m=" line of SDP MUST be video. 3046 o The encoding name in the "a=rtpmap" line of SDP MUST be H265 3047 (the media subtype). 3049 o The clock rate in the "a=rtpmap" line MUST be 90000. 3051 o The OPTIONAL parameters "profile-space", "profile-id", "tier- 3052 flag", "level-id", "interop-constraints", "profile- 3053 compatibility-indicator", "sprop-sub-layer-id", "recv-sub- 3054 layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max- 3055 lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc", 3056 "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus", 3057 "sprop-depack-buf-bytes", "depack-buf-cap", "sprop- 3058 segmentation-id", "sprop-spatial-segmentation-idc", "dec- 3059 parallel-cap", and "include-dph", when present, MUST be 3060 included in the "a=fmtp" line of SDP. This parameter is 3061 expressed as a media type string, in the form of a semicolon 3062 separated list of parameter=value pairs. 3064 o The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop- 3065 pps", when present, MUST be included in the "a=fmtp" line of 3066 SDP or conveyed using the "fmtp" source attribute as specified 3067 in Section 6.3 of [RFC5576]. For a particular media format 3068 (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop- 3069 pps" MUST NOT be both included in the "a=fmtp" line of SDP and 3070 conveyed using the "fmtp" source attribute. When included in 3071 the "a=fmtp" line of SDP, these parameters are expressed as a 3072 media type string, in the form of a semicolon separated list 3073 of parameter=value pairs. When conveyed in the "a=fmtp" line 3074 of SDP for a particular payload type, the parameters "sprop- 3075 vps", "sprop-sps", and "sprop-pps" MUST be applied to each 3076 SSRC with the payload type. When conveyed using the "fmtp" 3077 source attribute, these parameters are only associated with 3078 the given source and payload type as parts of the "fmtp" 3079 source attribute. 3081 Informative note: Conveyance of "sprop-vps", "sprop-sps", 3082 and "sprop-pps" using the "fmtp" source attribute allows 3083 for out-of-band transport of parameter sets in topologies 3084 like Topo-Video-switch-MCU as specified in [RFC5117]. 3086 An example of media representation in SDP is as follows: 3088 m=video 49170 RTP/AVP 98 3089 a=rtpmap:98 H265/90000 3090 a=fmtp:98 profile-id=1; 3091 sprop-vps=