idnits 2.17.1 draft-zhao-avtcore-rtp-vvc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (September 23, 2019) is 1675 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 1045 == Missing Reference: 'Wang05' is mentioned on line 1295, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO23090-3' ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Possible downref: Non-RFC (?) normative reference: ref. 'VVC' Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Zhao 3 Internet-Draft S. Wenger 4 Intended status: Standards Track Tencent 5 Expires: March 26, 2020 September 23, 2019 7 RTP Payload Format for Versatile Video Coding (VVC) 8 draft-zhao-avtcore-rtp-vvc-00 10 Abstract 12 This memo describes an RTP payload format for the video coding 13 standard ITU-T Recommendation H.266 and ISO/IEC International 14 Standard 23090-3, both also known as Versatile Video Coding (VVC) and 15 developed by the Joint Video Experts Team (JVET). The RTP payload 16 format allows for packetization of one or more Network Abstraction 17 Layer (NAL) units in each RTP packet payload as well as fragmentation 18 of a NAL unit into multiple RTP packets. The payload format has wide 19 applicability in videoconferencing, Internet video streaming, and 20 high-bitrate entertainment-quality video, among others. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on March 26, 2020. 39 Copyright Notice 41 Copyright (c) 2019 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Overview of the VVC Codec . . . . . . . . . . . . . . . . 3 58 1.1.1. Coding-Tool Features (informative) . . . . . . . . . 3 59 1.1.2. Systems and Transport Interfaces . . . . . . . . . . 6 60 1.1.3. Parallel Processing Support (informative) . . . . . . 10 61 1.1.4. NAL Unit Header . . . . . . . . . . . . . . . . . . . 10 62 1.2. Overview of the Payload Format . . . . . . . . . . . . . 11 63 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 11 64 3. Definitions and Abbreviations . . . . . . . . . . . . . . . . 12 65 3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 12 66 3.1.1. Definitions from the VVC Specification . . . . . . . 12 67 3.1.2. Definitions Specific to This Memo . . . . . . . . . . 12 68 3.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 12 69 4. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . 12 70 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 12 71 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 14 72 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 15 73 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 15 74 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 16 75 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 21 76 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 24 77 5. Packetization Rulesumber . . . . . . . . . . . . . . . . . . 25 78 6. De-packetization Process . . . . . . . . . . . . . . . . . . 26 79 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 28 80 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 28 81 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 28 82 8.2. Slice Loss Indication (SLI) . . . . . . . . . . . . . . . 29 83 8.3. Reference Picture Selection Indication (RPSI) . . . . . . 29 84 8.4. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 29 85 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 86 10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 31 87 11. IANA Considertaions . . . . . . . . . . . . . . . . . . . . . 32 88 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 32 89 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 90 13.1. Normative References . . . . . . . . . . . . . . . . . . 32 91 13.2. Informative References . . . . . . . . . . . . . . . . . 34 92 Appendix A. Change History . . . . . . . . . . . . . . . . . . . 35 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 95 1. Introduction 97 The VVC specification, formally published as both ITU-T 98 Recommendation H.266 and ISO/IEC International Standard 23090-23 99 [ISO23090-3], is planned for ratification in mid 2020. A draft 100 that's currently in the approval process of ISO/IEC can be found as 101 [VVC]. H.266 is reported to provide significant coding efficiency 102 gains over H.265 [H.265] and earlier video codec formats. 104 This memo describes an RTP payload format for [VVC]. It shares its 105 basic design with the NAL unit-based RTP payload formats of 106 [RFC7798], [RFC6184] and [RFC6190] . With respect to design 107 philosophy, security, congestion control, and overall implementation 108 complexity, it has similar properties to those earlier payload format 109 specifications. This is a conscious choice, as at least RFC 6184 is 110 widely deployed and generally known in the relevant implementer 111 communities. Certain mechanisms known from RFC 6190 were 112 incorporated as [VVC] version 1 supports all temporal, spatial, and 113 SNR scalability. 115 1.1. Overview of the VVC Codec 117 [VVC] and H.265 share a similar hybrid video codec design. In this 118 memo, we provide a very brief overview of those features of [VVC] 119 that are, in some form, addressed by the payload format specified 120 herein. Implementers have to read, understand, and apply the ITU- 121 T/ISO/IEC specifications pertaining to [VVC] to arrive at 122 interoperable, well-performing implementations. 124 Conceptually, both [VVC] and HEVC include a Video Coding Layer (VCL), 125 which is often used to refer to the coding-tool features, and a 126 Network Abstraction Layer (NAL), which is often used to refer to the 127 systems and transport interface aspects of the codecs. 129 1.1.1. Coding-Tool Features (informative) 131 Coding tool features are described below with occasional reference to 132 the coding tool set of HEVC, which is believed to be well known in 133 the community. 135 Similar to earlier hybrid-video-coding-based standards, including 136 HEVC, the following basic video coding design is employed by [VVC]. 137 A prediction signal is first formed by either intra- or motion- 138 compensated prediction, and the residual (the difference between the 139 original and the prediction) is then coded. The gains in coding 140 efficiency are achieved by redesigning and improving almost all parts 141 of the codec over earlier designs. In addition, VVC includes several 142 tools to make the implementation on parallel architectures easier. 144 Finally, VVC includes temporal, spatial, and SNR scalability as well 145 as multiview coding support. 147 Coding blocks and transform structure 149 Among major coding-tool differences between HEVC and [VVC], one of 150 the important improvements is the more flexible coding tree structure 151 in VVC, i.e., multi-type tree. In addition to quadtree, binary and 152 ternary trees are also supported, which contributes significant 153 improvement in coding efficiency. Moreover, the maximum size of 154 Coding Tree Unit (CTU) is increased from 64x64 to 128x128. To 155 improve the coding efficiency of chroma signal, luma chroma separated 156 trees at CTU level may be employed for intra-slices. As to 157 transform, the square transforms in HEVC are extended to non-square 158 transforms for rectangular blocks resulted from binary and ternary 159 tree splits. Besides, [VVC] supports multiple transform sets (MTS), 160 including DCT-2, DST-7, and DCT-8 as well as the non-separable 161 secondary transform. The transforms used in [VVC] can have different 162 sizes with support for larger transform sizes. For DCT-2, the 163 transform sizes range from 2x2 to 64x64, and for DST-7 and DCT-8, the 164 transform sizes range from 4x4 to 32x32. In addition, [VVC] also 165 support sub-block transform for both intra and inter coded blocks. 166 For intra coded blocks, intra sub-partitioning (ISP) may be used to 167 allow sub-block based intra prediction and transform. For inter 168 blocks, sub-block transform may be used assuming that only a part of 169 an inter-block has non-zero transform coefficients. 171 Entropy coding 173 Similar to HEVC , [VVC] uses a single entropy-coding engine, which is 174 based on Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC], 175 but with the support of multi-window sizes. The window sizes can be 176 initialized differently for different context models. Due to such a 177 design, it has more efficient adaptation speed and better coding 178 efficiency. A joint chroma residual coding scheme is applied to 179 further exploit the correlation between the residuals of two colour 180 components. In [VVC], different residual coding schemes are applied 181 for regular transform coefficients and residual samples generated 182 using transform-skip mode. 184 In-loop filtering 186 [VVC] has more feature supports in loop filters than HEVC. The 187 deblocking filter in [VVC] is similar to HEVC but operates at a 188 smaller grid. After deblocking and sample adaptive offset (SAO), an 189 adaptive loop filter (ALF) may be used. As a Wiener filter, ALF 190 reduces distortion of decoded pictures. Besides, [VVC] introduces a 191 new module before deblocking called luma mapping with chroma scaling 192 to fully utilize the dynamic range of signal so that rate-distortion 193 performance of both SDR and HDR content is improved. 195 Motion prediction and coding 197 Compared to HEVC, [VVC] introduces several improvements in this area. 198 First, there is the Adaptive motion vector resolution (AMVR), which 199 can save bit cost for motion vectors by adaptively signaling motion 200 vector resolution. Then the Affine motion compensation is included 201 to capture complicated motion like zooming and rotation. Meanwhile, 202 prediction refinement with the optical flow with affine mode (PROF) 203 is further deployed to mimic affine motion at the pixel level. 204 Thirdly the decoder side motion vector refinement (DMVR) is a method 205 to derive MV vector at decoder side so that fewer bits may be spent 206 on motion vectors. Bi-directional optical flow (BDOF) is a similar 207 method to DMVR but at 4x4 sub-block level. Another difference is 208 that DMVR is based on block matching while BDOF derives MVs with 209 equations. Furthermore, merge with motion vector difference (MMVD) 210 is a special mode, which further signals a limited set of motion 211 vector differences on top of merge mode. In addition to MMVD, there 212 are another three types of special merge modes, i.e., sub-block 213 merge, triangle, and combined intra-/inter- prediction (CIIP). Sub- 214 block merge list includes one candidate of sub-block temporal motion 215 vector prediction (SbTMVP) and up to four candidates of affine motion 216 vectors. Triangle is based on triangular block motion compensation. 217 CIIP combines intra- and inter- predictions with weighting. 218 Moreover, weighting in bi-prediction has more flexibility then HEVC. 219 Adaptive weighting may be employed with a block-level tool called bi- 220 prediction with CU based weighting (BCW). 222 Intra prediction and intra-coding 224 To capture the diversified local image texture directions with finer 225 granularity, [VVC] supports 65 angular directions instead of 33 226 directions in HEVC. The intra mode coding is based on a 6 most 227 probable mode scheme, and the 6 most probable modes are derived using 228 the neighboring intra prediction directions. In addition, to deal 229 with the different distributions of intra prediction angles for 230 different block aspect ratios, a wide-angle intra prediction (WAIP) 231 scheme is applied in [VVC] by including intra prediction angles 232 beyond those present in HEVC. Unlike HEVC which only allows using 233 the most adjacent line of reference samples for intra prediction, 234 [VVC] also allows using two further reference lines, as known as 235 multi-reference-line (MRL) intra prediction. The additional 236 reference lines can be only used for 6 most probable intra prediction 237 modes. To capture the strong correlation between different colour 238 components, in [VVC], a cross-component linear mode (CCLM) is 239 utilized which assumes a linear relationship between the luma sample 240 values and their associated chroma samples. For intra prediction, 241 [VVC] also applies a position-dependent prediction combination (PDPC) 242 for refining the prediction samples closer to the intra prediction 243 block boundary. Matrix-based intra-prediction (MIP) modes are also 244 used in [VVC] which generates an up to 8x8 intra prediction block 245 using a weighted sum of downsampled neighboring reference samples, 246 and the weightings are hardcoded constants. 248 Other coding-tool feature 250 [VVC] introduces dependent quantization (DQ) to reduce quantization 251 error by state-based switching between two quantizers. 253 1.1.2. Systems and Transport Interfaces 255 [VVC] inherits the basic systems and transport interfaces designs 256 from HEVC and H.264. These include the NAL-unit-based syntax 257 structure, the hierarchical syntax and data unit structure, the 258 Supplemental Enhancement Information (SEI) message mechanism, and the 259 video buffering model based on the Hypothetical Reference Decoder 260 (HRD). The scalability features of [VVC] are conceptually similar to 261 the scalable variant of HEVC known as SHVC. The hierarchical syntax 262 and data unit structure consists of parameter sets at various levels 263 (decoder, sequence (including layers), sequence (per layer), 264 picture), slice-level header parameters, and lower-level parameters. 266 Below described are a number of key components that influenced the 267 Network Abstraction Layer design of VVC as well as this memo. 269 Decoder parameter set 271 The Decoder parameter set includes parameters that stay constant for 272 the lifetime of a Video Bitstream, which in IETF terms can translate 273 to the lifetime of a session. Decoder parameter sets can include 274 profile, level, and sub-profile information to determine a maximum 275 complexity interop point that is guaranteed to be never exceeded, 276 even if splicing of video sequences occurs within a session. It 277 further optionally includes constraint flags, which indicate that the 278 video bitstream will be constraint of the use of certain features as 279 indicated by the values of those flags. With this, a bitstream can 280 be labelled as not using certain tools, which allows among other 281 things for resource allocation in a decoder implementation. As all 282 parameter sets, also the decoder parameter set is required to be 283 present when first referenced, and it is necessarily referenced by 284 the very first picture in a video sequence, implying that it has to 285 be sent among the first NAL units in the bitstream (see section xxx 286 below). While multiple DPSs can be in the bitstream, the value of 287 the syntax elements therein cannot be inconsistent when being 288 referenced. 290 Video parameter set 292 The Video Parameter Set (VPS) includes decoding dependency or 293 information for reference picture set construction of enhancement 294 layers. The VPS provides a "big picture" of a scalable sequence, 295 including what types of operation points are provided, the profile, 296 tier, and level of the operation points, and some other high-level 297 properties of the bitstream that can be used as the basis for session 298 negotiation and content selection, etc. (see Section xxx). 300 Sequence parameter set 302 The Sequence Parameter Set (SPS) contains syntax elements pertaining 303 to a coded video sequence (CVS), which is a group of pictures, 304 starting with a random access point, and followed by pictures that 305 may depend on each other and the random access point picture. In 306 MPGEG-2, the equivalent of a CVS was a Group of Pictures (GOP), which 307 normally started with an I frame and was followed by P and B frames. 308 While more complex in its options of random access points, [VVC] 309 retains this basic concept. In many TV-like applications, a CVS 310 contains a few hundred milliseconds to a few seconds of video. In 311 video conferencing (without switching MCUs involved), a CVS can be as 312 long in duration as the whole session. 314 Picture and Adaptation parameter set 316 The Picture Parameter Set and the Adaptation Parameter Set (PPS and 317 APS, respectively) carry information pertaining to a single picture. 318 The PPS contains information that is likely to stay constant from 319 picture to picture-at least for pictures for a certain type-whereas 320 the APS contains information, such as adaptive loop filter 321 coefficients, that are likely to change from picture to picture. 323 Profile, tier, and level 325 The profile, tier, and level syntax structure can be included in all 326 DPS, VPS, and SPS. Somewhat oversimplified, they can be viewed to 327 provide information about maximum bitstream complexity in the 328 dimensions of tools used (profile), sample count (level), and maximum 329 bitrate (tier). Level and tier are onion shaped, in that a decoder 330 that can decode a certain level or tier can also decode lower levels 331 or tiers. Profiles are not necessarily onion shaped and do not 332 necessarily form a hierarchy. Therefore, the profile_tier_level 333 structure in the video bitstream contains a bitmask which allows an 334 encoder to mark a bitstream to be compatible with multiple profiles. 336 Sub-Profiles 338 Within the [VVC] specification, a sub-profile is simply a 32 bit 339 number coded according to ITU-T Rec. T.35, that does not carry a 340 semantic. It is carried in the profile_tier_level structure and 341 hence (potentially) present in the DPS, VPS, and SPS. External 342 registration bodies can register a T.35 codepoint with ITU-T 343 registration authorities and associate with their registration a 344 description of bitstream complexity restrictions beyond the profiles 345 defined by ITU-T and ISO/IEC. This would allow encoder manufacturers 346 to label the bitstreams generated by their encoder as complying with 347 such sub-profile. It is expected that upstream standardization 348 organizations (such as: DVB and ATSC), as well as large walled-garden 349 video services will take advantage of this labelling system. In 350 contrast to "normal" profiles, it is expected that sub-profiles may 351 indicate encoder choices traditionally left open in the (decoder- 352 centric) video coding specs, such as GOP structures, minimum/maximum 353 QP values, and the mandatory use of certain tools or SEI messages. 355 Constraint Flags 357 The profile_tier_level structure optionally carries a considerable 358 number of constraint flags, which an encoder can use to indicate to a 359 decoder that it will not use a certain tool or technology. They were 360 included in reaction to a perceived market need for labelling a 361 bitstream as not exercising a certain tool that has become 362 commercially unviable. 364 Temporal scalability support 366 Edt. note: this section may need adjustment as JVET work on bitstream 367 extraction is in progress. 369 [VVC] includes support of temporal scalability, by inclusion of the 370 signaling of TemporalId in the NAL unit header, the restriction that 371 pictures of a particular temporal sub-layer cannot be used for inter 372 prediction reference by pictures of a lower temporal sub-layer, the 373 sub-bitstream extraction process, and the requirement that each sub- 374 bitstream extraction output be a conforming bitstream. Media-Aware 375 Network Elements (MANEs) can utilize the TemporalId in the NAL unit 376 header for stream adaptation purposes based on temporal scalability. 378 Spatial, SNR, View Scalability 380 [VVC] includes support for spatial, SNR, and View scalability. 381 Scalable video coding is widely considered to have technical benefits 382 and enrich services for various video applications. Until recently, 383 however, the functionality has not been included in the main profiles 384 of video codecs and not wide deployed due to additional costs. In 385 VVC, however, all those forms of scalability are supported natively 386 through the signaling of the layer_id in the NAL unit header, the VPS 387 which associates layers with given layer_ids to each other, reference 388 picture selection, reference picture resampling for spatial 389 scalability, and a number of other mechanisms not relevant for this 390 memo. Scalability support can be implemented in a single decoding 391 "loop" and is widely considered a comparatively lightweight 392 operation. 394 Spatial Scalability 396 With the existence of Reference Picture Resampling, likely in the 397 "main" profile of VVC, the additional burden for scalability 398 support is just a minor modification of the high-level syntax 399 (HLS). In technical aspects, the inter-layer prediction is 400 employed in a scalable system to improve the coding efficiency of 401 the enhancement layers. In addition to the spatial and temporal 402 motion-compensated predictions that are available in a single- 403 layer codec, the inter-layer prediction in [VVC] uses the 404 resampled video data of the reconstructed reference picture from a 405 reference layer to predict the current enhancement layer. Then, 406 the resampling process for inter-layer prediction is performed at 407 the block-level, by modifying the existing interpolation process 408 for motion compensation. It means that no additional resampling 409 process is needed to support scalability. 411 SNR Scalability> 413 SNR scalability is similar to Spatial Scalability except that the 414 resampling factors are 1:1--in other words, tehre is no change in 415 resolution, but there is inter-layer prediction. 417 View Scalability> 419 Placeholder 421 SEI Messages 423 Supplementary Enhancement Information (SEI) messages are codepoints 424 in the bitstream that do not influence the decoding process as 425 specified in the [VVC] spec, but address issues of representation/ 426 rendering of the decoded bitstream, label the bitstream for certain 427 applications, among other, similar tasks. The overall concept of SEI 428 messages and many of the messages themselves has been inherited from 429 the H.264 and HEVC specs. In the [VVC] environment, some of the SEI 430 messages considered to be generally useful also in other video coding 431 technologies have been moved out of the main specification info a 432 companion document (TO DO: add reference once ITU designation is 433 known). 435 1.1.3. Parallel Processing Support (informative) 437 Compared to HEVC [RFC7798], the [VVC] design to support 438 parallelization offers numerous improvements. Some of those 439 improvements are still undergoing changes in JVET. Information, to 440 the extent relevant for this memo, will be added in future versions 441 of this memo as the standardization in JVET progresses and the 442 technology stabilizes. 444 1.1.4. NAL Unit Header 446 [VVC] maintains the NAL unit concept of HEVC with modifications. VVC 447 uses a two-byte NAL unit header, as shown in Figure 1. The payload 448 of a NAL unit refers to the NAL unit excluding the NAL unit header. 450 +---------------+---------------+ 451 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 |F|Z| LayerID | Type | TID | 454 +---------------+---------------+ 456 The Structure of the [VVC] NAL Unit Header. 458 Figure 1 460 The semantics of the fields in the NAL unit header are as specified 461 in [VVC] and described briefly below for convenience. In addition to 462 the name and size of each field, the corresponding syntax element 463 name in [VVC] is also provided. 465 F: 1 bit 467 forbidden_zero_bit. Required to be zero in [VVC]. Note that the 468 inclusion of this bit in the NAL unit header was to enable 469 transport of [VVC] video over MPEG-2 transport systems (avoidance 470 of start code emulations) [MPEG2S]. In the context of this memo 471 the value 1 may be used to indicate a syntax violation, e.g., for 472 a NAL unit resulted from aggregating a number of fragmented units 473 of a NAL unit but missing the last fragment, as described in 474 Section TBD. 476 Z: 1 bit 478 nuh_reserved_zero_bit. Required to be zero in [VVC], and reserved 479 for future extensions by ITU-T and ISO/IEC. This memo does not 480 overload the "Z" bit for local extensions, as a) overloading the 481 "F" bit is sufficient and b) to preserve the usefulness of this 482 memo to possible future versions of [VVC]. 484 LayerId: 6 bits 486 nuh_layer_id. Identifies the layer a NAL unit belongs to, wherein 487 a layer may be, e.g., a spatial scalable layer, a quality scalable 488 layer . 490 Type: 6 bits 492 nal_unit_type. This field specifies the NAL unit type as defined 493 in Table 7-1 of [VVC]. For a reference of all currently defined 494 NAL unit types and their semantics, please refer to 495 Section 7.4.2.2 in [VVC]. 497 TID: 3 bits 499 nuh_temporal_id_plus1. This field specifies the temporal 500 identifier of the NAL unit plus 1. The value of TemporalId is 501 equal to TID minus 1. A TID value of 0 is illegal to ensure that 502 there is at least one bit in the NAL unit header equal to 1, so to 503 enable independent considerations of start code emulations in the 504 NAL unit header and in the NAL unit payload data. 506 1.2. Overview of the Payload Format 508 This payload format defines the following processes required for 509 transport of [VVC] coded data over RTP [RFC3550]: 511 o Usage of RTP header with this payload format 513 o Packetization of [VVC] coded NAL units into RTP packets using 514 three types of payload structures: a single NAL unit packet, 515 aggregation packet, and fragment unit 517 o Transmission of HEVC NAL units of the same bitstream within a 518 single RTP stream. 520 o Media type parameters to be used with the Session Description 521 Protocol (SDP) [RFC4566] 523 2. Conventions 525 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 526 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 527 document are to be interpreted as described in BCP 14 [RFC2119]. In 528 this document, the above key words will convey that interpretation 529 only when in ALL CAPS. Lowercase uses of these words are not to be 530 interpreted as carrying the significance described in RFC 2119. This 531 specification uses the notion of setting and clearing a bit when bit 532 fields are handled. Setting a bit is the same as assigning that bit 533 the value of 1 (On). Clearing a bit is the same as assigning that 534 bit the value of 0 (Off). 536 3. Definitions and Abbreviations 538 3.1. Definitions 540 This document uses the terms and definitions of [VVC]. Section 3.1.1 541 lists relevant definitions from [VVC] for convenience. Section 3.1.2 542 provides definitions specific to this memo. 544 3.1.1. Definitions from the VVC Specification 546 Placeholder 548 3.1.2. Definitions Specific to This Memo 550 Placeholder 552 3.2. Abbreviations 554 Placeholder 556 4. RTP Payload Format 558 4.1. RTP Header Usage 560 The format of the RTP header is specified in [RFC3550] (reprinted as 561 Figure 2 for convenience). This payload format uses the fields of 562 the header in a manner consistent with that specification. 564 The RTP payload (and the settings for some RTP header bits) for 565 aggregation packets and fragmentation units are specified in Sections 566 4.4.2 and 4.4.3, respectively. 568 0 1 2 3 569 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 |V=2|P|X| CC |M| PT | sequence number | 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 | timestamp | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 | synchronization source (SSRC) identifier | 576 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 577 | contributing source (CSRC) identifiers | 578 | .... | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 581 RTP Header According to [RFC3550] 583 Figure 2 585 The RTP header information to be set according to this RTP payload 586 format is set as follows: 588 Marker bit (M): 1 bit 590 Set for the last packet of the access unit, carried in the current 591 RTP stream. This is in line with the normal use of the M bit in 592 video formats to allow an efficient playout buffer handling. 594 The informative note below needs updating once the NAL unit 595 type table is stable in the [VVC] spec 597 Informative note: The content of a NAL unit does not tell 598 whether or not the NAL unit is the last NAL unit, in decoding 599 order, of an access unit. An RTP sender implementation may 600 obtain this information from the video encoder. If, however, 601 the implementation cannot obtain this information directly from 602 the encoder, e.g., when the bitstream was pre-encoded, and also 603 there is no timestamp allocated for each NAL unit, then the 604 sender implementation can inspect subsequent NAL units in 605 decoding order to determine whether or not the NAL unit is the 606 last NAL unit of an access unit as follows. A NAL unit is 607 determined to be the last NAL unit of an access unit if it is 608 the last NAL unit of the bitstream. A NAL unit naluX is also 609 determined to be the last NAL unit of an access unit if both 610 the following conditions are true: 1) the next VCL NAL unit 611 naluY in decoding order has the high-order bit of the first 612 byte after its NAL unit header equal to 1, and 2) all NAL units 613 between naluX and naluY, when present, have nal_unit_type in 614 the range of 32 to 35, inclusive, equal to 39, or in the ranges 615 of 41 to 44, inclusive, or 48 to 55, inclusive. 617 Payload Type (PT): 7 bits 619 The assignment of an RTP payload type for this new packet format 620 is outside the scope of this document and will not be specified 621 here. The assignment of a payload type has to be performed either 622 through the profile used or in a dynamic way. 624 Sequence Number (SN): 16 bits 626 Set and used in accordance with [RFC3550] . 628 Timestamp: 32 bits 630 The RTP timestamp is set to the sampling timestamp of the content. 631 A 90 kHz clock rate MUST be used. If the NAL unit has no timing 632 properties of its own (e.g., parameter set and SEI NAL units), the 633 RTP timestamp MUST be set to the RTP timestamp of the coded 634 picture of the access unit in which the NAL unit (according to 635 Section xxx of [VVC]) is included. Receivers MUST use the RTP 636 timestamp for the display process, even when the bitstream 637 contains picture timing SEI messages or decoding unit information 638 SEI messages as specified in [VVC]. However, this does not mean 639 that picture timing SEI messages in the bitstream should be 640 discarded, as picture timing SEI messages may contain frame-field 641 information that is important in appropriately rendering 642 interlaced video. 644 Synchronization source (SSRC): 32 bits 646 Used to identify the source of the RTP packets. When using SRST, 647 by definition a single SSRC is used for all parts of a single 648 bitstream. 650 4.2. Payload Header Usage 652 The first two bytes of the payload of an RTP packet are referred to 653 as the payload header. The payload header consists of the same 654 fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown 655 in Section 1.1.4, irrespective of the type of the payload structure. 657 The TID value indicates (among other things) the relative importance 658 of an RTP packet, for example, because NAL units belonging to higher 659 temporal sub-layers are not used for the decoding of lower temporal 660 sub-layers. A lower value of TID indicates a higher importance. 661 More-important NAL units MAY be better protected against transmission 662 losses than less-important NAL units. 664 For Discussion: quite possibly something similar can be said for the 665 Layer_id in layered coding, but perhaps not in multiview coding. 666 (The relevant part of the spec is relatively new, therefore the soft 667 language). However, for serious layer pruning, interpretation of the 668 VPS is required. We can add language about the need for starteful 669 interpretation of LayerID vis-a-vis stateless interpretation of TID 670 later. 672 4.3. Payload Structures 674 Four different types of RTP packet payload structures are specified. 675 A receiver can identify the type of an RTP packet payload through the 676 Type field in the payload header. 678 The four different payload structures are as follows: 680 o Single NAL unit packet: Contains a single NAL unit in the payload, 681 and the NAL unit header of the NAL unit also serves as the payload 682 header. This payload structure is specified in Section 4.4.1. 684 o Aggregation Packet (AP): Contains more than one NAL unit within 685 one access unit. This payload structure is specified in 686 Section 4.4.2. 688 o Fragmentation Unit (FU): Contains a subset of a single NAL unit. 689 This payload structure is specified in Section 4.4.3. 691 4.3.1. Single NAL Unit Packets 693 A single NAL unit packet contains exactly one NAL unit, and consists 694 of a payload header (denoted as PayloadHdr), a conditional 16-bit 695 DONL field (in network byte order), and the NAL unit payload data 696 (the NAL unit excluding its NAL unit header) of the contained NAL 697 unit, as shown in Figure 3. 699 0 1 2 3 700 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | PayloadHdr | DONL (conditional) | 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | | 705 | NAL unit payload data | 706 | | 707 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 | :...OPTIONAL RTP padding | 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 711 The Structure of a Single NAL Unit Packet 713 Figure 3 715 The DONL field, when present, specifies the value of the 16 least 716 significant bits of the decoding order number of the contained NAL 717 unit. If sprop-max-don-diff is greater than 0 for any of the RTP 718 streams, the DONL field MUST be present, and the variable DON for the 719 contained NAL unit is derived as equal to the value of the DONL 720 field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP 721 streams), the DONL field MUST NOT be present. 723 4.3.2. Aggregation Packets (APs) 725 Aggregation Packets (APs) are introduced to enable the reduction of 726 packetization overhead for small NAL units, such as most of the non- 727 VCL NAL units, which are often only a few octets in size. 729 An AP aggregates NAL units within one access unit. Each NAL unit to 730 be carried in an AP is encapsulated in an aggregation unit. NAL 731 units aggregated in one AP are in NAL unit decoding order. 733 An AP consists of a payload header (denoted as PayloadHdr) followed 734 by two or more aggregation units, as shown in Figure 4. 736 0 1 2 3 737 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | PayloadHdr (Type=48) | | 740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 741 | | 742 | two or more aggregation units | 743 | | 744 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 745 | :...OPTIONAL RTP padding | 746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 748 The Structure of an Aggregation Packet 750 Figure 4 752 The fields in the payload header are set as follows. The F bit MUST 753 be equal to 0 if the F bit of each aggregated NAL unit is equal to 754 zero; otherwise, it MUST be equal to 1. The Type field MUST be equal 755 to 48. 757 NOTE: double check #48 against post-geneva [VVC] spec 759 The value of LayerId MUST be equal to the lowest value of LayerId of 760 all the aggregated NAL units. The value of TID MUST be the lowest 761 value of TID of all the aggregated NAL units. 763 Informative note: All VCL NAL units in an AP have the same TID 764 value since they belong to the same access unit. However, an AP 765 may contain non-VCL NAL units for which the TID value in the NAL 766 unit header may be different than the TID value of the VCL NAL 767 units in the same AP. 769 An AP MUST carry at least two aggregation units and can carry as many 770 aggregation units as necessary; however, the total amount of data in 771 an AP obviously MUST fit into an IP packet, and the size SHOULD be 772 chosen so that the resulting IP packet is smaller than the MTU size 773 so to avoid IP layer fragmentation. An AP MUST NOT contain FUs 774 specified in Section 4.4.3. APs MUST NOT be nested; i.e., an AP must 775 not contain another AP. 777 The first aggregation unit in an AP consists of a conditional 16-bit 778 DONL field (in network byte order) followed by a 16-bit unsigned size 779 information (in network byte order) that indicates the size of the 780 NAL unit in bytes (excluding these two octets, but including the NAL 781 unit header), followed by the NAL unit itself, including its NAL unit 782 header, as shown in Figure 5. 784 0 1 2 3 785 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 787 | : DONL (conditional) | NALU size | 788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 789 | NALU size | | 790 +-+-+-+-+-+-+-+-+ NAL unit | 791 | | 792 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 793 | : 794 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 796 The Structure of the First Aggregation Unit in an AP 798 Figure 5 800 The DONL field, when present, specifies the value of the 16 least 801 significant bits of the decoding order number of the aggregated NAL 802 unit. 804 If sprop-max-don-diff is greater than 0 for any of the RTP streams, 805 the DONL field MUST be present in an aggregation unit that is the 806 first aggregation unit in an AP, and the variable DON for the 807 aggregated NAL unit is derived as equal to the value of the DONL 808 field. Otherwise (sprop-max-don-diff is equal to 0 for all the RTP 809 streams), the DONL field MUST NOT be present in an aggregation unit 810 that is the first aggregation unit in an AP. 812 An aggregation unit that is not the first aggregation unit in an AP 813 consists of a conditional 8-bit DOND field followed by a 16-bit 814 unsigned size information (in network byte order) that indicates the 815 size of the NAL unit in bytes (excluding these two octets, but 816 including the NAL unit header), followed by the NAL unit itself, 817 including its NAL unit header, as shown in Figure 6. 819 0 1 2 3 820 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 821 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 822 | : DOND (cond) | NALU size | 823 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 824 | | 825 | NAL unit | 826 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 827 | : 828 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 830 The Structure of an Aggregation Unit That Is Not the First 831 Aggregation Unit in an AP 833 Figure 6 835 When present, the DOND field plus 1 specifies the difference between 836 the decoding order number values of the current aggregated NAL unit 837 and the preceding aggregated NAL unit in the same AP. 839 If sprop-max-don-diff is greater than 0 for any of the RTP streams, 840 the DOND field MUST be present in an aggregation unit that is not the 841 first aggregation unit in an AP, and the variable DON for the 842 aggregated NAL unit is derived as equal to the DON of the preceding 843 aggregated NAL unit in the same AP plus the value of the DOND field 844 plus 1 modulo 65536. Otherwise (sprop-max-don-diff is equal to 0 for 845 all the RTP streams), the DOND field MUST NOT be present in an 846 aggregation unit that is not the first aggregation unit in an AP, and 847 in this case the transmission order and decoding order of NAL units 848 carried in the AP are the same as the order the NAL units appear in 849 the AP. 851 Figure 7 presents an example of an AP that contains two aggregation 852 units, labeled as 1 and 2 in the figure, without the DONL and DOND 853 fields being present. 855 0 1 2 3 856 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 858 | RTP Header | 859 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860 | PayloadHdr (Type=XX) | NALU 1 Size | 861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 862 | NALU 1 HDR | | 863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 1 Data | 864 | . . . | 865 | | 866 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 867 | . . . | NALU 2 Size | NALU 2 HDR | 868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 869 | NALU 2 HDR | | 870 +-+-+-+-+-+-+-+-+ NALU 2 Data | 871 | . . . | 872 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 873 | :...OPTIONAL RTP padding | 874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 876 An Example of an AP Packet Containing Two Aggregation Units without 877 the DONL and DOND Fields 879 Figure 7 881 Figure 8 presents an example of an AP that contains two aggregation 882 units, labeled as 1 and 2 in the figure, with the DONL and DOND 883 fields being present. 885 0 1 2 3 886 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 887 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 888 | RTP Header | 889 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 890 | PayloadHdr (Type=XX) | NALU 1 DONL | 891 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 892 | NALU 1 Size | NALU 1 HDR | 893 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 894 | | 895 | NALU 1 Data . . . | 896 | | 897 + . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 898 | | NALU 2 DOND | NALU 2 Size | 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 900 | NALU 2 HDR | | 901 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ NALU 2 Data | 902 | | 903 | . . . +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 904 | :...OPTIONAL RTP padding | 905 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 907 An Example of an AP Containing Two Aggregation Units with the DONL 908 and DOND Fields 910 Figure 8 912 4.3.3. Fragmentation Units 914 Fragmentation Units (FUs) are introduced to enable fragmenting a 915 single NAL unit into multiple RTP packets, possibly without 916 cooperation or knowledge of the HEVC [RFC7798] encoder. A fragment 917 of a NAL unit consists of an integer number of consecutive octets of 918 that NAL unit. Fragments of the same NAL unit MUST be sent in 919 consecutive order with ascending RTP sequence numbers (with no other 920 RTP packets within the same RTP stream being sent between the first 921 and last fragment). 923 When a NAL unit is fragmented and conveyed within FUs, it is referred 924 to as a fragmented NAL unit. APs MUST NOT be fragmented. FUs MUST 925 NOT be nested; i.e., an FU must not contain a subset of another FU. 927 The RTP timestamp of an RTP packet carrying an FU is set to the NALU- 928 time of the fragmented NAL unit. 930 An FU consists of a payload header (denoted as PayloadHdr), an FU 931 header of one octet, a conditional 16-bit DONL field (in network byte 932 order), and an FU payload, as shown in Figure 9. 934 0 1 2 3 935 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 937 | PayloadHdr (Type=XX) | FU header | DONL (cond) | 938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 939 | DONL (cond) | | 940 |-+-+-+-+-+-+-+-+ | 941 | FU payload | 942 | | 943 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 944 | :...OPTIONAL RTP padding | 945 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 The Structure of an FU 949 Figure 9 951 The fields in the payload header are set as follows. The Type field 952 MUST be equal to XX. The fields F, LayerId, and TID MUST be equal to 953 the fields F, LayerId, and TID, respectively, of the fragmented NAL 954 unit. 956 The FU header consists of an S bit, an E bit, and a 6-bit FuType 957 field, as shown in Figure 10. 959 +---------------+ 960 |0|1|2|3|4|5|6|7| 961 +-+-+-+-+-+-+-+-+ 962 |S|E| FuType | 963 +---------------+ 965 The Structure of FU Header 967 Figure 10 969 The semantics of the FU header fields are as follows: 971 S: 1 bit 973 When set to 1, the S bit indicates the start of a fragmented NAL 974 unit, i.e., the first byte of the FU payload is also the first 975 byte of the payload of the fragmented NAL unit. When the FU 976 payload is not the start of the fragmented NAL unit payload, the S 977 bit MUST be set to 0. 979 E: 1 bit 980 When set to 1, the E bit indicates the end of a fragmented NAL 981 unit, i.e., the last byte of the payload is also the last byte of 982 the fragmented NAL unit. When the FU payload is not the last 983 fragment of a fragmented NAL unit, the E bit MUST be set to 0. 985 FuType: 6 bits 987 The field FuType MUST be equal to the field Type of the fragmented 988 NAL unit. 990 The DONL field, when present, specifies the value of the 16 least 991 significant bits of the decoding order number of the fragmented NAL 992 unit. 994 If sprop-max-don-diff is greater than 0 for any of the RTP streams, 995 and the S bit is equal to 1, the DONL field MUST be present in the 996 FU, and the variable DON for the fragmented NAL unit is derived as 997 equal to the value of the DONL field. Otherwise (sprop-max-don-diff 998 is equal to 0 for all the RTP streams, or the S bit is equal to 0), 999 the DONL field MUST NOT be present in the FU. 1001 A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e., 1002 the Start bit and End bit must not both be set to 1 in the same FU 1003 header. 1005 The FU payload consists of fragments of the payload of the fragmented 1006 NAL unit so that if the FU payloads of consecutive FUs, starting with 1007 an FU with the S bit equal to 1 and ending with an FU with the E bit 1008 equal to 1, are sequentially concatenated, the payload of the 1009 fragmented NAL unit can be reconstructed. The NAL unit header of the 1010 fragmented NAL unit is not included as such in the FU payload, but 1011 rather the information of the NAL unit header of the fragmented NAL 1012 unit is conveyed in F, LayerId, and TID fields of the FU payload 1013 headers of the FUs and the FuType field of the FU header of the FUs. 1014 An FU payload MUST NOT be empty. 1016 If an FU is lost, the receiver SHOULD discard all following 1017 fragmentation units in transmission order corresponding to the same 1018 fragmented NAL unit, unless the decoder in the receiver is known to 1019 be prepared to gracefully handle incomplete NAL units. 1021 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1022 fragments of a NAL unit to an (incomplete) NAL unit, even if fragment 1023 n of that NAL unit is not received. In this case, the 1024 forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a 1025 syntax violation. 1027 4.4. Decoding Order Number 1029 For each NAL unit, the variable AbsDon is derived, representing the 1030 decoding order number that is indicative of the NAL unit decoding 1031 order. 1033 Let NAL unit n be the n-th NAL unit in transmission order within an 1034 RTP stream. 1036 If sprop-max-don-diff is equal to 0 for all the RTP streams carrying 1037 the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is 1038 derived as equal to n. 1040 Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP 1041 streams), AbsDon[n] is derived as follows, where DON[n] is the value 1042 of the variable DON for NAL unit n: 1044 o If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in 1045 transmission order), AbsDon[0] is set equal to DON[0]. 1047 o Otherwise (n is greater than 0), the following applies for 1048 derivation of AbsDon[n]: 1050 If DON[n] == DON[n-1], 1051 AbsDon[n] = AbsDon[n-1] 1053 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768), 1054 AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1] 1056 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768), 1057 AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n] 1059 If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768), 1060 AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - 1061 DON[n]) 1063 If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768), 1064 AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n]) 1066 For any two NAL units m and n, the following applies: 1068 o AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows 1069 NAL unit m in NAL unit decoding order. 1071 o When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order 1072 of the two NAL units can be in either order. 1074 o AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes 1075 NAL unit m in decoding order. 1077 Informative note: When two consecutive NAL units in the NAL 1078 unit decoding order have different values of AbsDon, the 1079 absolute difference between the two AbsDon values may be 1080 greater than or equal to 1. 1082 Informative note: There are multiple reasons to allow for the 1083 absolute difference of the values of AbsDon for two consecutive 1084 NAL units in the NAL unit decoding order to be greater than 1085 one. An increment by one is not required, as at the time of 1086 associating values of AbsDon to NAL units, it may not be known 1087 whether all NAL units are to be delivered to the receiver. For 1088 example, a gateway may not forward VCL NAL units of higher sub- 1089 layers or some SEI NAL units when there is congestion in the 1090 network. In another example, the first intra-coded picture of 1091 a pre-encoded clip is transmitted in advance to ensure that it 1092 is readily available in the receiver, and when transmitting the 1093 first intra-coded picture, the originator does not exactly know 1094 how many NAL units will be encoded before the first intra-coded 1095 picture of the pre-encoded clip follows in decoding order. 1096 Thus, the values of AbsDon for the NAL units of the first 1097 intra-coded picture of the pre-encoded clip have to be 1098 estimated when they are transmitted, and gaps in values of 1099 AbsDon may occur. Another example is MRST or MRMT with sprop- 1100 max-don-diff greater than 0, where the AbsDon values must 1101 indicate cross-layer decoding order for NAL units conveyed in 1102 all the RTP streams. 1104 5. Packetization Rulesumber 1106 The following packetization rules apply: 1108 o If sprop-max-don-diff is greater than 0 for any of the RTP 1109 streams, the transmission order of NAL units carried in the RTP 1110 stream MAY be different than the NAL unit decoding order and the 1111 NAL unit output order. Otherwise (sprop-max-don-diff is equal to 1112 0 for all the RTP streams), the transmission order of NAL units 1113 carried in the RTP stream MUST be the same as the NAL unit 1114 decoding order and, when tx-mode is equal to "MRST" or "MRMT", 1115 MUST also be the same as the NAL unit output order. 1117 o A NAL unit of a small size SHOULD be encapsulated in an 1118 aggregation packet together with one or more other NAL units in 1119 order to avoid the unnecessary packetization overhead for small 1120 NAL units. For example, non-VCL NAL units such as access unit 1121 delimiters, parameter sets, or SEI NAL units are typically small 1122 and can often be aggregated with VCL NAL units without violating 1123 MTU size constraints. 1125 o Each non-VCL NAL unit SHOULD, when possible from an MTU size match 1126 viewpoint, be encapsulated in an aggregation packet together with 1127 its associated VCL NAL unit, as typically a non-VCL NAL unit would 1128 be meaningless without the associated VCL NAL unit being 1129 available. 1131 o For carrying exactly one NAL unit in an RTP packet, a single NAL 1132 unit packet MUST be used. 1134 6. De-packetization Process 1136 The general concept behind de-packetization is to get the NAL units 1137 out of the RTP packets in an RTP stream and all RTP streams the RTP 1138 stream depends on, if any, and pass them to the decoder in the NAL 1139 unit decoding order. 1141 The de-packetization process is implementation dependent. Therefore, 1142 the following description should be seen as an example of a suitable 1143 implementation. Other schemes may be used as well, as long as the 1144 output for the same input is the same as the process described below. 1145 The output is the same when the set of output NAL units and their 1146 order are both identical. Optimizations relative to the described 1147 algorithms are possible. 1149 All normal RTP mechanisms related to buffer management apply. In 1150 particular, duplicated or outdated RTP packets (as indicated by the 1151 RTP sequences number and the RTP timestamp) are removed. To 1152 determine the exact time for decoding, factors such as a possible 1153 intentional delay to allow for proper inter-stream synchronization 1154 must be factored in. 1156 NAL units with NAL unit type values in the range of 0 to XX, 1157 inclusive, may be passed to the decoder. NAL-unit-like structures 1158 with NAL unit type values in the range of XX to XX, inclusive, MUST 1159 NOT be passed to the decoder. 1161 The receiver includes a receiver buffer, which is used to compensate 1162 for transmission delay jitter within individual RTP streams and 1163 across RTP streams, to reorder NAL units from transmission order to 1164 the NAL unit decoding order, and to recover the NAL unit decoding 1165 order in MRST or MRMT, when applicable. In this section, the 1166 receiver operation is described under the assumption that there is no 1167 transmission delay jitter within an RTP stream and across RTP 1168 streams. To make a difference from a practical receiver buffer that 1169 is also used for compensation of transmission delay jitter, the 1170 receiver buffer is hereafter called the de-packetization buffer in 1171 this section. Receivers should also prepare for transmission delay 1172 jitter; that is, either reserve separate buffers for transmission 1173 delay jitter buffering and de-packetization buffering or use a 1174 receiver buffer for both transmission delay jitter and de- 1175 packetization. Moreover, receivers should take transmission delay 1176 jitter into account in the buffering operation, e.g., by additional 1177 initial buffering before starting of decoding and playback. 1179 When sprop-max-don-diff is equal to 0 for all the received RTP 1180 streams, the de-packetization buffer size is zero bytes, and the 1181 process described in the remainder of this paragraph applies. When 1182 there is only one RTP stream received, the NAL units carried in the 1183 single RTP stream are directly passed to the decoder in their 1184 transmission order, which is identical to their decoding order. When 1185 there is more than one RTP stream received, the NAL units carried in 1186 the multiple RTP streams are passed to the decoder in their NTP 1187 timestamp order. When there are several NAL units of different RTP 1188 streams with the same NTP timestamp, the order to pass them to the 1189 decoder is their dependency order, where NAL units of a dependee RTP 1190 stream are passed to the decoder prior to the NAL units of the 1191 dependent RTP stream. When there are several NAL units of the same 1192 RTP stream with the same NTP timestamp, the order to pass them to the 1193 decoder is their transmission order. 1195 Informative note: The mapping between RTP and NTP timestamps is 1196 conveyed in RTCP SR packets. In addition, the mechanisms for 1197 faster media timestamp synchronization discussed in [RFC6051] may 1198 be used to speed up the acquisition of the RTP-to-wall-clock 1199 mapping. 1201 When sprop-max-don-diff is greater than 0 for any the received RTP 1202 streams, the process described in the remainder of this section 1203 applies. 1205 There are two buffering states in the receiver: initial buffering and 1206 buffering while playing. Initial buffering starts when the reception 1207 is initialized. After initial buffering, decoding and playback are 1208 started, and the buffering-while-playing mode is used. 1210 Regardless of the buffering state, the receiver stores incoming NAL 1211 units, in reception order, into the de-packetization buffer. NAL 1212 units carried in RTP packets are stored in the de-packetization 1213 buffer individually, and the value of AbsDon is calculated and stored 1214 for each NAL unit. When MRST or MRMT is in use, NAL units of all RTP 1215 streams of a bitstream are stored in the same de-packetization 1216 buffer. When NAL units carried in any two RTP streams are available 1217 to be placed into the de-packetization buffer, those NAL units 1218 carried in the RTP stream that is lower in the dependency tree are 1219 placed into the buffer first. For example, if RTP stream A depends 1220 on RTP stream B, then NAL units carried in RTP stream B are placed 1221 into the buffer first. 1223 Initial buffering lasts until condition A (the difference between the 1224 greatest and smallest AbsDon values of the NAL units in the de- 1225 packetization buffer is greater than or equal to the value of sprop- 1226 max-don-diff of the highest RTP stream) or condition B (the number of 1227 NAL units in the de-packetization buffer is greater than the value of 1228 sprop-depack-buf-nalus) is true. 1230 After initial buffering, whenever condition A or condition B is true, 1231 the following operation is repeatedly applied until both condition A 1232 and condition B become false: 1234 o The NAL unit in the de-packetization buffer with the smallest 1235 value of AbsDon is removed from the de-packetization buffer and 1236 passed to the decoder. 1238 When no more NAL units are flowing into the de-packetization buffer, 1239 all NAL units remaining in the de-packetization buffer are removed 1240 from the buffer and passed to the decoder in the order of increasing 1241 AbsDon values. 1243 7. Payload Format Parameters 1245 Placeholder 1247 8. Use with Feedback Messages 1249 The following subsections define the use of the Picture Loss 1250 Indication (PLI), Slice Lost Indication (SLI), Reference Picture 1251 Selection Indication (RPSI), and Full Intra Request (FIR) feedback 1252 messages with HEVC. The PLI, SLI, and RPSI messages are defined in 1253 [RFC4585] , and the FIR message is defined in [RFC5104] . 1255 8.1. Picture Loss Indication (PLI) 1257 As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a 1258 media sender indicates "the loss of an undefined amount of coded 1259 video data belonging to one or more pictures". Without having any 1260 specific knowledge of the setup of the bitstream (such as use and 1261 location of in-band parameter sets, non-IDR decoder refresh points, 1262 picture structures, and so forth), a reaction to the reception of an 1263 PLI by a [VVC] sender SHOULD be to send an IDR picture and relevant 1264 parameter sets; potentially with sufficient redundancy so to ensure 1265 correct reception. However, sometimes information about the 1266 bitstream structure is known. For example, state could have been 1267 established outside of the mechanisms defined in this document that 1268 parameter sets are conveyed out of band only, and stay static for the 1269 duration of the session. In that case, it is obviously unnecessary 1270 to send them in-band as a result of the reception of a PLI. Other 1271 examples could be devised based on a priori knowledge of different 1272 aspects of the bitstream structure. In all cases, the timing and 1273 congestion control mechanisms of RFC 4585 MUST be observed. 1275 8.2. Slice Loss Indication (SLI) 1277 For further study. Maybe remove as there are no known 1278 implementations of SDLI in H.265 based systems 1280 8.3. Reference Picture Selection Indication (RPSI) 1282 Feedback-based reference picture selection has been shown as a 1283 powerful tool to stop temporal error propagation for improved error 1284 resilience [Girod99] [Wang05]. In one approach, the decoder side 1285 tracks errors in the decoded pictures and informs the encoder side 1286 that a particular picture that has been decoded relatively earlier is 1287 correct and still present in the decoded picture buffer; it requests 1288 the encoder to use that correct picture-availability information when 1289 encoding the next picture, so to stop further temporal error 1290 propagation. For this approach, the decoder side should use the RPSI 1291 feedback message. 1293 Encoders can encode some long-term reference pictures as specified in 1294 [VVC] for purposes described in the previous paragraph without the 1295 need of a huge decoded picture buffer. As shown in [Wang05], with a 1296 flexible reference picture management scheme, as in [VVC], even a 1297 decoded picture buffer size of two picture storage buffers would work 1298 for the approach described in the previous paragraph. 1300 the text below is copy-paste from RFC 7798. If we keep the RPSI 1301 message, it needs adaptation to the [VVC] syntax. Doing so shouldn't 1302 be too hard as the [VVC] reference picture mechanism is not too 1303 different from the H.265 one. 1305 8.4. Full Intra Request (FIR) 1307 The purpose of the FIR message is to force an encoder to send an 1308 independent decoder refresh point as soon as possible (observing, for 1309 example, the congestion-control-related constraints set out in RFC 1310 5104). 1312 Upon reception of a FIR, a sender MUST send an IDR picture. 1313 Parameter sets MUST also be sent, except when there is a priori 1314 knowledge that the parameter sets have been correctly established. A 1315 typical example for that is an understanding between sender and 1316 receiver, established by means outside this document, that parameter 1317 sets are exclusively sent out-of-band. 1319 9. Security Considerations 1321 The scope of this Security Considerations section is limited to the 1322 payload format itself and to one feature of [VVC] that may pose a 1323 particularly serious security risk if implemented naively. The 1324 payload format, in isolation, does not form a complete system. 1325 Implementers are advised to read and understand relevant security- 1326 related documents, especially those pertaining to RTP (see the 1327 Security Considerations section in [RFC3550] ), and the security of 1328 the call-control stack chosen (that may make use of the media type 1329 registration of this memo). Implementers should also consider known 1330 security vulnerabilities of video coding and decoding implementations 1331 in general and avoid those. 1333 Within this RTP payload format, and with the exception of the user 1334 data SEI message as described below, no security threats other than 1335 those common to RTP payload formats are known. In other words, 1336 neither the various media-plane-based mechanisms, nor the signaling 1337 part of this memo, seems to pose a security risk beyond those common 1338 to all RTP-based systems. 1340 RTP packets using the payload format defined in this specification 1341 are subject to the security considerations discussed in the RTP 1342 specification [RFC3550] , and in any applicable RTP profile such as 1343 RTP/AVP [RFC3551] , RTP/AVPF [RFC4585] , RTP/SAVP [RFC3711] , or RTP/ 1344 SAVPF [RFC5124] . However, as "Securing the RTP Framework: Why RTP 1345 Does Not Mandate a Single Media Security Solution" [RFC7202] 1346 discusses, it is not an RTP payload format's responsibility to 1347 discuss or mandate what solutions are used to meet the basic security 1348 goals like confidentiality, integrity and source authenticity for RTP 1349 in general. This responsibility lays on anyone using RTP in an 1350 application. They can find guidance on available security mechanisms 1351 and important considerations in "Options for Securing RTP Sessions" 1352 [RFC7201] . Applications SHOULD use one or more appropriate strong 1353 security mechanisms. The rest of this section discusses the security 1354 impacting properties of the payload format itself. 1356 Because the data compression used with this payload format is applied 1357 end-to-end, any encryption needs to be performed after compression. 1358 A potential denial-of-service threat exists for data encodings using 1359 compression techniques that have non-uniform receiver-end 1360 computational load. The attacker can inject pathological datagrams 1361 into the bitstream that are complex to decode and that cause the 1362 receiver to be overloaded. [VVC] is particularly vulnerable to such 1363 attacks, as it is extremely simple to generate datagrams containing 1364 NAL units that affect the decoding process of many future NAL units. 1365 Therefore, the usage of data origin authentication and data integrity 1366 protection of at least the RTP packet is RECOMMENDED, for example, 1367 with SRTP [RFC3711] . 1369 Like HEVC [RFC7798], [VVC] includes a user data Supplemental 1370 Enhancement Information (SEI) message. This SEI message allows 1371 inclusion of an arbitrary bitstring into the video bitstream. Such a 1372 bitstring could include JavaScript, machine code, and other active 1373 content. [VVC] leaves the handling of this SEI message to the 1374 receiving system. In order to avoid harmful side effects rganization 1375 the user data SEI message, decoder implementations cannot naively 1376 trust its content. For example, it would be a bad and insecure 1377 implementation practice to forward any JavaScript a decoder 1378 implementation detects to a web browser. The safest way to deal with 1379 user data SEI messages is to simply discard them, but that can have 1380 negative side effects on the quality of experience by the user. 1382 End-to-end security with authentication, integrity, or 1383 confidentiality protection will prevent a MANE from performing media- 1384 aware operations other than discarding complete packets. In the case 1385 of confidentiality protection, it will even be prevented from 1386 discarding packets in a media-aware way. To be allowed to perform 1387 such operations, a MANE is required to be a trusted entity that is 1388 included in the security context establishment. 1390 10. Congestion Control 1392 Congestion control for RTP SHALL be used in accordance with RTP 1393 [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551] . 1394 If best-effort service is being used, an additional requirement is 1395 that users of this payload format MUST monitor packet loss to ensure 1396 that the packet loss rate is within an acceptable range. Packet loss 1397 is considered acceptable if a TCP flow across the same network path, 1398 and experiencing the same network conditions, would achieve an 1399 average throughput, measured on a reasonable timescale, that is not 1400 less than all RTP streams combined is achieving. This condition can 1401 be satisfied by implementing congestion-control mechanisms to adapt 1402 the transmission rate, the number of layers subscribed for a layered 1403 multicast session, or by arranging for a receiver to leave the 1404 session if the loss rate is unacceptably high. 1406 The bitrate adaptation necessary for obeying the congestion control 1407 principle is easily achievable when real-time encoding is used, for 1408 example, by adequately tuning the quantization parameter. 1410 However, when pre-encoded content is being transmitted, bandwidth 1411 adaptation requires the pre-coded bitstream to be tailored for such 1412 adaptivity. The key mechanisms available in [VVC] are temporal 1413 scalability, and spatial/SNR scalability. A media sender can remove 1414 NAL units belonging to higher temporal sub-layers (i.e., those NAL 1415 units with a high value of TID) or higher spatio-SNR layers (as 1416 indicated by interpreting the VPS) until the sending bitrate drops to 1417 an acceptable range. 1419 Above mechanisms generally work within a defined profile and level 1420 and, therefore, no renegotiation of the channel is required. Only 1421 when non-downgradable parameters (such as profile) are required to be 1422 changed does it become necessary to terminate and restart the RTP 1423 stream(s). This may be accomplished by using different RTP payload 1424 types. 1426 MANEs MAY remove certain unusable packets from the RTP stream when 1427 that RTP stream was damaged due to previous packet losses. This can 1428 help reduce the network load in certain special cases. For example, 1429 MANES can remove those FUs where the leading FUs belonging to the 1430 same NAL unit have been lost or those dependent slice segments when 1431 the leading slice segments belonging to the same slice have been 1432 lost, because the trailing FUs or dependent slice segments are 1433 meaningless to most decoders. MANES can also remove higher temporal 1434 scalable layers if the outbound transmission (from the MANE's 1435 viewpoint) experiences congestion. 1437 11. IANA Considertaions 1439 Placeholder 1441 12. Acknowledgements 1443 Large parts of this specification share text with the RTP payload 1444 format for HEVC [RFC7798], RFC 7798. We thank the authors of that 1445 specification for their excellent work. We also thank BD Choi for 1446 his contribution towards the [VVC] descriptive text. 1448 13. References 1450 13.1. Normative References 1452 [ISO23090-3] 1453 ISO and IEC, "Versatile video coding -- not yet 1454 published", August 2020. 1456 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1457 Requirement Levels", BCP 14, RFC 2119, 1458 DOI 10.17487/RFC2119, March 1997, 1459 . 1461 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1462 Jacobson, "RTP: A Transport Protocol for Real-Time 1463 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1464 July 2003, . 1466 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1467 Video Conferences with Minimal Control", STD 65, RFC 3551, 1468 DOI 10.17487/RFC3551, July 2003, 1469 . 1471 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1472 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1473 RFC 3711, DOI 10.17487/RFC3711, March 2004, 1474 . 1476 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1477 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1478 July 2006, . 1480 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1481 "Extended RTP Profile for Real-time Transport Control 1482 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 1483 DOI 10.17487/RFC4585, July 2006, 1484 . 1486 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1487 "Codec Control Messages in the RTP Audio-Visual Profile 1488 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1489 February 2008, . 1491 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 1492 Real-time Transport Control Protocol (RTCP)-Based Feedback 1493 (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February 1494 2008, . 1496 [VVC] ITU-T, "Versatile video coding - JVET-O2001-vE, available 1497 from http://phenix.it- 1498 sudparis.eu/jvet/doc_end_user/documents/15_Gothenburg/ 1499 wg11/JVET-O2001-v14.zip", August 2019. 1501 13.2. Informative References 1503 [CABAC] Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz, M., 1504 Clare, G., Henry, F., and A. Duenas, "Transform 1505 coefficient coding in HEVC", IEEE Transactions on Circuts 1506 and Systems for Video Technology Vol. 22 No. 12 pp. 1507 1765-1777, DOI 10.1109/TCSVT.2012.2223055, December 2012. 1509 [Girod99] Girod, B. and F. Faerber, "Feedback-based error control 1510 for mobile video transmission", Proceedings of the 1511 IEEE Vol. 87, No. 10, pp. 1707-1723, DOI 10.1109/5.790632, 1512 October 1999. 1514 [MPEG2S] IS0/IEC, "Information technology - Generic coding of 1515 moving pictures and associated audio information - Part 1: 1516 Systems", ISO International Standard 13818-1, 2013. 1518 [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP 1519 Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010, 1520 . 1522 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 1523 Payload Format for H.264 Video", RFC 6184, 1524 DOI 10.17487/RFC6184, May 2011, 1525 . 1527 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1528 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1529 DOI 10.17487/RFC6190, May 2011, 1530 . 1532 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1533 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1534 . 1536 [RFC7202] Perkins, C. and M. Westerlund, "Securing the RTP 1537 Framework: Why RTP Does Not Mandate a Single Media 1538 Security Solution", RFC 7202, DOI 10.17487/RFC7202, April 1539 2014, . 1541 [RFC7798] Wang, Y., Sanchez, Y., Schierl, T., Wenger, S., and M. 1542 Hannuksela, "RTP Payload Format for High Efficiency Video 1543 Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, March 1544 2016, . 1546 Appendix A. Change History 1548 draft-zhao-payload-rtp-vvc-00 ........ initial version 1550 Authors' Addresses 1552 Shuai Zhao 1553 Tencent 1554 2747 Park Blvd. 1555 Palo Alto, CA 94306 1556 US 1558 Email: shuaiizhao@tencent.com 1560 Stephan Wenger 1561 Tencent 1562 2747 Park Blvd. 1563 Palo Alto, CA 94306 1564 US 1566 Email: stewe@stewe.org