idnits 2.17.1 

draft-ietf-avtcore-rtp-vvc-12.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (25 October 2021) is 913 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1381

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Downref: Normative reference to an Informational RFC: RFC 7656

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VSEI'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VVC'

  -- Obsolete informational reference (is this intentional?): RFC 2326
     (Obsoleted by RFC 7826)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	avtcore                                                          S. Zhao
3	Internet-Draft                                                 S. Wenger
4	Intended status: Standards Track                                 Tencent
5	Expires: 28 April 2022                                        Y. Sanchez
6	                                                          Fraunhofer HHI
7	                                                              Y.-K. Wang
8	                                                          Bytedance Inc.
9	                                                         25 October 2021

11	          RTP Payload Format for Versatile Video Coding (VVC)
12	                     draft-ietf-avtcore-rtp-vvc-12

14	Abstract

16	   This memo describes an RTP payload format for the video coding
17	   standard ITU-T Recommendation H.266 and ISO/IEC International
18	   Standard 23090-3, both also known as Versatile Video Coding (VVC) and
19	   developed by the Joint Video Experts Team (JVET).  The RTP payload
20	   format allows for packetization of one or more Network Abstraction
21	   Layer (NAL) units in each RTP packet payload as well as fragmentation
22	   of a NAL unit into multiple RTP packets.  The payload format has wide
23	   applicability in videoconferencing, Internet video streaming, and
24	   high-bitrate entertainment-quality video, among other applications.

26	Status of This Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at https://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on 28 April 2022.

43	Copyright Notice

45	   Copyright (c) 2021 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
50	   license-info) in effect on the date of publication of this document.
51	   Please review these documents carefully, as they describe your rights
52	   and restrictions with respect to this document.  Code Components
53	   extracted from this document must include Simplified BSD License text
54	   as described in Section 4.e of the Trust Legal Provisions and are
55	   provided without warranty as described in the Simplified BSD License.

57	Table of Contents

59	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
60	     1.1.  Overview of the VVC Codec . . . . . . . . . . . . . . . .   3
61	       1.1.1.  Coding-Tool Features (informative)  . . . . . . . . .   3
62	       1.1.2.  Systems and Transport Interfaces (informative)  . . .   6
63	       1.1.3.  High-Level Picture Partitioning (informative) . . . .  11
64	       1.1.4.  NAL Unit Header . . . . . . . . . . . . . . . . . . .  13
65	     1.2.  Overview of the Payload Format  . . . . . . . . . . . . .  15
66	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .  15
67	   3.  Definitions and Abbreviations . . . . . . . . . . . . . . . .  15
68	     3.1.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  15
69	       3.1.1.  Definitions from the VVC Specification  . . . . . . .  15
70	       3.1.2.  Definitions Specific to This Memo . . . . . . . . . .  18
71	     3.2.  Abbreviations . . . . . . . . . . . . . . . . . . . . . .  19
72	   4.  RTP Payload Format  . . . . . . . . . . . . . . . . . . . . .  20
73	     4.1.  RTP Header Usage  . . . . . . . . . . . . . . . . . . . .  20
74	     4.2.  Payload Header Usage  . . . . . . . . . . . . . . . . . .  22
75	     4.3.  Payload Structures  . . . . . . . . . . . . . . . . . . .  22
76	       4.3.1.  Single NAL Unit Packets . . . . . . . . . . . . . . .  23
77	       4.3.2.  Aggregation Packets (APs) . . . . . . . . . . . . . .  23
78	       4.3.3.  Fragmentation Units . . . . . . . . . . . . . . . . .  27
79	     4.4.  Decoding Order Number . . . . . . . . . . . . . . . . . .  30
80	   5.  Packetization Rules . . . . . . . . . . . . . . . . . . . . .  31
81	   6.  De-packetization Process  . . . . . . . . . . . . . . . . . .  32
82	   7.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  34
83	     7.1.  Media Type Registration . . . . . . . . . . . . . . . . .  34
84	     7.2.  SDP Parameters  . . . . . . . . . . . . . . . . . . . . .  46
85	       7.2.1.  Mapping of Payload Type Parameters to SDP . . . . . .  46
86	       7.2.2.  Usage with SDP Offer/Answer Model . . . . . . . . . .  47
87	       7.2.3.  Usage in Declarative Session Descriptions . . . . . .  56
88	       7.2.4.  Considerations for Parameter Sets . . . . . . . . . .  57
89	   8.  Use with Feedback Messages  . . . . . . . . . . . . . . . . .  57
90	     8.1.  Picture Loss Indication (PLI) . . . . . . . . . . . . . .  58
91	     8.2.  Full Intra Request (FIR)  . . . . . . . . . . . . . . . .  58
92	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  58
93	   10. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  60
94	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  61
95	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  61
96	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  61
97	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  61
98	     13.2.  Informative References . . . . . . . . . . . . . . . . .  63
99	   Appendix A.  Change History . . . . . . . . . . . . . . . . . . .  64
100	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  64

102	1.  Introduction

104	   The Versatile Video Coding [VVC] specification, formally published as
105	   both ITU-T Recommendation H.266 and ISO/IEC International Standard
106	   23090-3, is currently in the ITU-T publication process and the ISO/
107	   IEC approval process.  VVC is reported to provide significant coding
108	   efficiency gains over HEVC [HEVC] as known as H.265, and other
109	   earlier video codecs.

111	   This memo specifies an RTP payload format for VVC.  It shares its
112	   basic design with the NAL (Network Abstraction Layer) unit-based RTP
113	   payload formats of H.264 Video Coding [RFC6184], Scalable Video
114	   Coding (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798]
115	   and their respective predecessors.  With respect to design
116	   philosophy, security, congestion control, and overall implementation
117	   complexity, it has similar properties to those earlier payload format
118	   specifications.  This is a conscious choice, as at least RFC 6184 is
119	   widely deployed and generally known in the relevant implementer
120	   communities.  Certain scalability-related mechanisms known from
121	   [RFC6190] were incorporated into this document, as VVC version 1
122	   supports temporal, spatial, and signal-to-noise ratio (SNR)
123	   scalability.

125	1.1.  Overview of the VVC Codec

127	   VVC and HEVC share a similar hybrid video codec design.  In this
128	   memo, we provide a very brief overview of those features of VVC that
129	   are, in some form, addressed by the payload format specified herein.
130	   Implementers have to read, understand, and apply the ITU-T/ISO/IEC
131	   specifications pertaining to VVC to arrive at interoperable, well-
132	   performing implementations.

134	   Conceptually, both VVC and HEVC include a Video Coding Layer (VCL),
135	   which is often used to refer to the coding-tool features, and a NAL,
136	   which is often used to refer to the systems and transport interface
137	   aspects of the codecs.

139	1.1.1.  Coding-Tool Features (informative)

141	   Coding tool features are described below with occasional reference to
142	   the coding tool set of HEVC, which is well known in the community.

144	   Similar to earlier hybrid-video-coding-based standards, including
145	   HEVC, the following basic video coding design is employed by VVC.  A
146	   prediction signal is first formed by either intra- or motion-
147	   compensated prediction, and the residual (the difference between the
148	   original and the prediction) is then coded.  The gains in coding
149	   efficiency are achieved by redesigning and improving almost all parts
150	   of the codec over earlier designs.  In addition, VVC includes several
151	   tools to make the implementation on parallel architectures easier.

153	   Finally, VVC includes temporal, spatial, and SNR scalability as well
154	   as multiview coding support.

156	   Coding blocks and transform structure

158	   Among major coding-tool differences between HEVC and VVC, one of the
159	   important improvements is the more flexible coding tree structure in
160	   VVC, i.e., multi-type tree.  In addition to quadtree, binary and
161	   ternary trees are also supported, which contributes significant
162	   improvement in coding efficiency.  Moreover, the maximum size of
163	   coding tree unit (CTU) is increased from 64x64 to 128x128.  To
164	   improve the coding efficiency of chroma signal, luma chroma separated
165	   trees at CTU level may be employed for intra-slices.  The square
166	   transforms in HEVC are extended to non-square transforms for
167	   rectangular blocks resulting from binary and ternary tree splits.
168	   Besides, VVC supports multiple transform sets (MTS), including DCT-2,
169	   DST-7, and DCT-8 as well as the non-separable secondary transform.
170	   The transforms used in VVC can have different sizes with support for
171	   larger transform sizes.  For DCT-2, the transform sizes range from
172	   2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from
173	   4x4 to 32x32.  In addition, VVC also support sub-block transform for
174	   both intra and inter coded blocks.  For intra coded blocks, intra
175	   sub-partitioning (ISP) may be used to allow sub-block based intra
176	   prediction and transform.  For inter blocks, sub-block transform may
177	   be used assuming that only a part of an inter-block has non-zero
178	   transform coefficients.

180	   Entropy coding

182	   Similar to HEVC, VVC uses a single entropy-coding engine, which is
183	   based on context adaptive binary arithmetic coding [CABAC], but with
184	   the support of multi-window sizes.  The window sizes can be
185	   initialized differently for different context models.  Due to such a
186	   design, it has more efficient adaptation speed and better coding
187	   efficiency.  A joint chroma residual coding scheme is applied to
188	   further exploit the correlation between the residuals of two color
189	   components.  In VVC, different residual coding schemes are applied
190	   for regular transform coefficients and residual samples generated
191	   using transform-skip mode.

193	   In-loop filtering

195	   VVC has more feature support in loop filters than HEVC.  The
196	   deblocking filter in VVC is similar to HEVC but operates at a smaller
197	   grid.  After deblocking and sample adaptive offset (SAO), an adaptive
198	   loop filter (ALF) may be used.  As a Wiener filter, ALF reduces
199	   distortion of decoded pictures.  Besides, VVC introduces a new module
200	   before deblocking called luma mapping with chroma scaling to fully
201	   utilize the dynamic range of signal so that rate-distortion
202	   performance of both SDR and HDR content is improved.

204	   Motion prediction and coding

206	   Compared to HEVC, VVC introduces several improvements in this area.
207	   First, there is the adaptive motion vector resolution (AMVR), which
208	   can save bit cost for motion vectors by adaptively signaling motion
209	   vector resolution.  Then the affine motion compensation is included
210	   to capture complicated motion like zooming and rotation.  Meanwhile,
211	   prediction refinement with the optical flow with affine mode (PROF)
212	   is further deployed to mimic affine motion at the pixel level.
213	   Thirdly the decoder side motion vector refinement (DMVR) is a method
214	   to derive MV vector at decoder side based on block matching so that
215	   fewer bits may be spent on motion vectors.  Bi-directional optical
216	   flow (BDOF) is a similar method to PROF.  BDOF adds a sample wise
217	   offset at 4x4 sub-block level that is derived with equations based on
218	   gradients of the prediction samples and a motion difference relative
219	   to CU motion vectors.  Furthermore, merge with motion vector
220	   difference (MMVD) is a special mode, which further signals a limited
221	   set of motion vector differences on top of merge mode.  In addition
222	   to MMVD, there are another three types of special merge modes, i.e.,
223	   sub-block merge, triangle, and combined intra-/inter-prediction
224	   (CIIP).  Sub-block merge list includes one candidate of sub-block
225	   temporal motion vector prediction (SbTMVP) and up to four candidates
226	   of affine motion vectors.  Triangle is based on triangular block
227	   motion compensation.  CIIP combines intra- and inter- predictions
228	   with weighting.  Adaptive weighting may be employed with a block-
229	   level tool called bi-prediction with CU based weighting (BCW) which
230	   provides more flexibility than in HEVC.

232	   Intra prediction and intra-coding

234	   To capture the diversified local image texture directions with finer
235	   granularity, VVC supports 65 angular directions instead of 33
236	   directions in HEVC.  The intra mode coding is based on a 6-most-
237	   probable-mode scheme, and the 6 most probable modes are derived using
238	   the neighboring intra prediction directions.  In addition, to deal
239	   with the different distributions of intra prediction angles for
240	   different block aspect ratios, a wide-angle intra prediction (WAIP)
241	   scheme is applied in VVC by including intra prediction angles beyond
242	   those present in HEVC.  Unlike HEVC which only allows using the most
243	   adjacent line of reference samples for intra prediction, VVC also
244	   allows using two further reference lines, as known as multi-
245	   reference-line (MRL) intra prediction.  The additional reference
246	   lines can be only used for the 6 most probable intra prediction
247	   modes.  To capture the strong correlation between different colour
248	   components, in VVC, a cross-component linear mode (CCLM) is utilized
249	   which assumes a linear relationship between the luma sample values
250	   and their associated chroma samples.  For intra prediction, VVC also
251	   applies a position-dependent prediction combination (PDPC) for
252	   refining the prediction samples closer to the intra prediction block
253	   boundary.  Matrix-based intra prediction (MIP) modes are also used in
254	   VVC which generates an up to 8x8 intra prediction block using a
255	   weighted sum of downsampled neighboring reference samples, and the
256	   weights are hardcoded constants.

258	   Other coding-tool feature

260	   VVC introduces dependent quantization (DQ) to reduce quantization
261	   error by state-based switching between two quantizers.

263	1.1.2.  Systems and Transport Interfaces (informative)

265	   VVC inherits the basic systems and transport interfaces designs from
266	   HEVC and H.264.  These include the NAL-unit-based syntax structure,
267	   the hierarchical syntax and data unit structure, the supplemental
268	   enhancement information (SEI) message mechanism, and the video
269	   buffering model based on the hypothetical reference decoder (HRD).
270	   The scalability features of VVC are conceptually similar to the
271	   scalable variant of HEVC known as SHVC.  The hierarchical syntax and
272	   data unit structure consists of parameter sets at various levels
273	   (decoder, sequence (pertaining to all), sequence (pertaining to a
274	   single), picture), picture-level header parameters, slice-level
275	   header parameters, and lower-level parameters.

277	   A number of key components that influenced the network abstraction
278	   layer design of VVC as well as this memo are described below

280	   Decoding capability information

282	   The decoding capability information includes parameters that stay
283	   constant for the lifetime of a Video Bitstream, which in IETF terms
284	   can translate to the lifetime of a session.  Such information
285	   includes profile, level, and sub-profile information to determine a
286	   maximum capability interop point that is guaranteed to be never
287	   exceeded, even if splicing of video sequences occurs within a
288	   session.  It further includes constraint fields (most of which are
289	   flags), which can optionally be set to indicate that the video
290	   bitstream will be constraint in the use of certain features as
291	   indicated by the values of those fields.  With this, a bitstream can
292	   be labelled as not using certain tools, which allows among other
293	   things for resource allocation in a decoder implementation.

295	   Video parameter set

297	   The video parameter set (VPS) pertains to one or more coded video
298	   sequences (CVSs) of multiple layers covering the same range of access
299	   units, and includes, among other information, decoding dependency
300	   expressed as information for reference picture list construction of
301	   enhancement layers.  The VPS provides a "big picture" of a scalable
302	   sequence, including what types of operation points are provided, the
303	   profile, tier, and level of the operation points, and some other
304	   high-level properties of the bitstream that can be used as the basis
305	   for session negotiation and content selection, etc.  One VPS may be
306	   referenced by one or more sequence parameter sets.

308	   Sequence parameter set

310	   The sequence parameter set (SPS) contains syntax elements pertaining
311	   to a coded layer video sequence (CLVS), which is a group of pictures
312	   belonging to the same layer, starting with a random access point, and
313	   followed by pictures that may depend on each other, until the next
314	   random access point picture.  In MPGEG-2, the equivalent of a CVS was
315	   a group of pictures (GOP), which normally started with an I frame and
316	   was followed by P and B frames.  While more complex in its options of
317	   random access points, VVC retains this basic concept.  One remarkable
318	   difference of VVC is that a CLVS may start with a Gradual Decoding
319	   Refresh (GDR) picture, without requiring presence of traditional
320	   random access points in the bitstream, such as instantaneous decoding
321	   refresh (IDR) or clean random access (CRA) pictures.  In many TV-like
322	   applications, a CVS contains a few hundred milliseconds to a few
323	   seconds of video.  In video conferencing (without switching MCUs
324	   involved), a CVS can be as long in duration as the whole session.

326	   Picture and adaptation parameter set
327	   The picture parameter set and the adaptation parameter set (PPS and
328	   APS, respectively) carry information pertaining to zero or more
329	   pictures and zero or more slices, respectively.  The PPS contains
330	   information that is likely to stay constant from picture to picture-
331	   at least for pictures for a certain type-whereas the APS contains
332	   information, such as adaptive loop filter coefficients, that are
333	   likely to change from picture to picture or even within a picture.  A
334	   single APS is referenced by all slices of the same picture if that
335	   APS contains information about luma mapping with chroma scaling
336	   (LMCS) or scaling list.  Different APSs containing ALF parameters can
337	   be referenced by slices of the same picture.

339	   Picture header

341	   A Picture Header contains information that is common to all slices
342	   that belong to the same picture.  Being able to send that information
343	   as a separate NAL unit when pictures are split into several slices
344	   allows for saving bitrate, compared to repeating the same information
345	   in all slices.  However, there might be scenarios where low-bitrate
346	   video is transmitted using a single slice per picture.  Having a
347	   separate NAL unit to convey that information incurs in an overhead
348	   for such scenarios.  For such scenarios, the picture header syntax
349	   structure is directly included in the slice header, instead of in its
350	   own NAL unit.  The mode of the picture header syntax structure being
351	   included in its own NAL unit or not can only be switched on/off for
352	   an entire CLVS, and can only be switched off when in the entire CLVS
353	   each picture contains only one slice.

355	   Profile, tier, and level

357	   The profile, tier and level syntax structures in DCI, VPS and SPS
358	   contain profile, tier, level information for all layers that refer to
359	   the DCI, for layers associated with one or more output layer sets
360	   specified by the VPS, and for any layer that refers to the SPS,
361	   respectively.

363	   Sub-profiles

365	   Within the VVC specification, a sub-profile is a 32-bit number, coded
366	   according to ITU-T Rec. T.35, that does not carry a semantics.  It is
367	   carried in the profile_tier_level structure and hence (potentially)
368	   present in the DCI, VPS, and SPS.  External registration bodies can
369	   register a T.35 codepoint with ITU-T registration authorities and
370	   associate with their registration a description of bitstream
371	   restrictions beyond the profiles defined by ITU-T and ISO/IEC.  This
372	   would allow encoder manufacturers to label the bitstreams generated
373	   by their encoder as complying with such sub-profile.  It is expected
374	   that upstream standardization organizations (such as: DVB and ATSC),
375	   as well as walled-garden video services will take advantage of this
376	   labelling system.  In contrast to "normal" profiles, it is expected
377	   that sub-profiles may indicate encoder choices traditionally left
378	   open in the (decoder- centric) video coding specs, such as GOP
379	   structures, minimum/maximum QP values, and the mandatory use of
380	   certain tools or SEI messages.

382	   General constraint fields

384	   The profile_tier_level structure carries a considerable number of
385	   constraint fields (most of which are flags), which an encoder can use
386	   to indicate to a decoder that it will not use a certain tool or
387	   technology.  They were included in reaction to a perceived market
388	   need for labelling a bitstream as not exercising a certain tool that
389	   has become commercially unviable.

391	   Temporal scalability support

393	   VVC includes support of temporal scalability, by inclusion of the
394	   signaling of TemporalId in the NAL unit header, the restriction that
395	   pictures of a particular temporal sublayer cannot be used for inter
396	   prediction reference by pictures of a lower temporal sublayer, the
397	   sub-bitstream extraction process, and the requirement that each sub-
398	   bitstream extraction output be a conforming bitstream.  Media-Aware
399	   Network Elements (MANEs) can utilize the TemporalId in the NAL unit
400	   header for stream adaptation purposes based on temporal scalability.

402	   Reference picture resampling (RPR)

404	   In AVC and HEVC, the spatial resolution of pictures cannot change
405	   unless a new sequence using a new SPS starts, with an IRAP picture.
406	   VVC enables picture resolution change within a sequence at a position
407	   without encoding an IRAP picture, which is always intra-coded.  This
408	   feature is sometimes referred to as reference picture resampling
409	   (RPR), as the feature needs resampling of a reference picture used
410	   for inter prediction when that reference picture has a different
411	   resolution than the current picture being decoded.  RPR allows
412	   resolution change without the need of coding an IRAP picture, which
413	   causes a momentary bit rate spike in streaming or video conferencing
414	   scenarios, e.g., to cope with network condition changes.  RPR can
415	   also be used in application scenarios wherein zooming of the entire
416	   video region or some region of interest is needed.

418	   Spatial, SNR, and multiview scalability

420	   VVC includes support for spatial, SNR, and multiview scalability.
421	   Scalable video coding is widely considered to have technical benefits
422	   and enrich services for various video applications.  Until recently,
423	   however, the functionality has not been included in the first version
424	   of specifications of the video codecs.  In VVC, however, all those
425	   forms of scalability are supported in the first version of VVC
426	   natively through the signaling of the layer_id in the NAL unit
427	   header, the VPS which associates layers with given layer_ids to each
428	   other, reference picture selection, reference picture resampling for
429	   spatial scalability, and a number of other mechanisms not relevant
430	   for this memo.

432	      Spatial scalability

434	         With the existence of Reference Picture Resampling (RPR), the
435	         additional burden for scalability support is just a
436	         modification of the high-level syntax (HLS).  The inter-layer
437	         prediction is employed in a scalable system to improve the
438	         coding efficiency of the enhancement layers.  In addition to
439	         the spatial and temporal motion-compensated predictions that
440	         are available in a single-layer codec, the inter-layer
441	         prediction in VVC uses the possibly resampled video data of the
442	         reconstructed reference picture from a reference layer to
443	         predict the current enhancement layer.  The resampling process
444	         for inter-layer prediction, when used, is performed at the
445	         block-level, reusing the existing interpolation process for
446	         motion compensation in single-layer coding.  It means that no
447	         additional resampling process is needed to support spatial
448	         scalability.

450	      SNR scalability

452	         SNR scalability is similar to spatial scalability except that
453	         the resampling factors are 1:1.  In other words, there is no
454	         change in resolution, but there is inter-layer prediction.

456	      Multiview scalability

458	         The first version of VVC also supports multiview scalability,
459	         wherein a multi-layer bitstream carries layers representing
460	         multiple views, and one or more of the represented views can be
461	         output at the same time.

463	   SEI messages

465	   Supplementary enhancement information (SEI) messages are information
466	   in the bitstream that do not influence the decoding process as
467	   specified in the VVC spec, but address issues of representation/
468	   rendering of the decoded bitstream, label the bitstream for certain
469	   applications, among other, similar tasks.  The overall concept of SEI
470	   messages and many of the messages themselves has been inherited from
471	   the H.264 and HEVC specs.  Except for the SEI messages that affect
472	   the specification of the hypothetical reference decoder (HRD), other
473	   SEI messages for use in the VVC environment, which are generally
474	   useful also in other video coding technologies, are not included in
475	   the main VVC specification but in a companion specification [VSEI].

477	1.1.3.  High-Level Picture Partitioning (informative)

479	   VVC inherited the concept of tiles and wavefront parallel processing
480	   (WPP) from HEVC, with some minor to moderate differences.  The basic
481	   concept of slices was kept in VVC but designed in an essentially
482	   different form.  VVC is the first video coding standard that includes
483	   subpictures as a feature, which provides the same functionality as
484	   HEVC motion-constrained tile sets (MCTSs) but designed differently to
485	   have better coding efficiency and to be friendlier for usage in
486	   application systems.  More details of these differences are described
487	   below.

489	   Tiles and WPP

491	   Same as in HEVC, a picture can be split into tile rows and tile
492	   columns in VVC, in-picture prediction across tile boundaries is
493	   disallowed, etc.  However, the syntax for signaling of tile
494	   partitioning has been simplified, by using a unified syntax design
495	   for both the uniform and the non-uniform mode.  In addition,
496	   signaling of entry point offsets for tiles in the slice header is
497	   optional in VVC while it is mandatory in HEVC.  The WPP design in VVC
498	   has two differences compared to HEVC: i) The CTU row delay is reduced
499	   from two CTUs to one CTU; ii) Signaling of entry point offsets for
500	   WPP in the slice header is optional in VVC while it is mandatory in
501	   HEVC.

503	   Slices

505	   In VVC, the conventional slices based on CTUs (as in HEVC) or
506	   macroblocks (as in AVC) have been removed.  The main reasoning behind
507	   this architectural change is as follows.  The advances in video
508	   coding since 2003 (the publication year of AVC v1) have been such
509	   that slice-based error concealment has become practically impossible,
510	   due to the ever-increasing number and efficiency of in-picture and
511	   inter-picture prediction mechanisms.  An error-concealed picture is
512	   the decoding result of a transmitted coded picture for which there is
513	   some data loss (e.g., loss of some slices) of the coded picture or a
514	   reference picture for at least some part of the coded picture is not
515	   error-free (e.g., that reference picture was an error-concealed
516	   picture).  For example, when one of the multiple slices of a picture
517	   is lost, it may be error-concealed using an interpolation of the
518	   neighboring slices.  While advanced video coding prediction
519	   mechanisms provide significantly higher coding efficiency, they also
520	   make it harder for machines to estimate the quality of an error-
521	   concealed picture, which was already a hard problem with the use of
522	   simpler prediction mechanisms.  Advanced in-picture prediction
523	   mechanisms also cause the coding efficiency loss due to splitting a
524	   picture into multiple slices to be more significant.  Furthermore,
525	   network conditions become significantly better while at the same time
526	   techniques for dealing with packet losses have become significantly
527	   improved.  As a result, very few implementations have recently used
528	   slices for maximum transmission unit size matching.  Instead,
529	   substantially all applications where low-delay error resilience is
530	   required (e.g., video telephony and video conferencing) rely on
531	   system/transport-level error resilience (e.g., retransmission,
532	   forward error correction) and/or picture-based error resilience tools
533	   (feedback-based error resilience, insertion of IRAPs, scalability
534	   with higher protection level of the base layer, and so on).
535	   Considering all the above, nowadays it is very rare that a picture
536	   that cannot be correctly decoded is passed to the decoder, and when
537	   such a rare case occurs, the system can afford to wait for an error-
538	   free picture to be decoded and available for display without
539	   resulting in frequent and long periods of picture freezing seen by
540	   end users.

542	   Slices in VVC have two modes: rectangular slices and raster-scan
543	   slices.  The rectangular slice, as indicated by its name, covers a
544	   rectangular region of the picture.  Typically, a rectangular slice
545	   consists of several complete tiles.  However, it is also possible
546	   that a rectangular slice is a subset of a tile and consists of one or
547	   more consecutive, complete CTU rows within a tile.  A raster-scan
548	   slice consists of one or more complete tiles in a tile raster scan
549	   order, hence the region covered by a raster-scan slices need not but
550	   could have a non-rectangular shape, but it may also happen to have
551	   the shape of a rectangle.  The concept of slices in VVC is therefore
552	   strongly linked to or based on tiles instead of CTUs (as in HEVC) or
553	   macroblocks (as in AVC).

555	   Subpictures

557	   VVC is the first video coding standard that includes the support of
558	   subpictures as a feature.  Each subpicture consists of one or more
559	   complete rectangular slices that collectively cover a rectangular
560	   region of the picture.  A subpicture may be either specified to be
561	   extractable (i.e., coded independently of other subpictures of the
562	   same picture and of earlier pictures in decoding order) or not
563	   extractable.  Regardless of whether a subpicture is extractable or
564	   not, the encoder can control whether in-loop filtering (including
565	   deblocking, SAO, and ALF) is applied across the subpicture boundaries
566	   individually for each subpicture.

568	   Functionally, subpictures are similar to the motion-constrained tile
569	   sets (MCTSs) in HEVC.  They both allow independent coding and
570	   extraction of a rectangular subset of a sequence of coded pictures,
571	   for use cases like viewport-dependent 360o video streaming
572	   optimization and region of interest (ROI) applications.

574	   There are several important design differences between subpictures
575	   and MCTSs.  First, the subpictures feature in VVC allows motion
576	   vectors of a coding block pointing outside of the subpicture even
577	   when the subpicture is extractable by applying sample padding at
578	   subpicture boundaries in this case, similarly as at picture
579	   boundaries.  Second, additional changes were introduced for the
580	   selection and derivation of motion vectors in the merge mode and in
581	   the decoder side motion vector refinement process of VVC.  This
582	   allows higher coding efficiency compared to the non-normative motion
583	   constraints applied at the encoder-side for MCTSs.  Third, rewriting
584	   of SHs (and PH NAL units, when present) is not needed when extracting
585	   one or more extractable subpictures from a sequence of pictures to
586	   create a sub-bitstream that is a conforming bitstream.  In sub-
587	   bitstream extractions based on HEVC MCTSs, rewriting of SHs is
588	   needed.  Note that in both HEVC MCTSs extraction and VVC subpictures
589	   extraction, rewriting of SPSs and PPSs is needed.  However, typically
590	   there are only a few parameter sets in a bitstream, while each
591	   picture has at least one slice, therefore rewriting of SHs can be a
592	   significant burden for application systems.  Fourth, slices of
593	   different subpictures within a picture are allowed to have different
594	   NAL unit types.  Fifth, VVC specifies HRD and level definitions for
595	   subpicture sequences, thus the conformance of the sub-bitstream of
596	   each extractable subpicture sequence can be ensured by encoders.

598	1.1.4.  NAL Unit Header

600	   VVC maintains the NAL unit concept of HEVC with modifications.  VVC
601	   uses a two-byte NAL unit header, as shown in Figure 1.  The payload
602	   of a NAL unit refers to the NAL unit excluding the NAL unit header.

604	                     +---------------+---------------+
605	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
606	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
607	                     |F|Z| LayerID   |  Type   | TID |
608	                     +---------------+---------------+

610	                   The Structure of the VVC NAL Unit Header.

612	                                  Figure 1

614	   The semantics of the fields in the NAL unit header are as specified
615	   in VVC and described briefly below for convenience.  In addition to
616	   the name and size of each field, the corresponding syntax element
617	   name in VVC is also provided.

619	   F: 1 bit

621	      forbidden_zero_bit.  Required to be zero in VVC.  Note that the
622	      inclusion of this bit in the NAL unit header was to enable
623	      transport of VVC video over MPEG-2 transport systems (avoidance of
624	      start code emulations) [MPEG2S].  In the context of this memo the
625	      value 1 may be used to indicate a syntax violation, e.g., for a
626	      NAL unit resulted from aggregating a number of fragmented units of
627	      a NAL unit but missing the last fragment, as described in the last
628	      sentence of section 4.3.3.

630	   Z: 1 bit

632	      nuh_reserved_zero_bit.  Required to be zero in VVC, and reserved
633	      for future extensions by ITU-T and ISO/IEC.

635	      This memo does not overload the "Z" bit for local extensions, as
636	      a) overloading the "F" bit is sufficient and b) to preserve the
637	      usefulness of this memo to possible future versions of [VVC].

639	   LayerId: 6 bits

641	      nuh_layer_id.  Identifies the layer a NAL unit belongs to, wherein
642	      a layer may be, e.g., a spatial scalable layer, a quality scalable
643	      layer, a layer containing a different view, etc.

645	   Type: 5 bits

647	      nal_unit_type.  This field specifies the NAL unit type as defined
648	      in Table 5 of [VVC].  For a reference of all currently defined NAL
649	      unit types and their semantics, please refer to Section 7.4.2.2 in
650	      [VVC].

652	   TID: 3 bits

654	      nuh_temporal_id_plus1.  This field specifies the temporal
655	      identifier of the NAL unit plus 1.  The value of TemporalId is
656	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
657	      there is at least one bit in the NAL unit header equal to 1, so to
658	      enable the consideration of start code emulations in the NAL unit
659	      payload data independent of the NAL unit header.

661	1.2.  Overview of the Payload Format

663	   This payload format defines the following processes required for
664	   transport of VVC coded data over RTP [RFC3550]:

666	   *  Usage of RTP header with this payload format

668	   *  Packetization of VVC coded NAL units into RTP packets using three
669	      types of payload structures: a single NAL unit packet, aggregation
670	      packet, and fragment unit

672	   *  Transmission of VVC NAL units of the same bitstream within a
673	      single RTP stream

675	   *  Media type parameters to be used with the Session Description
676	      Protocol (SDP) [RFC4566]

678	   *  Usage of RTCP feedback messages

680	2.  Conventions

682	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
683	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
684	   "OPTIONAL" in this document are to be interpreted as described in BCP
685	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
686	   capitals, as shown above.

688	3.  Definitions and Abbreviations

690	3.1.  Definitions

692	   This document uses the terms and definitions of VVC.  Section 3.1.1
693	   lists relevant definitions from [VVC] for convenience.  Section 3.1.2
694	   provides definitions specific to this memo.  All the used terms and
695	   definitions in this memo are verbatim copies of [VVC] specification.

697	3.1.1.  Definitions from the VVC Specification

699	   Access unit (AU): A set of PUs that belong to different layers and
700	   contain coded pictures associated with the same time for output from
701	   the DPB.

703	   Adaptation parameter set (APS): A syntax structure containing syntax
704	   elements that apply to zero or more slices as determined by zero or
705	   more syntax elements found in slice headers.

707	   Bitstream: A sequence of bits, in the form of a NAL unit stream or a
708	   byte stream, that forms the representation of a sequence of AUs
709	   forming one or more coded video sequences (CVSs).

711	   Coded picture: A coded representation of a picture comprising VCL NAL
712	   units with a particular value of nuh_layer_id within an AU and
713	   containing all CTUs of the picture.

715	   Clean random access (CRA) PU: A PU in which the coded picture is a
716	   CRA picture.

718	   Clean random access (CRA) picture: An IRAP picture for which each VCL
719	   NAL unit has nal_unit_type equal to CRA_NUT.

721	   Coded video sequence (CVS): A sequence of AUs that consists, in
722	   decoding order, of a CVSS AU, followed by zero or more AUs that are
723	   not CVSS AUs, including all subsequent AUs up to but not including
724	   any subsequent AU that is a CVSS AU.

726	   Coded video sequence start (CVSS) AU: An AU in which there is a PU
727	   for each layer in the CVS and the coded picture in each PU is a CLVSS
728	   picture.

730	   Coded layer video sequence (CLVS): A sequence of PUs with the same
731	   value of nuh_layer_id that consists, in decoding order, of a CLVSS
732	   PU, followed by zero or more PUs that are not CLVSS PUs, including
733	   all subsequent PUs up to but not including any subsequent PU that is
734	   a CLVSS PU.

736	   Coded layer video sequence start (CLVSS) PU: A PU in which the coded
737	   picture is a CLVSS picture.

739	   Coded layer video sequence start (CLVSS) picture: A coded picture
740	   that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or
741	   a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.

743	   Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs
744	   of chroma samples of a picture that has three sample arrays, or a CTB
745	   of samples of a monochrome picture or a picture that is coded using
746	   three separate colour planes and syntax structures used to code the
747	   samples.

749	   Decoding Capability Information (DCI): A syntax structure containing
750	   syntax elements that apply to the entire bitstream.

752	   Decoded picture buffer (DPB): A buffer holding decoded pictures for
753	   reference, output reordering, or output delay specified for the
754	   hypothetical reference decoder.

756	   Gradual decoding refresh (GDR) picture: A picture for which each VCL
757	   NAL unit has nal_unit_type equal to GDR_NUT.

759	   Instantaneous decoding refresh (IDR) PU: A PU in which the coded
760	   picture is an IDR picture.

762	   Instantaneous decoding refresh (IDR) picture: An IRAP picture for
763	   which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or
764	   IDR_N_LP.

766	   Intra random access point (IRAP) AU: An AU in which there is a PU for
767	   each layer in the CVS and the coded picture in each PU is an IRAP
768	   picture.

770	   Intra random access point (IRAP) PU: A PU in which the coded picture
771	   is an IRAP picture.

773	   Intra random access point (IRAP) picture: A coded picture for which
774	   all VCL NAL units have the same value of nal_unit_type in the range
775	   of IDR_W_RADL to CRA_NUT, inclusive.

777	   Layer: A set of VCL NAL units that all have a particular value of
778	   nuh_layer_id and the associated non-VCL NAL units.

780	   Network abstraction layer (NAL) unit: A syntax structure containing
781	   an indication of the type of data to follow and bytes containing that
782	   data in the form of an RBSP interspersed as necessary with emulation
783	   prevention bytes.

785	   Network abstraction layer (NAL) unit stream: A sequence of NAL units.

787	   Operation point (OP): A temporal subset of an OLS, identified by an
788	   OLS index and a highest value of TemporalId.

790	   Picture parameter set (PPS): A syntax structure containing syntax
791	   elements that apply to zero or more entire coded pictures as
792	   determined by a syntax element found in each slice header.

794	   Picture unit (PU): A set of NAL units that are associated with each
795	   other according to a specified classification rule, are consecutive
796	   in decoding order, and contain exactly one coded picture.

798	   Random access: The act of starting the decoding process for a
799	   bitstream at a point other than the beginning of the stream.

801	   Sequence parameter set (SPS): A syntax structure containing syntax
802	   elements that apply to zero or more entire CLVSs as determined by the
803	   content of a syntax element found in the PPS referred to by a syntax
804	   element found in each picture header.

806	   Slice: An integer number of complete tiles or an integer number of
807	   consecutive complete CTU rows within a tile of a picture that are
808	   exclusively contained in a single NAL unit.

810	   Slice header (SH): A part of a coded slice containing the data
811	   elements pertaining to all tiles or CTU rows within a tile
812	   represented in the slice.

814	   Sublayer: A temporal scalable layer of a temporal scalable bitstream
815	   consisting of VCL NAL units with a particular value of the TemporalId
816	   variable, and the associated non-VCL NAL units.

818	   Subpicture: An rectangular region of one or more slices within a
819	   picture.

821	   Sublayer representation: A subset of the bitstream consisting of NAL
822	   units of a particular sublayer and the lower sublayers.

824	   Tile: A rectangular region of CTUs within a particular tile column
825	   and a particular tile row in a picture.

827	   Tile column: A rectangular region of CTUs having a height equal to
828	   the height of the picture and a width specified by syntax elements in
829	   the picture parameter set.

831	   Tile row: A rectangular region of CTUs having a height specified by
832	   syntax elements in the picture parameter set and a width equal to the
833	   width of the picture.

835	   Video coding layer (VCL) NAL unit: A collective term for coded slice
836	   NAL units and the subset of NAL units that have reserved values of
837	   nal_unit_type that are classified as VCL NAL units in this
838	   Specification.

840	3.1.2.  Definitions Specific to This Memo

842	   Media-Aware Network Element (MANE): A network element, such as a
843	   middlebox, selective forwarding unit, or application-layer gateway
844	   that is capable of parsing certain aspects of the RTP payload headers
845	   or the RTP payload and reacting to their contents.

847	      Informative note: The concept of a MANE goes beyond normal routers
848	      or gateways in that a MANE has to be aware of the signaling (e.g.,
849	      to learn about the payload type mappings of the media streams),
850	      and in that it has to be trusted when working with Secure RTP
851	      (SRTP).  The advantage of using MANEs is that they allow packets
852	      to be dropped according to the needs of the media coding.  For
853	      example, if a MANE has to drop packets due to congestion on a
854	      certain link, it can identify and remove those packets whose
855	      elimination produces the least adverse effect on the user
856	      experience.  After dropping packets, MANEs must rewrite RTCP
857	      packets to match the changes to the RTP stream, as specified in
858	      Section 7 of [RFC3550].

860	   NAL unit decoding order: A NAL unit order that conforms to the
861	   constraints on NAL unit order given in Section 7.4.2.4 in [VVC],
862	   follow the Order of NAL units in the bitstream.

864	   RTP stream (See [RFC7656]): Within the scope of this memo, one RTP
865	   stream is utilized to transport a VVC bitstream, which may contain
866	   one or more layers, and each layer may contain one or more temporal
867	   sublayers.

869	   Transmission order: The order of packets in ascending RTP sequence
870	   number order (in modulo arithmetic).  Within an aggregation packet,
871	   the NAL unit transmission order is the same as the order of
872	   appearance of NAL units in the packet.

874	3.2.  Abbreviations

876	   AU         Access Unit

878	   AP         Aggregation Packet

880	   APS        Adaptation Parameter Set

882	   CTU        Coding Tree Unit

884	   CVS        Coded Video Sequence

886	   DPB        Decoded Picture Buffer

888	   DCI        Decoding Capability Information

890	   DON        Decoding Order Number

892	   FIR        Full Intra Request

894	   FU         Fragmentation Unit
895	   GDR        Gradual Decoding Refresh

897	   HRD        Hypothetical Reference Decoder

899	   IDR        Instantaneous Decoding Refresh

901	   MANE       Media-Aware Network Element

903	   MTU        Maximum Transfer Unit

905	   NAL        Network Abstraction Layer

907	   NALU       Network Abstraction Layer Unit

909	   PLI        Picture Loss Indication

911	   PPS        Picture Parameter Set

913	   RPS        Reference Picture Set

915	   RPSI       Reference Picture Selection Indication

917	   SEI        Supplemental Enhancement Information

919	   SLI        Slice Loss Indication

921	   SPS        Sequence Parameter Set

923	   VCL        Video Coding Layer

925	   VPS        Video Parameter Set

927	4.  RTP Payload Format

929	4.1.  RTP Header Usage

931	   The format of the RTP header is specified in [RFC3550] (reprinted as
932	   Figure 2 for convenience).  This payload format uses the fields of
933	   the header in a manner consistent with that specification.

935	   The RTP payload (and the settings for some RTP header bits) for
936	   aggregation packets and fragmentation units are specified in
937	   Section 4.3.2 and Section 4.3.3, respectively.

939	       0                   1                   2                   3
940	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
941	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
942	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
943	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
944	      |                           timestamp                           |
945	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
946	      |           synchronization source (SSRC) identifier            |
947	      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
948	      |            contributing source (CSRC) identifiers             |
949	      |                             ....                              |
950	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

952	                        RTP Header According to {{RFC3550}}

954	                                  Figure 2

956	   The RTP header information to be set according to this RTP payload
957	   format is set as follows:

959	   Marker bit (M): 1 bit

961	      Set for the last packet, in transmission order, among each set of
962	      packets that contain NAL units of one access unit.  This is in
963	      line with the normal use of the M bit in video formats to allow an
964	      efficient playout buffer handling.

966	   Payload Type (PT): 7 bits

968	      The assignment of an RTP payload type for this new packet format
969	      is outside the scope of this document and will not be specified
970	      here.  The assignment of a payload type has to be performed either
971	      through the profile used or in a dynamic way.

973	   Sequence Number (SN): 16 bits

975	      Set and used in accordance with [RFC3550].

977	   Timestamp: 32 bits
978	      The RTP timestamp is set to the sampling timestamp of the content.
979	      A 90 kHz clock rate MUST be used.  If the NAL unit has no timing
980	      properties of its own (e.g., parameter set and SEI NAL units), the
981	      RTP timestamp MUST be set to the RTP timestamp of the coded
982	      pictures of the access unit in which the NAL unit (according to
983	      Section 7.4.2.4 of [VVC]) is included.  Receivers MUST use the RTP
984	      timestamp for the display process, even when the bitstream
985	      contains picture timing SEI messages or decoding unit information
986	      SEI messages as specified in [VVC].

988	         Informative note: When picture timing SEI messages are present,
989	         the RTP sender is responsible to ensure that the RTP timestamps
990	         are consistent with the timing information carried in the
991	         picture timing SEI messages.

993	   Synchronization source (SSRC): 32 bits

995	      Used to identify the source of the RTP packets.  A single SSRC is
996	      used for all parts of a single bitstream.

998	4.2.  Payload Header Usage

1000	   The first two bytes of the payload of an RTP packet are referred to
1001	   as the payload header.  The payload header consists of the same
1002	   fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown
1003	   in Section 1.1.4, irrespective of the type of the payload structure.

1005	   The TID value indicates (among other things) the relative importance
1006	   of an RTP packet, for example, because NAL units belonging to higher
1007	   temporal sublayers are not used for the decoding of lower temporal
1008	   sublayers.  A lower value of TID indicates a higher importance.
1009	   More-important NAL units MAY be better protected against transmission
1010	   losses than less-important NAL units.

1012	4.3.  Payload Structures

1014	   Three different types of RTP packet payload structures are specified.
1015	   A receiver can identify the type of an RTP packet payload through the
1016	   Type field in the payload header.

1018	   The three different payload structures are as follows:

1020	   *  Single NAL unit packet: Contains a single NAL unit in the payload,
1021	      and the NAL unit header of the NAL unit also serves as the payload
1022	      header.  This payload structure is specified in Section 4.4.1.

1024	   *  Aggregation Packet (AP): Contains more than one NAL unit within
1025	      one access unit.  This payload structure is specified in
1026	      Section 4.3.2.

1028	   *  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
1029	      This payload structure is specified in Section 4.3.3.

1031	4.3.1.  Single NAL Unit Packets

1033	   A single NAL unit packet contains exactly one NAL unit, and consists
1034	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
1035	   DONL field (in network byte order), and the NAL unit payload data
1036	   (the NAL unit excluding its NAL unit header) of the contained NAL
1037	   unit, as shown in Figure 3.

1039	      0                   1                   2                   3
1040	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1041	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1042	     |           PayloadHdr          |      DONL (conditional)       |
1043	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1044	     |                                                               |
1045	     |                  NAL unit payload data                        |
1046	     |                                                               |
1047	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1048	     |                               :...OPTIONAL RTP padding        |
1049	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1051	                  The Structure of a Single NAL Unit Packet

1053	                                  Figure 3

1055	   The DONL field, when present, specifies the value of the 16 least
1056	   significant bits of the decoding order number of the contained NAL
1057	   unit.  If sprop-max-don-diff is greater than 0, the DONL field MUST
1058	   be present, and the variable DON for the contained NAL unit is
1059	   derived as equal to the value of the DONL field.  Otherwise (sprop-
1060	   max-don-diff is equal to 0), the DONL field MUST NOT be present.

1062	4.3.2.  Aggregation Packets (APs)

1064	   Aggregation Packets (APs) can reduce packetization overhead for small
1065	   NAL units, such as most of the non-VCL NAL units, which are often
1066	   only a few octets in size.

1068	   An AP aggregates NAL units of one access unit and it can only contain
1069	   NAL units from one AU.  Each NAL unit to be carried in an AP is
1070	   encapsulated in an aggregation unit.  NAL units aggregated in one AP
1071	   are included in NAL unit decoding order.

1073	   An AP consists of a payload header (denoted as PayloadHdr) followed
1074	   by two or more aggregation units, as shown in Figure 4.

1076	     0                   1                   2                   3
1077	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1078	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1079	    |    PayloadHdr (Type=28)       |                               |
1080	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1081	    |                                                               |
1082	    |             two or more aggregation units                     |
1083	    |                                                               |
1084	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1085	    |                               :...OPTIONAL RTP padding        |
1086	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1088	                   The Structure of an Aggregation Packet

1090	                                  Figure 4

1092	   The fields in the payload header of an AP are set as follows.  The F
1093	   bit MUST be equal to 0 if the F bit of each aggregated NAL unit is
1094	   equal to zero; otherwise, it MUST be equal to 1.  The Type field MUST
1095	   be equal to 28.

1097	   The value of LayerId MUST be equal to the lowest value of LayerId of
1098	   all the aggregated NAL units.  The value of TID MUST be the lowest
1099	   value of TID of all the aggregated NAL units.

1101	      Informative note: All VCL NAL units in an AP have the same TID
1102	      value since they belong to the same access unit.  However, an AP
1103	      may contain non-VCL NAL units for which the TID value in the NAL
1104	      unit header may be different than the TID value of the VCL NAL
1105	      units in the same AP.

1107	      Informative Note: If a system envisions sub-picture level or
1108	      picture level modifications, for example by removing sub-pictures
1109	      or pictures of a particular layer, a good design choice on the
1110	      sender's side would be to aggregate NAL units belonging to only
1111	      the same sub-picture or picture of a particular layer.

1113	   An AP MUST carry at least two aggregation units and can carry as many
1114	   aggregation units as necessary; however, the total amount of data in
1115	   an AP obviously MUST fit into an IP packet, and the size SHOULD be
1116	   chosen so that the resulting IP packet is smaller than the MTU size
1117	   so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
1118	   specified in Section 4.3.3.  APs MUST NOT be nested; i.e., an AP can
1119	   not contain another AP.

1121	   The first aggregation unit in an AP consists of a conditional 16-bit
1122	   DONL field (in network byte order) followed by a 16-bit unsigned size
1123	   information (in network byte order) that indicates the size of the
1124	   NAL unit in bytes (excluding these two octets, but including the NAL
1125	   unit header), followed by the NAL unit itself, including its NAL unit
1126	   header, as shown in Figure 5.

1128	     0                   1                   2                   3
1129	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1130	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1131	    |               :       DONL (conditional)      |   NALU size   |
1132	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1133	    |   NALU size   |                                               |
1134	    +-+-+-+-+-+-+-+-+         NAL unit                              |
1135	    |                                                               |
1136	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1137	    |                               :
1138	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1140	           The Structure of the First Aggregation Unit in an AP

1142	                                  Figure 5

1144	   The DONL field, when present, specifies the value of the 16 least
1145	   significant bits of the decoding order number of the aggregated NAL
1146	   unit.

1148	   If sprop-max-don-diff is greater than 0, the DONL field MUST be
1149	   present in an aggregation unit that is the first aggregation unit in
1150	   an AP, and the variable DON for the aggregated NAL unit is derived as
1151	   equal to the value of the DONL field, and the variable DON for an
1152	   aggregation unit that is not the first aggregation unit in an AP
1153	   aggregated NAL unit is derived as equal to the DON of the preceding
1154	   aggregated NAL unit in the same AP plus 1 modulo 65536.  Otherwise
1155	   (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
1156	   present in an aggregation unit that is the first aggregation unit in
1157	   an AP.

1159	   An aggregation unit that is not the first aggregation unit in an AP
1160	   will be followed immediately by a 16-bit unsigned size information
1161	   (in network byte order) that indicates the size of the NAL unit in
1162	   bytes (excluding these two octets, but including the NAL unit
1163	   header), followed by the NAL unit itself, including its NAL unit
1164	   header, as shown in Figure 6.

1166	     0                   1                   2                   3
1167	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1168	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1169	    |               :       NALU size               |   NAL unit    |
1170	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1171	    |                                                               |
1172	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1173	    |                               :
1174	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1176	         The Structure of an Aggregation Unit That Is Not the First
1177	                          Aggregation Unit in an AP

1179	                                  Figure 6

1181	   Figure 7 presents an example of an AP that contains two aggregation
1182	   units, labeled as 1 and 2 in the figure, without the DONL field being
1183	   present.

1185	     0                   1                   2                   3
1186	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1187	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1188	    |                          RTP Header                           |
1189	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1190	    |   PayloadHdr (Type=28)        |         NALU 1 Size           |
1191	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1192	    |          NALU 1 HDR           |                               |
1193	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1194	    |                   . . .                                       |
1195	    |                                                               |
1196	    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1197	    |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1198	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1199	    | NALU 2 HDR    |                                               |
1200	    +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1201	    |                   . . .                                       |
1202	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1203	    |                               :...OPTIONAL RTP padding        |
1204	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1206	               An Example of an AP Packet Containing
1207	             Two Aggregation Units without the DONL Field

1209	                                  Figure 7

1211	   Figure 8 presents an example of an AP that contains two aggregation
1212	   units, labeled as 1 and 2 in the figure, with the DONL field being
1213	   present.

1215	     0                   1                   2                   3
1216	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1217	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1218	    |                          RTP Header                           |
1219	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1220	    |   PayloadHdr (Type=28)        |        NALU 1 DONL            |
1221	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1222	    |          NALU 1 Size          |            NALU 1 HDR         |
1223	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1224	    |                                                               |
1225	    |                 NALU 1 Data   . . .                           |
1226	    |                                                               |
1227	    +        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1228	    |                               :          NALU 2 Size          |
1229	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1230	    |          NALU 2 HDR           |                               |
1231	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1232	    |                                                               |
1233	    |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1234	    |                               :...OPTIONAL RTP padding        |
1235	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1237	                   An Example of an AP Containing
1238	                 Two Aggregation Units with the DONL Field

1240	                                  Figure 8

1242	4.3.3.  Fragmentation Units

1244	   Fragmentation Units (FUs) are introduced to enable fragmenting a
1245	   single NAL unit into multiple RTP packets, possibly without
1246	   cooperation or knowledge of the [VVC] encoder.  A fragment of a NAL
1247	   unit consists of an integer number of consecutive octets of that NAL
1248	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1249	   order with ascending RTP sequence numbers (with no other RTP packets
1250	   within the same RTP stream being sent between the first and last
1251	   fragment).

1253	   When a NAL unit is fragmented and conveyed within FUs, it is referred
1254	   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
1255	   NOT be nested; i.e., an FU can not contain a subset of another FU.

1257	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1258	   time of the fragmented NAL unit.

1260	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1261	   header of one octet, a conditional 16-bit DONL field (in network byte
1262	   order), and an FU payload, as shown in Figure 9.

1264	     0                   1                   2                   3
1265	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1266	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1267	    |   PayloadHdr (Type=29)        |   FU header   | DONL (cond)   |
1268	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1269	    |   DONL (cond) |                                               |
1270	    |-+-+-+-+-+-+-+-+                                               |
1271	    |                         FU payload                            |
1272	    |                                                               |
1273	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1274	    |                               :...OPTIONAL RTP padding        |
1275	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1277	                          The Structure of an FU

1279	                                  Figure 9

1281	   The fields in the payload header are set as follows.  The Type field
1282	   MUST be equal to 29.  The fields F, LayerId, and TID MUST be equal to
1283	   the fields F, LayerId, and TID, respectively, of the fragmented NAL
1284	   unit.

1286	   The FU header consists of an S bit, an E bit, an R bit and a 5-bit
1287	   FuType field, as shown in Figure 10.

1289	                           +---------------+
1290	                           |0|1|2|3|4|5|6|7|
1291	                           +-+-+-+-+-+-+-+-+
1292	                           |S|E|P|  FuType |
1293	                           +---------------+

1295	                       The Structure of FU Header

1297	                                 Figure 10

1299	   The semantics of the FU header fields are as follows:

1301	   S: 1 bit

1303	      When set to 1, the S bit indicates the start of a fragmented NAL
1304	      unit, i.e., the first byte of the FU payload is also the first
1305	      byte of the payload of the fragmented NAL unit.  When the FU
1306	      payload is not the start of the fragmented NAL unit payload, the S
1307	      bit MUST be set to 0.

1309	   E: 1 bit
1310	      When set to 1, the E bit indicates the end of a fragmented NAL
1311	      unit, i.e., the last byte of the payload is also the last byte of
1312	      the fragmented NAL unit.  When the FU payload is not the last
1313	      fragment of a fragmented NAL unit, the E bit MUST be set to 0.

1315	   P: 1 bit

1317	      When set to 1, the P bit indicates the last FU of the last VCL NAL
1318	      unit of a coded picture, i.e., the last byte of the FU payload is
1319	      also the last byte of the last VCL NAL unit of the coded picture.
1320	      When the FU payload is not the last fragment of the last VCL NAL
1321	      unit of a coded picture, the P bit MUST be set to 0.

1323	   FuType: 5 bits

1325	      The field FuType MUST be equal to the field Type of the fragmented
1326	      NAL unit.

1328	   The DONL field, when present, specifies the value of the 16 least
1329	   significant bits of the decoding order number of the fragmented NAL
1330	   unit.

1332	   If sprop-max-don-diff is greater than 0, and the S bit is equal to 1,
1333	   the DONL field MUST be present in the FU, and the variable DON for
1334	   the fragmented NAL unit is derived as equal to the value of the DONL
1335	   field.  Otherwise (sprop-max-don-diff is equal to 0, or the S bit is
1336	   equal to 0), the DONL field MUST NOT be present in the FU.

1338	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1339	   the Start bit and End bit must not both be set to 1 in the same FU
1340	   header.

1342	   The FU payload consists of fragments of the payload of the fragmented
1343	   NAL unit so that if the FU payloads of consecutive FUs, starting with
1344	   an FU with the S bit equal to 1 and ending with an FU with the E bit
1345	   equal to 1, are sequentially concatenated, the payload of the
1346	   fragmented NAL unit can be reconstructed.  The NAL unit header of the
1347	   fragmented NAL unit is not included as such in the FU payload, but
1348	   rather the information of the NAL unit header of the fragmented NAL
1349	   unit is conveyed in F, LayerId, and TID fields of the FU payload
1350	   headers of the FUs and the FuType field of the FU header of the FUs.
1351	   An FU payload MUST NOT be empty.

1353	   If an FU is lost, the receiver SHOULD discard all following
1354	   fragmentation units in transmission order corresponding to the same
1355	   fragmented NAL unit, unless the decoder in the receiver is known to
1356	   be prepared to gracefully handle incomplete NAL units.

1358	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1359	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1360	   n of that NAL unit is not received.  In this case, the
1361	   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
1362	   syntax violation.

1364	4.4.  Decoding Order Number

1366	   For each NAL unit, the variable AbsDon is derived, representing the
1367	   decoding order number that is indicative of the NAL unit decoding
1368	   order.

1370	   Let NAL unit n be the n-th NAL unit in transmission order within an
1371	   RTP stream.

1373	   If sprop-max-don-diff is equal to 0, AbsDon[n], the value of AbsDon
1374	   for NAL unit n, is derived as equal to n.

1376	   Otherwise (sprop-max-don-diff is greater than 0), AbsDon[n] is
1377	   derived as follows, where DON[n] is the value of the variable DON for
1378	   NAL unit n:

1380	   *  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
1381	      transmission order), AbsDon[0] is set equal to DON[0].

1383	   *  Otherwise (n is greater than 0), the following applies for
1384	      derivation of AbsDon[n]:

1386	         If DON[n] == DON[n-1],
1387	            AbsDon[n] = AbsDon[n-1]

1389	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1390	            AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1392	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1393	            AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1395	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1396	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
1397	            DON[n])

1399	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1400	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1402	   For any two NAL units m and n, the following applies:

1404	   *  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
1405	      NAL unit m in NAL unit decoding order.

1407	   *  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1408	      of the two NAL units can be in either order.

1410	   *  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1411	      NAL unit m in decoding order.

1413	         Informative note: When two consecutive NAL units in the NAL
1414	         unit decoding order have different values of AbsDon, the
1415	         absolute difference between the two AbsDon values may be
1416	         greater than or equal to 1.

1418	         Informative note: There are multiple reasons to allow for the
1419	         absolute difference of the values of AbsDon for two consecutive
1420	         NAL units in the NAL unit decoding order to be greater than
1421	         one.  An increment by one is not required, as at the time of
1422	         associating values of AbsDon to NAL units, it may not be known
1423	         whether all NAL units are to be delivered to the receiver.  For
1424	         example, a gateway might not forward VCL NAL units of higher
1425	         sublayers or some SEI NAL units when there is congestion in the
1426	         network.  In another example, the first intra-coded picture of
1427	         a pre-encoded clip is transmitted in advance to ensure that it
1428	         is readily available in the receiver, and when transmitting the
1429	         first intra-coded picture, the originator does not exactly know
1430	         how many NAL units will be encoded before the first intra-coded
1431	         picture of the pre-encoded clip follows in decoding order.
1432	         Thus, the values of AbsDon for the NAL units of the first
1433	         intra-coded picture of the pre-encoded clip have to be
1434	         estimated when they are transmitted, and gaps in values of
1435	         AbsDon may occur.

1437	5.  Packetization Rules

1439	   The following packetization rules apply:

1441	   *  If sprop-max-don-diff is greater than 0, the transmission order of
1442	      NAL units carried in the RTP stream MAY be different than the NAL
1443	      unit decoding order.  Otherwise (sprop-max-don-diff is equal to
1444	      0), the transmission order of NAL units carried in the RTP stream
1445	      MUST be the same as the NAL unit decoding order.

1447	   *  A NAL unit of a small size SHOULD be encapsulated in an
1448	      aggregation packet together one or more other NAL units in order
1449	      to avoid the unnecessary packetization overhead for small NAL
1450	      units.  For example, non-VCL NAL units such as access unit
1451	      delimiters, parameter sets, or SEI NAL units are typically small
1452	      and can often be aggregated with VCL NAL units without violating
1453	      MTU size constraints.

1455	   *  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
1456	      viewpoint, be encapsulated in an aggregation packet together with
1457	      its associated VCL NAL unit, as typically a non-VCL NAL unit would
1458	      be meaningless without the associated VCL NAL unit being
1459	      available.

1461	   *  For carrying exactly one NAL unit in an RTP packet, a single NAL
1462	      unit packet MUST be used.

1464	6.  De-packetization Process

1466	   The general concept behind de-packetization is to get the NAL units
1467	   out of the RTP packets in an RTP stream and pass them to the decoder
1468	   in the NAL unit decoding order.

1470	   The de-packetization process is implementation dependent.  Therefore,
1471	   the following description should be seen as an example of a suitable
1472	   implementation.  Other schemes may be used as well, as long as the
1473	   output for the same input is the same as the process described below.
1474	   The output is the same when the set of output NAL units and their
1475	   order are both identical.  Optimizations relative to the described
1476	   algorithms are possible.

1478	   All normal RTP mechanisms related to buffer management apply.  In
1479	   particular, duplicated or outdated RTP packets (as indicated by the
1480	   RTP sequences number and the RTP timestamp) are removed.  To
1481	   determine the exact time for decoding, factors such as a possible
1482	   intentional delay to allow for proper inter-stream synchronization
1483	   MUST be factored in.

1485	   NAL units with NAL unit type values in the range of 0 to 27,
1486	   inclusive, may be passed to the decoder.  NAL-unit-like structures
1487	   with NAL unit type values in the range of 28 to 31, inclusive, MUST
1488	   NOT be passed to the decoder.

1490	   The receiver includes a receiver buffer, which is used to compensate
1491	   for transmission delay jitter within individual RTP stream, and to
1492	   reorder NAL units from transmission order to the NAL unit decoding
1493	   order.  In this section, the receiver operation is described under
1494	   the assumption that there is no transmission delay jitter within an
1495	   RTP stream.  To make a difference from a practical receiver buffer
1496	   that is also used for compensation of transmission delay jitter, the
1497	   receiver buffer is hereafter called the de-packetization buffer in
1498	   this section.  Receivers should also prepare for transmission delay
1499	   jitter; that is, either reserve separate buffers for transmission
1500	   delay jitter buffering and de-packetization buffering or use a
1501	   receiver buffer for both transmission delay jitter and de-
1502	   packetization.  Moreover, receivers should take transmission delay
1503	   jitter into account in the buffering operation, e.g., by additional
1504	   initial buffering before starting of decoding and playback.

1506	   The de-packetization process extracts the NAL units from the RTP
1507	   packets in an RTP stream as follows.  When an RTP packet carries a
1508	   single NAL unit packet, the payload of the RTP packet is extracted as
1509	   a single NAL unit, excluding the DONL field, i.e., third and fourth
1510	   bytes, when sprop-max-don-diff is greater than 0.  When an RTP packet
1511	   carries an Aggregation Packet, several NAL units are extracted from
1512	   the payload of the RTP packet.  In this case, each NAL unit
1513	   corresponds to the part of the payload of each aggregation unit that
1514	   follows the NALU size field as described in Section 4.3.2.  When an
1515	   RTP packet carries a Fragmentation Unit (FU), all RTP packets from
1516	   the first FU (with the S field equal to 1) of the fragmented NAL unit
1517	   up to the last FU (with the E field equal to 1) of the fragmented NAL
1518	   unit are collected.  The NAL unit is extracted from these RTP packets
1519	   by concatenating all FU payloads in the same order as the
1520	   corresponding RTP packets and appending the NAL unit header with the
1521	   fields F, LayerId, and TID, set to equal to the values of the fields
1522	   F, LayerId, and TID in the payload header of the FUs respectively,
1523	   and with the NAL unit type set equal to the value of the field FuType
1524	   in the FU header of the FUs, as described in Section 4.3.3.

1526	   When sprop-max-don-diff is equal to 0, the de-packetization buffer
1527	   size is zero bytes, and the NAL units carried in the single RTP
1528	   stream are directly passed to the decoder in their transmission
1529	   order, which is identical to their decoding order.

1531	   When sprop-max-don-diff is greater than 0, the process described in
1532	   the remainder of this section applies.

1534	   There are two buffering states in the receiver: initial buffering and
1535	   buffering while playing.  Initial buffering starts when the reception
1536	   is initialized.  After initial buffering, decoding and playback are
1537	   started, and the buffering-while-playing mode is used.

1539	   Regardless of the buffering state, the receiver stores incoming NAL
1540	   units in reception order into the de-packetization buffer.  NAL units
1541	   carried in RTP packets are stored in the de-packetization buffer
1542	   individually, and the value of AbsDon is calculated and stored for
1543	   each NAL unit.

1545	   Initial buffering lasts until the difference between the greatest and
1546	   smallest AbsDon values of the NAL units in the de-packetization
1547	   buffer is greater than or equal to the value of sprop-max-don-diff.

1549	   After initial buffering, whenever the difference between the greatest
1550	   and smallest AbsDon values of the NAL units in the de-packetization
1551	   buffer is greater than or equal to the value of sprop-max-don-diff,
1552	   the following operation is repeatedly applied until this difference
1553	   is smaller than sprop-max-don-diff:

1555	   *  The NAL unit in the de-packetization buffer with the smallest
1556	      value of AbsDon is removed from the de-packetization buffer and
1557	      passed to the decoder.

1559	   When no more NAL units are flowing into the de-packetization buffer,
1560	   all NAL units remaining in the de-packetization buffer are removed
1561	   from the buffer and passed to the decoder in the order of increasing
1562	   AbsDon values.

1564	7.  Payload Format Parameters

1566	   This section specifies the optional parameters.  A mapping of the
1567	   parameters with Session Description Protocol (SDP) [RFC4556] is also
1568	   provided for applications that use SDP.

1570	7.1.  Media Type Registration

1572	   The receiver MUST ignore any parameter unspecified in this memo.

1574	   Type name:            video

1576	   Subtype name:         H266

1578	   Required parameters:  none

1580	   Optional parameters:

1582	      profile-id, tier-flag, sub-profile-id, interop-constraints, and
1583	      level-id:

1585	         These parameters indicate the profile, tier, default level,
1586	         sub-profile, and some constraints of the bitstream carried by
1587	         the RTP stream, or a specific set of the profile, tier, default
1588	         level, sub-profile and some constraints the receiver supports.

1590	         The subset of coding tools that may have been used to generate
1591	         the bitstream or that the receiver supports, as well as some
1592	         additional constraints are indicated collectively by profile-
1593	         id, sub-profile-id, and interop-constraints.

1595	            Informative note: There are 128 values of profile-id.  The
1596	            subset of coding tools identified by the profile-id can be
1597	            further constrained with up to 255 instances of sub-profile-
1598	            id.  In addition, 68 bits included in interop-constraints,
1599	            which can be extended up to 324 bits provide means to
1600	            further restrict tools from existing profiles.  To be able
1601	            to support this fine-granular signalling of coding tool
1602	            subsets with profile-id, sub-profile-id and interop-
1603	            constraints, it would be safe to require symmetric use of
1604	            these parameters in SDP offer/answer unless recv-ols-id is
1605	            included in the SDP answer for choosing one of the layers
1606	            offered.

1608	         The tier is indicated by tier-flag.  The default level is
1609	         indicated by level-id.  The tier and the default level specify
1610	         the limits on values of syntax elements or arithmetic
1611	         combinations of values of syntax elements that are followed
1612	         when generating the bitstream or that the receiver supports.

1614	         In SDP offer/answer, when the SDP answer does not include the
1615	         recv-ols-id parameter that is less than the sprop-ols-id
1616	         parameter in the SDP offer, the following applies:

1618	         o  The tier-flag, profile-id, sub-profile-id, and interop-
1619	            constraints parameters MUST be used symmetrically, i.e., the
1620	            value of each of these parameters in the offer MUST be the
1621	            same as that in the answer, either explicitly signaled or
1622	            implicitly inferred.

1624	         o  The level-id parameter is changeable as long as the highest
1625	            level indicated by the answer is either equal to or lower
1626	            than that in the offer.  Note that a highest level higher
1627	            than level-id in the offer for receiving can be included as
1628	            max-recv-level-id.

1630	         In SDP offer/answer, when the SDP answer does include the recv-
1631	         ols-id parameter that is less than the sprop-ols-id parameter
1632	         in the SDP offer, the set of tier- flag, profile-id, sub-
1633	         profile-id, interop-constraints, and level-id parameters
1634	         included in the answer MUST be consistent with that for the
1635	         chosen output layer set as indicated in the SDP offer, with the
1636	         exception that the level-id parameter in the SDP answer is
1637	         changeable as long as the highest level indicated by the answer
1638	         is either lower than or equal to that in the offer.

1640	         More specifications of these parameters, including how they
1641	         relate to syntax elements specified in [VVC] are provided
1642	         below.

1644	      profile-id:

1646	         When profile-id is not present, a value of 1 (i.e., the Main 10
1647	         profile) MUST be inferred.

1649	         When used to indicate properties of a bitstream, profile-id is
1650	         derived from the general_profile_idc syntax element that
1651	         applies to the bitstream in an instance of the
1652	         profile_tier_level( ) syntax structure.

1654	         VVC bitstreams transported over RTP using the technologies of
1655	         this memo SHOULD contain only a single PTL structure in the
1656	         DCI, unless the sender can assure that a receiver can correctly
1657	         decode the the VVC bitstream regardless of what PTL structure
1658	         was used in the SDP O/A exchange.

1660	         As specified in [VVC], a profile_tier_level( ) syntax structure
1661	         may be contained in an SPS NAL unit, and one or more
1662	         profile_tier_level( ) syntax structures may be contained in a
1663	         VPS NAL unit and in a DCI NAL unit.  One of the following three
1664	         cases applies to the container NAL unit of the
1665	         profile_tier_level( ) syntax structure containing those PTL
1666	         syntax elements used to derive the values of profile-id, tier-
1667	         flag, level-id, sub-profile-id, or interop-constraints: 1) The
1668	         container NAL unit is an SPS, the bitstream is a single-layer
1669	         bitstream, and the profile_tier_level( ) syntax structures in
1670	         all SPSs referenced by the CVSs in the bitstream has the same
1671	         values respectively for those PTL syntax elements; 2) The
1672	         container NAL unit is a VPS, the profile_tier_level( ) syntax
1673	         structure is the one in the VPS that applies to the OLS
1674	         corresponding to the bitstream, and the profile_tier_level( )
1675	         syntax structures applicable to the OLS corresponding to the
1676	         bitstream in all VPSs referenced by the CVSs in the bitstream
1677	         have the same values respectively for those PTL syntax
1678	         elements; 3) The container NAL unit is a DCI NAL unit and the
1679	         profile_tier_level( ) syntax structures in all DCI NAL units in
1680	         the bitstream has the same values respectively for those PTL
1681	         syntax elements.

1683	         [VVC] allows for multiple profile_tier_level( ) structures in a
1684	         DCI NAL unit, which may contain different values for the syntax
1685	         elements used to derive the values of profile-id, tier-flag,
1686	         level-id, sub-profile-id, or interop-constraints in the
1687	         different entries.  However, herein defined is only a single
1688	         profile-id, tier-flag, level-id, sub-profile-id, or interop-
1689	         constraints.  When signaling these parameters, when a DCI NAL
1690	         unit is present with multiple profile_tier_level( ) structures,
1691	         these values SHOULD be the same as the first profile_tier_level
1692	         structure in the DCI, unless the sender has ensured that the
1693	         receiver can decode the bitstream when a different value is
1694	         chosen.

1696	      tier-flag, level-id:

1698	         The value of tier-flag MUST be in the range of 0 to 1,
1699	         inclusive.  The value of level-id MUST be in the range of 0 to
1700	         255, inclusive.

1702	         If the tier-flag and level-id parameters are used to indicate
1703	         properties of a bitstream, they indicate the tier and the
1704	         highest level the bitstream complies with.

1706	         If the tier-flag and level-id parameters are used for
1707	         capability exchange, the following applies.  If max-recv-level-
1708	         id is not present, the default level defined by level-id
1709	         indicates the highest level the codec wishes to support.
1710	         Otherwise, max-recv-level-id indicates the highest level the
1711	         codec supports for receiving.  For either receiving or sending,
1712	         all levels that are lower than the highest level supported MUST
1713	         also be supported.

1715	         If no tier-flag is present, a value of 0 MUST be inferred; if
1716	         no level-id is present, a value of 51 (i.e., level 3.1) MUST be
1717	         inferred.

1719	            Informative note: The level values currently defined in the
1720	            VVC specification are in the form of "majorNum.minorNum",
1721	            and the value of the level-id for each of the levels is
1722	            equal to majorNum * 16 + minorNum * 3.  It is expected that
1723	            if any level are defined in the future, the same convention
1724	            will be used, but this cannot be guaranteed.

1726	         When used to indicate properties of a bitstream, the tier-flag
1727	         and level-id parameters are derived respectively from the
1728	         syntax element general_tier_flag, and the syntax element
1729	         general_level_idc or sub_layer_level_idc[j], that apply to the
1730	         bitstream, in an instance of the profile_tier_level( ) syntax
1731	         structure.

1733	         If the tier-flag and level-id are derived from the
1734	         profile_tier_level( ) syntax structure in a DCI NAL unit, the
1735	         following applies:

1737	         o  tier-flag = general_tier_flag

1739	         o  level-id = general_level_idc

1741	         Otherwise, if the tier-flag and level-id are derived from the
1742	         profile_tier_level( ) syntax structure in an SPS or VPS NAL
1743	         unit, and the bitstream contains the highest sublayer
1744	         representation in the OLS corresponding to the bitstream, the
1745	         following applies:

1747	         o  tier-flag = general_tier_flag

1749	         o  level-id = general_level_idc

1751	         Otherwise, if the tier-flag and level-id are derived from the
1752	         profile_tier_level( ) syntax structure in an SPS or VPS NAL
1753	         unit, and the bitstream does not contains the highest sublayer
1754	         representation in the OLS corresponding to the bitstream, the
1755	         following applies, with j being the value of the sprop-
1756	         sublayer-id parameter:

1758	         o  tier-flag = general_tier_flag

1760	         o  level-id = sub_layer_level_idc[j]

1762	      sub-profile-id:

1764	         The value of the parameter is a comma-separated (',') list of
1765	         data using base64 [RFC4648] representation.

1767	         When used to indicate properties of a bitstream, sub-profile-id
1768	         is derived from each of the ptl_num_sub_profiles
1769	         general_sub_profile_idc[i] syntax elements that apply to the
1770	         bitstream in an profile_tier_level( ) syntax structure.

1772	      interop-constraints:

1774	         A base64 [RFC4648] representation of the data that includes the
1775	         syntax elements ptl_frame_only_constraint_flag and
1776	         ptl_multilayer_enabled_flag and the general_constraints_info( )
1777	         syntax structure that apply to the bitstream in an instance of
1778	         the profile_tier_level( ) syntax structure.

1780	         If the interop-constraints parameter is not present, the
1781	         following MUST be inferred:

1783	         o  ptl_frame_only_constraint_flag = 1

1785	         o  ptl_multilayer_enabled_flag = 0

1787	         o  gci_present_flag in the general_constraints_info( ) syntax
1788	            structure = 0

1790	         Using interop-constraints for capability exchange results in a
1791	         requirement on any bitstream to be compliant with the interop-
1792	         constraints.

1794	      sprop-sublayer-id:

1796	         This parameter MAY be used to indicate the highest allowed
1797	         value of TID in the highest layer present in the bitstream.
1798	         When not present, the value of sprop-sublayer-id is inferred to
1799	         be equal to 6.

1801	         The value of sprop-sublayer-id MUST be in the range of 0 to 6,
1802	         inclusive.

1804	      sprop-ols-id:

1806	         This parameter MAY be used to indicate the OLS that the
1807	         bitstream applies to.  When not present, the value of sprop-
1808	         ols-id is inferred to be equal to TargetOlsIdx as specified in
1809	         8.1.1 in [VVC].  If this optional parameter is present, sprop-
1810	         vps MUST also be present or its content MUST be known a priori
1811	         at the receiver.

1813	         The value of sprop-ols-id MUST be in the range of 0 to 256,
1814	         inclusive.

1816	            Informative note: VVC allows having up to 258 output layer
1817	            sets indicated in the VPS as the number of output layer sets
1818	            minus 2 is indicated with a field of 8 bits.

1820	      recv-sublayer-id:

1822	         This parameter MAY be used to signal a receiver's choice of the
1823	         offered or declared sublayer representations in the sprop-vps
1824	         and sprop-sps.  The value of recv-sublayer-id indicates the TID
1825	         of the highest sublayer in the highest layer of the bitstream
1826	         that a receiver supports.  When not present, the value of recv-
1827	         sublayer-id is inferred to be equal to the value of the sprop-
1828	         sublayer-id parameter in the SDP offer.

1830	         The value of recv-sublayer-id MUST be in the range of 0 to 6,
1831	         inclusive.

1833	      recv-ols-id:

1835	         This parameter MAY be used to signal a receiver's choice of the
1836	         offered or declared output layer sets in the sprop-vps.  The
1837	         value of recv-ols-id indicates the OLS index of the bitstream
1838	         that a receiver supports.  When not present, the value of recv-
1839	         ols-id is inferred to be equal to value of the sprop-ols-id
1840	         parameter inferred from or indicated in the SDP offer.  When
1841	         present, the value of recv-ols-id must be included only when
1842	         sprop-ols-id was received and must refer to an output layer set
1843	         in the VPS that includes no layers other than all or a subset
1844	         of the layers of the OLS referred to by sprop-ols-id.  If this
1845	         optional parameter is present, sprop-vps must have been
1846	         received or its content must be known a priori at the receiver.

1848	         The value of recv-ols-id MUST be in the range of 0 to 257,
1849	         inclusive.

1851	      max-recv-level-id:

1853	         This parameter MAY be used to indicate the highest level a
1854	         receiver supports.

1856	         The value of max-recv-level-id MUST be in the range of 0 to
1857	         255, inclusive.

1859	         When max-recv-level-id is not present, the value is inferred to
1860	         be equal to level-id.

1862	         max-recv-level-id MUST NOT be present when the highest level
1863	         the receiver supports is not higher than the default level.

1865	      sprop-dci:

1867	         This parameter MAY be used to convey a decoding capability
1868	         information NAL unit of the bitstream for out-of-band
1869	         transmission.  The parameter MAY also be used for capability
1870	         exchange.  The value of the parameter a base64 [RFC4648]
1871	         representations of the decoding capability information NAL unit
1872	         as specified in Section 7.3.2.1 of [VVC].

1874	      sprop-vps:

1876	         This parameter MAY be used to convey any video parameter set
1877	         NAL unit of the bitstream for out-of-band transmission of video
1878	         parameter sets.  The parameter MAY also be used for capability
1879	         exchange and to indicate sub-stream characteristics (i.e.,
1880	         properties of output layer sets and sublayer representations as
1881	         defined in [VVC]).  The value of the parameter is a comma-
1882	         separated (',') list of base64 [RFC4648] representations of the
1883	         video parameter set NAL units as specified in Section 7.3.2.3
1884	         of [VVC].

1886	         The sprop-vps parameter MAY contain one or more than one video
1887	         parameter set NAL units.  However, all other video parameter
1888	         sets contained in the sprop-vps parameter MUST be consistent
1889	         with the first video parameter set in the sprop-vps parameter.
1890	         A video parameter set vpsB is said to be consistent with
1891	         another video parameter set vpsA if the number of OLSs in vpsA
1892	         and vpsB is the same and any decoder that conforms to the
1893	         profile, tier, level, and constraints indicated by the data
1894	         starting from the syntax element general_profile_idc to the
1895	         syntax structure general_constraints_info(), inclusive, in the
1896	         profile_tier_level( ) syntax structure corresponding to any OLS
1897	         with index olsIdx in vpsA can decode any CVS(s) referencing
1898	         vpsB when TargetOlsIdx is equal to olsIdx that conforms to the
1899	         profile, tier, level, and constraints indicated by the data
1900	         starting from the syntax element general_profile_idc to the
1901	         syntax structure general_constraints_info(), inclusive, in the
1902	         profile_tier_level( ) syntax structure corresponding to the OLS
1903	         with index TargetOlsIdx in vpsB.

1905	      sprop-sps:

1907	         This parameter MAY be used to convey sequence parameter set NAL
1908	         units of the bitstream for out-of-band transmission of sequence
1909	         parameter sets.  The value of the parameter is a comma-
1910	         separated (',') list of base64 [RFC4648] representations of the
1911	         sequence parameter set NAL units as specified in
1912	         Section 7.3.2.4 of [VVC].

1914	         A sequence parameter set spsB is said to be consistent with
1915	         another sequence parameter set spsA if any decoder that
1916	         conforms to the profile, tier, level, and constraints indicated
1917	         by the data starting from the syntax element
1918	         general_profile_idc to the syntax structure
1919	         general_constraints_info(), inclusive, in the
1920	         profile_tier_level( ) syntax structure in spsA can decode any
1921	         CLVS(s) referencing spsB that conforms to the profile, tier,
1922	         level, and constraints indicated by the data starting from the
1923	         syntax element general_profile_idc to the syntax structure
1924	         general_constraints_info(), inclusive, in the
1925	         profile_tier_level( ) syntax structure in spsB.

1927	      sprop-pps:

1929	         This parameter MAY be used to convey picture parameter set NAL
1930	         units of the bitstream for out-of-band transmission of picture
1931	         parameter sets.  The value of the parameter is a comma-
1932	         separated (',') list of base64 [RFC4648] representations of the
1933	         picture parameter set NAL units as specified in Section 7.3.2.5
1934	         of [VVC].

1936	      sprop-sei:

1938	         This parameter MAY be used to convey one or more SEI messages
1939	         that describe bitstream characteristics.  When present, a
1940	         decoder can rely on the bitstream characteristics that are
1941	         described in the SEI messages for the entire duration of the
1942	         session, independently from the persistence scopes of the SEI
1943	         messages as specified in [VSEI].

1945	         The value of the parameter is a comma-separated (',') list of
1946	         base64 [RFC4648] representations of SEI NAL units as specified
1947	         in [VSEI].

1949	            Informative note: Intentionally, no list of applicable or
1950	            inapplicable SEI messages is specified here.  Conveying
1951	            certain SEI messages in sprop-sei may be sensible in some
1952	            application scenarios and meaningless in others.  However, a
1953	            few examples are described below:

1955	            1) In an environment where the bitstream was created from
1956	            film-based source material, and no splicing is going to
1957	            occur during the lifetime of the session, the film grain
1958	            characteristics SEI message is likely meaningful, and
1959	            sending it in sprop-sei rather than in the bitstream at each
1960	            entry point may help with saving bits and allows one to
1961	            configure the renderer only once, avoiding unwanted
1962	            artifacts.

1964	            2) Examples for SEI messages that would be meaningless to be
1965	            conveyed in sprop-sei include the decoded picture hash SEI
1966	            message (it is close to impossible that all decoded pictures
1967	            have the same hashtag) or the filler payload SEI message (as
1968	            there is no point in just having more bits in SDP).

1970	      max-lsr:

1972	         The max-lsr MAY be used to signal the capabilities of a
1973	         receiver implementation and MUST NOT be used for any other
1974	         purpose.  The value of max-lsr is an integer indicating the
1975	         maximum processing rate in units of luma samples per second.
1976	         The max-lsr parameter signals that the receiver is capable of
1977	         decoding video at a higher rate than is required by the highest
1978	         level.

1980	            Informative note: When the OPTIONAL media type parameters
1981	            are used to signal the properties of a bitstream, and max-
1982	            lsr is not present, the values of tier-flag, profile-id,
1983	            sub-profile-id interop-constraints, and level-id must always
1984	            be such that the bitstream complies fully with the specified
1985	            profile, tier, and level.

1987	         When max-lsr is signaled, the receiver MUST be able to decode
1988	         bitstreams that conform to the highest level, with the
1989	         exception that the MaxLumaSr value in Table 136 of [VVC] for
1990	         the highest level is replaced with the value of max-lsr.
1991	         Senders MAY use this knowledge to send pictures of a given size
1992	         at a higher picture rate than is indicated in the highest
1993	         level.

1995	         When not present, the value of max-lsr is inferred to be equal
1996	         to the value of MaxLumaSr given in Table 136 of [VVC] for the
1997	         highest level.

1999	         The value of max-lsr MUST be in the range of MaxLumaSr to 16 *
2000	         MaxLumaSr, inclusive, where MaxLumaSr is given in Table 136 of
2001	         [VVC] for the highest level.

2003	      max-fps:

2005	         The value of max-fps is an integer indicating the maximum
2006	         picture rate in units of pictures per 100 seconds that can be
2007	         effectively processed by the receiver.  The max-fps parameter
2008	         MAY be used to signal that the receiver has a constraint in
2009	         that it is not capable of processing video effectively at the
2010	         full picture rate that is implied by the highest level and,
2011	         when present, max-lsr.

2013	         The value of max-fps is not necessarily the picture rate at
2014	         which the maximum picture size can be sent, it constitutes a
2015	         constraint on maximum picture rate for all resolutions.

2017	            Informative note: The max-fps parameter is semantically
2018	            different from max-lsr in that max-fps is used to signal a
2019	            constraint, lowering the maximum picture rate from what is
2020	            implied by other parameters.

2022	         The encoder MUST use a picture rate equal to or less than this
2023	         value.  In cases where the max-fps parameter is absent, the
2024	         encoder is free to choose any picture rate according to the
2025	         highest level and any signaled optional parameters.

2027	         The value of max-fps MUST be smaller than or equal to the full
2028	         picture rate that is implied by the highest level and, when
2029	         present, max-lsr.

2031	      sprop-max-don-diff:

2033	         If there is no NAL unit naluA that is followed in transmission
2034	         order by any NAL unit preceding naluA in decoding order (i.e.,
2035	         the transmission order of the NAL units is the same as the
2036	         decoding order), the value of this parameter MUST be equal to
2037	         0.

2039	         Otherwise, this parameter specifies the maximum absolute
2040	         difference between the decoding order number (i.e., AbsDon)
2041	         values of any two NAL units naluA and naluB, where naluA
2042	         follows naluB in decoding order and precedes naluB in
2043	         transmission order.

2045	         The value of sprop-max-don-diff MUST be an integer in the range
2046	         of 0 to 32767, inclusive.

2048	         When not present, the value of sprop-max-don-diff is inferred
2049	         to be equal to 0.

2051	      sprop-depack-buf-bytes:

2053	         This parameter signals the required size of the de-
2054	         packetization buffer in units of bytes.  The value of the
2055	         parameter MUST be greater than or equal to the maximum buffer
2056	         occupancy (in units of bytes) of the de-packetization buffer as
2057	         specified in Section 6.

2059	         The value of sprop-depack-buf-bytes MUST be an integer in the
2060	         range of 0 to 4294967295, inclusive.

2062	         When sprop-max-don-diff is present and greater than 0, this
2063	         parameter MUST be present and the value MUST be greater than 0.
2064	         When not present, the value of sprop-depack-buf-bytes is
2065	         inferred to be equal to 0.

2067	            Informative note: The value of sprop-depack-buf-bytes
2068	            indicates the required size of the de-packetization buffer
2069	            only.  When network jitter can occur, an appropriately sized
2070	            jitter buffer has to be available as well.

2072	      depack-buf-cap:

2074	         This parameter signals the capabilities of a receiver
2075	         implementation and indicates the amount of de-packetization
2076	         buffer space in units of bytes that the receiver has available
2077	         for reconstructing the NAL unit decoding order from NAL units
2078	         carried in the RTP stream.  A receiver is able to handle any
2079	         RTP stream for which the value of the sprop-depack-buf-bytes
2080	         parameter is smaller than or equal to this parameter.

2082	         When not present, the value of depack-buf-cap is inferred to be
2083	         equal to 4294967295.  The value of depack-buf-cap MUST be an
2084	         integer in the range of 1 to 4294967295, inclusive.

2086	            Informative note: depack-buf-cap indicates the maximum
2087	            possible size of the de-packetization buffer of the receiver
2088	            only, without allowing for network jitter.

2090	7.2.  SDP Parameters

2092	   The receiver MUST ignore any parameter unspecified in this memo.

2094	7.2.1.  Mapping of Payload Type Parameters to SDP

2096	   The media type video/H266 string is mapped to fields in the Session
2097	   Description Protocol (SDP) [RFC4566] as follows:

2099	   *  The media name in the "m=" line of SDP MUST be video.

2101	   *  The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the
2102	      media subtype).

2104	   *  The clock rate in the "a=rtpmap" line MUST be 90000.

2106	   *  The OPTIONAL parameters profile-id, tier-flag, sub-profile-id,
2107	      interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id,
2108	      recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max-
2109	      fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf-
2110	      cap, when present, MUST be included in the "a=fmtp" line of SDP.
2111	      This parameter is expressed as a media type string, in the form of
2112	      a semicolon-separated list of parameter=value pairs.

2114	   *  The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei,
2115	      and sprop-dci, when present, MUST be included in the "a=fmtp" line
2116	      of SDP or conveyed using the "fmtp" source attribute as specified
2117	      in Section 6.3 of [RFC5576].  For a particular media format (i.e.,
2118	      RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or
2119	      sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP
2120	      and conveyed using the "fmtp" source attribute.  When included in
2121	      the "a=fmtp" line of SDP, those parameters are expressed as a
2122	      media type string, in the form of a semicolon-separated list of
2123	      parameter=value pairs.  When conveyed in the "a=fmtp" line of SDP
2124	      for a particular payload type, the parameters sprop-vps, sprop-
2125	      sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each
2126	      SSRC with the payload type.  When conveyed using the "fmtp" source
2127	      attribute, these parameters are only associated with the given
2128	      source and payload type as parts of the "fmtp" source attribute.

2130	   An example of media representation in SDP is as follows:

2132	           m=video 49170 RTP/AVP 98
2133	           a=rtpmap:98 H266/90000
2134	           a=fmtp:98 profile-id=1;
2135	             sprop-vps=<video parameter sets data>;
2136	             sprop-sps=<sequence parameter set data>;
2137	             sprop-pps=<picture parameter set data>;

2139	7.2.2.  Usage with SDP Offer/Answer Model

2141	   This section describes the negotiation of unicast messages using the
2142	   offer-answer model as described in [RFC3264] and its updates.  The
2143	   section is split into subsections, covering a) media format
2144	   configurations not involving non-temporal scalability; b) scalable
2145	   media format configurations; c) the description of the use of those
2146	   parameters not involving the media configuration itself but rather
2147	   the parameters of the payload format design; and d) multicast.

2149	7.2.2.1.  Non-scalable media format configuration

2151	   A non-scalable VVC media configuration is such a configuration where
2152	   no non-temporal scalability mechanisms are allowed.  In [VVC] version
2153	   1, that implies that general_profile_idc indicates one of the
2154	   following profiles: Main10, Main10 still, Main 10 4:4:4, Main10 4:4:4
2155	   still, with general_profile_dic values of 1, 65, 33, and 97,
2156	   respectively.  Note that non-scalable media configurations includes
2157	   temporal scalability, inline with VVC's design philosophy and profile
2158	   structure.

2160	   The following limitations and rules pertaining to the media
2161	   configuration apply:

2163	   *  The parameters identifying a media format configuration for VVC
2164	      are profile-id, tier-flag, sub-profile-id, level-id, and interop-
2165	      constraints.  These media configuration parameters, except level-
2166	      id, MUST be used symmetrically.

2168	      The answerer MUST structure its answer in according to one of the
2169	      following three options:

2171	      1) maintain all configuration parameters with the values remaining
2172	      the same as in the offer for the media format (payload type), with
2173	      the exception that the value of level-id is changeable as long as
2174	      the highest level indicated by the answer is not higher than that
2175	      indicated by the offer;
2176	      2) include in the answer the recv-sublayer-id parameter, with a
2177	      value less than the sprop-sublayer-id parameter in the offer, for
2178	      the media format (payload type), and maintain all configuration
2179	      parameters with the values remaining the same as in the offer for
2180	      the media format (payload type), with the exception that the value
2181	      of level-id is changeable as long as the highest level indicated
2182	      by the answer is not higher than the level indicated by the sprop-
2183	      sps or sprop-vps in offer for the chosen sublayer representation;
2184	      or

2186	      3) remove the media format (payload type) completely (when one or
2187	      more of the parameter values are not supported).

2189	            Informative note: The above requirement for symmetric use
2190	            does not apply for level-id, and does not apply for the
2191	            other bitstream or RTP stream properties and capability
2192	            parameters as described in Section 7.2.2.3 below.

2194	   *  To simplify handling and matching of these configurations, the
2195	      same RTP payload type number used in the offer SHOULD also be used
2196	      in the answer, as specified in [RFC3264].

2198	   *  The same RTP payload type number used in the offer for the media
2199	      subtype H266 MUST be used in the answer when the answer includes
2200	      recv-sublayer-id.  When the answer does not include recv-sublayer-
2201	      id, the answer MUST NOT contain a payload type number used in the
2202	      offer for the media subtype H266 unless the configuration is
2203	      exactly the same as in the offer or the configuration in the
2204	      answer only differs from that in the offer with a different value
2205	      of level-id.  The answer MAY contain the recv-sublayer-id
2206	      parameter if an VVC bitstream contains multiple operation points
2207	      (using temporal scalability and sublayers) and sprop-sps or sprop-
2208	      vps is included in the offer where information of sublayers are
2209	      present in the first sequence parameter set or video parameter set
2210	      contained in sprop-sps or sprop-vps respectively.  If the sprop-
2211	      sps or sprop-vps is provided in an offer, an answerer MAY select a
2212	      particular operation point indicated in the first sequence
2213	      parameter set or video parameter set contained in sprop-sps or
2214	      sprop-vps respectively.  When the answer includes a recv-sublayer-
2215	      id that is less than a sprop-sublayer-id in the offer, the
2216	      following applies:

2218	      1) When sprop-sps parameter is present, all sequence parameter
2219	      sets contained in the sprop-sps parameter in the SDP answer and
2220	      all sequence parameter sets sent in-band for either the offerer-
2221	      to-answerer direction or the answerer-to-offerer direction MUST be
2222	      consistent with the first sequence parameter set in the sprop-sps
2223	      parameter of the offer (see the semantics of sprop-sps in
2224	      Section 7.1 of this document on one sequence parameter set being
2225	      consistent with another sequence parameter set).

2227	      2) When sprop-vps parameter is present, all video parameter sets
2228	      contained in the sprop-vps parameter in the SDP answer and all
2229	      video parameter sets sent in-band for either the offerer-to-
2230	      answerer direction or the answerer-to-offerer direction MUST be
2231	      consistent with the first video parameter set in the sprop-vps
2232	      parameter of the offer (see the semantics of sprop-vps in
2233	      Section 7.1 of this document on one video parameter set being
2234	      consistent with another video parameter set).

2236	      3) The bitstream sent in either direction MUST conform to the
2237	      profile, tier, level, and constraints of the chosen sublayer
2238	      representation as indicated by the profile_tier_level( ) syntax
2239	      structure in the first sequence parameter set in the sprop-sps
2240	      parameter or by the first profile_tier_level( ) syntax structure
2241	      in the first video parameter set in the sprop-vps parameter of the
2242	      offer.

2244	            Informative note: When an offerer receives an answer that
2245	            does not include recv-sublayer-id, it has to compare payload
2246	            types not declared in the offer based on the media type
2247	            (i.e., video/H266) and the above media configuration
2248	            parameters with any payload types it has already declared.
2249	            This will enable it to determine whether the configuration
2250	            in question is new or if it is equivalent to configuration
2251	            already offered, since a different payload type number may
2252	            be used in the answer.  The ability to perform operation
2253	            point selection enables a receiver to utilize the temporal
2254	            scalable nature of an VVC bitstream.

2256	7.2.2.2.  Scalable media format configuration

2258	   A scalable VVC media configuration is such a configuration where non-
2259	   temporal scalability mechanisms are allowed.  In [VVC] version 1,
2260	   that implies that general_profile_idc indicates one of the following
2261	   profiles: Multilayer Main 10, and Multilayer Main 10 4:4:4, with
2262	   general_profile_idc values of 17 and 49, respectively.

2264	   The following limitations and rules pertaining to the media
2265	   configuration apply.  They are listed in an order that would be
2266	   logical for an implementation to follow:

2268	   *  The parameters identifying a media format configuration for
2269	      scalable VVC are profile-id, tier-flag, sub-profile-id, level-id,
2270	      interop-constraints, and sprop-vps.  These media configuration
2271	      parameters, except level-id, MUST be used symmetrically, except as
2272	      noted below.

2274	   *  The answerer MAY include a level-id that MUST be lower or equal
2275	      than the level-id indicated in the offer (either expressed by
2276	      level-id in the offer, or implied by the default level as specific
2277	      in Section 7.1).

2279	   *  The offerer MUST include sprop-vps including at least one valid
2280	      VPS, so to allow the answerer to meaningfully interpret sprop-ols-
2281	      id and select recv-ols-id (see below).

2283	   *  The offerer MUST include sprop-ols-id.  The answerer MUST include
2284	      recv-ols-id, and recv-ols-id MUST indicate a supported output
2285	      layer set in the same dependency tree as sprop-ols-id.  If unable,
2286	      the answerer MUST remove the media format.

2288	         Informative note: if an offerer wants to offer more than one
2289	         output layer set, in can do so by offering multiple VVC media
2290	         with different payload types.

2292	   *  The offerer MAY include sprop-sublayer-id which, in case of
2293	      scalable VVC, is interpreted as the highest sublayer of the
2294	      highest enhancement layer in the OLS indicated by sprop-ols-id.
2295	      The answerer MAY include recv-sublayer-id which can be used to
2296	      downgrade the sublayer of the highest enhancement layer.  This
2297	      specification does not support downgrading the sublayer of any
2298	      layers in the OLS that are not the highest layer.

2300	   Editor-note-3: Miska agrees to provide text.  Currently we agreed to
2301	   use Stephan's option 1 which normatively disallow the O/A process to
2302	   arrive at signalling sublayer information.

2304	         Informative note: in other words, using this mechanism, an
2305	         answerer can downgrade only the frame rate for the highest
2306	         spatial/quality layer (typically corresponding to the highest
2307	         resolution or bitrate, hence the most complex to decode), but
2308	         not for lower spatial/quality layers.  The answerer must
2309	         support all sublayers for lower layers in the OLS, or reject
2310	         the offer.  That's not a big burden, as the receiver/decoder
2311	         has the option to discard any sublayers it cannot decode,
2312	         irrespective of what is being signalled through offer/answer.

2314	   *  The answerer MUST maintain all configuration parameters with the
2315	      values being the same as signaled in the sprop-vps for the
2316	      operating point with the largest number of sublayers for the
2317	      chosen output layer set, with the exception that the value of
2318	      level-id is changeable as long as the highest level indicated by
2319	      the answer is not higher than the level indicated by the sprop-vps
2320	      in offer for the operating point with the largest number of
2321	      sublayers for the chosen output layer set.

2323	7.2.2.3.  Payload format configuration

2325	   The following limitations and rules pertain to the configuration of
2326	   the payload format mechanisms---buffer management mostly and apply to
2327	   both scalable and non-scalable VVC.

2329	   *  The parameters sprop-max-don-diff, and sprop-depack-buf-bytes
2330	      describe the properties of an RTP stream that the offerer or the
2331	      answerer is sending for the media format configuration.  This
2332	      differs from the normal usage of the offer/answer parameters:
2333	      normally such parameters declare the properties of the bitstream
2334	      or RTP stream that the offerer or the answerer is able to receive.
2335	      When dealing with VVC, the offerer assumes that the answerer will
2336	      be able to receive media encoded using the configuration being
2337	      offered.

2339	         Informative note: The above parameters apply for any RTP
2340	         stream, when present, sent by a declaring entity with the same
2341	         configuration.  In other words, the applicability of the above
2342	         parameters to RTP streams depends on the source endpoint.
2343	         Rather than being bound to the payload type, the values may
2344	         have to be applied to another payload type when being sent, as
2345	         they apply for the configuration.

2347	   *  The capability parameter max-lsr MAY be used to declare further
2348	      capabilities of the offerer or answerer for receiving.  It MUST
2349	      NOT be present when the direction attribute is sendonly.

2351	   *  The capability parameter max-fps MAY be used to declare lower
2352	      capabilities of the offerer or answerer for receiving.  It MUST
2353	      NOT be present when the direction attribute is sendonly.

2355	   *  When an offerer offers an interleaved stream, indicated by the
2356	      presence of sprop-max-don-diff with a value larger than zero, the
2357	      offerer MUST include the size of the de-packetization buffer
2358	      sprop-depack-buf-bytes.

2360	   *  To enable the offerer and answerer to inform each other about
2361	      their capabilities for de-packetization buffering in receiving RTP
2362	      streams, both parties are RECOMMENDED to include depack-buf-cap.

2364	   *  The sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when present
2365	      (included in the "a=fmtp" line of SDP or conveyed using the "fmtp"
2366	      source attribute as specified in Section 6.3 of [RFC5576]), are
2367	      used for out-of-band transport of the parameter sets (DCI, VPS,
2368	      SPS, or PPS, respectively).

2370	   *  The answerer MAY use either out-of-band or in-band transport of
2371	      parameter sets for the bitstream it is sending, regardless of
2372	      whether out-of-band parameter sets transport has been used in the
2373	      offerer-to-answerer direction.  Parameter sets included in an
2374	      answer are independent of those parameter sets included in the
2375	      offer, as they are used for decoding two different bitstreams, one
2376	      from the answerer to the offerer and the other in the opposit
2377	      direction.  In case some RTP packets are sent before the SDP
2378	      offer/answer settles down, in-band parameter sets MUST be used for
2379	      those RTP stream parts sent before the SDP offer/answer.

2381	   *  The following rules apply to transport of parameter set in the
2382	      offerer-to-answerer direction.

2384	      -  An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2385	         sprop-pps.  If none of these parameters is present in the
2386	         offer, then only in-band transport of parameter sets is used.

2388	      -  If the level to use in the offerer-to-answerer direction is
2389	         equal to the default level in the offer, the answerer MUST be
2390	         prepared to use the parameter sets included in sprop-vps,
2391	         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
2392	         of SDP or conveyed using the "fmtp" source attribute) for
2393	         decoding the incoming bitstream, e.g., by passing these
2394	         parameter set NAL units to the video decoder before passing any
2395	         NAL units carried in the RTP streams.  Otherwise, the answerer
2396	         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
2397	         included in the "a=fmtp" line of SDP or conveyed using the
2398	         "fmtp" source attribute) and the offerer MUST transmit
2399	         parameter sets in-band.

2401	   *  The following rules apply to transport of parameter set in the
2402	      answerer-to-offerer direction.

2404	      -  An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2405	         sprop-pps.  If none of these parameters is present in the
2406	         answer, then only in-band transport of parameter sets is used.

2408	      -  The offerer MUST be prepared to use the parameter sets included
2409	         in sprop-vps, sprop-sps, and sprop-pps (either included in the
2410	         "a=fmtp" line of SDP or conveyed using the "fmtp" source
2411	         attribute) for decoding the incoming bitstream, e.g., by
2412	         passing these parameter set NAL units to the video decoder
2413	         before passing any NAL units carried in the RTP streams.

2415	   *  When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are
2416	      conveyed using the "fmtp" source attribute as specified in
2417	      Section 6.3 of [RFC5576], the receiver of the parameters MUST
2418	      store the parameter sets included in sprop-dci, sprop-vps, sprop-
2419	      sps, and/or sprop-pps and associate them with the source given as
2420	      part of the "fmtp" source attribute.  Parameter sets associated
2421	      with one source (given as part of the "fmtp" source attribute)
2422	      MUST only be used to decode NAL units conveyed in RTP packets from
2423	      the same source (given as part of the "fmtp" source attribute).
2424	      When this mechanism is in use, SSRC collision detection and
2425	      resolution MUST be performed as specified in [RFC5576].

2427	   Table 1 lists the interpretation of all the parameters that MAY be
2428	   used for the various combinations of offer, answer, and direction
2429	   attributes.  Note that the two columns wherein the recv-ols-id
2430	   parameter is used only apply to answers, whereas the other columns
2431	   apply to both offers and answers.

2433	   Table 1.  Interpretation of parameters for various combinations of
2434	   offers, answers, direction attributes, with and without recv-ols-id.
2435	   Columns that do not indicate offer or answer apply to both.

2437	                                       sendonly --+
2438	               answer: recvonly, recv-ols-id --+  |
2439	                 recvonly w/o recv-ols-id --+  |  |
2440	         answer: sendrecv, recv-ols-id --+  |  |  |
2441	           sendrecv w/o recv-ols-id --+  |  |  |  |
2442	                                      |  |  |  |  |
2443	   profile-id                         C  D  C  D  P
2444	   tier-flag                          C  D  C  D  P
2445	   level-id                           D  D  D  D  P
2446	   sub-profile-id                     C  D  C  D  P
2447	   interop-constraints                C  D  C  D  P
2448	   max-recv-level-id                  R  R  R  R  -
2449	   sprop-max-don-diff                 P  P  -  -  P
2450	   sprop-depack-buf-bytes             P  P  -  -  P
2451	   depack-buf-cap                     R  R  R  R  -
2452	   max-lsr                            R  R  R  R  -
2453	   max-fps                            R  R  R  R  -
2454	   sprop-dci                          P  P -  -  P
2455	   sprop-sei                          P  P -  -  P
2456	   sprop-vps                          P  P  -  -  P
2457	   sprop-sps                          P  P  -  -  P
2458	   sprop-pps                          P  P  -  -  P
2459	   sprop-sublayer-id                 P  P  -  -  P
2460	   recv-sublayer-id                  O  O  O  O  -
2461	   sprop-ols-id                       P  P  -  -  P
2462	   recv-ols-id                        X  O  X  O  -

2464	   Legend:

2466	    C: configuration for sending and receiving bitstreams
2467	    D: changeable configuration, same as C except possible
2468	       to answer with a different but consistent value (see the
2469	       semantics of the six parameters related to profile, tier,
2470	       and level on these parameters being consistent)
2471	    P: properties of the bitstream to be sent
2472	    R: receiver capabilities
2473	    O: operation point selection
2474	    X: MUST NOT be present
2475	    -: not usable, when present MUST be ignored

2477	   Parameters used for declaring receiver capabilities are, in general,
2478	   downgradable; i.e., they express the upper limit for a sender's
2479	   possible behavior.  Thus, a sender MAY select to set its encoder
2480	   using only lower/lesser or equal values of these parameters.

2482	   When the answer does not include a recv-ols-id that is less than the
2483	   sprop-ols-id in the offer, parameters declaring a configuration point
2484	   are not changeable, with the exception of the level-id parameter for
2485	   unicast usage, and these parameters express values a receiver expects
2486	   to be used and MUST be used verbatim in the answer as in the offer.

2488	   When a sender's capabilities are declared with the configuration
2489	   parameters, these parameters express a configuration that is
2490	   acceptable for the sender to receive bitstreams.  In order to achieve
2491	   high interoperability levels, it is often advisable to offer multiple
2492	   alternative configurations.  It is impossible to offer multiple
2493	   configurations in a single payload type.  Thus, when multiple
2494	   configuration offers are made, each offer requires its own RTP
2495	   payload type associated with the offer.  However, it is possible to
2496	   offer multiple operation points using one configuration in a single
2497	   payload type by including sprop-vps in the offer and recv-ols- id in
2498	   the answer.

2500	   A receiver SHOULD understand all media type parameters, even if it
2501	   only supports a subset of the payload format's functionality.  This
2502	   ensures that a receiver is capable of understanding when an offer to
2503	   receive media can be downgraded to what is supported by the receiver
2504	   of the offer.

2506	   An answerer MAY extend the offer with additional media format
2507	   configurations.  However, to enable their usage, in most cases a
2508	   second offer is required from the offerer to provide the bitstream
2509	   property parameters that the media sender will use.  This also has
2510	   the effect that the offerer has to be able to receive this media
2511	   format configuration, not only to send it.

2513	7.2.2.4.  Multicast

2515	   For bitstreams being delivered over multicast, the following rules
2516	   apply:

2518	   *  The media format configuration is identified by profile-id, tier-
2519	      flag, sub-profile-id, level-id, and interop-constraints.  These
2520	      media format configuration parameters, including level-id, MUST be
2521	      used symmetrically; that is, the answerer MUST either maintain all
2522	      configuration parameters or remove the media format (payload type)
2523	      completely.  Note that this implies that the level-id for offer/
2524	      answer in multicast is not changeable.

2526	   *  To simplify the handling and matching of these configurations, the
2527	      same RTP payload type number used in the offer SHOULD also be used
2528	      in the answer, as specified in [RFC3264].  An answer MUST NOT
2529	      contain a payload type number used in the offer unless the
2530	      configuration is the same as in the offer.

2532	   *  Parameter sets received MUST be associated with the originating
2533	      source and MUST only be used in decoding the incoming bitstream
2534	      from the same source.

2536	   *  The rules for other parameters are the same as above for unicast
2537	      as long as the three above rules are obeyed.

2539	7.2.3.  Usage in Declarative Session Descriptions

2541	   When VVC over RTP is offered with SDP in a declarative style, as in
2542	   Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement
2543	   Protocol (SAP) [RFC2974], the following considerations are necessary.

2545	   *  All parameters capable of indicating both bitstream properties and
2546	      receiver capabilities are used to indicate only bitstream
2547	      properties.  For example, in this case, the parameter profile-id,
2548	      tier-id, level-id declares the values used by the bitstream, not
2549	      the capabilities for receiving bitstreams.  As a result, the
2550	      following interpretation of the parameters MUST be used:

2552	      -  Declaring actual configuration or bitstream properties:

2554	         o  profile-id

2556	         o  tier-flag

2558	         o  level-id

2560	         o  interop-constraints

2562	         o  sub-profile-id

2564	         o  sprop-dci

2566	         o  sprop-vps

2568	         o  sprop-sps

2570	         o  sprop-pps

2572	         o  sprop-max-don-diff
2573	         o  sprop-depack-buf-bytes

2575	         o  sprop-sublayer-id

2577	         o  sprop-ols-id

2579	         o  sprop-sei

2581	      -  Not usable (when present, they MUST be ignored):

2583	         o  max-lsr

2585	         o  max-fps

2587	         o  max-recv-level-id

2589	         o  depack-buf-cap

2591	         o  recv-sublayer-id

2593	         o  recv-ols-id

2595	      -  A receiver of the SDP is required to support all parameters and
2596	         values of the parameters provided; otherwise, the receiver MUST
2597	         reject (RTSP) or not participate in (SAP) the session.  It
2598	         falls on the creator of the session to use values that are
2599	         expected to be supported by the receiving application.

2601	7.2.4.  Considerations for Parameter Sets

2603	   When out-of-band transport of parameter sets is used, parameter sets
2604	   MAY still be additionally transported in-band unless explicitly
2605	   disallowed by an application, and some of these additional parameter
2606	   sets may update some of the out-of-band transported parameter sets.
2607	   Update of a parameter set refers to the sending of a parameter set of
2608	   the same type using the same parameter set ID but with different
2609	   values for at least one other parameter of the parameter set.

2611	8.  Use with Feedback Messages

2613	   The following subsections define the use of the Picture Loss
2614	   Indication (PLI) and Full Intra Request (FIR) feedback messages with
2615	   [VVC].  The PLI is defined in [RFC4585], and the FIR message is
2616	   defined in [RFC5104].  In accordance with this memo, unlike [HEVC], a
2617	   sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture
2618	   Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and
2619	   treat a received SLI as a PLI.

2621	8.1.  Picture Loss Indication (PLI)

2623	   As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
2624	   media sender indicates "the loss of an undefined amount of coded
2625	   video data belonging to one or more pictures".  Without having any
2626	   specific knowledge of the setup of the bitstream (such as use and
2627	   location of in-band parameter sets, non-IRAP decoder refresh points,
2628	   picture structures, and so forth), a reaction to the reception of an
2629	   PLI by a VVC sender SHOULD be to send an IRAP picture and relevant
2630	   parameter sets; potentially with sufficient redundancy so to ensure
2631	   correct reception.  However, sometimes information about the
2632	   bitstream structure is known.  For example, state could have been
2633	   established outside of the mechanisms defined in this document that
2634	   parameter sets are conveyed out of band only, and stay static for the
2635	   duration of the session.  In that case, it is obviously unnecessary
2636	   to send them in-band as a result of the reception of a PLI.  Other
2637	   examples could be devised based on a priori knowledge of different
2638	   aspects of the bitstream structure.  In all cases, the timing and
2639	   congestion control mechanisms of RFC 4585 MUST be observed.

2641	8.2.  Full Intra Request (FIR)

2643	   The purpose of the FIR message is to force an encoder to send an
2644	   independent decoder refresh point as soon as possible, while
2645	   observing applicable congestion-control-related constraints, such as
2646	   those set out in [RFC8082]).

2648	   Upon reception of a FIR, a sender MUST send an IDR picture.
2649	   Parameter sets MUST also be sent, except when there is a priori
2650	   knowledge that the parameter sets have been correctly established.  A
2651	   typical example for that is an understanding between sender and
2652	   receiver, established by means outside this document, that parameter
2653	   sets are exclusively sent out-of-band.

2655	9.  Security Considerations

2657	   The scope of this Security Considerations section is limited to the
2658	   payload format itself and to one feature of [VVC] that may pose a
2659	   particularly serious security risk if implemented naively.  The
2660	   payload format, in isolation, does not form a complete system.
2661	   Implementers are advised to read and understand relevant security-
2662	   related documents, especially those pertaining to RTP (see the
2663	   Security Considerations section in [RFC3550] ), and the security of
2664	   the call-control stack chosen (that may make use of the media type
2665	   registration of this memo).  Implementers should also consider known
2666	   security vulnerabilities of video coding and decoding implementations
2667	   in general and avoid those.

2669	   Within this RTP payload format, and with the exception of the user
2670	   data SEI message as described below, no security threats other than
2671	   those common to RTP payload formats are known.  In other words,
2672	   neither the various media-plane-based mechanisms, nor the signaling
2673	   part of this memo, seems to pose a security risk beyond those common
2674	   to all RTP-based systems.

2676	   RTP packets using the payload format defined in this specification
2677	   are subject to the security considerations discussed in the RTP
2678	   specification [RFC3550] , and in any applicable RTP profile such as
2679	   RTP/AVP [RFC3551] , RTP/AVPF [RFC4585] , RTP/SAVP [RFC3711] , or RTP/
2680	   SAVPF [RFC5124] .  However, as "Securing the RTP Framework: Why RTP
2681	   Does Not Mandate a Single Media Security Solution" [RFC7202]
2682	   discusses, it is not an RTP payload format's responsibility to
2683	   discuss or mandate what solutions are used to meet the basic security
2684	   goals like confidentiality, integrity and source authenticity for RTP
2685	   in general.  This responsibility lays on anyone using RTP in an
2686	   application.  They can find guidance on available security mechanisms
2687	   and important considerations in "Options for Securing RTP Sessions"
2688	   [RFC7201] . The rest of this section discusses the security impacting
2689	   properties of the payload format itself.

2691	   Because the data compression used with this payload format is applied
2692	   end-to-end, any encryption needs to be performed after compression.
2693	   A potential denial-of-service threat exists for data encodings using
2694	   compression techniques that have non-uniform receiver-end
2695	   computational load.  The attacker can inject pathological datagrams
2696	   into the bitstream that are complex to decode and that cause the
2697	   receiver to be overloaded.  [VVC] is particularly vulnerable to such
2698	   attacks, as it is extremely simple to generate datagrams containing
2699	   NAL units that affect the decoding process of many future NAL units.
2700	   Therefore, the usage of data origin authentication and data integrity
2701	   protection of at least the RTP packet is RECOMMENDED, for example,
2702	   with SRTP [RFC3711] .

2704	   Like HEVC [RFC7798], [VVC] includes a user data Supplemental
2705	   Enhancement Information (SEI) message.  This SEI message allows
2706	   inclusion of an arbitrary bitstring into the video bitstream.  Such a
2707	   bitstring could include JavaScript, machine code, and other active
2708	   content.  [VVC] leaves the handling of this SEI message to the
2709	   receiving system.  In order to avoid harmful side effects the user
2710	   data SEI message, decoder implementations cannot naively trust its
2711	   content.  For example, it would be a bad and insecure implementation
2712	   practice to forward any JavaScript a decoder implementation detects
2713	   to a web browser.  The safest way to deal with user data SEI messages
2714	   is to simply discard them, but that can have negative side effects on
2715	   the quality of experience by the user.

2717	   End-to-end security with authentication, integrity, or
2718	   confidentiality protection will prevent a MANE from performing media-
2719	   aware operations other than discarding complete packets.  In the case
2720	   of confidentiality protection, it will even be prevented from
2721	   discarding packets in a media-aware way.  To be allowed to perform
2722	   such operations, a MANE is required to be a trusted entity that is
2723	   included in the security context establishment.

2725	10.  Congestion Control

2727	   Congestion control for RTP SHALL be used in accordance with RTP
2728	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
2729	   If best-effort service is being used, an additional requirement is
2730	   that users of this payload format MUST monitor packet loss to ensure
2731	   that the packet loss rate is within an acceptable range.  Packet loss
2732	   is considered acceptable if a TCP flow across the same network path,
2733	   and experiencing the same network conditions, would achieve an
2734	   average throughput, measured on a reasonable timescale, that is not
2735	   less than all RTP streams combined are achieving.  This condition can
2736	   be satisfied by implementing congestion-control mechanisms to adapt
2737	   the transmission rate, the number of layers subscribed for a layered
2738	   multicast session, or by arranging for a receiver to leave the
2739	   session if the loss rate is unacceptably high.

2741	   The bitrate adaptation necessary for obeying the congestion control
2742	   principle is easily achievable when real-time encoding is used, for
2743	   example, by adequately tuning the quantization parameter.  However,
2744	   when pre-encoded content is being transmitted, bandwidth adaptation
2745	   requires the pre-coded bitstream to be tailored for such adaptivity.
2746	   The key mechanisms available in [VVC] are temporal scalability, and
2747	   spatial/SNR scalability.  A media sender can remove NAL units
2748	   belonging to higher temporal sublayers (i.e., those NAL units with a
2749	   high value of TID) or higher spatio-SNR layers until the sending
2750	   bitrate drops to an acceptable range.

2752	   The mechanisms mentioned above generally work within a defined
2753	   profile and level and, therefore, no renegotiation of the channel is
2754	   required.  Only when non-downgradable parameters (such as profile)
2755	   are required to be changed does it become necessary to terminate and
2756	   restart the RTP stream(s).  This may be accomplished by using
2757	   different RTP payload types.

2759	   MANEs MAY remove certain unusable packets from the RTP stream when
2760	   that RTP stream was damaged due to previous packet losses.  This can
2761	   help reduce the network load in certain special cases.  For example,
2762	   MANES can remove those FUs where the leading FUs belonging to the
2763	   same NAL unit have been lost or those dependent slice segments when
2764	   the leading slice segments belonging to the same slice have been
2765	   lost, because the trailing FUs or dependent slice segments are
2766	   meaningless to most decoders.  MANES can also remove higher temporal
2767	   scalable layers if the outbound transmission (from the MANE's
2768	   viewpoint) experiences congestion.

2770	11.  IANA Considerations

2772	   Placeholder

2774	12.  Acknowledgements

2776	   Dr. Byeongdoo Choi is thanked for the video codec related technical
2777	   discussion and other aspects in this memo.  Xin Zhao and Dr. Xiang Li
2778	   are thanked for their contributions on [VVC] specification
2779	   descriptive content.  Spencer Dawkins is thanked for his valuable
2780	   review comments that led to great improvements of this memo.  Some
2781	   parts of this specification share text with the RTP payload format
2782	   for HEVC [RFC7798].  We thank the authors of that specification for
2783	   their excellent work.

2785	13.  References

2787	13.1.  Normative References

2789	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2790	              Requirement Levels", BCP 14, RFC 2119,
2791	              DOI 10.17487/RFC2119, March 1997,
2792	              <https://www.rfc-editor.org/info/rfc2119>.

2794	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
2795	              with Session Description Protocol (SDP)", RFC 3264,
2796	              DOI 10.17487/RFC3264, June 2002,
2797	              <https://www.rfc-editor.org/info/rfc3264>.

2799	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
2800	              Jacobson, "RTP: A Transport Protocol for Real-Time
2801	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
2802	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

2804	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
2805	              Video Conferences with Minimal Control", STD 65, RFC 3551,
2806	              DOI 10.17487/RFC3551, July 2003,
2807	              <https://www.rfc-editor.org/info/rfc3551>.

2809	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
2810	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
2811	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
2812	              <https://www.rfc-editor.org/info/rfc3711>.

2814	   [RFC4556]  Zhu, L. and B. Tung, "Public Key Cryptography for Initial
2815	              Authentication in Kerberos (PKINIT)", RFC 4556,
2816	              DOI 10.17487/RFC4556, June 2006,
2817	              <https://www.rfc-editor.org/info/rfc4556>.

2819	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
2820	              Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
2821	              July 2006, <https://www.rfc-editor.org/info/rfc4566>.

2823	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
2824	              "Extended RTP Profile for Real-time Transport Control
2825	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
2826	              DOI 10.17487/RFC4585, July 2006,
2827	              <https://www.rfc-editor.org/info/rfc4585>.

2829	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2830	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2831	              <https://www.rfc-editor.org/info/rfc4648>.

2833	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
2834	              "Codec Control Messages in the RTP Audio-Visual Profile
2835	              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
2836	              February 2008, <https://www.rfc-editor.org/info/rfc5104>.

2838	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
2839	              Real-time Transport Control Protocol (RTCP)-Based Feedback
2840	              (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2841	              2008, <https://www.rfc-editor.org/info/rfc5124>.

2843	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
2844	              Media Attributes in the Session Description Protocol
2845	              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
2846	              <https://www.rfc-editor.org/info/rfc5576>.

2848	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
2849	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
2850	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
2851	              DOI 10.17487/RFC7656, November 2015,
2852	              <https://www.rfc-editor.org/info/rfc7656>.

2854	   [RFC8082]  Wenger, S., Lennox, J., Burman, B., and M. Westerlund,
2855	              "Using Codec Control Messages in the RTP Audio-Visual
2856	              Profile with Feedback with Layered Codecs", RFC 8082,
2857	              DOI 10.17487/RFC8082, March 2017,
2858	              <https://www.rfc-editor.org/info/rfc8082>.

2860	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2861	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2862	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

2864	   [VSEI]     "Versatile supplemental enhancement information messages
2865	              for coded video bitstreams", 2020,
2866	              <http://handle.itu.int/11.1002/1000/14337>.

2868	   [VVC]      "Versatile Video Coding, ITU-T Recommendation H.266",
2869	              2020, <http://handle.itu.int/11.1002/1000/14336>.

2871	13.2.  Informative References

2873	   [CABAC]    Sole, J, . and . et al, "Transform coefficient coding in
2874	              HEVC, IEEE Transactions on Circuts and Systems for Video
2875	              Technology", DOI 10.1109/TCSVT.2012.2223055, December
2876	              2012, <https://doi.org/10.1109/TCSVT.2012.2223055>.

2878	   [HEVC]     "High efficiency video coding, ITU-T Recommendation
2879	              H.265", 2019, <http://handle.itu.int/11.1002/1000/14107>.

2881	   [MPEG2S]   IS0/IEC, ., "Information technology - Generic coding
2882	              ofmoving pictures and associated audio information - Part
2883	              1:Systems, ISO International Standard 13818-1", 2013.

2885	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
2886	              Streaming Protocol (RTSP)", RFC 2326,
2887	              DOI 10.17487/RFC2326, April 1998,
2888	              <https://www.rfc-editor.org/info/rfc2326>.

2890	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
2891	              Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
2892	              October 2000, <https://www.rfc-editor.org/info/rfc2974>.

2894	   [RFC6184]  Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
2895	              Payload Format for H.264 Video", RFC 6184,
2896	              DOI 10.17487/RFC6184, May 2011,
2897	              <https://www.rfc-editor.org/info/rfc6184>.

2899	   [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
2900	              Eleftheriadis, "RTP Payload Format for Scalable Video
2901	              Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
2902	              <https://www.rfc-editor.org/info/rfc6190>.

2904	   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
2905	              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
2906	              <https://www.rfc-editor.org/info/rfc7201>.

2908	   [RFC7202]  Perkins, C. and M. Westerlund, "Securing the RTP
2909	              Framework: Why RTP Does Not Mandate a Single Media
2910	              Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
2911	              2014, <https://www.rfc-editor.org/info/rfc7202>.

2913	   [RFC7798]  Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
2914	              M. Hannuksela, "RTP Payload Format for High Efficiency
2915	              Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
2916	              March 2016, <https://www.rfc-editor.org/info/rfc7798>.

2918	Appendix A.  Change History

2920	   draft-zhao-payload-rtp-vvc-00 ........ initial version

2922	   draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and
2923	   corrections

2925	   draft-ietf-payload-rtp-vvc-00 ........ initial WG draft

2927	   draft-ietf-payload-rtp-vvc-01 ........ VVC specification update

2929	   draft-ietf-payload-rtp-vvc-02 ........ VVC specification update

2931	   draft-ietf-payload-rtp-vvc-03 ........ VVC coding tool introduction
2932	   update

2934	   draft-ietf-payload-rtp-vvc-04 ........ VVC coding tool introduction
2935	   update

2937	   draft-ietf-payload-rtp-vvc-05 ........ reference udpate and adding
2938	   placement for open issues

2940	   draft-ietf-payload-rtp-vvc-06 ........ address editor's note

2942	   draft-ietf-payload-rtp-vvc-07 ........ address editor's notes

2944	   draft-ietf-payload-rtp-vvc-08 ........ address editor's notes

2946	   draft-ietf-payload-rtp-vvc-09 ........ address editor's notes

2948	   draft-ietf-payload-rtp-vvc-10 ........ address editor's notes

2950	Authors' Addresses
2951	   Shuai Zhao
2952	   Tencent
2953	   2747 Park Blvd
2954	   Palo Alto,  94588
2955	   United States of America

2957	   Email: shuai.zhao@ieee.org

2959	   Stephan Wenger
2960	   Tencent
2961	   2747 Park Blvd
2962	   Palo Alto,  94588
2963	   United States of America

2965	   Email: stewe@stewe.org

2967	   Yago Sanchez
2968	   Fraunhofer HHI
2969	   Einsteinufer 37
2970	   10587 Berlin
2971	   Germany

2973	   Email: yago.sanchez@hhi.fraunhofer.de

2975	   Ye-Kui Wang
2976	   Bytedance Inc.
2977	   8910 University Center Lane
2978	   San Diego,  92122
2979	   United States of America

2981	   Email: yekui.wang@bytedance.com