idnits 2.17.1 

draft-ietf-avtcore-rtp-vvc-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (18 November 2021) is 889 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1390

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Downref: Normative reference to an Informational RFC: RFC 7656

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VSEI'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VVC'

  -- Obsolete informational reference (is this intentional?): RFC 2326
     (Obsoleted by RFC 7826)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	avtcore                                                          S. Zhao
3	Internet-Draft                                                 S. Wenger
4	Intended status: Standards Track                                 Tencent
5	Expires: 22 May 2022                                          Y. Sanchez
6	                                                          Fraunhofer HHI
7	                                                              Y.-K. Wang
8	                                                          Bytedance Inc.
9	                                                           M. Hannuksela
10	                                                      Nokia Technologies
11	                                                        18 November 2021

13	          RTP Payload Format for Versatile Video Coding (VVC)
14	                     draft-ietf-avtcore-rtp-vvc-13

16	Abstract

18	   This memo describes an RTP payload format for the video coding
19	   standard ITU-T Recommendation H.266 and ISO/IEC International
20	   Standard 23090-3, both also known as Versatile Video Coding (VVC) and
21	   developed by the Joint Video Experts Team (JVET).  The RTP payload
22	   format allows for packetization of one or more Network Abstraction
23	   Layer (NAL) units in each RTP packet payload as well as fragmentation
24	   of a NAL unit into multiple RTP packets.  The payload format has wide
25	   applicability in videoconferencing, Internet video streaming, and
26	   high-bitrate entertainment-quality video, among other applications.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at https://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on 22 May 2022.

45	Copyright Notice

47	   Copyright (c) 2021 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
52	   license-info) in effect on the date of publication of this document.
53	   Please review these documents carefully, as they describe your rights
54	   and restrictions with respect to this document.  Code Components
55	   extracted from this document must include Simplified BSD License text
56	   as described in Section 4.e of the Trust Legal Provisions and are
57	   provided without warranty as described in the Simplified BSD License.

59	Table of Contents

61	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
62	     1.1.  Overview of the VVC Codec . . . . . . . . . . . . . . . .   3
63	       1.1.1.  Coding-Tool Features (informative)  . . . . . . . . .   3
64	       1.1.2.  Systems and Transport Interfaces (informative)  . . .   6
65	       1.1.3.  High-Level Picture Partitioning (informative) . . . .  11
66	       1.1.4.  NAL Unit Header . . . . . . . . . . . . . . . . . . .  13
67	     1.2.  Overview of the Payload Format  . . . . . . . . . . . . .  15
68	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .  15
69	   3.  Definitions and Abbreviations . . . . . . . . . . . . . . . .  15
70	     3.1.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  15
71	       3.1.1.  Definitions from the VVC Specification  . . . . . . .  15
72	       3.1.2.  Definitions Specific to This Memo . . . . . . . . . .  18
73	     3.2.  Abbreviations . . . . . . . . . . . . . . . . . . . . . .  19
74	   4.  RTP Payload Format  . . . . . . . . . . . . . . . . . . . . .  20
75	     4.1.  RTP Header Usage  . . . . . . . . . . . . . . . . . . . .  20
76	     4.2.  Payload Header Usage  . . . . . . . . . . . . . . . . . .  22
77	     4.3.  Payload Structures  . . . . . . . . . . . . . . . . . . .  22
78	       4.3.1.  Single NAL Unit Packets . . . . . . . . . . . . . . .  23
79	       4.3.2.  Aggregation Packets (APs) . . . . . . . . . . . . . .  23
80	       4.3.3.  Fragmentation Units . . . . . . . . . . . . . . . . .  27
81	     4.4.  Decoding Order Number . . . . . . . . . . . . . . . . . .  30
82	   5.  Packetization Rules . . . . . . . . . . . . . . . . . . . . .  31
83	   6.  De-packetization Process  . . . . . . . . . . . . . . . . . .  32
84	   7.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  34
85	     7.1.  Media Type Registration . . . . . . . . . . . . . . . . .  34
86	     7.2.  SDP Parameters  . . . . . . . . . . . . . . . . . . . . .  45
87	       7.2.1.  Mapping of Payload Type Parameters to SDP . . . . . .  45
88	       7.2.2.  Usage with SDP Offer/Answer Model . . . . . . . . . .  46
89	       7.2.3.  Usage in Declarative Session Descriptions . . . . . .  55
90	       7.2.4.  Considerations for Parameter Sets . . . . . . . . . .  56
91	   8.  Use with Feedback Messages  . . . . . . . . . . . . . . . . .  56
92	     8.1.  Picture Loss Indication (PLI) . . . . . . . . . . . . . .  57
93	     8.2.  Full Intra Request (FIR)  . . . . . . . . . . . . . . . .  57
94	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  57
95	   10. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  59
96	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  60
97	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  60
98	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  60
99	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  60
100	     13.2.  Informative References . . . . . . . . . . . . . . . . .  62
101	   Appendix A.  Change History . . . . . . . . . . . . . . . . . . .  63
102	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  64

104	1.  Introduction

106	   The Versatile Video Coding [VVC] specification, formally published as
107	   both ITU-T Recommendation H.266 and ISO/IEC International Standard
108	   23090-3, is currently in the ITU-T publication process and the ISO/
109	   IEC approval process.  VVC is reported to provide significant coding
110	   efficiency gains over HEVC [HEVC] as known as H.265, and other
111	   earlier video codecs.

113	   This memo specifies an RTP payload format for VVC.  It shares its
114	   basic design with the NAL (Network Abstraction Layer) unit based RTP
115	   payload formats of AVC Video Coding [RFC6184], Scalable Video Coding
116	   (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and
117	   their respective predecessors.  With respect to design philosophy,
118	   security, congestion control, and overall implementation complexity,
119	   it has similar properties to those earlier payload format
120	   specifications.  This is a conscious choice, as at least RFC 6184 is
121	   widely deployed and generally known in the relevant implementer
122	   communities.  Certain scalability-related mechanisms known from
123	   [RFC6190] were incorporated into this document, as VVC version 1
124	   supports temporal, spatial, and signal-to-noise ratio (SNR)
125	   scalability.

127	1.1.  Overview of the VVC Codec

129	   VVC and HEVC share a similar hybrid video codec design.  In this
130	   memo, we provide a very brief overview of those features of VVC that
131	   are, in some form, addressed by the payload format specified herein.
132	   Implementers have to read, understand, and apply the ITU-T/ISO/IEC
133	   specifications pertaining to VVC to arrive at interoperable, well-
134	   performing implementations.

136	   Conceptually, both VVC and HEVC include a Video Coding Layer (VCL),
137	   which is often used to refer to the coding-tool features, and a NAL,
138	   which is often used to refer to the systems and transport interface
139	   aspects of the codecs.

141	1.1.1.  Coding-Tool Features (informative)

143	   Coding tool features are described below with occasional reference to
144	   the coding tool set of HEVC, which is well known in the community.

146	   Similar to earlier hybrid-video-coding-based standards, including
147	   HEVC, the following basic video coding design is employed by VVC.  A
148	   prediction signal is first formed by either intra- or motion-
149	   compensated prediction, and the residual (the difference between the
150	   original and the prediction) is then coded.  The gains in coding
151	   efficiency are achieved by redesigning and improving almost all parts
152	   of the codec over earlier designs.  In addition, VVC includes several
153	   tools to make the implementation on parallel architectures easier.

155	   Finally, VVC includes temporal, spatial, and SNR scalability as well
156	   as multiview coding support.

158	   Coding blocks and transform structure

160	   Among major coding-tool differences between HEVC and VVC, one of the
161	   important improvements is the more flexible coding tree structure in
162	   VVC, i.e., multi-type tree.  In addition to quadtree, binary and
163	   ternary trees are also supported, which contributes significant
164	   improvement in coding efficiency.  Moreover, the maximum size of
165	   coding tree unit (CTU) is increased from 64x64 to 128x128.  To
166	   improve the coding efficiency of chroma signal, luma chroma separated
167	   trees at CTU level may be employed for intra-slices.  The square
168	   transforms in HEVC are extended to non-square transforms for
169	   rectangular blocks resulting from binary and ternary tree splits.
170	   Besides, VVC supports multiple transform sets (MTS), including DCT-2,
171	   DST-7, and DCT-8 as well as the non-separable secondary transform.
172	   The transforms used in VVC can have different sizes with support for
173	   larger transform sizes.  For DCT-2, the transform sizes range from
174	   2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from
175	   4x4 to 32x32.  In addition, VVC also support sub-block transform for
176	   both intra and inter coded blocks.  For intra coded blocks, intra
177	   sub-partitioning (ISP) may be used to allow sub-block based intra
178	   prediction and transform.  For inter blocks, sub-block transform may
179	   be used assuming that only a part of an inter-block has non-zero
180	   transform coefficients.

182	   Entropy coding

184	   Similar to HEVC, VVC uses a single entropy-coding engine, which is
185	   based on context adaptive binary arithmetic coding [CABAC], but with
186	   the support of multi-window sizes.  The window sizes can be
187	   initialized differently for different context models.  Due to such a
188	   design, it has more efficient adaptation speed and better coding
189	   efficiency.  A joint chroma residual coding scheme is applied to
190	   further exploit the correlation between the residuals of two color
191	   components.  In VVC, different residual coding schemes are applied
192	   for regular transform coefficients and residual samples generated
193	   using transform-skip mode.

195	   In-loop filtering

197	   VVC has more feature support in loop filters than HEVC.  The
198	   deblocking filter in VVC is similar to HEVC but operates at a smaller
199	   grid.  After deblocking and sample adaptive offset (SAO), an adaptive
200	   loop filter (ALF) may be used.  As a Wiener filter, ALF reduces
201	   distortion of decoded pictures.  Besides, VVC introduces a new module
202	   called luma mapping with chroma scaling to fully utilize the dynamic
203	   range of signal so that rate-distortion performance of both Standard
204	   Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved.

206	   Motion prediction and coding

208	   Compared to HEVC, VVC introduces several improvements in this area.
209	   First, there is the adaptive motion vector resolution (AMVR), which
210	   can save bit cost for motion vectors by adaptively signalling motion
211	   vector resolution.  Then the affine motion compensation is included
212	   to capture complicated motion like zooming and rotation.  Meanwhile,
213	   prediction refinement with the optical flow with affine mode (PROF)
214	   is further deployed to mimic affine motion at the pixel level.
215	   Thirdly the decoder side motion vector refinement (DMVR) is a method
216	   to derive MV vector at decoder side based on block matching so that
217	   fewer bits may be spent on motion vectors.  Bi-directional optical
218	   flow (BDOF) is a similar method to PROF.  BDOF adds a sample wise
219	   offset at 4x4 sub-block level that is derived with equations based on
220	   gradients of the prediction samples and a motion difference relative
221	   to CU motion vectors.  Furthermore, merge with motion vector
222	   difference (MMVD) is a special mode, which further signals a limited
223	   set of motion vector differences on top of merge mode.  In addition
224	   to MMVD, there are another three types of special merge modes, i.e.,
225	   sub-block merge, triangle, and combined intra-/inter-prediction
226	   (CIIP).  Sub-block merge list includes one candidate of sub-block
227	   temporal motion vector prediction (SbTMVP) and up to four candidates
228	   of affine motion vectors.  Triangle is based on triangular block
229	   motion compensation.  CIIP combines intra- and inter- predictions
230	   with weighting.  Adaptive weighting may be employed with a block-
231	   level tool called bi-prediction with CU based weighting (BCW) which
232	   provides more flexibility than in HEVC.

234	   Intra prediction and intra-coding

236	   To capture the diversified local image texture directions with finer
237	   granularity, VVC supports 65 angular directions instead of 33
238	   directions in HEVC.  The intra mode coding is based on a 6-most-
239	   probable-mode scheme, and the 6 most probable modes are derived using
240	   the neighboring intra prediction directions.  In addition, to deal
241	   with the different distributions of intra prediction angles for
242	   different block aspect ratios, a wide-angle intra prediction (WAIP)
243	   scheme is applied in VVC by including intra prediction angles beyond
244	   those present in HEVC.  Unlike HEVC which only allows using the most
245	   adjacent line of reference samples for intra prediction, VVC also
246	   allows using two further reference lines, as known as multi-
247	   reference-line (MRL) intra prediction.  The additional reference
248	   lines can be only used for the 6 most probable intra prediction
249	   modes.  To capture the strong correlation between different colour
250	   components, in VVC, a cross-component linear mode (CCLM) is utilized
251	   which assumes a linear relationship between the luma sample values
252	   and their associated chroma samples.  For intra prediction, VVC also
253	   applies a position-dependent prediction combination (PDPC) for
254	   refining the prediction samples closer to the intra prediction block
255	   boundary.  Matrix-based intra prediction (MIP) modes are also used in
256	   VVC which generates an up to 8x8 intra prediction block using a
257	   weighted sum of downsampled neighboring reference samples, and the
258	   weights are hardcoded constants.

260	   Other coding-tool feature

262	   VVC introduces dependent quantization (DQ) to reduce quantization
263	   error by state-based switching between two quantizers.

265	1.1.2.  Systems and Transport Interfaces (informative)

267	   VVC inherits the basic systems and transport interfaces designs from
268	   HEVC and AVC.  These include the NAL-unit-based syntax structure, the
269	   hierarchical syntax and data unit structure, the supplemental
270	   enhancement information (SEI) message mechanism, and the video
271	   buffering model based on the hypothetical reference decoder (HRD).
272	   The scalability features of VVC are conceptually similar to the
273	   scalable variant of HEVC known as SHVC.  The hierarchical syntax and
274	   data unit structure consists of parameter sets at various levels
275	   (decoder, sequence (pertaining to all), sequence (pertaining to a
276	   single), picture), picture-level header parameters, slice-level
277	   header parameters, and lower-level parameters.

279	   A number of key components that influenced the network abstraction
280	   layer design of VVC as well as this memo are described below

282	   Decoding capability information

284	   The decoding capability information includes parameters that stay
285	   constant for the lifetime of a VVC bitstream, which in IETF terms can
286	   translate to a session.  Such information includes profile, level,
287	   and sub-profile information to determine a maximum capability interop
288	   point that is guaranteed to be never exceeded, even if splicing of
289	   video sequences occurs within a session.  It further includes
290	   constraint fields (most of which are flags), which can optionally be
291	   set to indicate that the video bitstream will be constrained in the
292	   use of certain features as indicated by the values of those fields.
293	   With this, a bitstream can be labelled as not using certain tools,
294	   which allows among other things for resource allocation in a decoder
295	   implementation.

297	   Video parameter set

299	   The video parameter set (VPS) pertains to one or more coded video
300	   sequences (CVSs) of multiple layers covering the same range of access
301	   units, and includes, among other information, decoding dependency
302	   expressed as information for reference picture list construction of
303	   enhancement layers.  The VPS provides a "big picture" of a scalable
304	   sequence, including what types of operation points are provided, the
305	   profile, tier, and level of the operation points, and some other
306	   high-level properties of the bitstream that can be used as the basis
307	   for session negotiation and content selection, etc.  One VPS may be
308	   referenced by one or more sequence parameter sets.

310	   Sequence parameter set

312	   The sequence parameter set (SPS) contains syntax elements pertaining
313	   to a coded layer video sequence (CLVS), which is a group of pictures
314	   belonging to the same layer, starting with a random access point, and
315	   followed by pictures that may depend on each other, until the next
316	   random access point picture.  In MPGEG-2, the equivalent of a CVS was
317	   a group of pictures (GOP), which normally started with an I frame and
318	   was followed by P and B frames.  While more complex in its options of
319	   random access points, VVC retains this basic concept.  One remarkable
320	   difference of VVC is that a CLVS may start with a Gradual Decoding
321	   Refresh (GDR) picture, without requiring presence of traditional
322	   random access points in the bitstream, such as instantaneous decoding
323	   refresh (IDR) or clean random access (CRA) pictures.  In many TV-like
324	   applications, a CVS contains a few hundred milliseconds to a few
325	   seconds of video.  In video conferencing (without switching MCUs
326	   involved), a CVS can be as long in duration as the whole session.

328	   Picture and adaptation parameter set
329	   The picture parameter set and the adaptation parameter set (PPS and
330	   APS, respectively) carry information pertaining to zero or more
331	   pictures and zero or more slices, respectively.  The PPS contains
332	   information that is likely to stay constant from picture to picture,
333	   at least for pictures for a certain type-whereas the APS contains
334	   information, such as adaptive loop filter coefficients, that are
335	   likely to change from picture to picture or even within a picture.  A
336	   single APS is referenced by all slices of the same picture if that
337	   APS contains information about luma mapping with chroma scaling
338	   (LMCS) or scaling list.  Different APSs containing ALF parameters can
339	   be referenced by slices of the same picture.

341	   Picture header

343	   A Picture Header contains information that is common to all slices
344	   that belong to the same picture.  Being able to send that information
345	   as a separate NAL unit when pictures are split into several slices
346	   allows for saving bitrate, compared to repeating the same information
347	   in all slices.  However, there might be scenarios where low-bitrate
348	   video is transmitted using a single slice per picture.  Having a
349	   separate NAL unit to convey that information incurs in an overhead
350	   for such scenarios.  For such scenarios, the picture header syntax
351	   structure is directly included in the slice header, instead of its
352	   own NAL unit.  The mode of the picture header syntax structure being
353	   included in its own NAL unit or not can only be switched on/off for
354	   an entire CLVS, and can only be switched off when in the entire CLVS
355	   each picture contains only one slice.

357	   Profile, tier, and level

359	   The profile, tier and level syntax structures in DCI, VPS and SPS
360	   contain profile, tier, level information for all layers that refer to
361	   the DCI, for layers associated with one or more output layer sets
362	   specified by the VPS, and for any layer that refers to the SPS,
363	   respectively.

365	   Sub-profiles

367	   Within the VVC specification, a sub-profile is a 32-bit number, coded
368	   according to ITU-T Rec. T.35, that does not carry a semantics.  It is
369	   carried in the profile_tier_level structure and hence (potentially)
370	   present in the DCI, VPS, and SPS.  External registration bodies can
371	   register a T.35 codepoint with ITU-T registration authorities and
372	   associate with their registration a description of bitstream
373	   restrictions beyond the profiles defined by ITU-T and ISO/IEC.  This
374	   would allow encoder manufacturers to label the bitstreams generated
375	   by their encoder as complying with such sub-profile.  It is expected
376	   that upstream standardization organizations (such as: DVB and ATSC),
377	   as well as walled-garden video services will take advantage of this
378	   labelling system.  In contrast to "normal" profiles, it is expected
379	   that sub-profiles may indicate encoder choices traditionally left
380	   open in the (decoder-centric) video coding specs, such as GOP
381	   structures, minimum/maximum QP values, and the mandatory use of
382	   certain tools or SEI messages.

384	   General constraint fields

386	   The profile_tier_level structure carries a considerable number of
387	   constraint fields (most of which are flags), which an encoder can use
388	   to indicate to a decoder that it will not use a certain tool or
389	   technology.  They were included in reaction to a perceived market
390	   need for labelling a bitstream as not exercising a certain tool that
391	   has become commercially unviable.

393	   Temporal scalability support

395	   VVC includes support of temporal scalability, by inclusion of the
396	   signalling of TemporalId in the NAL unit header, the restriction that
397	   pictures of a particular temporal sublayer cannot be used for inter
398	   prediction reference by pictures of a lower temporal sublayer, the
399	   sub-bitstream extraction process, and the requirement that each sub-
400	   bitstream extraction output be a conforming bitstream.  Media-Aware
401	   Network Elements (MANEs) can utilize the TemporalId in the NAL unit
402	   header for stream adaptation purposes based on temporal scalability.

404	   Reference picture resampling (RPR)

406	   In AVC and HEVC, the spatial resolution of pictures cannot change
407	   unless a new sequence using a new SPS starts, with an Intra random
408	   access point (IRAP) picture.  VVC enables picture resolution change
409	   within a sequence at a position without encoding an IRAP picture,
410	   which is always intra-coded.  This feature is sometimes referred to
411	   as reference picture resampling (RPR), as the feature needs
412	   resampling of a reference picture used for inter prediction when that
413	   reference picture has a different resolution than the current picture
414	   being decoded.  RPR allows resolution change without the need of
415	   coding an IRAP picture and hence avoids a momentary bit rate spike
416	   caused by an IRAP picture in streaming or video conferencing
417	   scenarios, e.g., to cope with network condition changes.  RPR can
418	   also be used in application scenarios wherein zooming of the entire
419	   video region or some region of interest is needed.

421	   Spatial, SNR, and multiview scalability
422	   VVC includes support for spatial, SNR, and multiview scalability.
423	   Scalable video coding is widely considered to have technical benefits
424	   and enrich services for various video applications.  Until recently,
425	   however, the functionality has not been included in the first version
426	   of specifications of the video codecs.  In VVC, however, all those
427	   forms of scalability are supported in the first version of VVC
428	   natively through the signalling of the nuh_layer_id in the NAL unit
429	   header, the VPS which associates layers with given nuh_layer_id to
430	   each other, reference picture selection, reference picture resampling
431	   for spatial scalability, and a number of other mechanisms not
432	   relevant for this memo.

434	      Spatial scalability

436	         With the existence of Reference Picture Resampling (RPR), the
437	         additional burden for scalability support is just a
438	         modification of the high-level syntax (HLS).  The inter-layer
439	         prediction is employed in a scalable system to improve the
440	         coding efficiency of the enhancement layers.  In addition to
441	         the spatial and temporal motion-compensated predictions that
442	         are available in a single-layer codec, the inter-layer
443	         prediction in VVC uses the possibly resampled video data of the
444	         reconstructed reference picture from a reference layer to
445	         predict the current enhancement layer.  The resampling process
446	         for inter-layer prediction, when used, is performed at the
447	         block-level, reusing the existing interpolation process for
448	         motion compensation in single-layer coding.  It means that no
449	         additional resampling process is needed to support spatial
450	         scalability.

452	      SNR scalability

454	         SNR scalability is similar to spatial scalability except that
455	         the resampling factors are 1:1.  In other words, there is no
456	         change in resolution, but there is inter-layer prediction.

458	      Multiview scalability

460	         The first version of VVC also supports multiview scalability,
461	         wherein a multi-layer bitstream carries layers representing
462	         multiple views, and one or more of the represented views can be
463	         output at the same time.

465	   SEI messages

467	   Supplementary enhancement information (SEI) messages are information
468	   in the bitstream that do not influence the decoding process as
469	   specified in the VVC spec, but address issues of representation/
470	   rendering of the decoded bitstream, label the bitstream for certain
471	   applications, among other, similar tasks.  The overall concept of SEI
472	   messages and many of the messages themselves has been inherited from
473	   the AVC and HEVC specs.  Except for the SEI messages that affect the
474	   specification of the hypothetical reference decoder (HRD), other SEI
475	   messages for use in the VVC environment, which are generally useful
476	   also in other video coding technologies, are not included in the main
477	   VVC specification but in a companion specification [VSEI].

479	1.1.3.  High-Level Picture Partitioning (informative)

481	   VVC inherited the concept of tiles and wavefront parallel processing
482	   (WPP) from HEVC, with some minor to moderate differences.  The basic
483	   concept of slices was kept in VVC but designed in an essentially
484	   different form.  VVC is the first video coding standard that includes
485	   subpictures as a feature, which provides the same functionality as
486	   HEVC motion-constrained tile sets (MCTSs) but designed differently to
487	   have better coding efficiency and to be friendlier for usage in
488	   application systems.  More details of these differences are described
489	   below.

491	   Tiles and WPP

493	   Same as in HEVC, a picture can be split into tile rows and tile
494	   columns in VVC, in-picture prediction across tile boundaries is
495	   disallowed, etc.  However, the syntax for signalling of tile
496	   partitioning has been simplified, by using a unified syntax design
497	   for both the uniform and the non-uniform mode.  In addition,
498	   signalling of entry point offsets for tiles in the slice header is
499	   optional in VVC while it is mandatory in HEVC.  The WPP design in VVC
500	   has two differences compared to HEVC: i) The CTU row delay is reduced
501	   from two CTUs to one CTU; ii) signalling of entry point offsets for
502	   WPP in the slice header is optional in VVC while it is mandatory in
503	   HEVC.

505	   Slices

507	   In VVC, the conventional slices based on CTUs (as in HEVC) or
508	   macroblocks (as in AVC) have been removed.  The main reasoning behind
509	   this architectural change is as follows.  The advances in video
510	   coding since 2003 (the publication year of AVC v1) have been such
511	   that slice-based error concealment has become practically impossible,
512	   due to the ever-increasing number and efficiency of in-picture and
513	   inter-picture prediction mechanisms.  An error-concealed picture is
514	   the decoding result of a transmitted coded picture for which there is
515	   some data loss (e.g., loss of some slices) of the coded picture or a
516	   reference picture for at least some part of the coded picture is not
517	   error-free (e.g., that reference picture was an error-concealed
518	   picture).  For example, when one of the multiple slices of a picture
519	   is lost, it may be error-concealed using an interpolation of the
520	   neighboring slices.  While advanced video coding prediction
521	   mechanisms provide significantly higher coding efficiency, they also
522	   make it harder for machines to estimate the quality of an error-
523	   concealed picture, which was already a hard problem with the use of
524	   simpler prediction mechanisms.  Advanced in-picture prediction
525	   mechanisms also cause the coding efficiency loss due to splitting a
526	   picture into multiple slices to be more significant.  Furthermore,
527	   network conditions become significantly better while at the same time
528	   techniques for dealing with packet losses have become significantly
529	   improved.  As a result, very few implementations have recently used
530	   slices for maximum transmission unit size matching.  Instead,
531	   substantially all applications where low-delay error resilience is
532	   required (e.g., video telephony and video conferencing) rely on
533	   system/transport-level error resilience (e.g., retransmission,
534	   forward error correction) and/or picture-based error resilience tools
535	   (feedback-based error resilience, insertion of IRAPs, scalability
536	   with higher protection level of the base layer, and so on).
537	   Considering all the above, nowadays it is very rare that a picture
538	   that cannot be correctly decoded is passed to the decoder, and when
539	   such a rare case occurs, the system can afford to wait for an error-
540	   free picture to be decoded and available for display without
541	   resulting in frequent and long periods of picture freezing seen by
542	   end users.

544	   Slices in VVC have two modes: rectangular slices and raster-scan
545	   slices.  The rectangular slice, as indicated by its name, covers a
546	   rectangular region of the picture.  Typically, a rectangular slice
547	   consists of several complete tiles.  However, it is also possible
548	   that a rectangular slice is a subset of a tile and consists of one or
549	   more consecutive, complete CTU rows within a tile.  A raster-scan
550	   slice consists of one or more complete tiles in a tile raster scan
551	   order, hence the region covered by a raster-scan slices need not but
552	   could have a non-rectangular shape, but it may also happen to have
553	   the shape of a rectangle.  The concept of slices in VVC is therefore
554	   strongly linked to or based on tiles instead of CTUs (as in HEVC) or
555	   macroblocks (as in AVC).

557	   Subpictures

559	   VVC is the first video coding standard that includes the support of
560	   subpictures as a feature.  Each subpicture consists of one or more
561	   complete rectangular slices that collectively cover a rectangular
562	   region of the picture.  A subpicture may be either specified to be
563	   extractable (i.e., coded independently of other subpictures of the
564	   same picture and of earlier pictures in decoding order) or not
565	   extractable.  Regardless of whether a subpicture is extractable or
566	   not, the encoder can control whether in-loop filtering (including
567	   deblocking, SAO, and ALF) is applied across the subpicture boundaries
568	   individually for each subpicture.

570	   Functionally, subpictures are similar to the motion-constrained tile
571	   sets (MCTSs) in HEVC.  They both allow independent coding and
572	   extraction of a rectangular subset of a sequence of coded pictures,
573	   for use cases like viewport-dependent 360o video streaming
574	   optimization and region of interest (ROI) applications.

576	   There are several important design differences between subpictures
577	   and MCTSs.  First, the subpictures feature in VVC allows motion
578	   vectors of a coding block pointing outside of the subpicture even
579	   when the subpicture is extractable by applying sample padding at
580	   subpicture boundaries in this case, similarly as at picture
581	   boundaries.  Second, additional changes were introduced for the
582	   selection and derivation of motion vectors in the merge mode and in
583	   the decoder side motion vector refinement process of VVC.  This
584	   allows higher coding efficiency compared to the non-normative motion
585	   constraints applied at the encoder-side for MCTSs.  Third, rewriting
586	   of SHs (and PH NAL units, when present) is not needed when extracting
587	   one or more extractable subpictures from a sequence of pictures to
588	   create a sub-bitstream that is a conforming bitstream.  In sub-
589	   bitstream extractions based on HEVC MCTSs, rewriting of SHs is
590	   needed.  Note that in both HEVC MCTSs extraction and VVC subpictures
591	   extraction, rewriting of SPSs and PPSs is needed.  However, typically
592	   there are only a few parameter sets in a bitstream, while each
593	   picture has at least one slice, therefore rewriting of SHs can be a
594	   significant burden for application systems.  Fourth, slices of
595	   different subpictures within a picture are allowed to have different
596	   NAL unit types.  Fifth, VVC specifies HRD and level definitions for
597	   subpicture sequences, thus the conformance of the sub-bitstream of
598	   each extractable subpicture sequence can be ensured by encoders.

600	1.1.4.  NAL Unit Header

602	   VVC maintains the NAL unit concept of HEVC with modifications.  VVC
603	   uses a two-byte NAL unit header, as shown in Figure 1.  The payload
604	   of a NAL unit refers to the NAL unit excluding the NAL unit header.

606	                     +---------------+---------------+
607	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
608	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
609	                     |F|Z| LayerID   |  Type   | TID |
610	                     +---------------+---------------+

612	                   The Structure of the VVC NAL Unit Header.

614	                                  Figure 1

616	   The semantics of the fields in the NAL unit header are as specified
617	   in VVC and described briefly below for convenience.  In addition to
618	   the name and size of each field, the corresponding syntax element
619	   name in VVC is also provided.

621	   F: 1 bit

623	      forbidden_zero_bit.  Required to be zero in VVC.  Note that the
624	      inclusion of this bit in the NAL unit header was to enable
625	      transport of VVC video over MPEG-2 transport systems (avoidance of
626	      start code emulations) [MPEG2S].  In the context of this memo the
627	      value 1 may be used to indicate a syntax violation, e.g., for a
628	      NAL unit resulted from aggregating a number of fragmented units of
629	      a NAL unit but missing the last fragment, as described in the last
630	      sentence of section 4.3.3.

632	   Z: 1 bit

634	      nuh_reserved_zero_bit.  Required to be zero in VVC, and reserved
635	      for future extensions by ITU-T and ISO/IEC.

637	      This memo does not overload the "Z" bit for local extensions, as
638	      a) overloading the "F" bit is sufficient and b) to preserve the
639	      usefulness of this memo to possible future versions of [VVC].

641	   LayerId: 6 bits

643	      nuh_layer_id.  Identifies the layer a NAL unit belongs to, wherein
644	      a layer may be, e.g., a spatial scalable layer, a quality scalable
645	      layer, a layer containing a different view, etc.

647	   Type: 5 bits

649	      nal_unit_type.  This field specifies the NAL unit type as defined
650	      in Table 5 of [VVC].  For a reference of all currently defined NAL
651	      unit types and their semantics, please refer to Section 7.4.2.2 in
652	      [VVC].

654	   TID: 3 bits

656	      nuh_temporal_id_plus1.  This field specifies the temporal
657	      identifier of the NAL unit plus 1.  The value of TemporalId is
658	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
659	      there is at least one bit in the NAL unit header equal to 1, so to
660	      enable the consideration of start code emulations in the NAL unit
661	      payload data independent of the NAL unit header.

663	1.2.  Overview of the Payload Format

665	   This payload format defines the following processes required for
666	   transport of VVC coded data over RTP [RFC3550]:

668	   *  Usage of RTP header with this payload format

670	   *  Packetization of VVC coded NAL units into RTP packets using three
671	      types of payload structures: a single NAL unit packet, aggregation
672	      packet, and fragment unit

674	   *  Transmission of VVC NAL units of the same bitstream within a
675	      single RTP stream

677	   *  Media type parameters to be used with the Session Description
678	      Protocol (SDP) [RFC4566]

680	   *  Usage of RTCP feedback messages

682	2.  Conventions

684	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
685	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
686	   "OPTIONAL" in this document are to be interpreted as described in BCP
687	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
688	   capitals, as shown above.

690	3.  Definitions and Abbreviations

692	3.1.  Definitions

694	   This document uses the terms and definitions of VVC.  Section 3.1.1
695	   lists relevant definitions from [VVC] for convenience.  Section 3.1.2
696	   provides definitions specific to this memo.  All the used terms and
697	   definitions in this memo are verbatim copies of [VVC] specification.

699	3.1.1.  Definitions from the VVC Specification

701	   Access unit (AU): A set of PUs that belong to different layers and
702	   contain coded pictures associated with the same time for output from
703	   the DPB.

705	   Adaptation parameter set (APS): A syntax structure containing syntax
706	   elements that apply to zero or more slices as determined by zero or
707	   more syntax elements found in slice headers.

709	   Bitstream: A sequence of bits, in the form of a NAL unit stream or a
710	   byte stream, that forms the representation of a sequence of AUs
711	   forming one or more coded video sequences (CVSs).

713	   Coded picture: A coded representation of a picture comprising VCL NAL
714	   units with a particular value of nuh_layer_id within an AU and
715	   containing all CTUs of the picture.

717	   Clean random access (CRA) PU: A PU in which the coded picture is a
718	   CRA picture.

720	   Clean random access (CRA) picture: An IRAP picture for which each VCL
721	   NAL unit has nal_unit_type equal to CRA_NUT.

723	   Coded video sequence (CVS): A sequence of AUs that consists, in
724	   decoding order, of a CVSS AU, followed by zero or more AUs that are
725	   not CVSS AUs, including all subsequent AUs up to but not including
726	   any subsequent AU that is a CVSS AU.

728	   Coded video sequence start (CVSS) AU: An AU in which there is a PU
729	   for each layer in the CVS and the coded picture in each PU is a CLVSS
730	   picture.

732	   Coded layer video sequence (CLVS): A sequence of PUs with the same
733	   value of nuh_layer_id that consists, in decoding order, of a CLVSS
734	   PU, followed by zero or more PUs that are not CLVSS PUs, including
735	   all subsequent PUs up to but not including any subsequent PU that is
736	   a CLVSS PU.

738	   Coded layer video sequence start (CLVSS) PU: A PU in which the coded
739	   picture is a CLVSS picture.

741	   Coded layer video sequence start (CLVSS) picture: A coded picture
742	   that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or
743	   a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.

745	   Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs
746	   of chroma samples of a picture that has three sample arrays, or a CTB
747	   of samples of a monochrome picture or a picture that is coded using
748	   three separate colour planes and syntax structures used to code the
749	   samples.

751	   Decoding Capability Information (DCI): A syntax structure containing
752	   syntax elements that apply to the entire bitstream.

754	   Decoded picture buffer (DPB): A buffer holding decoded pictures for
755	   reference, output reordering, or output delay specified for the
756	   hypothetical reference decoder.

758	   Gradual decoding refresh (GDR) picture: A picture for which each VCL
759	   NAL unit has nal_unit_type equal to GDR_NUT.

761	   Instantaneous decoding refresh (IDR) PU: A PU in which the coded
762	   picture is an IDR picture.

764	   Instantaneous decoding refresh (IDR) picture: An IRAP picture for
765	   which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or
766	   IDR_N_LP.

768	   Intra random access point (IRAP) AU: An AU in which there is a PU for
769	   each layer in the CVS and the coded picture in each PU is an IRAP
770	   picture.

772	   Intra random access point (IRAP) PU: A PU in which the coded picture
773	   is an IRAP picture.

775	   Intra random access point (IRAP) picture: A coded picture for which
776	   all VCL NAL units have the same value of nal_unit_type in the range
777	   of IDR_W_RADL to CRA_NUT, inclusive.

779	   Layer: A set of VCL NAL units that all have a particular value of
780	   nuh_layer_id and the associated non-VCL NAL units.

782	   Network abstraction layer (NAL) unit: A syntax structure containing
783	   an indication of the type of data to follow and bytes containing that
784	   data in the form of an RBSP interspersed as necessary with emulation
785	   prevention bytes.

787	   Network abstraction layer (NAL) unit stream: A sequence of NAL units.

789	   Output Layer Set (OLS): A set of layers for which one or more layers
790	   are specified as the output layers.

792	   Operation point (OP): A temporal subset of an OLS, identified by an
793	   OLS index and a highest value of TemporalId.

795	   Picture parameter set (PPS): A syntax structure containing syntax
796	   elements that apply to zero or more entire coded pictures as
797	   determined by a syntax element found in each slice header.

799	   Picture unit (PU): A set of NAL units that are associated with each
800	   other according to a specified classification rule, are consecutive
801	   in decoding order, and contain exactly one coded picture.

803	   Random access: The act of starting the decoding process for a
804	   bitstream at a point other than the beginning of the stream.

806	   Sequence parameter set (SPS): A syntax structure containing syntax
807	   elements that apply to zero or more entire CLVSs as determined by the
808	   content of a syntax element found in the PPS referred to by a syntax
809	   element found in each picture header.

811	   Slice: An integer number of complete tiles or an integer number of
812	   consecutive complete CTU rows within a tile of a picture that are
813	   exclusively contained in a single NAL unit.

815	   Slice header (SH): A part of a coded slice containing the data
816	   elements pertaining to all tiles or CTU rows within a tile
817	   represented in the slice.

819	   Sublayer: A temporal scalable layer of a temporal scalable bitstream
820	   consisting of VCL NAL units with a particular value of the TemporalId
821	   variable, and the associated non-VCL NAL units.

823	   Subpicture: An rectangular region of one or more slices within a
824	   picture.

826	   Sublayer representation: A subset of the bitstream consisting of NAL
827	   units of a particular sublayer and the lower sublayers.

829	   Tile: A rectangular region of CTUs within a particular tile column
830	   and a particular tile row in a picture.

832	   Tile column: A rectangular region of CTUs having a height equal to
833	   the height of the picture and a width specified by syntax elements in
834	   the picture parameter set.

836	   Tile row: A rectangular region of CTUs having a height specified by
837	   syntax elements in the picture parameter set and a width equal to the
838	   width of the picture.

840	   Video coding layer (VCL) NAL unit: A collective term for coded slice
841	   NAL units and the subset of NAL units that have reserved values of
842	   nal_unit_type that are classified as VCL NAL units in this
843	   Specification.

845	3.1.2.  Definitions Specific to This Memo

847	   Media-Aware Network Element (MANE): A network element, such as a
848	   middlebox, selective forwarding unit, or application-layer gateway
849	   that is capable of parsing certain aspects of the RTP payload headers
850	   or the RTP payload and reacting to their contents.

852	      Informative note: The concept of a MANE goes beyond normal routers
853	      or gateways in that a MANE has to be aware of the signalling
854	      (e.g., to learn about the payload type mappings of the media
855	      streams), and in that it has to be trusted when working with
856	      Secure RTP (SRTP).  The advantage of using MANEs is that they
857	      allow packets to be dropped according to the needs of the media
858	      coding.  For example, if a MANE has to drop packets due to
859	      congestion on a certain link, it can identify and remove those
860	      packets whose elimination produces the least adverse effect on the
861	      user experience.  After dropping packets, MANEs must rewrite RTCP
862	      packets to match the changes to the RTP stream, as specified in
863	      Section 7 of [RFC3550].

865	   NAL unit decoding order: A NAL unit order that conforms to the
866	   constraints on NAL unit order given in Section 7.4.2.4 in [VVC],
867	   follow the Order of NAL units in the bitstream.

869	   RTP stream (See [RFC7656]): Within the scope of this memo, one RTP
870	   stream is utilized to transport a VVC bitstream, which may contain
871	   one or more layers, and each layer may contain one or more temporal
872	   sublayers.

874	   Transmission order: The order of packets in ascending RTP sequence
875	   number order (in modulo arithmetic).  Within an aggregation packet,
876	   the NAL unit transmission order is the same as the order of
877	   appearance of NAL units in the packet.

879	3.2.  Abbreviations

881	   AU         Access Unit

883	   AP         Aggregation Packet

885	   APS        Adaptation Parameter Set

887	   CTU        Coding Tree Unit

889	   CVS        Coded Video Sequence

891	   DPB        Decoded Picture Buffer

893	   DCI        Decoding Capability Information

895	   DON        Decoding Order Number

897	   FIR        Full Intra Request

899	   FU         Fragmentation Unit
900	   GDR        Gradual Decoding Refresh

902	   HRD        Hypothetical Reference Decoder

904	   IDR        Instantaneous Decoding Refresh

906	   IRAP       Intra Random Access Point

908	   MANE       Media-Aware Network Element

910	   MTU        Maximum Transfer Unit

912	   NAL        Network Abstraction Layer

914	   NALU       Network Abstraction Layer Unit

916	   OLS        Output Layer Set

918	   PLI        Picture Loss Indication

920	   PPS        Picture Parameter Set

922	   RPS        Reference Picture Set

924	   RPSI       Reference Picture Selection Indication

926	   SEI        Supplemental Enhancement Information

928	   SLI        Slice Loss Indication

930	   SPS        Sequence Parameter Set

932	   VCL        Video Coding Layer

934	   VPS        Video Parameter Set

936	4.  RTP Payload Format

938	4.1.  RTP Header Usage

940	   The format of the RTP header is specified in [RFC3550] (reprinted as
941	   Figure 2 for convenience).  This payload format uses the fields of
942	   the header in a manner consistent with that specification.

944	   The RTP payload (and the settings for some RTP header bits) for
945	   aggregation packets and fragmentation units are specified in
946	   Section 4.3.2 and Section 4.3.3, respectively.

948	       0                   1                   2                   3
949	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
950	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
951	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
952	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
953	      |                           timestamp                           |
954	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
955	      |           synchronization source (SSRC) identifier            |
956	      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
957	      |            contributing source (CSRC) identifiers             |
958	      |                             ....                              |
959	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

961	                        RTP Header According to {{RFC3550}}

963	                                  Figure 2

965	   The RTP header information to be set according to this RTP payload
966	   format is set as follows:

968	   Marker bit (M): 1 bit

970	      Set for the last packet, in transmission order, among each set of
971	      packets that contain NAL units of one access unit.  This is in
972	      line with the normal use of the M bit in video formats to allow an
973	      efficient playout buffer handling.

975	   Payload Type (PT): 7 bits

977	      The assignment of an RTP payload type for this new packet format
978	      is outside the scope of this document and will not be specified
979	      here.  The assignment of a payload type has to be performed either
980	      through the profile used or in a dynamic way.

982	   Sequence Number (SN): 16 bits

984	      Set and used in accordance with [RFC3550].

986	   Timestamp: 32 bits
987	      The RTP timestamp is set to the sampling timestamp of the content.
988	      A 90 kHz clock rate MUST be used.  If the NAL unit has no timing
989	      properties of its own (e.g., parameter set and SEI NAL units), the
990	      RTP timestamp MUST be set to the RTP timestamp of the coded
991	      pictures of the access unit in which the NAL unit (according to
992	      Section 7.4.2.4 of [VVC]) is included.  Receivers MUST use the RTP
993	      timestamp for the display process, even when the bitstream
994	      contains picture timing SEI messages or decoding unit information
995	      SEI messages as specified in [VVC].

997	         Informative note: When picture timing SEI messages are present,
998	         the RTP sender is responsible to ensure that the RTP timestamps
999	         are consistent with the timing information carried in the
1000	         picture timing SEI messages.

1002	   Synchronization source (SSRC): 32 bits

1004	      Used to identify the source of the RTP packets.  A single SSRC is
1005	      used for all parts of a single bitstream.

1007	4.2.  Payload Header Usage

1009	   The first two bytes of the payload of an RTP packet are referred to
1010	   as the payload header.  The payload header consists of the same
1011	   fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown
1012	   in Section 1.1.4, irrespective of the type of the payload structure.

1014	   The TID value indicates (among other things) the relative importance
1015	   of an RTP packet, for example, because NAL units belonging to higher
1016	   temporal sublayers are not used for the decoding of lower temporal
1017	   sublayers.  A lower value of TID indicates a higher importance.
1018	   More-important NAL units MAY be better protected against transmission
1019	   losses than less-important NAL units.

1021	4.3.  Payload Structures

1023	   Three different types of RTP packet payload structures are specified.
1024	   A receiver can identify the type of an RTP packet payload through the
1025	   Type field in the payload header.

1027	   The three different payload structures are as follows:

1029	   *  Single NAL unit packet: Contains a single NAL unit in the payload,
1030	      and the NAL unit header of the NAL unit also serves as the payload
1031	      header.  This payload structure is specified in Section 4.4.1.

1033	   *  Aggregation Packet (AP): Contains more than one NAL unit within
1034	      one access unit.  This payload structure is specified in
1035	      Section 4.3.2.

1037	   *  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
1038	      This payload structure is specified in Section 4.3.3.

1040	4.3.1.  Single NAL Unit Packets

1042	   A single NAL unit packet contains exactly one NAL unit, and consists
1043	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
1044	   DONL field (in network byte order), and the NAL unit payload data
1045	   (the NAL unit excluding its NAL unit header) of the contained NAL
1046	   unit, as shown in Figure 3.

1048	      0                   1                   2                   3
1049	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1050	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1051	     |           PayloadHdr          |      DONL (conditional)       |
1052	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1053	     |                                                               |
1054	     |                  NAL unit payload data                        |
1055	     |                                                               |
1056	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1057	     |                               :...OPTIONAL RTP padding        |
1058	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1060	                  The Structure of a Single NAL Unit Packet

1062	                                  Figure 3

1064	   The DONL field, when present, specifies the value of the 16 least
1065	   significant bits of the decoding order number of the contained NAL
1066	   unit.  If sprop-max-don-diff is greater than 0, the DONL field MUST
1067	   be present, and the variable DON for the contained NAL unit is
1068	   derived as equal to the value of the DONL field.  Otherwise (sprop-
1069	   max-don-diff is equal to 0), the DONL field MUST NOT be present.

1071	4.3.2.  Aggregation Packets (APs)

1073	   Aggregation Packets (APs) can reduce packetization overhead for small
1074	   NAL units, such as most of the non-VCL NAL units, which are often
1075	   only a few octets in size.

1077	   An AP aggregates NAL units of one access unit and it MUST NOT contain
1078	   NAL units from more than one AU.  Each NAL unit to be carried in an
1079	   AP is encapsulated in an aggregation unit.  NAL units aggregated in
1080	   one AP are included in NAL unit decoding order.

1082	   An AP consists of a payload header (denoted as PayloadHdr) followed
1083	   by two or more aggregation units, as shown in Figure 4.

1085	     0                   1                   2                   3
1086	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1087	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1088	    |    PayloadHdr (Type=28)       |                               |
1089	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1090	    |                                                               |
1091	    |             two or more aggregation units                     |
1092	    |                                                               |
1093	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1094	    |                               :...OPTIONAL RTP padding        |
1095	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1097	                   The Structure of an Aggregation Packet

1099	                                  Figure 4

1101	   The fields in the payload header of an AP are set as follows.  The F
1102	   bit MUST be equal to 0 if the F bit of each aggregated NAL unit is
1103	   equal to zero; otherwise, it MUST be equal to 1.  The Type field MUST
1104	   be equal to 28.

1106	   The value of LayerId MUST be equal to the lowest value of LayerId of
1107	   all the aggregated NAL units.  The value of TID MUST be the lowest
1108	   value of TID of all the aggregated NAL units.

1110	      Informative note: All VCL NAL units in an AP have the same TID
1111	      value since they belong to the same access unit.  However, an AP
1112	      may contain non-VCL NAL units for which the TID value in the NAL
1113	      unit header may be different than the TID value of the VCL NAL
1114	      units in the same AP.

1116	      Informative Note: If a system envisions sub-picture level or
1117	      picture level modifications, for example by removing sub-pictures
1118	      or pictures of a particular layer, a good design choice on the
1119	      sender's side would be to aggregate NAL units belonging to only
1120	      the same sub-picture or picture of a particular layer.

1122	   An AP MUST carry at least two aggregation units and can carry as many
1123	   aggregation units as necessary; however, the total amount of data in
1124	   an AP obviously MUST fit into an IP packet, and the size SHOULD be
1125	   chosen so that the resulting IP packet is smaller than the MTU size
1126	   so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
1127	   specified in Section 4.3.3.  APs MUST NOT be nested; i.e., an AP can
1128	   not contain another AP.

1130	   The first aggregation unit in an AP consists of a conditional 16-bit
1131	   DONL field (in network byte order) followed by a 16-bit unsigned size
1132	   information (in network byte order) that indicates the size of the
1133	   NAL unit in bytes (excluding these two octets, but including the NAL
1134	   unit header), followed by the NAL unit itself, including its NAL unit
1135	   header, as shown in Figure 5.

1137	     0                   1                   2                   3
1138	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1139	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1140	    |               :       DONL (conditional)      |   NALU size   |
1141	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1142	    |   NALU size   |                                               |
1143	    +-+-+-+-+-+-+-+-+         NAL unit                              |
1144	    |                                                               |
1145	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1146	    |                               :
1147	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1149	           The Structure of the First Aggregation Unit in an AP

1151	                                  Figure 5

1153	   The DONL field, when present, specifies the value of the 16 least
1154	   significant bits of the decoding order number of the aggregated NAL
1155	   unit.

1157	   If sprop-max-don-diff is greater than 0, the DONL field MUST be
1158	   present in an aggregation unit that is the first aggregation unit in
1159	   an AP, and the variable DON for the aggregated NAL unit is derived as
1160	   equal to the value of the DONL field, and the variable DON for an
1161	   aggregation unit that is not the first aggregation unit in an AP
1162	   aggregated NAL unit is derived as equal to the DON of the preceding
1163	   aggregated NAL unit in the same AP plus 1 modulo 65536.  Otherwise
1164	   (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
1165	   present in an aggregation unit that is the first aggregation unit in
1166	   an AP.

1168	   An aggregation unit that is not the first aggregation unit in an AP
1169	   will be followed immediately by a 16-bit unsigned size information
1170	   (in network byte order) that indicates the size of the NAL unit in
1171	   bytes (excluding these two octets, but including the NAL unit
1172	   header), followed by the NAL unit itself, including its NAL unit
1173	   header, as shown in Figure 6.

1175	     0                   1                   2                   3
1176	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1177	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1178	    |               :       NALU size               |   NAL unit    |
1179	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1180	    |                                                               |
1181	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1182	    |                               :
1183	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1185	         The Structure of an Aggregation Unit That Is Not the First
1186	                          Aggregation Unit in an AP

1188	                                  Figure 6

1190	   Figure 7 presents an example of an AP that contains two aggregation
1191	   units, labeled as 1 and 2 in the figure, without the DONL field being
1192	   present.

1194	     0                   1                   2                   3
1195	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1196	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1197	    |                          RTP Header                           |
1198	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1199	    |   PayloadHdr (Type=28)        |         NALU 1 Size           |
1200	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1201	    |          NALU 1 HDR           |                               |
1202	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1203	    |                   . . .                                       |
1204	    |                                                               |
1205	    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1206	    |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1207	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1208	    | NALU 2 HDR    |                                               |
1209	    +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1210	    |                   . . .                                       |
1211	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1212	    |                               :...OPTIONAL RTP padding        |
1213	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1215	               An Example of an AP Packet Containing
1216	             Two Aggregation Units without the DONL Field

1218	                                  Figure 7

1220	   Figure 8 presents an example of an AP that contains two aggregation
1221	   units, labeled as 1 and 2 in the figure, with the DONL field being
1222	   present.

1224	     0                   1                   2                   3
1225	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1226	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1227	    |                          RTP Header                           |
1228	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1229	    |   PayloadHdr (Type=28)        |        NALU 1 DONL            |
1230	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1231	    |          NALU 1 Size          |            NALU 1 HDR         |
1232	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1233	    |                                                               |
1234	    |                 NALU 1 Data   . . .                           |
1235	    |                                                               |
1236	    +        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1237	    |                               :          NALU 2 Size          |
1238	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1239	    |          NALU 2 HDR           |                               |
1240	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1241	    |                                                               |
1242	    |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1243	    |                               :...OPTIONAL RTP padding        |
1244	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1246	                   An Example of an AP Containing
1247	                 Two Aggregation Units with the DONL Field

1249	                                  Figure 8

1251	4.3.3.  Fragmentation Units

1253	   Fragmentation Units (FUs) are introduced to enable fragmenting a
1254	   single NAL unit into multiple RTP packets, possibly without
1255	   cooperation or knowledge of the [VVC] encoder.  A fragment of a NAL
1256	   unit consists of an integer number of consecutive octets of that NAL
1257	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1258	   order with ascending RTP sequence numbers (with no other RTP packets
1259	   within the same RTP stream being sent between the first and last
1260	   fragment).

1262	   When a NAL unit is fragmented and conveyed within FUs, it is referred
1263	   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
1264	   NOT be nested; i.e., an FU can not contain a subset of another FU.

1266	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1267	   time of the fragmented NAL unit.

1269	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1270	   header of one octet, a conditional 16-bit DONL field (in network byte
1271	   order), and an FU payload, as shown in Figure 9.

1273	     0                   1                   2                   3
1274	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1275	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1276	    |   PayloadHdr (Type=29)        |   FU header   | DONL (cond)   |
1277	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1278	    |   DONL (cond) |                                               |
1279	    |-+-+-+-+-+-+-+-+                                               |
1280	    |                         FU payload                            |
1281	    |                                                               |
1282	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1283	    |                               :...OPTIONAL RTP padding        |
1284	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1286	                          The Structure of an FU

1288	                                  Figure 9

1290	   The fields in the payload header are set as follows.  The Type field
1291	   MUST be equal to 29.  The fields F, LayerId, and TID MUST be equal to
1292	   the fields F, LayerId, and TID, respectively, of the fragmented NAL
1293	   unit.

1295	   The FU header consists of an S bit, an E bit, an R bit and a 5-bit
1296	   FuType field, as shown in Figure 10.

1298	                           +---------------+
1299	                           |0|1|2|3|4|5|6|7|
1300	                           +-+-+-+-+-+-+-+-+
1301	                           |S|E|P|  FuType |
1302	                           +---------------+

1304	                       The Structure of FU Header

1306	                                 Figure 10

1308	   The semantics of the FU header fields are as follows:

1310	   S: 1 bit

1312	      When set to 1, the S bit indicates the start of a fragmented NAL
1313	      unit, i.e., the first byte of the FU payload is also the first
1314	      byte of the payload of the fragmented NAL unit.  When the FU
1315	      payload is not the start of the fragmented NAL unit payload, the S
1316	      bit MUST be set to 0.

1318	   E: 1 bit
1319	      When set to 1, the E bit indicates the end of a fragmented NAL
1320	      unit, i.e., the last byte of the payload is also the last byte of
1321	      the fragmented NAL unit.  When the FU payload is not the last
1322	      fragment of a fragmented NAL unit, the E bit MUST be set to 0.

1324	   P: 1 bit

1326	      When set to 1, the P bit indicates the last FU of the last VCL NAL
1327	      unit of a coded picture, i.e., the last byte of the FU payload is
1328	      also the last byte of the last VCL NAL unit of the coded picture.
1329	      When the FU payload is not the last fragment of the last VCL NAL
1330	      unit of a coded picture, the P bit MUST be set to 0.

1332	   FuType: 5 bits

1334	      The field FuType MUST be equal to the field Type of the fragmented
1335	      NAL unit.

1337	   The DONL field, when present, specifies the value of the 16 least
1338	   significant bits of the decoding order number of the fragmented NAL
1339	   unit.

1341	   If sprop-max-don-diff is greater than 0, and the S bit is equal to 1,
1342	   the DONL field MUST be present in the FU, and the variable DON for
1343	   the fragmented NAL unit is derived as equal to the value of the DONL
1344	   field.  Otherwise (sprop-max-don-diff is equal to 0, or the S bit is
1345	   equal to 0), the DONL field MUST NOT be present in the FU.

1347	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1348	   the Start bit and End bit must not both be set to 1 in the same FU
1349	   header.

1351	   The FU payload consists of fragments of the payload of the fragmented
1352	   NAL unit so that if the FU payloads of consecutive FUs, starting with
1353	   an FU with the S bit equal to 1 and ending with an FU with the E bit
1354	   equal to 1, are sequentially concatenated, the payload of the
1355	   fragmented NAL unit can be reconstructed.  The NAL unit header of the
1356	   fragmented NAL unit is not included as such in the FU payload, but
1357	   rather the information of the NAL unit header of the fragmented NAL
1358	   unit is conveyed in F, LayerId, and TID fields of the FU payload
1359	   headers of the FUs and the FuType field of the FU header of the FUs.
1360	   An FU payload MUST NOT be empty.

1362	   If an FU is lost, the receiver SHOULD discard all following
1363	   fragmentation units in transmission order corresponding to the same
1364	   fragmented NAL unit, unless the decoder in the receiver is known to
1365	   be prepared to gracefully handle incomplete NAL units.

1367	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1368	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1369	   n of that NAL unit is not received.  In this case, the
1370	   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
1371	   syntax violation.

1373	4.4.  Decoding Order Number

1375	   For each NAL unit, the variable AbsDon is derived, representing the
1376	   decoding order number that is indicative of the NAL unit decoding
1377	   order.

1379	   Let NAL unit n be the n-th NAL unit in transmission order within an
1380	   RTP stream.

1382	   If sprop-max-don-diff is equal to 0, AbsDon[n], the value of AbsDon
1383	   for NAL unit n, is derived as equal to n.

1385	   Otherwise (sprop-max-don-diff is greater than 0), AbsDon[n] is
1386	   derived as follows, where DON[n] is the value of the variable DON for
1387	   NAL unit n:

1389	   *  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
1390	      transmission order), AbsDon[0] is set equal to DON[0].

1392	   *  Otherwise (n is greater than 0), the following applies for
1393	      derivation of AbsDon[n]:

1395	         If DON[n] == DON[n-1],
1396	            AbsDon[n] = AbsDon[n-1]

1398	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1399	            AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1401	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1402	            AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1404	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1405	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1407	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1408	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1410	   For any two NAL units m and n, the following applies:

1412	   *  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
1413	      NAL unit m in NAL unit decoding order.

1415	   *  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1416	      of the two NAL units can be in either order.

1418	   *  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1419	      NAL unit m in decoding order.

1421	         Informative note: When two consecutive NAL units in the NAL
1422	         unit decoding order have different values of AbsDon, the
1423	         absolute difference between the two AbsDon values may be
1424	         greater than or equal to 1.

1426	         Informative note: There are multiple reasons to allow for the
1427	         absolute difference of the values of AbsDon for two consecutive
1428	         NAL units in the NAL unit decoding order to be greater than
1429	         one.  An increment by one is not required, as at the time of
1430	         associating values of AbsDon to NAL units, it may not be known
1431	         whether all NAL units are to be delivered to the receiver.  For
1432	         example, a gateway might not forward VCL NAL units of higher
1433	         sublayers or some SEI NAL units when there is congestion in the
1434	         network.  In another example, the first intra-coded picture of
1435	         a pre-encoded clip is transmitted in advance to ensure that it
1436	         is readily available in the receiver, and when transmitting the
1437	         first intra-coded picture, the originator does not exactly know
1438	         how many NAL units will be encoded before the first intra-coded
1439	         picture of the pre-encoded clip follows in decoding order.
1440	         Thus, the values of AbsDon for the NAL units of the first
1441	         intra-coded picture of the pre-encoded clip have to be
1442	         estimated when they are transmitted, and gaps in values of
1443	         AbsDon may occur.

1445	5.  Packetization Rules

1447	   The following packetization rules apply:

1449	   *  If sprop-max-don-diff is greater than 0, the transmission order of
1450	      NAL units carried in the RTP stream MAY be different than the NAL
1451	      unit decoding order.  Otherwise (sprop-max-don-diff is equal to
1452	      0), the transmission order of NAL units carried in the RTP stream
1453	      MUST be the same as the NAL unit decoding order.

1455	   *  A NAL unit of a small size SHOULD be encapsulated in an
1456	      aggregation packet together one or more other NAL units in order
1457	      to avoid the unnecessary packetization overhead for small NAL
1458	      units.  For example, non-VCL NAL units such as access unit
1459	      delimiters, parameter sets, or SEI NAL units are typically small
1460	      and can often be aggregated with VCL NAL units without violating
1461	      MTU size constraints.

1463	   *  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
1464	      viewpoint, be encapsulated in an aggregation packet together with
1465	      its associated VCL NAL unit, as typically a non-VCL NAL unit would
1466	      be meaningless without the associated VCL NAL unit being
1467	      available.

1469	   *  For carrying exactly one NAL unit in an RTP packet, a single NAL
1470	      unit packet MUST be used.

1472	6.  De-packetization Process

1474	   The general concept behind de-packetization is to get the NAL units
1475	   out of the RTP packets in an RTP stream and pass them to the decoder
1476	   in the NAL unit decoding order.

1478	   The de-packetization process is implementation dependent.  Therefore,
1479	   the following description should be seen as an example of a suitable
1480	   implementation.  Other schemes may be used as well, as long as the
1481	   output for the same input is the same as the process described below.
1482	   The output is the same when the set of output NAL units and their
1483	   order are both identical.  Optimizations relative to the described
1484	   algorithms are possible.

1486	   All normal RTP mechanisms related to buffer management apply.  In
1487	   particular, duplicated or outdated RTP packets (as indicated by the
1488	   RTP sequences number and the RTP timestamp) are removed.  To
1489	   determine the exact time for decoding, factors such as a possible
1490	   intentional delay to allow for proper inter-stream synchronization
1491	   MUST be factored in.

1493	   NAL units with NAL unit type values in the range of 0 to 27,
1494	   inclusive, may be passed to the decoder.  NAL-unit-like structures
1495	   with NAL unit type values in the range of 28 to 31, inclusive, MUST
1496	   NOT be passed to the decoder.

1498	   The receiver includes a receiver buffer, which is used to compensate
1499	   for transmission delay jitter within individual RTP stream, and to
1500	   reorder NAL units from transmission order to the NAL unit decoding
1501	   order.  In this section, the receiver operation is described under
1502	   the assumption that there is no transmission delay jitter within an
1503	   RTP stream.  To make a difference from a practical receiver buffer
1504	   that is also used for compensation of transmission delay jitter, the
1505	   receiver buffer is hereafter called the de-packetization buffer in
1506	   this section.  Receivers should also prepare for transmission delay
1507	   jitter; that is, either reserve separate buffers for transmission
1508	   delay jitter buffering and de-packetization buffering or use a
1509	   receiver buffer for both transmission delay jitter and de-
1510	   packetization.  Moreover, receivers should take transmission delay
1511	   jitter into account in the buffering operation, e.g., by additional
1512	   initial buffering before starting of decoding and playback.

1514	   The de-packetization process extracts the NAL units from the RTP
1515	   packets in an RTP stream as follows.  When an RTP packet carries a
1516	   single NAL unit packet, the payload of the RTP packet is extracted as
1517	   a single NAL unit, excluding the DONL field, i.e., third and fourth
1518	   bytes, when sprop-max-don-diff is greater than 0.  When an RTP packet
1519	   carries an Aggregation Packet, several NAL units are extracted from
1520	   the payload of the RTP packet.  In this case, each NAL unit
1521	   corresponds to the part of the payload of each aggregation unit that
1522	   follows the NALU size field as described in Section 4.3.2.  When an
1523	   RTP packet carries a Fragmentation Unit (FU), all RTP packets from
1524	   the first FU (with the S field equal to 1) of the fragmented NAL unit
1525	   up to the last FU (with the E field equal to 1) of the fragmented NAL
1526	   unit are collected.  The NAL unit is extracted from these RTP packets
1527	   by concatenating all FU payloads in the same order as the
1528	   corresponding RTP packets and appending the NAL unit header with the
1529	   fields F, LayerId, and TID, set to equal to the values of the fields
1530	   F, LayerId, and TID in the payload header of the FUs respectively,
1531	   and with the NAL unit type set equal to the value of the field FuType
1532	   in the FU header of the FUs, as described in Section 4.3.3.

1534	   When sprop-max-don-diff is equal to 0, the de-packetization buffer
1535	   size is zero bytes, and the NAL units carried in the single RTP
1536	   stream are directly passed to the decoder in their transmission
1537	   order, which is identical to their decoding order.

1539	   When sprop-max-don-diff is greater than 0, the process described in
1540	   the remainder of this section applies.

1542	   There are two buffering states in the receiver: initial buffering and
1543	   buffering while playing.  Initial buffering starts when the reception
1544	   is initialized.  After initial buffering, decoding and playback are
1545	   started, and the buffering-while-playing mode is used.

1547	   Regardless of the buffering state, the receiver stores incoming NAL
1548	   units in reception order into the de-packetization buffer.  NAL units
1549	   carried in RTP packets are stored in the de-packetization buffer
1550	   individually, and the value of AbsDon is calculated and stored for
1551	   each NAL unit.

1553	   Initial buffering lasts until the difference between the greatest and
1554	   smallest AbsDon values of the NAL units in the de-packetization
1555	   buffer is greater than or equal to the value of sprop-max-don-diff.

1557	   After initial buffering, whenever the difference between the greatest
1558	   and smallest AbsDon values of the NAL units in the de-packetization
1559	   buffer is greater than or equal to the value of sprop-max-don-diff,
1560	   the following operation is repeatedly applied until this difference
1561	   is smaller than sprop-max-don-diff:

1563	   *  The NAL unit in the de-packetization buffer with the smallest
1564	      value of AbsDon is removed from the de-packetization buffer and
1565	      passed to the decoder.

1567	   When no more NAL units are flowing into the de-packetization buffer,
1568	   all NAL units remaining in the de-packetization buffer are removed
1569	   from the buffer and passed to the decoder in the order of increasing
1570	   AbsDon values.

1572	7.  Payload Format Parameters

1574	   This section specifies the optional parameters.  A mapping of the
1575	   parameters with Session Description Protocol (SDP) [RFC4556] is also
1576	   provided for applications that use SDP.

1578	7.1.  Media Type Registration

1580	   The receiver MUST ignore any parameter unspecified in this memo.

1582	   Type name:            video

1584	   Subtype name:         H266

1586	   Required parameters:  none

1588	   Optional parameters:

1590	      profile-id, tier-flag, sub-profile-id, interop-constraints, and
1591	      level-id:

1593	         These parameters indicate the profile, tier, default level,
1594	         sub-profile, and some constraints of the bitstream carried by
1595	         the RTP stream, or a specific set of the profile, tier, default
1596	         level, sub-profile and some constraints the receiver supports.

1598	         The subset of coding tools that may have been used to generate
1599	         the bitstream or that the receiver supports, as well as some
1600	         additional constraints are indicated collectively by profile-
1601	         id, sub-profile-id, and interop-constraints.

1603	            Informative note: There are 128 values of profile-id.  The
1604	            subset of coding tools identified by the profile-id can be
1605	            further constrained with up to 255 instances of sub-profile-
1606	            id.  In addition, 68 bits included in interop-constraints,
1607	            which can be extended up to 324 bits provide means to
1608	            further restrict tools from existing profiles.  To be able
1609	            to support this fine-granular signalling of coding tool
1610	            subsets with profile-id, sub-profile-id and interop-
1611	            constraints, it would be safe to require symmetric use of
1612	            these parameters in SDP offer/answer unless recv-ols-id is
1613	            included in the SDP answer for choosing one of the layers
1614	            offered.

1616	         The tier is indicated by tier-flag.  The default level is
1617	         indicated by level-id.  The tier and the default level specify
1618	         the limits on values of syntax elements or arithmetic
1619	         combinations of values of syntax elements that are followed
1620	         when generating the bitstream or that the receiver supports.

1622	         In SDP offer/answer, when the SDP answer does not include the
1623	         recv-ols-id parameter that is less than the sprop-ols-id
1624	         parameter in the SDP offer, the following applies:

1626	         o  The tier-flag, profile-id, sub-profile-id, and interop-
1627	            constraints parameters MUST be used symmetrically, i.e., the
1628	            value of each of these parameters in the offer MUST be the
1629	            same as that in the answer, either explicitly signalled or
1630	            implicitly inferred.

1632	         o  The level-id parameter is changeable as long as the highest
1633	            level indicated by the answer is either equal to or lower
1634	            than that in the offer.  Note that a highest level higher
1635	            than level-id in the offer for receiving can be included as
1636	            max-recv-level-id.

1638	         In SDP offer/answer, when the SDP answer does include the recv-
1639	         ols-id parameter that is less than the sprop-ols-id parameter
1640	         in the SDP offer, the set of tier- flag, profile-id, sub-
1641	         profile-id, interop-constraints, and level-id parameters
1642	         included in the answer MUST be consistent with that for the
1643	         chosen output layer set as indicated in the SDP offer, with the
1644	         exception that the level-id parameter in the SDP answer is
1645	         changeable as long as the highest level indicated by the answer
1646	         is either lower than or equal to that in the offer.

1648	         More specifications of these parameters, including how they
1649	         relate to syntax elements specified in [VVC] are provided
1650	         below.

1652	      profile-id:

1654	         When profile-id is not present, a value of 1 (i.e., the Main 10
1655	         profile) MUST be inferred.

1657	         When used to indicate properties of a bitstream, profile-id is
1658	         derived from the general_profile_idc syntax element that
1659	         applies to the bitstream in an instance of the
1660	         profile_tier_level( ) syntax structure.

1662	         VVC bitstreams transported over RTP using the technologies of
1663	         this memo SHOULD contain only a single profile_tier_level( )
1664	         structure in the DCI, unless the sender can assure that a
1665	         receiver can correctly decode the VVC bitstream regardless of
1666	         which profile_tier_level( ) structure contained in the DCI was
1667	         used for deriving profile-id and other parameters for the SDP
1668	         O/A exchange.

1670	         As specified in [VVC], a profile_tier_level( ) syntax structure
1671	         may be contained in an SPS NAL unit, and one or more
1672	         profile_tier_level( ) syntax structures may be contained in a
1673	         VPS NAL unit and in a DCI NAL unit.  One of the following three
1674	         cases applies to the container NAL unit of the
1675	         profile_tier_level( ) syntax structure containing syntax
1676	         elements used to derive the values of profile-id, tier-flag,
1677	         level-id, sub-profile-id, or interop-constraints: 1) The
1678	         container NAL unit is an SPS, the bitstream is a single-layer
1679	         bitstream, and the profile_tier_level( ) syntax structures in
1680	         all SPSs referenced by the CVSs in the bitstream has the same
1681	         values respectively for those profile_tier_level( ) syntax
1682	         elements; 2) The container NAL unit is a VPS, the
1683	         profile_tier_level( ) syntax structure is the one in the VPS
1684	         that applies to the OLS corresponding to the bitstream, and the
1685	         profile_tier_level( ) syntax structures applicable to the OLS
1686	         corresponding to the bitstream in all VPSs referenced by the
1687	         CVSs in the bitstream have the same values respectively for
1688	         those profile_tier_level( ) syntax elements; 3) The container
1689	         NAL unit is a DCI NAL unit and the profile_tier_level( ) syntax
1690	         structures in all DCI NAL units in the bitstream has the same
1691	         values respectively for those profile_tier_level( ) syntax
1692	         elements.

1694	         [VVC] allows for multiple profile_tier_level( ) structures in a
1695	         DCI NAL unit, which may contain different values for the syntax
1696	         elements used to derive the values of profile-id, tier-flag,
1697	         level-id, sub-profile-id, or interop-constraints in the
1698	         different entries.  However, herein defined is only a single
1699	         profile-id, tier-flag, level-id, sub-profile-id, or interop-
1700	         constraints.  When signalling these parameters and a DCI NAL
1701	         unit is present with multiple profile_tier_level( ) structures,
1702	         these values SHOULD be the same as the first profile_tier_level
1703	         structure in the DCI, unless the sender has ensured that the
1704	         receiver can decode the bitstream when a different value is
1705	         chosen.

1707	      tier-flag, level-id:

1709	         The value of tier-flag MUST be in the range of 0 to 1,
1710	         inclusive.  The value of level-id MUST be in the range of 0 to
1711	         255, inclusive.

1713	         If the tier-flag and level-id parameters are used to indicate
1714	         properties of a bitstream, they indicate the tier and the
1715	         highest level the bitstream complies with.

1717	         If the tier-flag and level-id parameters are used for
1718	         capability exchange, the following applies.  If max-recv-level-
1719	         id is not present, the default level defined by level-id
1720	         indicates the highest level the codec wishes to support.
1721	         Otherwise, max-recv-level-id indicates the highest level the
1722	         codec supports for receiving.  For either receiving or sending,
1723	         all levels that are lower than the highest level supported MUST
1724	         also be supported.

1726	         If no tier-flag is present, a value of 0 MUST be inferred; if
1727	         no level-id is present, a value of 51 (i.e., level 3.1) MUST be
1728	         inferred.

1730	            Informative note: The level values currently defined in the
1731	            VVC specification are in the form of "majorNum.minorNum",
1732	            and the value of the level-id for each of the levels is
1733	            equal to majorNum * 16 + minorNum * 3.  It is expected that
1734	            if any levels are defined in the future, the same convention
1735	            will be used, but this cannot be guaranteed.

1737	         When used to indicate properties of a bitstream, the tier-flag
1738	         and level-id parameters are derived respectively from the
1739	         syntax element general_tier_flag, and the syntax element
1740	         general_level_idc or sub_layer_level_idc[j], that apply to the
1741	         bitstream, in an instance of the profile_tier_level( ) syntax
1742	         structure.

1744	         If the tier-flag and level-id are derived from the
1745	         profile_tier_level( ) syntax structure in a DCI NAL unit, the
1746	         following applies:

1748	         o  tier-flag = general_tier_flag

1750	         o  level-id = general_level_idc

1752	         Otherwise, if the tier-flag and level-id are derived from the
1753	         profile_tier_level( ) syntax structure in an SPS or VPS NAL
1754	         unit, and the bitstream contains the highest sublayer
1755	         representation in the OLS corresponding to the bitstream, the
1756	         following applies:

1758	         o  tier-flag = general_tier_flag

1760	         o  level-id = general_level_idc

1762	         Otherwise, if the tier-flag and level-id are derived from the
1763	         profile_tier_level( ) syntax structure in an SPS or VPS NAL
1764	         unit, and the bitstream does not contain the highest sublayer
1765	         representation in the OLS corresponding to the bitstream, the
1766	         following applies, with j being the value of the sprop-
1767	         sublayer-id parameter:

1769	         o  tier-flag = general_tier_flag

1771	         o  level-id = sub_layer_level_idc[j]

1773	      sub-profile-id:

1775	         The value of the parameter is a comma-separated (',') list of
1776	         data using base64 [RFC4648] representation.

1778	         When used to indicate properties of a bitstream, sub-profile-id
1779	         is derived from each of the ptl_num_sub_profiles
1780	         general_sub_profile_idc[i] syntax elements that apply to the
1781	         bitstream in a profile_tier_level( ) syntax structure.

1783	      interop-constraints:

1785	         A base64 [RFC4648] representation of the data that includes the
1786	         syntax elements ptl_frame_only_constraint_flag and
1787	         ptl_multilayer_enabled_flag and the general_constraints_info( )
1788	         syntax structure that apply to the bitstream in an instance of
1789	         the profile_tier_level( ) syntax structure.

1791	         If the interop-constraints parameter is not present, the
1792	         following MUST be inferred:

1794	         o  ptl_frame_only_constraint_flag = 1

1796	         o  ptl_multilayer_enabled_flag = 0

1798	         o  gci_present_flag in the general_constraints_info( ) syntax
1799	            structure = 0

1801	         Using interop-constraints for capability exchange results in a
1802	         requirement on any bitstream to be compliant with the interop-
1803	         constraints.

1805	      sprop-sublayer-id:

1807	         This parameter MAY be used to indicate the highest allowed
1808	         value of TID in the bitstream.  When not present, the value of
1809	         sprop-sublayer-id is inferred to be equal to 6.

1811	         The value of sprop-sublayer-id MUST be in the range of 0 to 6,
1812	         inclusive.

1814	      sprop-ols-id:

1816	         This parameter MAY be used to indicate the OLS that the
1817	         bitstream applies to.  When not present, the value of sprop-
1818	         ols-id is inferred to be equal to TargetOlsIdx as specified in
1819	         8.1.1 in [VVC].  If this optional parameter is present, sprop-
1820	         vps MUST also be present or its content MUST be known a priori
1821	         at the receiver.

1823	         The value of sprop-ols-id MUST be in the range of 0 to 256,
1824	         inclusive.

1826	            Informative note: VVC allows having up to 257 output layer
1827	            sets indicated in the VPS as the number of output layer sets
1828	            minus 2 is indicated with a field of 8 bits.

1830	      recv-sublayer-id:

1832	         This parameter MAY be used to signal a receiver's choice of the
1833	         offered or declared sublayer representations in the sprop-vps
1834	         and sprop-sps.  The value of recv-sublayer-id indicates the TID
1835	         of the highest sublayer that a receiver supports.  When not
1836	         present, the value of recv-sublayer-id is inferred to be equal
1837	         to the value of the sprop-sublayer-id parameter in the SDP
1838	         offer.

1840	         The value of recv-sublayer-id MUST be in the range of 0 to 6,
1841	         inclusive.

1843	      recv-ols-id:

1845	         This parameter MAY be used to signal a receiver's choice of the
1846	         offered or declared output layer sets in the sprop-vps.  The
1847	         value of recv-ols-id indicates the OLS index of the bitstream
1848	         that a receiver supports.  When not present, the value of recv-
1849	         ols-id is inferred to be equal to value of the sprop-ols-id
1850	         parameter inferred from or indicated in the SDP offer.  When
1851	         present, the value of recv-ols-id must be included only when
1852	         sprop-ols-id was received and must refer to an output layer set
1853	         in the VPS that includes no layers other than all or a subset
1854	         of the layers of the OLS referred to by sprop-ols-id.  If this
1855	         optional parameter is present, sprop-vps must have been
1856	         received or its content must be known a priori at the receiver.

1858	         The value of recv-ols-id MUST be in the range of 0 to 256,
1859	         inclusive.

1861	      max-recv-level-id:

1863	         This parameter MAY be used to indicate the highest level a
1864	         receiver supports.

1866	         The value of max-recv-level-id MUST be in the range of 0 to
1867	         255, inclusive.

1869	         When max-recv-level-id is not present, the value is inferred to
1870	         be equal to level-id.

1872	         max-recv-level-id MUST NOT be present when the highest level
1873	         the receiver supports is not higher than the default level.

1875	      sprop-dci:

1877	         This parameter MAY be used to convey a decoding capability
1878	         information NAL unit of the bitstream for out-of-band
1879	         transmission.  The parameter MAY also be used for capability
1880	         exchange.  The value of the parameter a base64 [RFC4648]
1881	         representations of the decoding capability information NAL unit
1882	         as specified in Section 7.3.2.1 of [VVC].

1884	      sprop-vps:

1886	         This parameter MAY be used to convey any video parameter set
1887	         NAL unit of the bitstream for out-of-band transmission of video
1888	         parameter sets.  The parameter MAY also be used for capability
1889	         exchange and to indicate sub-stream characteristics (i.e.,
1890	         properties of output layer sets and sublayer representations as
1891	         defined in [VVC]).  The value of the parameter is a comma-
1892	         separated (',') list of base64 [RFC4648] representations of the
1893	         video parameter set NAL units as specified in Section 7.3.2.3
1894	         of [VVC].

1896	         The sprop-vps parameter MAY contain one or more than one video
1897	         parameter set NAL units.  However, all other video parameter
1898	         sets contained in the sprop-vps parameter MUST be consistent
1899	         with the first video parameter set in the sprop-vps parameter.
1900	         A video parameter set vpsB is said to be consistent with
1901	         another video parameter set vpsA if the number of OLSs in vpsA
1902	         and vpsB is the same and any decoder that conforms to the
1903	         profile, tier, level, and constraints indicated by the data
1904	         starting from the syntax element general_profile_idc to the
1905	         syntax structure general_constraints_info(), inclusive, in the
1906	         profile_tier_level( ) syntax structure corresponding to any OLS
1907	         with index olsIdx in vpsA can decode any CVS(s) referencing
1908	         vpsB when TargetOlsIdx is equal to olsIdx that conforms to the
1909	         profile, tier, level, and constraints indicated by the data
1910	         starting from the syntax element general_profile_idc to the
1911	         syntax structure general_constraints_info(), inclusive, in the
1912	         profile_tier_level( ) syntax structure corresponding to the OLS
1913	         with index TargetOlsIdx in vpsB.

1915	      sprop-sps:

1917	         This parameter MAY be used to convey sequence parameter set NAL
1918	         units of the bitstream for out-of-band transmission of sequence
1919	         parameter sets.  The value of the parameter is a comma-
1920	         separated (',') list of base64 [RFC4648] representations of the
1921	         sequence parameter set NAL units as specified in
1922	         Section 7.3.2.4 of [VVC].

1924	         A sequence parameter set spsB is said to be consistent with
1925	         another sequence parameter set spsA if any decoder that
1926	         conforms to the profile, tier, level, and constraints indicated
1927	         by the data starting from the syntax element
1928	         general_profile_idc to the syntax structure
1929	         general_constraints_info(), inclusive, in the
1930	         profile_tier_level( ) syntax structure in spsA can decode any
1931	         CLVS(s) referencing spsB that conforms to the profile, tier,
1932	         level, and constraints indicated by the data starting from the
1933	         syntax element general_profile_idc to the syntax structure
1934	         general_constraints_info(), inclusive, in the
1935	         profile_tier_level( ) syntax structure in spsB.

1937	      sprop-pps:

1939	         This parameter MAY be used to convey picture parameter set NAL
1940	         units of the bitstream for out-of-band transmission of picture
1941	         parameter sets.  The value of the parameter is a comma-
1942	         separated (',') list of base64 [RFC4648] representations of the
1943	         picture parameter set NAL units as specified in Section 7.3.2.5
1944	         of [VVC].

1946	      sprop-sei:

1948	         This parameter MAY be used to convey one or more SEI messages
1949	         that describe bitstream characteristics.  When present, a
1950	         decoder can rely on the bitstream characteristics that are
1951	         described in the SEI messages for the entire duration of the
1952	         session, independently from the persistence scopes of the SEI
1953	         messages as specified in [VSEI].

1955	         The value of the parameter is a comma-separated (',') list of
1956	         base64 [RFC4648] representations of SEI NAL units as specified
1957	         in [VSEI].

1959	            Informative note: Intentionally, no list of applicable or
1960	            inapplicable SEI messages is specified here.  Conveying
1961	            certain SEI messages in sprop-sei may be sensible in some
1962	            application scenarios and meaningless in others.  However, a
1963	            few examples are described below:

1965	            1) In an environment where the bitstream was created from
1966	            film-based source material, and no splicing is going to
1967	            occur during the lifetime of the session, the film grain
1968	            characteristics SEI message is likely meaningful, and
1969	            sending it in sprop-sei rather than in the bitstream at each
1970	            entry point may help with saving bits and allows one to
1971	            configure the renderer only once, avoiding unwanted
1972	            artifacts.

1974	            2) Examples for SEI messages that would be meaningless to be
1975	            conveyed in sprop-sei include the decoded picture hash SEI
1976	            message (it is close to impossible that all decoded pictures
1977	            have the same hashtag) or the filler payload SEI message (as
1978	            there is no point in just having more bits in SDP).

1980	      max-lsr:

1982	         The max-lsr MAY be used to signal the capabilities of a
1983	         receiver implementation and MUST NOT be used for any other
1984	         purpose.  The value of max-lsr is an integer indicating the
1985	         maximum processing rate in units of luma samples per second.
1986	         The max-lsr parameter signals that the receiver is capable of
1987	         decoding video at a higher rate than is required by the highest
1988	         level.

1990	            Informative note: When the OPTIONAL media type parameters
1991	            are used to signal the properties of a bitstream, and max-
1992	            lsr is not present, the values of tier-flag, profile-id,
1993	            sub-profile-id interop-constraints, and level-id must always
1994	            be such that the bitstream complies fully with the specified
1995	            profile, tier, and level.

1997	         When max-lsr is signalled, the receiver MUST be able to decode
1998	         bitstreams that conform to the highest level, with the
1999	         exception that the MaxLumaSr value in Table 136 of [VVC] for
2000	         the highest level is replaced with the value of max-lsr.
2001	         Senders MAY use this knowledge to send pictures of a given size
2002	         at a higher picture rate than is indicated in the highest
2003	         level.

2005	         When not present, the value of max-lsr is inferred to be equal
2006	         to the value of MaxLumaSr given in Table 136 of [VVC] for the
2007	         highest level.

2009	         The value of max-lsr MUST be in the range of MaxLumaSr to 16 *
2010	         MaxLumaSr, inclusive, where MaxLumaSr is given in Table 136 of
2011	         [VVC] for the highest level.

2013	      max-fps:

2015	         The value of max-fps is an integer indicating the maximum
2016	         picture rate in units of pictures per 100 seconds that can be
2017	         effectively processed by the receiver.  The max-fps parameter
2018	         MAY be used to signal that the receiver has a constraint in
2019	         that it is not capable of processing video effectively at the
2020	         full picture rate that is implied by the highest level and,
2021	         when present, max-lsr.

2023	         The value of max-fps is not necessarily the picture rate at
2024	         which the maximum picture size can be sent, it constitutes a
2025	         constraint on maximum picture rate for all resolutions.

2027	            Informative note: The max-fps parameter is semantically
2028	            different from max-lsr in that max-fps is used to signal a
2029	            constraint, lowering the maximum picture rate from what is
2030	            implied by other parameters.

2032	         The encoder MUST use a picture rate equal to or less than this
2033	         value.  In cases where the max-fps parameter is absent, the
2034	         encoder is free to choose any picture rate according to the
2035	         highest level and any signalled optional parameters.

2037	         The value of max-fps MUST be smaller than or equal to the full
2038	         picture rate that is implied by the highest level and, when
2039	         present, max-lsr.

2041	      sprop-max-don-diff:

2043	         If there is no NAL unit naluA that is followed in transmission
2044	         order by any NAL unit preceding naluA in decoding order (i.e.,
2045	         the transmission order of the NAL units is the same as the
2046	         decoding order), the value of this parameter MUST be equal to
2047	         0.

2049	         Otherwise, this parameter specifies the maximum absolute
2050	         difference between the decoding order number (i.e., AbsDon)
2051	         values of any two NAL units naluA and naluB, where naluA
2052	         follows naluB in decoding order and precedes naluB in
2053	         transmission order.

2055	         The value of sprop-max-don-diff MUST be an integer in the range
2056	         of 0 to 32767, inclusive.

2058	         When not present, the value of sprop-max-don-diff is inferred
2059	         to be equal to 0.

2061	      sprop-depack-buf-bytes:

2063	         This parameter signals the required size of the de-
2064	         packetization buffer in units of bytes.  The value of the
2065	         parameter MUST be greater than or equal to the maximum buffer
2066	         occupancy (in units of bytes) of the de-packetization buffer as
2067	         specified in Section 6.

2069	         The value of sprop-depack-buf-bytes MUST be an integer in the
2070	         range of 0 to 4294967295, inclusive.

2072	         When sprop-max-don-diff is present and greater than 0, this
2073	         parameter MUST be present and the value MUST be greater than 0.
2074	         When not present, the value of sprop-depack-buf-bytes is
2075	         inferred to be equal to 0.

2077	            Informative note: The value of sprop-depack-buf-bytes
2078	            indicates the required size of the de-packetization buffer
2079	            only.  When network jitter can occur, an appropriately sized
2080	            jitter buffer has to be available as well.

2082	      depack-buf-cap:

2084	         This parameter signals the capabilities of a receiver
2085	         implementation and indicates the amount of de-packetization
2086	         buffer space in units of bytes that the receiver has available
2087	         for reconstructing the NAL unit decoding order from NAL units
2088	         carried in the RTP stream.  A receiver is able to handle any
2089	         RTP stream for which the value of the sprop-depack-buf-bytes
2090	         parameter is smaller than or equal to this parameter.

2092	         When not present, the value of depack-buf-cap is inferred to be
2093	         equal to 4294967295.  The value of depack-buf-cap MUST be an
2094	         integer in the range of 1 to 4294967295, inclusive.

2096	            Informative note: depack-buf-cap indicates the maximum
2097	            possible size of the de-packetization buffer of the receiver
2098	            only, without allowing for network jitter.

2100	7.2.  SDP Parameters

2102	   The receiver MUST ignore any parameter unspecified in this memo.

2104	7.2.1.  Mapping of Payload Type Parameters to SDP

2106	   The media type video/H266 string is mapped to fields in the Session
2107	   Description Protocol (SDP) [RFC4566] as follows:

2109	   *  The media name in the "m=" line of SDP MUST be video.

2111	   *  The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the
2112	      media subtype).

2114	   *  The clock rate in the "a=rtpmap" line MUST be 90000.

2116	   *  The OPTIONAL parameters profile-id, tier-flag, sub-profile-id,
2117	      interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id,
2118	      recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max-
2119	      fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf-
2120	      cap, when present, MUST be included in the "a=fmtp" line of SDP.
2121	      This parameter is expressed as a media type string, in the form of
2122	      a semicolon-separated list of parameter=value pairs.

2124	   *  The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei,
2125	      and sprop-dci, when present, MUST be included in the "a=fmtp" line
2126	      of SDP or conveyed using the "fmtp" source attribute as specified
2127	      in Section 6.3 of [RFC5576].  For a particular media format (i.e.,
2128	      RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or
2129	      sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP
2130	      and conveyed using the "fmtp" source attribute.  When included in
2131	      the "a=fmtp" line of SDP, those parameters are expressed as a
2132	      media type string, in the form of a semicolon-separated list of
2133	      parameter=value pairs.  When conveyed in the "a=fmtp" line of SDP
2134	      for a particular payload type, the parameters sprop-vps, sprop-
2135	      sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each
2136	      SSRC with the payload type.  When conveyed using the "fmtp" source
2137	      attribute, these parameters are only associated with the given
2138	      source and payload type as parts of the "fmtp" source attribute.

2140	   An example of media representation in SDP is as follows:

2142	           m=video 49170 RTP/AVP 98
2143	           a=rtpmap:98 H266/90000
2144	           a=fmtp:98 profile-id=1;
2145	             sprop-vps=<video parameter sets data>;
2146	             sprop-sps=<sequence parameter set data>;
2147	             sprop-pps=<picture parameter set data>;

2149	7.2.2.  Usage with SDP Offer/Answer Model

2151	   This section describes the negotiation of unicast messages using the
2152	   offer-answer model as described in [RFC3264] and its updates.  The
2153	   section is split into subsections, covering a) media format
2154	   configurations not involving non-temporal scalability; b) scalable
2155	   media format configurations; c) the description of the use of those
2156	   parameters not involving the media configuration itself but rather
2157	   the parameters of the payload format design; and d) multicast.

2159	7.2.2.1.  Non-scalable media format configuration

2161	   A non-scalable VVC media configuration is such a configuration where
2162	   no non-temporal scalability mechanisms are allowed.  In [VVC] version
2163	   1, that implies that general_profile_idc indicates one of the
2164	   following profiles: Main10, Main10 Still Picture, Main 10 4:4:4,
2165	   Main10 4:4:4 Still Picture, with general_profile_idc values of 1, 65,
2166	   33, and 97, respectively.  Note that non-scalable media
2167	   configurations includes temporal scalability, inline with VVC's
2168	   design philosophy and profile structure.

2170	   The following limitations and rules pertaining to the media
2171	   configuration apply:

2173	   *  The parameters identifying a media format configuration for VVC
2174	      are profile-id, tier-flag, sub-profile-id, level-id, and interop-
2175	      constraints.  These media configuration parameters, except level-
2176	      id, MUST be used symmetrically.

2178	      The answerer MUST structure its answer in according to one of the
2179	      following three options:

2181	      1) maintain all configuration parameters with the values remaining
2182	      the same as in the offer for the media format (payload type), with
2183	      the exception that the value of level-id is changeable as long as
2184	      the highest level indicated by the answer is not higher than that
2185	      indicated by the offer;

2187	      2) include in the answer the recv-sublayer-id parameter, with a
2188	      value less than the sprop-sublayer-id parameter in the offer, for
2189	      the media format (payload type), and maintain all configuration
2190	      parameters with the values remaining the same as in the offer for
2191	      the media format (payload type), with the exception that the value
2192	      of level-id is changeable as long as the highest level indicated
2193	      by the answer is not higher than the level indicated by the sprop-
2194	      sps or sprop-vps in offer for the chosen sublayer representation;
2195	      or

2197	      3) remove the media format (payload type) completely (when one or
2198	      more of the parameter values are not supported).

2200	            Informative note: The above requirement for symmetric use
2201	            does not apply for level-id, and does not apply for the
2202	            other bitstream or RTP stream properties and capability
2203	            parameters as described in Section 7.2.2.3 below.

2205	   *  To simplify handling and matching of these configurations, the
2206	      same RTP payload type number used in the offer SHOULD also be used
2207	      in the answer, as specified in [RFC3264].

2209	   *  The same RTP payload type number used in the offer for the media
2210	      subtype H266 MUST be used in the answer when the answer includes
2211	      recv-sublayer-id.  When the answer does not include recv-sublayer-
2212	      id, the answer MUST NOT contain a payload type number used in the
2213	      offer for the media subtype H266 unless the configuration is
2214	      exactly the same as in the offer or the configuration in the
2215	      answer only differs from that in the offer with a different value
2216	      of level-id.  The answer MAY contain the recv-sublayer-id
2217	      parameter if an VVC bitstream contains multiple operation points
2218	      (using temporal scalability and sublayers) and sprop-sps or sprop-
2219	      vps is included in the offer where information of sublayers are
2220	      present in the first sequence parameter set or video parameter set
2221	      contained in sprop-sps or sprop-vps respectively.  If the sprop-
2222	      sps or sprop-vps is provided in an offer, an answerer MAY select a
2223	      particular operation point indicated in the first sequence
2224	      parameter set or video parameter set contained in sprop-sps or
2225	      sprop-vps respectively.  When the answer includes a recv-sublayer-
2226	      id that is less than a sprop-sublayer-id in the offer, the
2227	      following applies:

2229	      1) When sprop-sps parameter is present, all sequence parameter
2230	      sets contained in the sprop-sps parameter in the SDP answer and
2231	      all sequence parameter sets sent in-band for either the offerer-
2232	      to-answerer direction or the answerer-to-offerer direction MUST be
2233	      consistent with the first sequence parameter set in the sprop-sps
2234	      parameter of the offer (see the semantics of sprop-sps in
2235	      Section 7.1 of this document on one sequence parameter set being
2236	      consistent with another sequence parameter set).

2238	      2) When sprop-vps parameter is present, all video parameter sets
2239	      contained in the sprop-vps parameter in the SDP answer and all
2240	      video parameter sets sent in-band for either the offerer-to-
2241	      answerer direction or the answerer-to-offerer direction MUST be
2242	      consistent with the first video parameter set in the sprop-vps
2243	      parameter of the offer (see the semantics of sprop-vps in
2244	      Section 7.1 of this document on one video parameter set being
2245	      consistent with another video parameter set).

2247	      3) The bitstream sent in either direction MUST conform to the
2248	      profile, tier, level, and constraints of the chosen sublayer
2249	      representation as indicated by the profile_tier_level( ) syntax
2250	      structure in the first sequence parameter set in the sprop-sps
2251	      parameter or by the first profile_tier_level( ) syntax structure
2252	      in the first video parameter set in the sprop-vps parameter of the
2253	      offer.

2255	            Informative note: When an offerer receives an answer that
2256	            does not include recv-sublayer-id, it has to compare payload
2257	            types not declared in the offer based on the media type
2258	            (i.e., video/H266) and the above media configuration
2259	            parameters with any payload types it has already declared.
2260	            This will enable it to determine whether the configuration
2261	            in question is new or if it is equivalent to configuration
2262	            already offered, since a different payload type number may
2263	            be used in the answer.  The ability to perform operation
2264	            point selection enables a receiver to utilize the temporal
2265	            scalable nature of an VVC bitstream.

2267	7.2.2.2.  Scalable media format configuration

2269	   A scalable VVC media configuration is such a configuration where non-
2270	   temporal scalability mechanisms are allowed.  In [VVC] version 1,
2271	   that implies that general_profile_idc indicates one of the following
2272	   profiles: Multilayer Main 10, and Multilayer Main 10 4:4:4, with
2273	   general_profile_idc values of 17 and 49, respectively.

2275	   The following limitations and rules pertaining to the media
2276	   configuration apply.  They are listed in an order that would be
2277	   logical for an implementation to follow:

2279	   *  The parameters identifying a media format configuration for
2280	      scalable VVC are profile-id, tier-flag, sub-profile-id, level-id,
2281	      interop-constraints, and sprop-vps.  These media configuration
2282	      parameters, except level-id, MUST be used symmetrically, except as
2283	      noted below.

2285	   *  The answerer MAY include a level-id that MUST be lower than or
2286	      equal to the level-id indicated in the offer (either expressed by
2287	      level-id in the offer, or implied by the default level as specific
2288	      in Section 7.1).

2290	   *  When sprop-ols-id is present in an offer, sprop-vps MUST also be
2291	      present in the same offer and including at least one valid VPS, so
2292	      to allow the answerer to meaningfully interpret sprop-ols-id and
2293	      select recv-ols-id (see below).

2295	   *  The answerer MUST NOT include recv-ols-id unless the offer
2296	      includes sprop-ols-id.  When present, recv-ols-id MUST indicate a
2297	      supported output layer set in the VPS that includes no layers
2298	      other than all or a subset of the layers of the OLS referred to by
2299	      sprop-ols-id.  If unable, the answerer MUST remove the media
2300	      format.

2302	         Informative note: if an offerer wants to offer more than one
2303	         output layer set, it can do so by offering multiple VVC media
2304	         with different payload types.

2306	   *  The offerer MAY include sprop-sublayer-id which indicates the
2307	      highest allowed value of TID in the bitstream.  The answerer MAY
2308	      include recv-sublayer-id which can be used to reduce the number of
2309	      sublayers from the value of sprop-sublayer-id.

2311	   *  When the answerer includes recv-ols-id and configuration
2312	      parameters profile-id, tier-flag, sub-profile-id, level-id, and
2313	      interop-constraints, it MUST use the configuration parameter
2314	      values as signaled in the sprop-vps for the operating point with
2315	      the largest number of sublayers for the chosen output layer set,
2316	      with the exception that the value of level-id is changeable as
2317	      long as the highest level indicated by the answer is not higher
2318	      than the level indicated by the sprop-vps in offer for the
2319	      operating point with the largest number of sublayers for the
2320	      chosen output layer set.

2322	7.2.2.3.  Payload format configuration

2324	   The following limitations and rules pertain to the configuration of
2325	   the payload format buffer management mostly and apply to both
2326	   scalable and non-scalable VVC.

2328	   *  The parameters sprop-max-don-diff, and sprop-depack-buf-bytes
2329	      describe the properties of an RTP stream that the offerer or the
2330	      answerer is sending for the media format configuration.  This
2331	      differs from the normal usage of the offer/answer parameters:
2332	      normally such parameters declare the properties of the bitstream
2333	      or RTP stream that the offerer or the answerer is able to receive.
2334	      When dealing with VVC, the offerer assumes that the answerer will
2335	      be able to receive media encoded using the configuration being
2336	      offered.

2338	         Informative note: The above parameters apply for any RTP
2339	         stream, when present, sent by a declaring entity with the same
2340	         configuration.  In other words, the applicability of the above
2341	         parameters to RTP streams depends on the source endpoint.
2342	         Rather than being bound to the payload type, the values may
2343	         have to be applied to another payload type when being sent, as
2344	         they apply for the configuration.

2346	   *  The capability parameter max-lsr MAY be used to declare further
2347	      capabilities of the offerer or answerer for receiving.  It MUST
2348	      NOT be present when the direction attribute is sendonly.

2350	   *  The capability parameter max-fps MAY be used to declare lower
2351	      capabilities of the offerer or answerer for receiving.  It MUST
2352	      NOT be present when the direction attribute is sendonly.

2354	   *  When an offerer offers an interleaved stream, indicated by the
2355	      presence of sprop-max-don-diff with a value larger than zero, the
2356	      offerer MUST include the size of the de-packetization buffer
2357	      sprop-depack-buf-bytes.

2359	   *  To enable the offerer and answerer to inform each other about
2360	      their capabilities for de-packetization buffering in receiving RTP
2361	      streams, both parties are RECOMMENDED to include depack-buf-cap.

2363	   *  The sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when present
2364	      (included in the "a=fmtp" line of SDP or conveyed using the "fmtp"
2365	      source attribute as specified in Section 6.3 of [RFC5576]), are
2366	      used for out-of-band transport of the parameter sets (DCI, VPS,
2367	      SPS, or PPS, respectively).

2369	   *  The answerer MAY use either out-of-band or in-band transport of
2370	      parameter sets for the bitstream it is sending, regardless of
2371	      whether out-of-band parameter sets transport has been used in the
2372	      offerer-to-answerer direction.  Parameter sets included in an
2373	      answer are independent of those parameter sets included in the
2374	      offer, as they are used for decoding two different bitstreams, one
2375	      from the answerer to the offerer and the other in the opposit
2376	      direction.  In case some RTP packets are sent before the SDP
2377	      offer/answer settles down, in-band parameter sets MUST be used for
2378	      those RTP stream parts sent before the SDP offer/answer.

2380	   *  The following rules apply to transport of parameter set in the
2381	      offerer-to-answerer direction.

2383	      -  An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2384	         sprop-pps.  If none of these parameters is present in the
2385	         offer, then only in-band transport of parameter sets is used.

2387	      -  If the level to use in the offerer-to-answerer direction is
2388	         equal to the default level in the offer, the answerer MUST be
2389	         prepared to use the parameter sets included in sprop-vps,
2390	         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
2391	         of SDP or conveyed using the "fmtp" source attribute) for
2392	         decoding the incoming bitstream, e.g., by passing these
2393	         parameter set NAL units to the video decoder before passing any
2394	         NAL units carried in the RTP streams.  Otherwise, the answerer
2395	         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
2396	         included in the "a=fmtp" line of SDP or conveyed using the
2397	         "fmtp" source attribute) and the offerer MUST transmit
2398	         parameter sets in-band.

2400	   *  The following rules apply to transport of parameter set in the
2401	      answerer-to-offerer direction.

2403	      -  An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2404	         sprop-pps.  If none of these parameters is present in the
2405	         answer, then only in-band transport of parameter sets is used.

2407	      -  The offerer MUST be prepared to use the parameter sets included
2408	         in sprop-vps, sprop-sps, and sprop-pps (either included in the
2409	         "a=fmtp" line of SDP or conveyed using the "fmtp" source
2410	         attribute) for decoding the incoming bitstream, e.g., by
2411	         passing these parameter set NAL units to the video decoder
2412	         before passing any NAL units carried in the RTP streams.

2414	   *  When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are
2415	      conveyed using the "fmtp" source attribute as specified in
2416	      Section 6.3 of [RFC5576], the receiver of the parameters MUST
2417	      store the parameter sets included in sprop-dci, sprop-vps, sprop-
2418	      sps, and/or sprop-pps and associate them with the source given as
2419	      part of the "fmtp" source attribute.  Parameter sets associated
2420	      with one source (given as part of the "fmtp" source attribute)
2421	      MUST only be used to decode NAL units conveyed in RTP packets from
2422	      the same source (given as part of the "fmtp" source attribute).
2423	      When this mechanism is in use, SSRC collision detection and
2424	      resolution MUST be performed as specified in [RFC5576].

2426	   Table 1 lists the interpretation of all the parameters that MAY be
2427	   used for the various combinations of offer, answer, and direction
2428	   attributes.  Note that the two columns wherein the recv-ols-id
2429	   parameter is used only apply to answers, whereas the other columns
2430	   apply to both offers and answers.

2432	                                       sendonly --+
2433	               answer: recvonly, recv-ols-id --+  |
2434	                 recvonly w/o recv-ols-id --+  |  |
2435	         answer: sendrecv, recv-ols-id --+  |  |  |
2436	           sendrecv w/o recv-ols-id --+  |  |  |  |
2437	                                      |  |  |  |  |
2438	   profile-id                         C  D  C  D  P
2439	   tier-flag                          C  D  C  D  P
2440	   level-id                           D  D  D  D  P
2441	   sub-profile-id                     C  D  C  D  P
2442	   interop-constraints                C  D  C  D  P
2443	   max-recv-level-id                  R  R  R  R  -
2444	   sprop-max-don-diff                 P  P  -  -  P
2445	   sprop-depack-buf-bytes             P  P  -  -  P
2446	   depack-buf-cap                     R  R  R  R  -
2447	   max-lsr                            R  R  R  R  -
2448	   max-fps                            R  R  R  R  -
2449	   sprop-dci                          P  P -  -  P
2450	   sprop-sei                          P  P -  -  P
2451	   sprop-vps                          P  P  -  -  P
2452	   sprop-sps                          P  P  -  -  P
2453	   sprop-pps                          P  P  -  -  P
2454	   sprop-sublayer-id                 P  P  -  -  P
2455	   recv-sublayer-id                  O  O  O  O  -
2456	   sprop-ols-id                       P  P  -  -  P
2457	   recv-ols-id                        X  O  X  O  -

2459	   Table 1.  Interpretation of parameters for various combinations of
2460	   offers, answers, direction attributes, with and without recv-ols-id.
2461	   Columns that do not indicate offer or answer apply to both.

2463	   Legend:

2465	    C: configuration for sending and receiving bitstreams
2466	    D: changeable configuration, same as C except possible
2467	       to answer with a different but consistent value (see the
2468	       semantics of the six parameters related to profile, tier,
2469	       and level on these parameters being consistent)
2470	    P: properties of the bitstream to be sent
2471	    R: receiver capabilities
2472	    O: operation point selection
2473	    X: MUST NOT be present
2474	    -: not usable, when present MUST be ignored

2476	   Parameters used for declaring receiver capabilities are, in general,
2477	   downgradable; i.e., they express the upper limit for a sender's
2478	   possible behavior.  Thus, a sender MAY select to set its encoder
2479	   using only lower/lesser or equal values of these parameters.

2481	   When the answer does not include a recv-ols-id that is less than the
2482	   sprop-ols-id in the offer, parameters declaring a configuration point
2483	   are not changeable, with the exception of the level-id parameter for
2484	   unicast usage, and these parameters express values a receiver expects
2485	   to be used and MUST be used verbatim in the answer as in the offer.

2487	   When a sender's capabilities are declared with the configuration
2488	   parameters, these parameters express a configuration that is
2489	   acceptable for the sender to receive bitstreams.  In order to achieve
2490	   high interoperability levels, it is often advisable to offer multiple
2491	   alternative configurations.  It is impossible to offer multiple
2492	   configurations in a single payload type.  Thus, when multiple
2493	   configuration offers are made, each offer requires its own RTP
2494	   payload type associated with the offer.  However, it is possible to
2495	   offer multiple operation points using one configuration in a single
2496	   payload type by including sprop-vps in the offer and recv-ols-id in
2497	   the answer.

2499	   A receiver SHOULD understand all media type parameters, even if it
2500	   only supports a subset of the payload format's functionality.  This
2501	   ensures that a receiver is capable of understanding when an offer to
2502	   receive media can be downgraded to what is supported by the receiver
2503	   of the offer.

2505	   An answerer MAY extend the offer with additional media format
2506	   configurations.  However, to enable their usage, in most cases a
2507	   second offer is required from the offerer to provide the bitstream
2508	   property parameters that the media sender will use.  This also has
2509	   the effect that the offerer has to be able to receive this media
2510	   format configuration, not only to send it.

2512	7.2.2.4.  Multicast

2514	   For bitstreams being delivered over multicast, the following rules
2515	   apply:

2517	   *  The media format configuration is identified by profile-id, tier-
2518	      flag, sub-profile-id, level-id, and interop-constraints.  These
2519	      media format configuration parameters, including level-id, MUST be
2520	      used symmetrically; that is, the answerer MUST either maintain all
2521	      configuration parameters or remove the media format (payload type)
2522	      completely.  Note that this implies that the level-id for offer/
2523	      answer in multicast is not changeable.

2525	   *  To simplify the handling and matching of these configurations, the
2526	      same RTP payload type number used in the offer SHOULD also be used
2527	      in the answer, as specified in [RFC3264].  An answer MUST NOT
2528	      contain a payload type number used in the offer unless the
2529	      configuration is the same as in the offer.

2531	   *  Parameter sets received MUST be associated with the originating
2532	      source and MUST only be used in decoding the incoming bitstream
2533	      from the same source.

2535	   *  The rules for other parameters are the same as above for unicast
2536	      as long as the three above rules are obeyed.

2538	7.2.3.  Usage in Declarative Session Descriptions

2540	   When VVC over RTP is offered with SDP in a declarative style, as in
2541	   Real Time Streaming Protocol (RTSP) [RFC2326] or Session Announcement
2542	   Protocol (SAP) [RFC2974], the following considerations are necessary.

2544	   *  All parameters capable of indicating both bitstream properties and
2545	      receiver capabilities are used to indicate only bitstream
2546	      properties.  For example, in this case, the parameter profile-id,
2547	      tier-id, level-id declares the values used by the bitstream, not
2548	      the capabilities for receiving bitstreams.  As a result, the
2549	      following interpretation of the parameters MUST be used:

2551	      -  Declaring actual configuration or bitstream properties:

2553	         o  profile-id

2555	         o  tier-flag

2557	         o  level-id

2559	         o  interop-constraints

2561	         o  sub-profile-id

2563	         o  sprop-dci

2565	         o  sprop-vps

2567	         o  sprop-sps

2569	         o  sprop-pps

2571	         o  sprop-max-don-diff
2572	         o  sprop-depack-buf-bytes

2574	         o  sprop-sublayer-id

2576	         o  sprop-ols-id

2578	         o  sprop-sei

2580	      -  Not usable (when present, they MUST be ignored):

2582	         o  max-lsr

2584	         o  max-fps

2586	         o  max-recv-level-id

2588	         o  depack-buf-cap

2590	         o  recv-sublayer-id

2592	         o  recv-ols-id

2594	      -  A receiver of the SDP is required to support all parameters and
2595	         values of the parameters provided; otherwise, the receiver MUST
2596	         reject (RTSP) or not participate in (SAP) the session.  It
2597	         falls on the creator of the session to use values that are
2598	         expected to be supported by the receiving application.

2600	7.2.4.  Considerations for Parameter Sets

2602	   When out-of-band transport of parameter sets is used, parameter sets
2603	   MAY still be additionally transported in-band unless explicitly
2604	   disallowed by an application, and some of these additional parameter
2605	   sets may update some of the out-of-band transported parameter sets.
2606	   Update of a parameter set refers to the sending of a parameter set of
2607	   the same type using the same parameter set ID but with different
2608	   values for at least one other parameter of the parameter set.

2610	8.  Use with Feedback Messages

2612	   The following subsections define the use of the Picture Loss
2613	   Indication (PLI) and Full Intra Request (FIR) feedback messages with
2614	   [VVC].  The PLI is defined in [RFC4585], and the FIR message is
2615	   defined in [RFC5104].  In accordance with this memo, unlike [HEVC], a
2616	   sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture
2617	   Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and
2618	   treat a received SLI as a PLI.

2620	8.1.  Picture Loss Indication (PLI)

2622	   As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
2623	   media sender indicates "the loss of an undefined amount of coded
2624	   video data belonging to one or more pictures".  Without having any
2625	   specific knowledge of the setup of the bitstream (such as use and
2626	   location of in-band parameter sets, non-IRAP decoder refresh points,
2627	   picture structures, and so forth), a reaction to the reception of an
2628	   PLI by a VVC sender SHOULD be to send an IRAP picture and relevant
2629	   parameter sets; potentially with sufficient redundancy so to ensure
2630	   correct reception.  However, sometimes information about the
2631	   bitstream structure is known.  For example, state could have been
2632	   established outside of the mechanisms defined in this document that
2633	   parameter sets are conveyed out of band only, and stay static for the
2634	   duration of the session.  In that case, it is obviously unnecessary
2635	   to send them in-band as a result of the reception of a PLI.  Other
2636	   examples could be devised based on a priori knowledge of different
2637	   aspects of the bitstream structure.  In all cases, the timing and
2638	   congestion control mechanisms of RFC 4585 MUST be observed.

2640	8.2.  Full Intra Request (FIR)

2642	   The purpose of the FIR message is to force an encoder to send an
2643	   independent decoder refresh point as soon as possible, while
2644	   observing applicable congestion-control-related constraints, such as
2645	   those set out in [RFC8082]).

2647	   Upon reception of a FIR, a sender MUST send an IDR picture.
2648	   Parameter sets MUST also be sent, except when there is a priori
2649	   knowledge that the parameter sets have been correctly established.  A
2650	   typical example for that is an understanding between sender and
2651	   receiver, established by means outside this document, that parameter
2652	   sets are exclusively sent out-of-band.

2654	9.  Security Considerations

2656	   The scope of this Security Considerations section is limited to the
2657	   payload format itself and to one feature of [VVC] that may pose a
2658	   particularly serious security risk if implemented naively.  The
2659	   payload format, in isolation, does not form a complete system.
2660	   Implementers are advised to read and understand relevant security-
2661	   related documents, especially those pertaining to RTP (see the
2662	   Security Considerations section in [RFC3550] ), and the security of
2663	   the call-control stack chosen (that may make use of the media type
2664	   registration of this memo).  Implementers should also consider known
2665	   security vulnerabilities of video coding and decoding implementations
2666	   in general and avoid those.

2668	   Within this RTP payload format, and with the exception of the user
2669	   data SEI message as described below, no security threats other than
2670	   those common to RTP payload formats are known.  In other words,
2671	   neither the various media-plane-based mechanisms, nor the signalling
2672	   part of this memo, seems to pose a security risk beyond those common
2673	   to all RTP-based systems.

2675	   RTP packets using the payload format defined in this specification
2676	   are subject to the security considerations discussed in the RTP
2677	   specification [RFC3550] , and in any applicable RTP profile such as
2678	   RTP/AVP [RFC3551] , RTP/AVPF [RFC4585] , RTP/SAVP [RFC3711] , or RTP/
2679	   SAVPF [RFC5124] .  However, as "Securing the RTP Framework: Why RTP
2680	   Does Not Mandate a Single Media Security Solution" [RFC7202]
2681	   discusses, it is not an RTP payload format's responsibility to
2682	   discuss or mandate what solutions are used to meet the basic security
2683	   goals like confidentiality, integrity and source authenticity for RTP
2684	   in general.  This responsibility lays on anyone using RTP in an
2685	   application.  They can find guidance on available security mechanisms
2686	   and important considerations in "Options for Securing RTP Sessions"
2687	   [RFC7201] . The rest of this section discusses the security impacting
2688	   properties of the payload format itself.

2690	   Because the data compression used with this payload format is applied
2691	   end-to-end, any encryption needs to be performed after compression.
2692	   A potential denial-of-service threat exists for data encodings using
2693	   compression techniques that have non-uniform receiver-end
2694	   computational load.  The attacker can inject pathological datagrams
2695	   into the bitstream that are complex to decode and that cause the
2696	   receiver to be overloaded.  [VVC] is particularly vulnerable to such
2697	   attacks, as it is extremely simple to generate datagrams containing
2698	   NAL units that affect the decoding process of many future NAL units.
2699	   Therefore, the usage of data origin authentication and data integrity
2700	   protection of at least the RTP packet is RECOMMENDED, for example,
2701	   with SRTP [RFC3711] .

2703	   Like HEVC [RFC7798], [VVC] includes a user data Supplemental
2704	   Enhancement Information (SEI) message.  This SEI message allows
2705	   inclusion of an arbitrary bitstring into the video bitstream.  Such a
2706	   bitstring could include JavaScript, machine code, and other active
2707	   content.  [VVC] leaves the handling of this SEI message to the
2708	   receiving system.  In order to avoid harmful side effects the user
2709	   data SEI message, decoder implementations cannot naively trust its
2710	   content.  For example, it would be a bad and insecure implementation
2711	   practice to forward any JavaScript a decoder implementation detects
2712	   to a web browser.  The safest way to deal with user data SEI messages
2713	   is to simply discard them, but that can have negative side effects on
2714	   the quality of experience by the user.

2716	   End-to-end security with authentication, integrity, or
2717	   confidentiality protection will prevent a MANE from performing media-
2718	   aware operations other than discarding complete packets.  In the case
2719	   of confidentiality protection, it will even be prevented from
2720	   discarding packets in a media-aware way.  To be allowed to perform
2721	   such operations, a MANE is required to be a trusted entity that is
2722	   included in the security context establishment.

2724	10.  Congestion Control

2726	   Congestion control for RTP SHALL be used in accordance with RTP
2727	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
2728	   If best-effort service is being used, an additional requirement is
2729	   that users of this payload format MUST monitor packet loss to ensure
2730	   that the packet loss rate is within an acceptable range.  Packet loss
2731	   is considered acceptable if a TCP flow across the same network path,
2732	   and experiencing the same network conditions, would achieve an
2733	   average throughput, measured on a reasonable timescale, that is not
2734	   less than all RTP streams combined are achieving.  This condition can
2735	   be satisfied by implementing congestion-control mechanisms to adapt
2736	   the transmission rate, the number of layers subscribed for a layered
2737	   multicast session, or by arranging for a receiver to leave the
2738	   session if the loss rate is unacceptably high.

2740	   The bitrate adaptation necessary for obeying the congestion control
2741	   principle is easily achievable when real-time encoding is used, for
2742	   example, by adequately tuning the quantization parameter.  However,
2743	   when pre-encoded content is being transmitted, bandwidth adaptation
2744	   requires the pre-coded bitstream to be tailored for such adaptivity.
2745	   The key mechanisms available in [VVC] are temporal scalability, and
2746	   spatial/SNR scalability.  A media sender can remove NAL units
2747	   belonging to higher temporal sublayers (i.e., those NAL units with a
2748	   high value of TID) or higher spatio-SNR layers until the sending
2749	   bitrate drops to an acceptable range.

2751	   The mechanisms mentioned above generally work within a defined
2752	   profile and level and, therefore, no renegotiation of the channel is
2753	   required.  Only when non-downgradable parameters (such as profile)
2754	   are required to be changed does it become necessary to terminate and
2755	   restart the RTP stream(s).  This may be accomplished by using
2756	   different RTP payload types.

2758	   MANEs MAY remove certain unusable packets from the RTP stream when
2759	   that RTP stream was damaged due to previous packet losses.  This can
2760	   help reduce the network load in certain special cases.  For example,
2761	   MANEs can remove those FUs where the leading FUs belonging to the
2762	   same NAL unit have been lost or those dependent slice segments when
2763	   the leading slice segments belonging to the same slice have been
2764	   lost, because the trailing FUs or dependent slice segments are
2765	   meaningless to most decoders.  MANE can also remove higher temporal
2766	   scalable layers if the outbound transmission (from the MANE's
2767	   viewpoint) experiences congestion.

2769	11.  IANA Considerations

2771	   Placeholder

2773	12.  Acknowledgements

2775	   Dr. Byeongdoo Choi is thanked for the video codec related technical
2776	   discussion and other aspects in this memo.  Xin Zhao and Dr. Xiang Li
2777	   are thanked for their contributions on [VVC] specification
2778	   descriptive content.  Spencer Dawkins is thanked for his valuable
2779	   review comments that led to great improvements of this memo.  Some
2780	   parts of this specification share text with the RTP payload format
2781	   for HEVC [RFC7798].  We thank the authors of that specification for
2782	   their excellent work.

2784	13.  References

2786	13.1.  Normative References

2788	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2789	              Requirement Levels", BCP 14, RFC 2119,
2790	              DOI 10.17487/RFC2119, March 1997,
2791	              <https://www.rfc-editor.org/info/rfc2119>.

2793	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
2794	              with Session Description Protocol (SDP)", RFC 3264,
2795	              DOI 10.17487/RFC3264, June 2002,
2796	              <https://www.rfc-editor.org/info/rfc3264>.

2798	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
2799	              Jacobson, "RTP: A Transport Protocol for Real-Time
2800	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
2801	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

2803	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
2804	              Video Conferences with Minimal Control", STD 65, RFC 3551,
2805	              DOI 10.17487/RFC3551, July 2003,
2806	              <https://www.rfc-editor.org/info/rfc3551>.

2808	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
2809	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
2810	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
2811	              <https://www.rfc-editor.org/info/rfc3711>.

2813	   [RFC4556]  Zhu, L. and B. Tung, "Public Key Cryptography for Initial
2814	              Authentication in Kerberos (PKINIT)", RFC 4556,
2815	              DOI 10.17487/RFC4556, June 2006,
2816	              <https://www.rfc-editor.org/info/rfc4556>.

2818	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
2819	              Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
2820	              July 2006, <https://www.rfc-editor.org/info/rfc4566>.

2822	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
2823	              "Extended RTP Profile for Real-time Transport Control
2824	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
2825	              DOI 10.17487/RFC4585, July 2006,
2826	              <https://www.rfc-editor.org/info/rfc4585>.

2828	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2829	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2830	              <https://www.rfc-editor.org/info/rfc4648>.

2832	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
2833	              "Codec Control Messages in the RTP Audio-Visual Profile
2834	              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
2835	              February 2008, <https://www.rfc-editor.org/info/rfc5104>.

2837	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
2838	              Real-time Transport Control Protocol (RTCP)-Based Feedback
2839	              (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2840	              2008, <https://www.rfc-editor.org/info/rfc5124>.

2842	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
2843	              Media Attributes in the Session Description Protocol
2844	              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
2845	              <https://www.rfc-editor.org/info/rfc5576>.

2847	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
2848	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
2849	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
2850	              DOI 10.17487/RFC7656, November 2015,
2851	              <https://www.rfc-editor.org/info/rfc7656>.

2853	   [RFC8082]  Wenger, S., Lennox, J., Burman, B., and M. Westerlund,
2854	              "Using Codec Control Messages in the RTP Audio-Visual
2855	              Profile with Feedback with Layered Codecs", RFC 8082,
2856	              DOI 10.17487/RFC8082, March 2017,
2857	              <https://www.rfc-editor.org/info/rfc8082>.

2859	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2860	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2861	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

2863	   [VSEI]     "Versatile supplemental enhancement information messages
2864	              for coded video bitstreams", 2020,
2865	              <http://handle.itu.int/11.1002/1000/14337>.

2867	   [VVC]      "Versatile Video Coding, ITU-T Recommendation H.266",
2868	              2020, <http://handle.itu.int/11.1002/1000/14336>.

2870	13.2.  Informative References

2872	   [CABAC]    Sole, J, . and . et al, "Transform coefficient coding in
2873	              HEVC, IEEE Transactions on Circuts and Systems for Video
2874	              Technology", DOI 10.1109/TCSVT.2012.2223055, December
2875	              2012, <https://doi.org/10.1109/TCSVT.2012.2223055>.

2877	   [HEVC]     "High efficiency video coding, ITU-T Recommendation
2878	              H.265", 2019, <http://handle.itu.int/11.1002/1000/14107>.

2880	   [MPEG2S]   IS0/IEC, ., "Information technology - Generic coding
2881	              ofmoving pictures and associated audio information - Part
2882	              1:Systems, ISO International Standard 13818-1", 2013.

2884	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
2885	              Streaming Protocol (RTSP)", RFC 2326,
2886	              DOI 10.17487/RFC2326, April 1998,
2887	              <https://www.rfc-editor.org/info/rfc2326>.

2889	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
2890	              Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
2891	              October 2000, <https://www.rfc-editor.org/info/rfc2974>.

2893	   [RFC6184]  Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
2894	              Payload Format for H.264 Video", RFC 6184,
2895	              DOI 10.17487/RFC6184, May 2011,
2896	              <https://www.rfc-editor.org/info/rfc6184>.

2898	   [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
2899	              Eleftheriadis, "RTP Payload Format for Scalable Video
2900	              Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
2901	              <https://www.rfc-editor.org/info/rfc6190>.

2903	   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
2904	              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
2905	              <https://www.rfc-editor.org/info/rfc7201>.

2907	   [RFC7202]  Perkins, C. and M. Westerlund, "Securing the RTP
2908	              Framework: Why RTP Does Not Mandate a Single Media
2909	              Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
2910	              2014, <https://www.rfc-editor.org/info/rfc7202>.

2912	   [RFC7798]  Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
2913	              M. Hannuksela, "RTP Payload Format for High Efficiency
2914	              Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
2915	              March 2016, <https://www.rfc-editor.org/info/rfc7798>.

2917	Appendix A.  Change History

2919	   draft-zhao-payload-rtp-vvc-00 ........ initial version

2921	   draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and
2922	   corrections

2924	   draft-ietf-payload-rtp-vvc-00 ........ initial WG draft

2926	   draft-ietf-payload-rtp-vvc-01 ........ VVC specification update

2928	   draft-ietf-payload-rtp-vvc-02 ........ VVC specification update

2930	   draft-ietf-payload-rtp-vvc-03 ........ VVC coding tool introduction
2931	   update

2933	   draft-ietf-payload-rtp-vvc-04 ........ VVC coding tool introduction
2934	   update

2936	   draft-ietf-payload-rtp-vvc-05 ........ reference udpate and adding
2937	   placement for open issues

2939	   draft-ietf-payload-rtp-vvc-06 ........ address editor's note

2941	   draft-ietf-payload-rtp-vvc-07 ........ address editor's notes

2943	   draft-ietf-payload-rtp-vvc-08 ........ address editor's notes

2945	   draft-ietf-payload-rtp-vvc-09 ........ address editor's notes

2947	   draft-ietf-payload-rtp-vvc-10 ........ address editor's notes

2949	   draft-ietf-payload-rtp-vvc-11 ........ address editor's notes

2951	   draft-ietf-payload-rtp-vvc-12 ........ address editor's notes

2953	   draft-ietf-payload-rtp-vvc-13 ........ address editor's notes

2955	Authors' Addresses

2957	   Shuai Zhao
2958	   Tencent
2959	   2747 Park Blvd
2960	   Palo Alto,  94588
2961	   United States of America

2963	   Email: shuai.zhao@ieee.org

2965	   Stephan Wenger
2966	   Tencent
2967	   2747 Park Blvd
2968	   Palo Alto,  94588
2969	   United States of America

2971	   Email: stewe@stewe.org

2973	   Yago Sanchez
2974	   Fraunhofer HHI
2975	   Einsteinufer 37
2976	   10587 Berlin
2977	   Germany

2979	   Email: yago.sanchez@hhi.fraunhofer.de

2981	   Ye-Kui Wang
2982	   Bytedance Inc.
2983	   8910 University Center Lane
2984	   San Diego,  92122
2985	   United States of America

2987	   Email: yekui.wang@bytedance.com

2989	   Miska M. Hannuksela
2990	   Nokia Technologies
2991	   Hatanpaeaen valtatie 30
2992	   FI-33100 Tampere
2993	   Finland

2995	   Email: miska.hannuksela@nokia.com