idnits 2.17.1 

draft-ietf-avtcore-rtp-vvc-16.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == There is 1 instance of lines with non-ascii characters in the document.


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (5 May 2022) is 714 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1389

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO23090-3'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VSEI'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'VVC'


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	avtcore                                                          S. Zhao
3	Internet-Draft                                                 S. Wenger
4	Intended status: Standards Track                                 Tencent
5	Expires: 6 November 2022                                      Y. Sanchez
6	                                                          Fraunhofer HHI
7	                                                                 Y. Wang
8	                                                          Bytedance Inc.
9	                                                         M. M Hannuksela
10	                                                      Nokia Technologies
11	                                                              5 May 2022

13	          RTP Payload Format for Versatile Video Coding (VVC)
14	                     draft-ietf-avtcore-rtp-vvc-16

16	Abstract

18	   This memo describes an RTP payload format for the video coding
19	   standard ITU-T Recommendation H.266 and ISO/IEC International
20	   Standard 23090-3, both also known as Versatile Video Coding (VVC) and
21	   developed by the Joint Video Experts Team (JVET).  The RTP payload
22	   format allows for packetization of one or more Network Abstraction
23	   Layer (NAL) units in each RTP packet payload as well as fragmentation
24	   of a NAL unit into multiple RTP packets.  The payload format has wide
25	   applicability in videoconferencing, Internet video streaming, and
26	   high-bitrate entertainment-quality video, among other applications.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at https://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on 6 November 2022.

45	Copyright Notice

47	   Copyright (c) 2022 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
52	   license-info) in effect on the date of publication of this document.
53	   Please review these documents carefully, as they describe your rights
54	   and restrictions with respect to this document.  Code Components
55	   extracted from this document must include Revised BSD License text as
56	   described in Section 4.e of the Trust Legal Provisions and are
57	   provided without warranty as described in the Revised BSD License.

59	Table of Contents

61	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
62	     1.1.  Overview of the VVC Codec . . . . . . . . . . . . . . . .   3
63	       1.1.1.  Coding-Tool Features (informative)  . . . . . . . . .   3
64	       1.1.2.  Systems and Transport Interfaces (informative)  . . .   6
65	       1.1.3.  High-Level Picture Partitioning (informative) . . . .  11
66	       1.1.4.  NAL Unit Header . . . . . . . . . . . . . . . . . . .  13
67	     1.2.  Overview of the Payload Format  . . . . . . . . . . . . .  14
68	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .  15
69	   3.  Definitions and Abbreviations . . . . . . . . . . . . . . . .  15
70	     3.1.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  15
71	       3.1.1.  Definitions from the VVC Specification  . . . . . . .  15
72	       3.1.2.  Definitions Specific to This Memo . . . . . . . . . .  18
73	     3.2.  Abbreviations . . . . . . . . . . . . . . . . . . . . . .  19
74	   4.  RTP Payload Format  . . . . . . . . . . . . . . . . . . . . .  20
75	     4.1.  RTP Header Usage  . . . . . . . . . . . . . . . . . . . .  20
76	     4.2.  Payload Header Usage  . . . . . . . . . . . . . . . . . .  22
77	     4.3.  Payload Structures  . . . . . . . . . . . . . . . . . . .  22
78	       4.3.1.  Single NAL Unit Packets . . . . . . . . . . . . . . .  23
79	       4.3.2.  Aggregation Packets (APs) . . . . . . . . . . . . . .  23
80	       4.3.3.  Fragmentation Units . . . . . . . . . . . . . . . . .  27
81	     4.4.  Decoding Order Number . . . . . . . . . . . . . . . . . .  30
82	   5.  Packetization Rules . . . . . . . . . . . . . . . . . . . . .  31
83	   6.  De-packetization Process  . . . . . . . . . . . . . . . . . .  32
84	   7.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  34
85	     7.1.  Media Type Registration . . . . . . . . . . . . . . . . .  34
86	     7.2.  Optional Parameters Definition  . . . . . . . . . . . . .  35
87	     7.3.  SDP Parameters  . . . . . . . . . . . . . . . . . . . . .  45
88	       7.3.1.  Mapping of Payload Type Parameters to SDP . . . . . .  46
89	       7.3.2.  Usage with SDP Offer/Answer Model . . . . . . . . . .  48
90	       7.3.3.  Usage in Declarative Session Descriptions . . . . . .  57
91	       7.3.4.  Considerations for Parameter Sets . . . . . . . . . .  59
92	   8.  Use with Feedback Messages  . . . . . . . . . . . . . . . . .  59
93	     8.1.  Picture Loss Indication (PLI) . . . . . . . . . . . . . .  59
94	     8.2.  Full Intra Request (FIR)  . . . . . . . . . . . . . . . .  59
95	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  60
96	   10. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  61
97	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  62
98	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  62
99	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  62
100	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  62
101	     13.2.  Informative References . . . . . . . . . . . . . . . . .  64
102	   Appendix A.  Change History . . . . . . . . . . . . . . . . . . .  66
103	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  66

105	1.  Introduction

107	   The Versatile Video Coding specification was formally published as
108	   both ITU-T Recommendation H.266 [VVC] and ISO/IEC International
109	   Standard 23090-3 [ISO23090-3].  VVC is reported to provide
110	   significant coding efficiency gains over High Efficiency Video Coding
111	   [HEVC], also known as H.265, and other earlier video codecs.

113	   This memo specifies an RTP payload format for VVC.  It shares its
114	   basic design with the NAL (Network Abstraction Layer) unit based RTP
115	   payload formats of AVC Video Coding [RFC6184], Scalable Video Coding
116	   (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and
117	   their respective predecessors.  With respect to design philosophy,
118	   security, congestion control, and overall implementation complexity,
119	   it has similar properties to those earlier payload format
120	   specifications.  This is a conscious choice, as at least RFC 6184 is
121	   widely deployed and generally known in the relevant implementer
122	   communities.  Certain scalability-related mechanisms known from
123	   [RFC6190] were incorporated into this document, as VVC version 1
124	   supports temporal, spatial, and signal-to-noise ratio (SNR)
125	   scalability.

127	1.1.  Overview of the VVC Codec

129	   VVC and HEVC share a similar hybrid video codec design.  In this
130	   memo, we provide a very brief overview of those features of VVC that
131	   are, in some form, addressed by the payload format specified herein.
132	   Implementers have to read, understand, and apply the ITU-T/ISO/IEC
133	   specifications pertaining to VVC to arrive at interoperable, well-
134	   performing implementations.

136	   Conceptually, both VVC and HEVC include a Video Coding Layer (VCL),
137	   which is often used to refer to the coding-tool features, and a NAL,
138	   which is often used to refer to the systems and transport interface
139	   aspects of the codecs.

141	1.1.1.  Coding-Tool Features (informative)

143	   Coding tool features are described below with occasional reference to
144	   the coding tool set of HEVC, which is well known in the community.

146	   Similar to earlier hybrid-video-coding-based standards, including
147	   HEVC, the following basic video coding design is employed by VVC.  A
148	   prediction signal is first formed by either intra- or motion-
149	   compensated prediction, and the residual (the difference between the
150	   original and the prediction) is then coded.  The gains in coding
151	   efficiency are achieved by redesigning and improving almost all parts
152	   of the codec over earlier designs.  In addition, VVC includes several
153	   tools to make the implementation on parallel architectures easier.

155	   Finally, VVC includes temporal, spatial, and SNR scalability as well
156	   as multiview coding support.

158	   Coding blocks and transform structure

160	   Among major coding-tool differences between HEVC and VVC, one of the
161	   important improvements is the more flexible coding tree structure in
162	   VVC, i.e., multi-type tree.  In addition to quadtree, binary and
163	   ternary trees are also supported, which contributes significant
164	   improvement in coding efficiency.  Moreover, the maximum size of a
165	   coding tree unit (CTU) is increased from 64x64 to 128x128.  To
166	   improve the coding efficiency of chroma signal, luma chroma separated
167	   trees at CTU level may be employed for intra-slices.  The square
168	   transforms in HEVC are extended to non-square transforms for
169	   rectangular blocks resulting from binary and ternary tree splits.
170	   Besides, VVC supports multiple transform sets (MTS), including DCT-2,
171	   DST-7, and DCT-8 as well as the non-separable secondary transform.
172	   The transforms used in VVC can have different sizes with support for
173	   larger transform sizes.  For DCT-2, the transform sizes range from
174	   2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from
175	   4x4 to 32x32.  In addition, VVC also support sub-block transform for
176	   both intra and inter coded blocks.  For intra coded blocks, intra
177	   sub-partitioning (ISP) may be used to allow sub-block based intra
178	   prediction and transform.  For inter blocks, sub-block transform may
179	   be used assuming that only a part of an inter-block has non-zero
180	   transform coefficients.

182	   Entropy coding

184	   Similar to HEVC, VVC uses a single entropy-coding engine, which is
185	   based on context adaptive binary arithmetic coding [CABAC], but with
186	   the support of multi-window sizes.  The window sizes can be
187	   initialized differently for different context models.  Due to such a
188	   design, it has more efficient adaptation speed and better coding
189	   efficiency.  A joint chroma residual coding scheme is applied to
190	   further exploit the correlation between the residuals of two color
191	   components.  In VVC, different residual coding schemes are applied
192	   for regular transform coefficients and residual samples generated
193	   using transform-skip mode.

195	   In-loop filtering

197	   VVC has more feature support in loop filters than HEVC.  The
198	   deblocking filter in VVC is similar to HEVC but operates at a smaller
199	   grid.  After deblocking and sample adaptive offset (SAO), an adaptive
200	   loop filter (ALF) may be used.  As a Wiener filter, ALF reduces
201	   distortion of decoded pictures.  Besides, VVC introduces a new module
202	   called luma mapping with chroma scaling to fully utilize the dynamic
203	   range of signal so that rate-distortion performance of both Standard
204	   Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved.

206	   Motion prediction and coding

208	   Compared to HEVC, VVC introduces several improvements in this area.
209	   First, there is the adaptive motion vector resolution (AMVR), which
210	   can save bit cost for motion vectors by adaptively signaling motion
211	   vector resolution.  Then the affine motion compensation is included
212	   to capture complicated motion like zooming and rotation.  Meanwhile,
213	   prediction refinement with the optical flow with affine mode (PROF)
214	   is further deployed to mimic affine motion at the pixel level.
215	   Thirdly the decoder side motion vector refinement (DMVR) is a method
216	   to derive MV vector at decoder side based on block matching so that
217	   fewer bits may be spent on motion vectors.  Bi-directional optical
218	   flow (BDOF) is a similar method to PROF.  BDOF adds a sample wise
219	   offset at 4x4 sub-block level that is derived with equations based on
220	   gradients of the prediction samples and a motion difference relative
221	   to CU motion vectors.  Furthermore, merge with motion vector
222	   difference (MMVD) is a special mode, which further signals a limited
223	   set of motion vector differences on top of merge mode.  In addition
224	   to MMVD, there are another three types of special merge modes, i.e.,
225	   sub-block merge, triangle, and combined intra-/inter-prediction
226	   (CIIP).  Sub-block merge list includes one candidate of sub-block
227	   temporal motion vector prediction (SbTMVP) and up to four candidates
228	   of affine motion vectors.  Triangle is based on triangular block
229	   motion compensation.  CIIP combines intra- and inter- predictions
230	   with weighting.  Adaptive weighting may be employed with a block-
231	   level tool called bi-prediction with CU based weighting (BCW) which
232	   provides more flexibility than in HEVC.

234	   Intra prediction and intra-coding

236	   To capture the diversified local image texture directions with finer
237	   granularity, VVC supports 65 angular directions instead of 33
238	   directions in HEVC.  The intra mode coding is based on a 6-most-
239	   probable-mode scheme, and the 6 most probable modes are derived using
240	   the neighboring intra prediction directions.  In addition, to deal
241	   with the different distributions of intra prediction angles for
242	   different block aspect ratios, a wide-angle intra prediction (WAIP)
243	   scheme is applied in VVC by including intra prediction angles beyond
244	   those present in HEVC.  Unlike HEVC which only allows using the most
245	   adjacent line of reference samples for intra prediction, VVC also
246	   allows using two further reference lines, as known as multi-
247	   reference-line (MRL) intra prediction.  The additional reference
248	   lines can be only used for the 6 most probable intra prediction
249	   modes.  To capture the strong correlation between different colour
250	   components, in VVC, a cross-component linear mode (CCLM) is utilized
251	   which assumes a linear relationship between the luma sample values
252	   and their associated chroma samples.  For intra prediction, VVC also
253	   applies a position-dependent prediction combination (PDPC) for
254	   refining the prediction samples closer to the intra prediction block
255	   boundary.  Matrix-based intra prediction (MIP) modes are also used in
256	   VVC which generates an up to 8x8 intra prediction block using a
257	   weighted sum of downsampled neighboring reference samples, and the
258	   weights are hardcoded constants.

260	   Other coding-tool features

262	   VVC introduces dependent quantization (DQ) to reduce quantization
263	   error by state-based switching between two quantizers.

265	1.1.2.  Systems and Transport Interfaces (informative)

267	   VVC inherits the basic systems and transport interfaces designs from
268	   HEVC and AVC.  These include the NAL-unit-based syntax structure, the
269	   hierarchical syntax and data unit structure, the supplemental
270	   enhancement information (SEI) message mechanism, and the video
271	   buffering model based on the hypothetical reference decoder (HRD).
272	   The scalability features of VVC are conceptually similar to the
273	   scalable variant of HEVC known as SHVC.  The hierarchical syntax and
274	   data unit structure consists of parameter sets at various levels
275	   (decoder, sequence (pertaining to all), sequence (pertaining to a
276	   single), picture), picture-level header parameters, slice-level
277	   header parameters, and lower-level parameters.

279	   A number of key components that influenced the network abstraction
280	   layer design of VVC as well as this memo are described below

282	   Decoding capability information

284	   The decoding capability information includes parameters that stay
285	   constant for the lifetime of a VVC bitstream, which in IETF terms can
286	   translate to a session.  Such information includes profile, level,
287	   and sub-profile information to determine a maximum capability interop
288	   point that is guaranteed to be never exceeded, even if splicing of
289	   video sequences occurs within a session.  It further includes
290	   constraint fields (most of which are flags), which can optionally be
291	   set to indicate that the video bitstream will be constrained in the
292	   use of certain features as indicated by the values of those fields.
293	   With this, a bitstream can be labeled as not using certain tools,
294	   which allows among other things for resource allocation in a decoder
295	   implementation.

297	   Video parameter set

299	   The video parameter set (VPS) pertains to one or more coded video
300	   sequences (CVSs) of multiple layers covering the same range of access
301	   units, and includes, among other information, decoding dependency
302	   expressed as information for reference picture list construction of
303	   enhancement layers.  The VPS provides a "big picture" of a scalable
304	   sequence, including what types of operation points are provided, the
305	   profile, tier, and level of the operation points, and some other
306	   high-level properties of the bitstream that can be used as the basis
307	   for session negotiation and content selection, etc.  One VPS may be
308	   referenced by one or more sequence parameter sets.

310	   Sequence parameter set

312	   The sequence parameter set (SPS) contains syntax elements pertaining
313	   to a coded layer video sequence (CLVS), which is a group of pictures
314	   belonging to the same layer, starting with a random access point, and
315	   followed by pictures that may depend on each other, until the next
316	   random access point picture.  In MPEG-2, the equivalent of a CVS was
317	   a group of pictures (GOP), which normally started with an I frame and
318	   was followed by P and B frames.  While more complex in its options of
319	   random access points, VVC retains this basic concept.  One remarkable
320	   difference of VVC is that a CLVS may start with a Gradual Decoding
321	   Refresh (GDR) picture, without requiring presence of traditional
322	   random access points in the bitstream, such as instantaneous decoding
323	   refresh (IDR) or clean random access (CRA) pictures.  In many TV-like
324	   applications, a CVS contains a few hundred milliseconds to a few
325	   seconds of video.  In video conferencing (without switching MCUs
326	   involved), a CVS can be as long in duration as the whole session.

328	   Picture and adaptation parameter set

330	   The picture parameter set and the adaptation parameter set (PPS and
331	   APS, respectively) carry information pertaining to zero or more
332	   pictures and zero or more slices, respectively.  The PPS contains
333	   information that is likely to stay constant from picture to picture,
334	   at least for pictures for a certain type-whereas the APS contains
335	   information, such as adaptive loop filter coefficients, that are
336	   likely to change from picture to picture or even within a picture.  A
337	   single APS is referenced by all slices of the same picture if that
338	   APS contains information about luma mapping with chroma scaling
339	   (LMCS) or scaling list.  Different APSs containing ALF parameters can
340	   be referenced by slices of the same picture.

342	   Picture header

344	   A Picture Header contains information that is common to all slices
345	   that belong to the same picture.  Being able to send that information
346	   as a separate NAL unit when pictures are split into several slices
347	   allows for saving bitrate, compared to repeating the same information
348	   in all slices.  However, there might be scenarios where low-bitrate
349	   video is transmitted using a single slice per picture.  Having a
350	   separate NAL unit to convey that information incurs in an overhead
351	   for such scenarios.  For such scenarios, the picture header syntax
352	   structure is directly included in the slice header, instead of its
353	   own NAL unit.  The mode of the picture header syntax structure being
354	   included in its own NAL unit or not can only be switched on/off for
355	   an entire CLVS, and can only be switched off when in the entire CLVS
356	   each picture contains only one slice.

358	   Profile, tier, and level

360	   The profile, tier and level syntax structures in DCI, VPS and SPS
361	   contain profile, tier, level information for all layers that refer to
362	   the DCI, for layers associated with one or more output layer sets
363	   specified by the VPS, and for any layer that refers to the SPS,
364	   respectively.

366	   Sub-profiles

368	   Within the VVC specification, a sub-profile is a 32-bit number, coded
369	   according to ITU-T Rec. T.35, that does not carry a semantics.  It is
370	   carried in the profile_tier_level structure and hence (potentially)
371	   present in the DCI, VPS, and SPS.  External registration bodies can
372	   register a T.35 codepoint with ITU-T registration authorities and
373	   associate with their registration a description of bitstream
374	   restrictions beyond the profiles defined by ITU-T and ISO/IEC.  This
375	   would allow encoder manufacturers to label the bitstreams generated
376	   by their encoder as complying with such sub-profile.  It is expected
377	   that upstream standardization organizations (such as: DVB and ATSC),
378	   as well as walled-garden video services will take advantage of this
379	   labeled system.  In contrast to "normal" profiles, it is expected
380	   that sub-profiles may indicate encoder choices traditionally left
381	   open in the (decoder-centric) video coding specs, such as GOP
382	   structures, minimum/maximum QP values, and the mandatory use of
383	   certain tools or SEI messages.

385	   General constraint fields

387	   The profile_tier_level structure carries a considerable number of
388	   constraint fields (most of which are flags), which an encoder can use
389	   to indicate to a decoder that it will not use a certain tool or
390	   technology.  They were included in reaction to a perceived market
391	   need for labeled a bitstream as not exercising a certain tool that
392	   has become commercially unviable.

394	   Temporal scalability support

396	   VVC includes support of temporal scalability, by inclusion of the
397	   signaling of TemporalId in the NAL unit header, the restriction that
398	   pictures of a particular temporal sublayer cannot be used for inter
399	   prediction reference by pictures of a lower temporal sublayer, the
400	   sub-bitstream extraction process, and the requirement that each sub-
401	   bitstream extraction output be a conforming bitstream.  Media-Aware
402	   Network Elements (MANEs) can utilize the TemporalId in the NAL unit
403	   header for stream adaptation purposes based on temporal scalability.

405	   Reference picture resampling (RPR)

407	   In AVC and HEVC, the spatial resolution of pictures cannot change
408	   unless a new sequence using a new SPS starts, with an Intra random
409	   access point (IRAP) picture.  VVC enables picture resolution change
410	   within a sequence at a position without encoding an IRAP picture,
411	   which is always intra-coded.  This feature is sometimes referred to
412	   as reference picture resampling (RPR), as the feature needs
413	   resampling of a reference picture used for inter prediction when that
414	   reference picture has a different resolution than the current picture
415	   being decoded.  RPR allows resolution change without the need of
416	   coding an IRAP picture and hence avoids a momentary bit rate spike
417	   caused by an IRAP picture in streaming or video conferencing
418	   scenarios, e.g., to cope with network condition changes.  RPR can
419	   also be used in application scenarios wherein zooming of the entire
420	   video region or some region of interest is needed.

422	   Spatial, SNR, and multiview scalability

424	   VVC includes support for spatial, SNR, and multiview scalability.
425	   Scalable video coding is widely considered to have technical benefits
426	   and enrich services for various video applications.  Until recently,
427	   however, the functionality has not been included in the first version
428	   of specifications of the video codecs.  In VVC, however, all those
429	   forms of scalability are supported in the first version of VVC
430	   natively through the signaling of the nuh_layer_id in the NAL unit
431	   header, the VPS which associates layers with given nuh_layer_id to
432	   each other, reference picture selection, reference picture resampling
433	   for spatial scalability, and a number of other mechanisms not
434	   relevant for this memo.

436	      Spatial scalability

438	         With the existence of Reference Picture Resampling (RPR), the
439	         additional burden for scalability support is just a
440	         modification of the high-level syntax (HLS).  The inter-layer
441	         prediction is employed in a scalable system to improve the
442	         coding efficiency of the enhancement layers.  In addition to
443	         the spatial and temporal motion-compensated predictions that
444	         are available in a single-layer codec, the inter-layer
445	         prediction in VVC uses the possibly resampled video data of the
446	         reconstructed reference picture from a reference layer to
447	         predict the current enhancement layer.  The resampling process
448	         for inter-layer prediction, when used, is performed at the
449	         block-level, reusing the existing interpolation process for
450	         motion compensation in single-layer coding.  It means that no
451	         additional resampling process is needed to support spatial
452	         scalability.

454	      SNR scalability

456	      SNR scalability is similar to spatial scalability except that
457	         the resampling factors are 1:1.  In other words, there is no
458	         change in resolution, but there is inter-layer prediction.

460	      Multiview scalability

462	      The first version of VVC also supports multiview scalability,
463	         wherein a multi-layer bitstream carries layers representing
464	         multiple views, and one or more of the represented views can be
465	         output at the same time.

467	   SEI messages

469	   Supplemental enhancement information (SEI) messages are information
470	   in the bitstream that do not influence the decoding process as
471	   specified in the VVC spec, but address issues of representation/
472	   rendering of the decoded bitstream, label the bitstream for certain
473	   applications, among other, similar tasks.  The overall concept of SEI
474	   messages and many of the messages themselves has been inherited from
475	   the AVC and HEVC specs.  Except for the SEI messages that affect the
476	   specification of the hypothetical reference decoder (HRD), other SEI
477	   messages for use in the VVC environment, which are generally useful
478	   also in other video coding technologies, are not included in the main
479	   VVC specification but in a companion specification [VSEI].

481	1.1.3.  High-Level Picture Partitioning (informative)

483	   VVC inherited the concept of tiles and wavefront parallel processing
484	   (WPP) from HEVC, with some minor to moderate differences.  The basic
485	   concept of slices was kept in VVC but designed in an essentially
486	   different form.  VVC is the first video coding standard that includes
487	   subpictures as a feature, which provides the same functionality as
488	   HEVC motion-constrained tile sets (MCTSs) but designed differently to
489	   have better coding efficiency and to be friendlier for usage in
490	   application systems.  More details of these differences are described
491	   below.

493	   Tiles and WPP

495	   Same as in HEVC, a picture can be split into tile rows and tile
496	   columns in VVC, in-picture prediction across tile boundaries is
497	   disallowed, etc.  However, the syntax for signaling of tile
498	   partitioning has been simplified, by using a unified syntax design
499	   for both the uniform and the non-uniform mode.  In addition,
500	   signaling of entry point offsets for tiles in the slice header is
501	   optional in VVC while it is mandatory in HEVC.  The WPP design in VVC
502	   has two differences compared to HEVC: i) The CTU row delay is reduced
503	   from two CTUs to one CTU; ii) signaling of entry point offsets for
504	   WPP in the slice header is optional in VVC while it is mandatory in
505	   HEVC.

507	   Slices

509	   In VVC, the conventional slices based on CTUs (as in HEVC) or
510	   macroblocks (as in AVC) have been removed.  The main reasoning behind
511	   this architectural change is as follows.  The advances in video
512	   coding since 2003 (the publication year of AVC v1) have been such
513	   that slice-based error concealment has become practically impossible,
514	   due to the ever-increasing number and efficiency of in-picture and
515	   inter-picture prediction mechanisms.  An error-concealed picture is
516	   the decoding result of a transmitted coded picture for which there is
517	   some data loss (e.g., loss of some slices) of the coded picture or a
518	   reference picture for at least some part of the coded picture is not
519	   error-free (e.g., that reference picture was an error-concealed
520	   picture).  For example, when one of the multiple slices of a picture
521	   is lost, it may be error-concealed using an interpolation of the
522	   neighboring slices.  While advanced video coding prediction
523	   mechanisms provide significantly higher coding efficiency, they also
524	   make it harder for machines to estimate the quality of an error-
525	   concealed picture, which was already a hard problem with the use of
526	   simpler prediction mechanisms.  Advanced in-picture prediction
527	   mechanisms also cause the coding efficiency loss due to splitting a
528	   picture into multiple slices to be more significant.  Furthermore,
529	   network conditions become significantly better while at the same time
530	   techniques for dealing with packet losses have become significantly
531	   improved.  As a result, very few implementations have recently used
532	   slices for maximum transmission unit size matching.  Instead,
533	   substantially all applications where low-delay error resilience is
534	   required (e.g., video telephony and video conferencing) rely on
535	   system/transport-level error resilience (e.g., retransmission,
536	   forward error correction) and/or picture-based error resilience tools
537	   (feedback-based error resilience, insertion of IRAPs, scalability
538	   with higher protection level of the base layer, and so on).
539	   Considering all the above, nowadays it is very rare that a picture
540	   that cannot be correctly decoded is passed to the decoder, and when
541	   such a rare case occurs, the system can afford to wait for an error-
542	   free picture to be decoded and available for display without
543	   resulting in frequent and long periods of picture freezing seen by
544	   end users.

546	   Slices in VVC have two modes: rectangular slices and raster-scan
547	   slices.  The rectangular slice, as indicated by its name, covers a
548	   rectangular region of the picture.  Typically, a rectangular slice
549	   consists of several complete tiles.  However, it is also possible
550	   that a rectangular slice is a subset of a tile and consists of one or
551	   more consecutive, complete CTU rows within a tile.  A raster-scan
552	   slice consists of one or more complete tiles in a tile raster scan
553	   order, hence the region covered by a raster-scan slices need not but
554	   could have a non-rectangular shape, but it may also happen to have
555	   the shape of a rectangle.  The concept of slices in VVC is therefore
556	   strongly linked to or based on tiles instead of CTUs (as in HEVC) or
557	   macroblocks (as in AVC).

559	   Subpictures

561	   VVC is the first video coding standard that includes the support of
562	   subpictures as a feature.  Each subpicture consists of one or more
563	   complete rectangular slices that collectively cover a rectangular
564	   region of the picture.  A subpicture may be either specified to be
565	   extractable (i.e., coded independently of other subpictures of the
566	   same picture and of earlier pictures in decoding order) or not
567	   extractable.  Regardless of whether a subpicture is extractable or
568	   not, the encoder can control whether in-loop filtering (including
569	   deblocking, SAO, and ALF) is applied across the subpicture boundaries
570	   individually for each subpicture.

572	   Functionally, subpictures are similar to the motion-constrained tile
573	   sets (MCTSs) in HEVC.  They both allow independent coding and
574	   extraction of a rectangular subset of a sequence of coded pictures,
575	   for use cases like viewport-dependent 360o video streaming
576	   optimization and region of interest (ROI) applications.

578	   There are several important design differences between subpictures
579	   and MCTSs.  First, the subpictures feature in VVC allows motion
580	   vectors of a coding block pointing outside of the subpicture even
581	   when the subpicture is extractable by applying sample padding at
582	   subpicture boundaries in this case, similarly as at picture
583	   boundaries.  Second, additional changes were introduced for the
584	   selection and derivation of motion vectors in the merge mode and in
585	   the decoder side motion vector refinement process of VVC.  This
586	   allows higher coding efficiency compared to the non-normative motion
587	   constraints applied at the encoder-side for MCTSs.  Third, rewriting
588	   of SHs (and PH NAL units, when present) is not needed when extracting
589	   one or more extractable subpictures from a sequence of pictures to
590	   create a sub-bitstream that is a conforming bitstream.  In sub-
591	   bitstream extractions based on HEVC MCTSs, rewriting of SHs is
592	   needed.  Note that in both HEVC MCTSs extraction and VVC subpictures
593	   extraction, rewriting of SPSs and PPSs is needed.  However, typically
594	   there are only a few parameter sets in a bitstream, while each
595	   picture has at least one slice, therefore rewriting of SHs can be a
596	   significant burden for application systems.  Fourth, slices of
597	   different subpictures within a picture are allowed to have different
598	   NAL unit types.  Fifth, VVC specifies HRD and level definitions for
599	   subpicture sequences, thus the conformance of the sub-bitstream of
600	   each extractable subpicture sequence can be ensured by encoders.

602	1.1.4.  NAL Unit Header

604	   VVC maintains the NAL unit concept of HEVC with modifications.  VVC
605	   uses a two-byte NAL unit header, as shown in Figure 1.  The payload
606	   of a NAL unit refers to the NAL unit excluding the NAL unit header.

608	                     +---------------+---------------+
609	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
610	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
611	                     |F|Z| LayerID   |  Type   | TID |
612	                     +---------------+---------------+

614	                   The Structure of the VVC NAL Unit Header.

616	                                  Figure 1

618	   The semantics of the fields in the NAL unit header are as specified
619	   in VVC and described briefly below for convenience.  In addition to
620	   the name and size of each field, the corresponding syntax element
621	   name in VVC is also provided.

623	   F: 1 bit

625	      forbidden_zero_bit.  Required to be zero in VVC.  Note that the
626	      inclusion of this bit in the NAL unit header was to enable
627	      transport of VVC video over MPEG-2 transport systems (avoidance of
628	      start code emulations) [MPEG2S].  In the context of this memo the
629	      value 1 may be used to indicate a syntax violation, e.g., for a
630	      NAL unit resulted from aggregating a number of fragmented units of
631	      a NAL unit but missing the last fragment, as described in the last
632	      sentence of section 4.3.3.

634	   Z: 1 bit

636	      nuh_reserved_zero_bit.  Required to be zero in VVC, and reserved
637	      for future extensions by ITU-T and ISO/IEC.
638	      This memo does not overload the "Z" bit for local extensions, as
639	      a) overloading the "F" bit is sufficient and b) to preserve the
640	      usefulness of this memo to possible future versions of [VVC].

642	   LayerId: 6 bits

644	      nuh_layer_id.  Identifies the layer a NAL unit belongs to, wherein
645	      a layer may be, e.g., a spatial scalable layer, a quality scalable
646	      layer, a layer containing a different view, etc.

648	   Type: 5 bits

650	      nal_unit_type.  This field specifies the NAL unit type as defined
651	      in Table 5 of [VVC].  For a reference of all currently defined NAL
652	      unit types and their semantics, please refer to Section 7.4.2.2 in
653	      [VVC].

655	   TID: 3 bits

657	      nuh_temporal_id_plus1.  This field specifies the temporal
658	      identifier of the NAL unit plus 1.  The value of TemporalId is
659	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
660	      there is at least one bit in the NAL unit header equal to 1, so to
661	      enable the consideration of start code emulations in the NAL unit
662	      payload data independent of the NAL unit header.

664	1.2.  Overview of the Payload Format

666	   This payload format defines the following processes required for
667	   transport of VVC coded data over RTP [RFC3550]:

669	   *  Usage of RTP header with this payload format
670	   *  Packetization of VVC coded NAL units into RTP packets using three
671	      types of payload structures: a single NAL unit packet, aggregation
672	      packet, and fragment unit

674	   *  Transmission of VVC NAL units of the same bitstream within a
675	      single RTP stream

677	   *  Media type parameters to be used with the Session Description
678	      Protocol (SDP) [RFC8866]

680	   *  Usage of RTCP feedback messages

682	2.  Conventions

684	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
685	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
686	   "OPTIONAL" in this document are to be interpreted as described in BCP
687	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
688	   capitals, as shown here.

690	3.  Definitions and Abbreviations

692	3.1.  Definitions

694	   This document uses the terms and definitions of VVC.  Section 3.1.1
695	   lists relevant definitions from [VVC] for convenience.  Section 3.1.2
696	   provides definitions specific to this memo.  All the used terms and
697	   definitions in this memo are verbatim copies of [VVC] specification.

699	3.1.1.  Definitions from the VVC Specification

701	   Access unit (AU): A set of PUs that belong to different layers and
702	   contain coded pictures associated with the same time for output from
703	   the DPB.

705	   Adaptation parameter set (APS): A syntax structure containing syntax
706	   elements that apply to zero or more slices as determined by zero or
707	   more syntax elements found in slice headers.

709	   Bitstream: A sequence of bits, in the form of a NAL unit stream or a
710	   byte stream, that forms the representation of a sequence of AUs
711	   forming one or more coded video sequences (CVSs).

713	   Coded picture: A coded representation of a picture comprising VCL NAL
714	   units with a particular value of nuh_layer_id within an AU and
715	   containing all CTUs of the picture.

717	   Clean random access (CRA) PU: A PU in which the coded picture is a
718	   CRA picture.

720	   Clean random access (CRA) picture: An IRAP picture for which each VCL
721	   NAL unit has nal_unit_type equal to CRA_NUT.

723	   Coded video sequence (CVS): A sequence of AUs that consists, in
724	   decoding order, of a CVSS AU, followed by zero or more AUs that are
725	   not CVSS AUs, including all subsequent AUs up to but not including
726	   any subsequent AU that is a CVSS AU.

728	   Coded video sequence start (CVSS) AU: An AU in which there is a PU
729	   for each layer in the CVS and the coded picture in each PU is a CLVSS
730	   picture.

732	   Coded layer video sequence (CLVS): A sequence of PUs with the same
733	   value of nuh_layer_id that consists, in decoding order, of a CLVSS
734	   PU, followed by zero or more PUs that are not CLVSS PUs, including
735	   all subsequent PUs up to but not including any subsequent PU that is
736	   a CLVSS PU.

738	   Coded layer video sequence start (CLVSS) PU: A PU in which the coded
739	   picture is a CLVSS picture.

741	   Coded layer video sequence start (CLVSS) picture: A coded picture
742	   that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or
743	   a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.

745	   Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs
746	   of chroma samples of a picture that has three sample arrays, or a CTB
747	   of samples of a monochrome picture or a picture that is coded using
748	   three separate colour planes and syntax structures used to code the
749	   samples.

751	   Decoding Capability Information (DCI): A syntax structure containing
752	   syntax elements that apply to the entire bitstream.

754	   Decoded picture buffer (DPB): A buffer holding decoded pictures for
755	   reference, output reordering, or output delay specified for the
756	   hypothetical reference decoder.

758	   Gradual decoding refresh (GDR) picture: A picture for which each VCL
759	   NAL unit has nal_unit_type equal to GDR_NUT.

761	   Instantaneous decoding refresh (IDR) PU: A PU in which the coded
762	   picture is an IDR picture.

764	   Instantaneous decoding refresh (IDR) picture: An IRAP picture for
765	   which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or
766	   IDR_N_LP.

768	   Intra random access point (IRAP) AU: An AU in which there is a PU for
769	   each layer in the CVS and the coded picture in each PU is an IRAP
770	   picture.

772	   Intra random access point (IRAP) PU: A PU in which the coded picture
773	   is an IRAP picture.

775	   Intra random access point (IRAP) picture: A coded picture for which
776	   all VCL NAL units have the same value of nal_unit_type in the range
777	   of IDR_W_RADL to CRA_NUT, inclusive.

779	   Layer: A set of VCL NAL units that all have a particular value of
780	   nuh_layer_id and the associated non-VCL NAL units.

782	   Network abstraction layer (NAL) unit: A syntax structure containing
783	   an indication of the type of data to follow and bytes containing that
784	   data in the form of an RBSP interspersed as necessary with emulation
785	   prevention bytes.

787	   Network abstraction layer (NAL) unit stream: A sequence of NAL units.

789	   Output Layer Set (OLS): A set of layers for which one or more layers
790	   are specified as the output layers.

792	   Operation point (OP): A temporal subset of an OLS, identified by an
793	   OLS index and a highest value of TemporalId.

795	   Picture parameter set (PPS): A syntax structure containing syntax
796	   elements that apply to zero or more entire coded pictures as
797	   determined by a syntax element found in each slice header.

799	   Picture unit (PU): A set of NAL units that are associated with each
800	   other according to a specified classification rule, are consecutive
801	   in decoding order, and contain exactly one coded picture.

803	   Random access: The act of starting the decoding process for a
804	   bitstream at a point other than the beginning of the stream.

806	   Sequence parameter set (SPS): A syntax structure containing syntax
807	   elements that apply to zero or more entire CLVSs as determined by the
808	   content of a syntax element found in the PPS referred to by a syntax
809	   element found in each picture header.

811	   Slice: An integer number of complete tiles or an integer number of
812	   consecutive complete CTU rows within a tile of a picture that are
813	   exclusively contained in a single NAL unit.

815	   Slice header (SH): A part of a coded slice containing the data
816	   elements pertaining to all tiles or CTU rows within a tile
817	   represented in the slice.

819	   Sublayer: A temporal scalable layer of a temporal scalable bitstream
820	   consisting of VCL NAL units with a particular value of the TemporalId
821	   variable, and the associated non-VCL NAL units.

823	   Subpicture: An rectangular region of one or more slices within a
824	   picture.

826	   Sublayer representation: A subset of the bitstream consisting of NAL
827	   units of a particular sublayer and the lower sublayers.

829	   Tile: A rectangular region of CTUs within a particular tile column
830	   and a particular tile row in a picture.

832	   Tile column: A rectangular region of CTUs having a height equal to
833	   the height of the picture and a width specified by syntax elements in
834	   the picture parameter set.

836	   Tile row: A rectangular region of CTUs having a height specified by
837	   syntax elements in the picture parameter set and a width equal to the
838	   width of the picture.

840	   Video coding layer (VCL) NAL unit: A collective term for coded slice
841	   NAL units and the subset of NAL units that have reserved values of
842	   nal_unit_type that are classified as VCL NAL units in this
843	   Specification.

845	3.1.2.  Definitions Specific to This Memo

847	   Media-Aware Network Element (MANE): A network element, such as a
848	   middlebox, selective forwarding unit, or application-layer gateway
849	   that is capable of parsing certain aspects of the RTP payload headers
850	   or the RTP payload and reacting to their contents.

852	      Informative note: The concept of a MANE goes beyond normal routers
853	      or gateways in that a MANE has to be aware of the signaling (e.g.,
854	      to learn about the payload type mappings of the media streams),
855	      and in that it has to be trusted when working with Secure RTP
856	      (SRTP).  The advantage of using MANEs is that they allow packets
857	      to be dropped according to the needs of the media coding.  For
858	      example, if a MANE has to drop packets due to congestion on a
859	      certain link, it can identify and remove those packets whose
860	      elimination produces the least adverse effect on the user
861	      experience.  After dropping packets, MANEs must rewrite RTCP
862	      packets to match the changes to the RTP stream, as specified in
863	      Section 7 of [RFC3550].

865	   NAL unit decoding order: A NAL unit order that conforms to the
866	   constraints on NAL unit order given in Section 7.4.2.4 in [VVC],
867	   follow the Order of NAL units in the bitstream.

869	   RTP stream (See [RFC7656]): Within the scope of this memo, one RTP
870	   stream is utilized to transport a VVC bitstream, which may contain
871	   one or more layers, and each layer may contain one or more temporal
872	   sublayers.

874	   Transmission order: The order of packets in ascending RTP sequence
875	   number order (in modulo arithmetic).  Within an aggregation packet,
876	   the NAL unit transmission order is the same as the order of
877	   appearance of NAL units in the packet.

879	3.2.  Abbreviations

881	   AU         Access Unit

883	   AP         Aggregation Packet

885	   APS        Adaptation Parameter Set

887	   CTU        Coding Tree Unit

889	   CVS        Coded Video Sequence

891	   DPB        Decoded Picture Buffer

893	   DCI        Decoding Capability Information

895	   DON        Decoding Order Number

897	   FIR        Full Intra Request

899	   FU         Fragmentation Unit

901	   GDR        Gradual Decoding Refresh

903	   HRD        Hypothetical Reference Decoder

905	   IDR        Instantaneous Decoding Refresh
906	   IRAP       Intra Random Access Point

908	   MANE       Media-Aware Network Element

910	   MTU        Maximum Transfer Unit

912	   NAL        Network Abstraction Layer

914	   NALU       Network Abstraction Layer Unit

916	   OLS        Output Layer Set

918	   PLI        Picture Loss Indication

920	   PPS        Picture Parameter Set

922	   RPSI       Reference Picture Selection Indication

924	   SEI        Supplemental Enhancement Information

926	   SLI        Slice Loss Indication

928	   SPS        Sequence Parameter Set

930	   VCL        Video Coding Layer

932	   VPS        Video Parameter Set

934	4.  RTP Payload Format

936	4.1.  RTP Header Usage

938	   The format of the RTP header is specified in [RFC3550] (reprinted as
939	   Figure 2 for convenience).  This payload format uses the fields of
940	   the header in a manner consistent with that specification.

942	   The RTP payload (and the settings for some RTP header bits) for
943	   aggregation packets and fragmentation units are specified in
944	   Section 4.3.2 and Section 4.3.3, respectively.

946	       0                   1                   2                   3
947	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
948	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
949	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
950	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
951	      |                           timestamp                           |
952	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
953	      |           synchronization source (SSRC) identifier            |
954	      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
955	      |            contributing source (CSRC) identifiers             |
956	      |                             ....                              |
957	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

959	                        RTP Header According to [RFC3550]

961	                                  Figure 2

963	   The RTP header information to be set according to this RTP payload
964	   format is set as follows:

966	   Marker bit (M): 1 bit

968	      Set for the last packet, in transmission order, among each set of
969	      packets that contain NAL units of one access unit.  This is in
970	      line with the normal use of the M bit in video formats to allow an
971	      efficient playout buffer handling.

973	   Payload Type (PT): 7 bits

975	      The assignment of an RTP payload type for this new packet format
976	      is outside the scope of this document and will not be specified
977	      here.  The assignment of a payload type has to be performed either
978	      through the profile used or in a dynamic way.

980	   Sequence Number (SN): 16 bits

982	      Set and used in accordance with [RFC3550].

984	   Timestamp: 32 bits
985	      The RTP timestamp is set to the sampling timestamp of the content.
986	      A 90 kHz clock rate MUST be used.  If the NAL unit has no timing
987	      properties of its own (e.g., parameter set and SEI NAL units), the
988	      RTP timestamp MUST be set to the RTP timestamp of the coded
989	      pictures of the access unit in which the NAL unit (according to
990	      Section 7.4.2.4 of [VVC]) is included.  Receivers MUST use the RTP
991	      timestamp for the display process, even when the bitstream
992	      contains picture timing SEI messages or decoding unit information
993	      SEI messages as specified in [VVC].

995	      Informative note: When picture timing SEI messages are present,
996	         the RTP sender is responsible to ensure that the RTP timestamps
997	         are consistent with the timing information carried in the
998	         picture timing SEI messages.

1000	   Synchronization source (SSRC): 32 bits

1002	      Used to identify the source of the RTP packets.  A single SSRC is
1003	      used for all parts of a single bitstream.

1005	4.2.  Payload Header Usage

1007	   The first two bytes of the payload of an RTP packet are referred to
1008	   as the payload header.  The payload header consists of the same
1009	   fields (F, Z, LayerId, Type, and TID) as the NAL unit header as shown
1010	   in Section 1.1.4, irrespective of the type of the payload structure.

1012	   The TID value indicates (among other things) the relative importance
1013	   of an RTP packet, for example, because NAL units belonging to higher
1014	   temporal sublayers are not used for the decoding of lower temporal
1015	   sublayers.  A lower value of TID indicates a higher importance.
1016	   More-important NAL units MAY be better protected against transmission
1017	   losses than less-important NAL units.

1019	4.3.  Payload Structures

1021	   Three different types of RTP packet payload structures are specified.
1022	   A receiver can identify the type of an RTP packet payload through the
1023	   Type field in the payload header.

1025	   The three different payload structures are as follows:

1027	   *  Single NAL unit packet: Contains a single NAL unit in the payload,
1028	      and the NAL unit header of the NAL unit also serves as the payload
1029	      header.  This payload structure is specified in Section 4.4.1.

1031	   *  Aggregation Packet (AP): Contains more than one NAL unit within
1032	      one access unit.  This payload structure is specified in
1033	      Section 4.3.2.

1035	   *  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
1036	      This payload structure is specified in Section 4.3.3.

1038	4.3.1.  Single NAL Unit Packets

1040	   A single NAL unit packet contains exactly one NAL unit, and consists
1041	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
1042	   DONL field (in network byte order), and the NAL unit payload data
1043	   (the NAL unit excluding its NAL unit header) of the contained NAL
1044	   unit, as shown in Figure 3.

1046	      0                   1                   2                   3
1047	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1048	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1049	     |           PayloadHdr          |      DONL (conditional)       |
1050	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1051	     |                                                               |
1052	     |                  NAL unit payload data                        |
1053	     |                                                               |
1054	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1055	     |                               :...OPTIONAL RTP padding        |
1056	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1058	                  The Structure of a Single NAL Unit Packet

1060	                                  Figure 3

1062	   The DONL field, when present, specifies the value of the 16 least
1063	   significant bits of the decoding order number of the contained NAL
1064	   unit.  If sprop-max-don-diff is greater than 0, the DONL field MUST
1065	   be present, and the variable DON for the contained NAL unit is
1066	   derived as equal to the value of the DONL field.  Otherwise (sprop-
1067	   max-don-diff is equal to 0), the DONL field MUST NOT be present.

1069	4.3.2.  Aggregation Packets (APs)

1071	   Aggregation Packets (APs) can reduce packetization overhead for small
1072	   NAL units, such as most of the non-VCL NAL units, which are often
1073	   only a few octets in size.

1075	   An AP aggregates NAL units of one access unit and it MUST NOT contain
1076	   NAL units from more than one AU.  Each NAL unit to be carried in an
1077	   AP is encapsulated in an aggregation unit.  NAL units aggregated in
1078	   one AP are included in NAL unit decoding order.

1080	   An AP consists of a payload header (denoted as PayloadHdr) followed
1081	   by two or more aggregation units, as shown in Figure 4.

1083	     0                   1                   2                   3
1084	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1085	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1086	    |    PayloadHdr (Type=28)       |                               |
1087	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1088	    |                                                               |
1089	    |             two or more aggregation units                     |
1090	    |                                                               |
1091	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1092	    |                               :...OPTIONAL RTP padding        |
1093	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1095	                   The Structure of an Aggregation Packet

1097	                                  Figure 4

1099	   The fields in the payload header of an AP are set as follows.  The F
1100	   bit MUST be equal to 0 if the F bit of each aggregated NAL unit is
1101	   equal to zero; otherwise, it MUST be equal to 1.  The Type field MUST
1102	   be equal to 28.

1104	   The value of LayerId MUST be equal to the lowest value of LayerId of
1105	   all the aggregated NAL units.  The value of TID MUST be the lowest
1106	   value of TID of all the aggregated NAL units.

1108	      Informative note: All VCL NAL units in an AP have the same TID
1109	      value since they belong to the same access unit.  However, an AP
1110	      may contain non-VCL NAL units for which the TID value in the NAL
1111	      unit header may be different than the TID value of the VCL NAL
1112	      units in the same AP.

1114	      Informative Note: If a system envisions sub-picture level or
1115	      picture level modifications, for example by removing sub-pictures
1116	      or pictures of a particular layer, a good design choice on the
1117	      sender's side would be to aggregate NAL units belonging to only
1118	      the same sub-picture or picture of a particular layer.

1120	   An AP MUST carry at least two aggregation units and can carry as many
1121	   aggregation units as necessary; however, the total amount of data in
1122	   an AP obviously MUST fit into an IP packet, and the size SHOULD be
1123	   chosen so that the resulting IP packet is smaller than the MTU size
1124	   so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
1125	   specified in Section 4.3.3.  APs MUST NOT be nested; i.e., an AP can
1126	   not contain another AP.

1128	   The first aggregation unit in an AP consists of a conditional 16-bit
1129	   DONL field (in network byte order) followed by a 16-bit unsigned size
1130	   information (in network byte order) that indicates the size of the
1131	   NAL unit in bytes (excluding these two octets, but including the NAL
1132	   unit header), followed by the NAL unit itself, including its NAL unit
1133	   header, as shown in Figure 5.

1135	     0                   1                   2                   3
1136	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1137	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1138	    |               :       DONL (conditional)      |   NALU size   |
1139	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1140	    |   NALU size   |                                               |
1141	    +-+-+-+-+-+-+-+-+         NAL unit                              |
1142	    |                                                               |
1143	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1144	    |                               :
1145	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1147	           The Structure of the First Aggregation Unit in an AP

1149	                                  Figure 5

1151	   The DONL field, when present, specifies the value of the 16 least
1152	   significant bits of the decoding order number of the aggregated NAL
1153	   unit.

1155	   If sprop-max-don-diff is greater than 0, the DONL field MUST be
1156	   present in an aggregation unit that is the first aggregation unit in
1157	   an AP, and the variable DON for the aggregated NAL unit is derived as
1158	   equal to the value of the DONL field, and the variable DON for an
1159	   aggregation unit that is not the first aggregation unit in an AP
1160	   aggregated NAL unit is derived as equal to the DON of the preceding
1161	   aggregated NAL unit in the same AP plus 1 modulo 65536.  Otherwise
1162	   (sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
1163	   present in an aggregation unit that is the first aggregation unit in
1164	   an AP.

1166	   An aggregation unit that is not the first aggregation unit in an AP
1167	   will be followed immediately by a 16-bit unsigned size information
1168	   (in network byte order) that indicates the size of the NAL unit in
1169	   bytes (excluding these two octets, but including the NAL unit
1170	   header), followed by the NAL unit itself, including its NAL unit
1171	   header, as shown in Figure 6.

1173	     0                   1                   2                   3
1174	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1175	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1176	    |               :       NALU size               |   NAL unit    |
1177	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1178	    |                                                               |
1179	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1180	    |                               :
1181	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1183	         The Structure of an Aggregation Unit That Is Not the First
1184	                          Aggregation Unit in an AP

1186	                                  Figure 6

1188	   Figure 7 presents an example of an AP that contains two aggregation
1189	   units, labeled as 1 and 2 in the figure, without the DONL field being
1190	   present.

1192	     0                   1                   2                   3
1193	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1194	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1195	    |                          RTP Header                           |
1196	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1197	    |   PayloadHdr (Type=28)        |         NALU 1 Size           |
1198	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1199	    |          NALU 1 HDR           |                               |
1200	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1201	    |                   . . .                                       |
1202	    |                                                               |
1203	    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1204	    |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1205	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1206	    | NALU 2 HDR    |                                               |
1207	    +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1208	    |                   . . .                                       |
1209	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1210	    |                               :...OPTIONAL RTP padding        |
1211	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1213	               An Example of an AP Packet Containing
1214	             Two Aggregation Units without the DONL Field

1216	                                  Figure 7

1218	   Figure 8 presents an example of an AP that contains two aggregation
1219	   units, labeled as 1 and 2 in the figure, with the DONL field being
1220	   present.

1222	     0                   1                   2                   3
1223	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1224	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1225	    |                          RTP Header                           |
1226	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1227	    |   PayloadHdr (Type=28)        |        NALU 1 DONL            |
1228	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1229	    |          NALU 1 Size          |            NALU 1 HDR         |
1230	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1231	    |                                                               |
1232	    |                 NALU 1 Data   . . .                           |
1233	    |                                                               |
1234	    +        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1235	    |                               :          NALU 2 Size          |
1236	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1237	    |          NALU 2 HDR           |                               |
1238	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1239	    |                                                               |
1240	    |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1241	    |                               :...OPTIONAL RTP padding        |
1242	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1244	                   An Example of an AP Containing
1245	                 Two Aggregation Units with the DONL Field

1247	                                  Figure 8

1249	4.3.3.  Fragmentation Units

1251	   Fragmentation Units (FUs) are introduced to enable fragmenting a
1252	   single NAL unit into multiple RTP packets, possibly without
1253	   cooperation or knowledge of the [VVC] encoder.  A fragment of a NAL
1254	   unit consists of an integer number of consecutive octets of that NAL
1255	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1256	   order with ascending RTP sequence numbers (with no other RTP packets
1257	   within the same RTP stream being sent between the first and last
1258	   fragment).

1260	   When a NAL unit is fragmented and conveyed within FUs, it is referred
1261	   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
1262	   NOT be nested; i.e., an FU can not contain a subset of another FU.

1264	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1265	   time of the fragmented NAL unit.

1267	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1268	   header of one octet, a conditional 16-bit DONL field (in network byte
1269	   order), and an FU payload, as shown in Figure 9.

1271	     0                   1                   2                   3
1272	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1273	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1274	    |   PayloadHdr (Type=29)        |   FU header   | DONL (cond)   |
1275	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1276	    |   DONL (cond) |                                               |
1277	    |-+-+-+-+-+-+-+-+                                               |
1278	    |                         FU payload                            |
1279	    |                                                               |
1280	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1281	    |                               :...OPTIONAL RTP padding        |
1282	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1284	                          The Structure of an FU

1286	                                  Figure 9

1288	   The fields in the payload header are set as follows.  The Type field
1289	   MUST be equal to 29.  The fields F, LayerId, and TID MUST be equal to
1290	   the fields F, LayerId, and TID, respectively, of the fragmented NAL
1291	   unit.

1293	   The FU header consists of an S bit, an E bit, an R bit and a 5-bit
1294	   FuType field, as shown in Figure 10.

1296	                             +---------------+
1297	                             |0|1|2|3|4|5|6|7|
1298	                             +-+-+-+-+-+-+-+-+
1299	                             |S|E|P|  FuType |
1300	                             +---------------+

1302	                       The Structure of FU Header

1304	                                 Figure 10

1306	   The semantics of the FU header fields are as follows:

1308	   S: 1 bit

1310	      When set to 1, the S bit indicates the start of a fragmented NAL
1311	      unit, i.e., the first byte of the FU payload is also the first
1312	      byte of the payload of the fragmented NAL unit.  When the FU
1313	      payload is not the start of the fragmented NAL unit payload, the S
1314	      bit MUST be set to 0.

1316	   E: 1 bit

1318	      When set to 1, the E bit indicates the end of a fragmented NAL
1319	      unit, i.e., the last byte of the payload is also the last byte of
1320	      the fragmented NAL unit.  When the FU payload is not the last
1321	      fragment of a fragmented NAL unit, the E bit MUST be set to 0.

1323	   P: 1 bit

1325	      When set to 1, the P bit indicates the last FU of the last VCL NAL
1326	      unit of a coded picture, i.e., the last byte of the FU payload is
1327	      also the last byte of the last VCL NAL unit of the coded picture.
1328	      When the FU payload is not the last fragment of the last VCL NAL
1329	      unit of a coded picture, the P bit MUST be set to 0.

1331	   FuType: 5 bits

1333	      The field FuType MUST be equal to the field Type of the fragmented
1334	      NAL unit.

1336	   The DONL field, when present, specifies the value of the 16 least
1337	   significant bits of the decoding order number of the fragmented NAL
1338	   unit.

1340	   If sprop-max-don-diff is greater than 0, and the S bit is equal to 1,
1341	   the DONL field MUST be present in the FU, and the variable DON for
1342	   the fragmented NAL unit is derived as equal to the value of the DONL
1343	   field.  Otherwise (sprop-max-don-diff is equal to 0, or the S bit is
1344	   equal to 0), the DONL field MUST NOT be present in the FU.

1346	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1347	   the Start bit and End bit must not both be set to 1 in the same FU
1348	   header.

1350	   The FU payload consists of fragments of the payload of the fragmented
1351	   NAL unit so that if the FU payloads of consecutive FUs, starting with
1352	   an FU with the S bit equal to 1 and ending with an FU with the E bit
1353	   equal to 1, are sequentially concatenated, the payload of the
1354	   fragmented NAL unit can be reconstructed.  The NAL unit header of the
1355	   fragmented NAL unit is not included as such in the FU payload, but
1356	   rather the information of the NAL unit header of the fragmented NAL
1357	   unit is conveyed in F, LayerId, and TID fields of the FU payload
1358	   headers of the FUs and the FuType field of the FU header of the FUs.
1359	   An FU payload MUST NOT be empty.

1361	   If an FU is lost, the receiver SHOULD discard all following
1362	   fragmentation units in transmission order corresponding to the same
1363	   fragmented NAL unit, unless the decoder in the receiver is known to
1364	   be prepared to gracefully handle incomplete NAL units.

1366	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1367	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1368	   n of that NAL unit is not received.  In this case, the
1369	   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
1370	   syntax violation.

1372	4.4.  Decoding Order Number

1374	   For each NAL unit, the variable AbsDon is derived, representing the
1375	   decoding order number that is indicative of the NAL unit decoding
1376	   order.

1378	   Let NAL unit n be the n-th NAL unit in transmission order within an
1379	   RTP stream.

1381	   If sprop-max-don-diff is equal to 0, AbsDon[n], the value of AbsDon
1382	   for NAL unit n, is derived as equal to n.

1384	   Otherwise (sprop-max-don-diff is greater than 0), AbsDon[n] is
1385	   derived as follows, where DON[n] is the value of the variable DON for
1386	   NAL unit n:

1388	   *  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
1389	      transmission order), AbsDon[0] is set equal to DON[0].

1391	   *  Otherwise (n is greater than 0), the following applies for
1392	      derivation of AbsDon[n]:

1394	         If DON[n] == DON[n-1],
1395	            AbsDon[n] = AbsDon[n-1]

1397	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1398	            AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1400	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1401	            AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1403	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1404	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1406	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1407	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1409	   For any two NAL units m and n, the following applies:

1411	   *  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
1412	      NAL unit m in NAL unit decoding order.

1414	   *  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1415	      of the two NAL units can be in either order.

1417	   *  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1418	      NAL unit m in decoding order.

1420	      Informative note: When two consecutive NAL units in the NAL
1421	         unit decoding order have different values of AbsDon, the
1422	         absolute difference between the two AbsDon values may be
1423	         greater than or equal to 1.

1425	      Informative note: There are multiple reasons to allow for the
1426	         absolute difference of the values of AbsDon for two consecutive
1427	         NAL units in the NAL unit decoding order to be greater than
1428	         one.  An increment by one is not required, as at the time of
1429	         associating values of AbsDon to NAL units, it may not be known
1430	         whether all NAL units are to be delivered to the receiver.  For
1431	         example, a gateway might not forward VCL NAL units of higher
1432	         sublayers or some SEI NAL units when there is congestion in the
1433	         network.  In another example, the first intra-coded picture of
1434	         a pre-encoded clip is transmitted in advance to ensure that it
1435	         is readily available in the receiver, and when transmitting the
1436	         first intra-coded picture, the originator does not exactly know
1437	         how many NAL units will be encoded before the first intra-coded
1438	         picture of the pre-encoded clip follows in decoding order.
1439	         Thus, the values of AbsDon for the NAL units of the first
1440	         intra-coded picture of the pre-encoded clip have to be
1441	         estimated when they are transmitted, and gaps in values of
1442	         AbsDon may occur.

1444	5.  Packetization Rules

1446	   The following packetization rules apply:

1448	   *  If sprop-max-don-diff is greater than 0, the transmission order of
1449	      NAL units carried in the RTP stream MAY be different than the NAL
1450	      unit decoding order.  Otherwise (sprop-max-don-diff is equal to
1451	      0), the transmission order of NAL units carried in the RTP stream
1452	      MUST be the same as the NAL unit decoding order.

1454	   *  A NAL unit of a small size SHOULD be encapsulated in an
1455	      aggregation packet together with one or more other NAL units in
1456	      order to avoid the unnecessary packetization overhead for small
1457	      NAL units.  For example, non-VCL NAL units such as access unit
1458	      delimiters, parameter sets, or SEI NAL units are typically small
1459	      and can often be aggregated with VCL NAL units without violating
1460	      MTU size constraints.

1462	   *  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
1463	      viewpoint, be encapsulated in an aggregation packet together with
1464	      its associated VCL NAL unit, as typically a non-VCL NAL unit would
1465	      be meaningless without the associated VCL NAL unit being
1466	      available.

1468	   *  For carrying exactly one NAL unit in an RTP packet, a single NAL
1469	      unit packet MUST be used.

1471	6.  De-packetization Process

1473	   The general concept behind de-packetization is to get the NAL units
1474	   out of the RTP packets in an RTP stream and pass them to the decoder
1475	   in the NAL unit decoding order.

1477	   The de-packetization process is implementation dependent.  Therefore,
1478	   the following description should be seen as an example of a suitable
1479	   implementation.  Other schemes may be used as well, as long as the
1480	   output for the same input is the same as the process described below.
1481	   The output is the same when the set of output NAL units and their
1482	   order are both identical.  Optimizations relative to the described
1483	   algorithms are possible.

1485	   All normal RTP mechanisms related to buffer management apply.  In
1486	   particular, duplicated or outdated RTP packets (as indicated by the
1487	   RTP sequence number and the RTP timestamp) are removed.  To determine
1488	   the exact time for decoding, factors such as a possible intentional
1489	   delay to allow for proper inter-stream synchronization MUST be
1490	   factored in.

1492	   NAL units with NAL unit type values in the range of 0 to 27,
1493	   inclusive, may be passed to the decoder.  NAL-unit-like structures
1494	   with NAL unit type values in the range of 28 to 31, inclusive, MUST
1495	   NOT be passed to the decoder.

1497	   The receiver includes a receiver buffer, which is used to compensate
1498	   for transmission delay jitter within individual RTP stream, and to
1499	   reorder NAL units from transmission order to the NAL unit decoding
1500	   order.  In this section, the receiver operation is described under
1501	   the assumption that there is no transmission delay jitter within an
1502	   RTP stream.  To make a difference from a practical receiver buffer
1503	   that is also used for compensation of transmission delay jitter, the
1504	   receiver buffer is hereafter called the de-packetization buffer in
1505	   this section.  Receivers should also prepare for transmission delay
1506	   jitter; that is, either reserve separate buffers for transmission
1507	   delay jitter buffering and de-packetization buffering or use a
1508	   receiver buffer for both transmission delay jitter and de-
1509	   packetization.  Moreover, receivers should take transmission delay
1510	   jitter into account in the buffering operation, e.g., by additional
1511	   initial buffering before starting of decoding and playback.

1513	   The de-packetization process extracts the NAL units from the RTP
1514	   packets in an RTP stream as follows.  When an RTP packet carries a
1515	   single NAL unit packet, the payload of the RTP packet is extracted as
1516	   a single NAL unit, excluding the DONL field, i.e., third and fourth
1517	   bytes, when sprop-max-don-diff is greater than 0.  When an RTP packet
1518	   carries an Aggregation Packet, several NAL units are extracted from
1519	   the payload of the RTP packet.  In this case, each NAL unit
1520	   corresponds to the part of the payload of each aggregation unit that
1521	   follows the NALU size field as described in Section 4.3.2.  When an
1522	   RTP packet carries a Fragmentation Unit (FU), all RTP packets from
1523	   the first FU (with the S field equal to 1) of the fragmented NAL unit
1524	   up to the last FU (with the E field equal to 1) of the fragmented NAL
1525	   unit are collected.  The NAL unit is extracted from these RTP packets
1526	   by concatenating all FU payloads in the same order as the
1527	   corresponding RTP packets and appending the NAL unit header with the
1528	   fields F, LayerId, and TID, set to equal to the values of the fields
1529	   F, LayerId, and TID in the payload header of the FUs respectively,
1530	   and with the NAL unit type set equal to the value of the field FuType
1531	   in the FU header of the FUs, as described in Section 4.3.3.

1533	   When sprop-max-don-diff is equal to 0, the de-packetization buffer
1534	   size is zero bytes, and the NAL units carried in the single RTP
1535	   stream are directly passed to the decoder in their transmission
1536	   order, which is identical to their decoding order.

1538	   When sprop-max-don-diff is greater than 0, the process described in
1539	   the remainder of this section applies.

1541	   There are two buffering states in the receiver: initial buffering and
1542	   buffering while playing.  Initial buffering starts when the reception
1543	   is initialized.  After initial buffering, decoding and playback are
1544	   started, and the buffering-while-playing mode is used.

1546	   Regardless of the buffering state, the receiver stores incoming NAL
1547	   units in reception order into the de-packetization buffer.  NAL units
1548	   carried in RTP packets are stored in the de-packetization buffer
1549	   individually, and the value of AbsDon is calculated and stored for
1550	   each NAL unit.

1552	   Initial buffering lasts until the difference between the greatest and
1553	   smallest AbsDon values of the NAL units in the de-packetization
1554	   buffer is greater than or equal to the value of sprop-max-don-diff.

1556	   After initial buffering, whenever the difference between the greatest
1557	   and smallest AbsDon values of the NAL units in the de-packetization
1558	   buffer is greater than or equal to the value of sprop-max-don-diff,
1559	   the following operation is repeatedly applied until this difference
1560	   is smaller than sprop-max-don-diff:

1562	   *  The NAL unit in the de-packetization buffer with the smallest
1563	      value of AbsDon is removed from the de-packetization buffer and
1564	      passed to the decoder.

1566	   When no more NAL units are flowing into the de-packetization buffer,
1567	   all NAL units remaining in the de-packetization buffer are removed
1568	   from the buffer and passed to the decoder in the order of increasing
1569	   AbsDon values.

1571	7.  Payload Format Parameters

1573	   This section specifies the optional parameters.  A mapping of the
1574	   parameters with Session Description Protocol (SDP) [RFC4556] is also
1575	   provided for applications that use SDP.

1577	7.1.  Media Type Registration

1579	   The receiver MUST ignore any parameter unspecified in this memo.

1581	   Type name:            video

1583	   Subtype name:         H266

1585	   Required parameters:  N/A

1587	   Optional parameters:

1589	      profile-id, tier-flag, sub-profile-id, interop-constraints, level-
1590	      id, sprop-sublayer-id, sprop-ols-id, recv-sublayer-id, recv-ols-
1591	      id, max-recv-level-id, sprop-dci, sprop-vps, sprop-sps, sprop-pps,
1592	      sprop-sei, max-lsr, max-fps, sprop-max-don-diff, sprop-depack-buf-
1593	      bytes, depack-buf-cap (Refer to Section 7.2 for definitions).

1595	   Encoding considerations:

1597	      This type is only defined for transfer via RTP (RFC 3550).

1599	   Security considerations:

1601	      See Section 9 of RFC XXXX.

1603	   Interoperability considerations: N/A

1605	   Published specification:

1607	      Please refer to RFC XXXX and its Section 13.

1609	   Applications that use this media type:

1611	      Any application that relies on VVC-based video services over RTP

1613	   Fragment identifier considerations: N/A

1615	   Additional information: N/A

1617	   Person & email address to contact for further information:

1619	      Stephan Wenger (stewe@stewe.org)

1621	   Intended usage: COMMON

1623	   Restrictions on usage: N/A

1625	   Author: See Authors' Addresses section of RFC XXXX.

1627	   Change controller:

1629	      IETF Audio/Video Transport Core Maintenance Working Group
1630	      delegated from the IESG.

1632	7.2.  Optional Parameters Definition

1634	   profile-id, tier-flag, sub-profile-id, interop-constraints, and
1635	   level-id:

1637	      These parameters indicate the profile, tier, default level, sub-
1638	      profile, and some constraints of the bitstream carried by the RTP
1639	      stream, or a specific set of the profile, tier, default level,
1640	      sub-profile and some constraints the receiver supports.

1642	      The subset of coding tools that may have been used to generate the
1643	      bitstream or that the receiver supports, as well as some
1644	      additional constraints are indicated collectively by profile-id,
1645	      sub-profile-id, and interop-constraints.

1647	      Informative note: There are 128 values of profile-id.  The
1648	         subset of coding tools identified by the profile-id can be
1649	         further constrained with up to 255 instances of sub-profile-id.
1650	         In addition, 68 bits included in interop-constraints, which can
1651	         be extended up to 324 bits provide means to further restrict
1652	         tools from existing profiles.  To be able to support this fine-
1653	         granular signaling of coding tool subsets with profile-id, sub-
1654	         profile-id and interop-constraints, it would be safe to require
1655	         symmetric use of these parameters in SDP offer/answer unless
1656	         recv-ols-id is included in the SDP answer for choosing one of
1657	         the layers offered.

1659	      The tier is indicated by tier-flag.  The default level is
1660	      indicated by level-id.  The tier and the default level specify the
1661	      limits on values of syntax elements or arithmetic combinations of
1662	      values of syntax elements that are followed when generating the
1663	      bitstream or that the receiver supports.

1665	      In SDP offer/answer, when the SDP answer does not include the
1666	      recv-ols-id parameter that is less than the sprop-ols-id parameter
1667	      in the SDP offer, the following applies:

1669	      -  The tier-flag, profile-id, sub-profile-id, and interop-
1670	         constraints parameters MUST be used symmetrically, i.e., the
1671	         value of each of these parameters in the offer MUST be the same
1672	         as that in the answer, either explicitly signaled or implicitly
1673	         inferred.

1675	      -  The level-id parameter is changeable as long as the highest
1676	         level indicated by the answer is either equal to or lower than
1677	         that in the offer.  Note that a highest level higher than
1678	         level-id in the offer for receiving can be included as max-
1679	         recv-level-id.

1681	      In SDP offer/answer, when the SDP answer does include the recv-
1682	         ols-id parameter that is less than the sprop-ols-id parameter
1683	         in the SDP offer, the set of tier-flag, profile-id, sub-
1684	         profile-id, interop-constraints, and level-id parameters
1685	         included in the answer MUST be consistent with that for the
1686	         chosen output layer set as indicated in the SDP offer, with the
1687	         exception that the level-id parameter in the SDP answer is
1688	         changeable as long as the highest level indicated by the answer
1689	         is either lower than or equal to that in the offer.

1691	      More specifications of these parameters, including how they relate
1692	      to syntax elements specified in [VVC] are provided below.

1694	   profile-id:

1696	      When profile-id is not present, a value of 1 (i.e., the Main 10
1697	      profile) MUST be inferred.

1699	      When used to indicate properties of a bitstream, profile-id is
1700	      derived from the general_profile_idc syntax element that applies
1701	      to the bitstream in an instance of the profile_tier_level( )
1702	      syntax structure.

1704	      VVC bitstreams transported over RTP using the technologies of this
1705	      memo SHOULD contain only a single profile_tier_level( ) structure
1706	      in the DCI, unless the sender can assure that a receiver can
1707	      correctly decode the VVC bitstream regardless of which
1708	      profile_tier_level( ) structure contained in the DCI was used for
1709	      deriving profile-id and other parameters for the SDP O/A exchange.

1711	      As specified in [VVC], a profile_tier_level( ) syntax structure
1712	      may be contained in an SPS NAL unit, and one or more
1713	      profile_tier_level( ) syntax structures may be contained in a VPS
1714	      NAL unit and in a DCI NAL unit.  One of the following three cases
1715	      applies to the container NAL unit of the profile_tier_level( )
1716	      syntax structure containing syntax elements used to derive the
1717	      values of profile-id, tier-flag, level-id, sub-profile-id, or
1718	      interop-constraints: 1) The container NAL unit is an SPS, the
1719	      bitstream is a single-layer bitstream, and the profile_tier_level(
1720	      ) syntax structures in all SPSs referenced by the CVSs in the
1721	      bitstream has the same values respectively for those
1722	      profile_tier_level( ) syntax elements; 2) The container NAL unit
1723	      is a VPS, the profile_tier_level( ) syntax structure is the one in
1724	      the VPS that applies to the OLS corresponding to the bitstream,
1725	      and the profile_tier_level( ) syntax structures applicable to the
1726	      OLS corresponding to the bitstream in all VPSs referenced by the
1727	      CVSs in the bitstream have the same values respectively for those
1728	      profile_tier_level( ) syntax elements; 3) The container NAL unit
1729	      is a DCI NAL unit and the profile_tier_level( ) syntax structures
1730	      in all DCI NAL units in the bitstream has the same values
1731	      respectively for those profile_tier_level( ) syntax elements.

1733	      [VVC] allows for multiple profile_tier_level( ) structures in a
1734	      DCI NAL unit, which may contain different values for the syntax
1735	      elements used to derive the values of profile-id, tier-flag,
1736	      level-id, sub-profile-id, or interop-constraints in the different
1737	      entries.  However, herein defined is only a single profile-id,
1738	      tier-flag, level-id, sub-profile-id, or interop-constraints.  When
1739	      signaling these parameters and a DCI NAL unit is present with
1740	      multiple profile_tier_level( ) structures, these values SHOULD be
1741	      the same as the first profile_tier_level structure in the DCI,
1742	      unless the sender has ensured that the receiver can decode the
1743	      bitstream when a different value is chosen.

1745	   tier-flag, level-id:

1747	      The value of tier-flag MUST be in the range of 0 to 1, inclusive.
1748	      The value of level-id MUST be in the range of 0 to 255, inclusive.

1750	      If the tier-flag and level-id parameters are used to indicate
1751	      properties of a bitstream, they indicate the tier and the highest
1752	      level the bitstream complies with.

1754	      If the tier-flag and level-id parameters are used for capability
1755	      exchange, the following applies.  If max-recv-level-id is not
1756	      present, the default level defined by level-id indicates the
1757	      highest level the codec wishes to support.  Otherwise, max-recv-
1758	      level-id indicates the highest level the codec supports for
1759	      receiving.  For either receiving or sending, all levels that are
1760	      lower than the highest level supported MUST also be supported.

1762	      If no tier-flag is present, a value of 0 MUST be inferred; if no
1763	      level-id is present, a value of 51 (i.e., level 3.1) MUST be
1764	      inferred.

1766	      Informative note: The level values currently defined in the VVC
1767	         specification are in the form of "majorNum.minorNum", and the
1768	         value of the level-id for each of the levels is equal to
1769	         majorNum * 16 + minorNum * 3.  It is expected that if any
1770	         levels are defined in the future, the same convention will be
1771	         used, but this cannot be guaranteed.

1773	      When used to indicate properties of a bitstream, the tier-flag and
1774	      level-id parameters are derived respectively from the syntax
1775	      element general_tier_flag, and the syntax element
1776	      general_level_idc or sub_layer_level_idc[j], that apply to the
1777	      bitstream, in an instance of the profile_tier_level( ) syntax
1778	      structure.

1780	      If the tier-flag and level-id are derived from the
1781	      profile_tier_level( ) syntax structure in a DCI NAL unit, the
1782	      following applies:

1784	      -  tier-flag = general_tier_flag

1786	      -  level-id = general_level_idc

1788	      Otherwise, if the tier-flag and level-id are derived from the
1789	      profile_tier_level( ) syntax structure in an SPS or VPS NAL unit,
1790	      and the bitstream contains the highest sublayer representation in
1791	      the OLS corresponding to the bitstream, the following applies:

1793	      -  tier-flag = general_tier_flag

1795	      -  level-id = general_level_idc

1797	      Otherwise, if the tier-flag and level-id are derived from the
1798	         profile_tier_level( ) syntax structure in an SPS or VPS NAL
1799	         unit, and the bitstream does not contain the highest sublayer
1800	         representation in the OLS corresponding to the bitstream, the
1801	         following applies, with j being the value of the sprop-
1802	         sublayer-id parameter:

1804	      -  tier-flag = general_tier_flag

1806	      -  level-id = sub_layer_level_idc[j]

1808	   sub-profile-id:

1810	      The value of the parameter is a comma-separated (',') list of data
1811	      using base64 [RFC4648] representation.

1813	      When used to indicate properties of a bitstream, sub-profile-id is
1814	      derived from each of the ptl_num_sub_profiles
1815	      general_sub_profile_idc[i] syntax elements that apply to the
1816	      bitstream in a profile_tier_level( ) syntax structure.

1818	   interop-constraints:

1820	      A base64 [RFC4648] representation of the data that includes the
1821	      syntax elements ptl_frame_only_constraint_flag and
1822	      ptl_multilayer_enabled_flag and the general_constraints_info( )
1823	      syntax structure that apply to the bitstream in an instance of the
1824	      profile_tier_level( ) syntax structure.

1826	      If the interop-constraints parameter is not present, the following
1827	      MUST be inferred:

1829	      -  ptl_frame_only_constraint_flag = 1

1831	      -  ptl_multilayer_enabled_flag = 0

1833	      -  gci_present_flag in the general_constraints_info( ) syntax
1834	         structure = 0

1836	      Using interop-constraints for capability exchange results in a
1837	      requirement on any bitstream to be compliant with the interop-
1838	      constraints.

1840	   sprop-sublayer-id:

1842	      This parameter MAY be used to indicate the highest allowed value
1843	      of TID in the bitstream.  When not present, the value of sprop-
1844	      sublayer-id is inferred to be equal to 6.

1846	      The value of sprop-sublayer-id MUST be in the range of 0 to 6,
1847	      inclusive.

1849	   sprop-ols-id:

1851	      This parameter MAY be used to indicate the OLS that the bitstream
1852	      applies to.  When not present, the value of sprop-ols-id is
1853	      inferred to be equal to TargetOlsIdx as specified in 8.1.1 in
1854	      [VVC].  If this optional parameter is present, sprop-vps MUST also
1855	      be present or its content MUST be known a priori at the receiver.

1857	      The value of sprop-ols-id MUST be in the range of 0 to 256,
1858	      inclusive.

1860	      Informative note: VVC allows having up to 257 output layer sets
1861	         indicated in the VPS as the number of output layer sets minus 2
1862	         is indicated with a field of 8 bits.

1864	   recv-sublayer-id:

1866	      This parameter MAY be used to signal a receiver's choice of the
1867	      offered or declared sublayer representations in the sprop-vps and
1868	      sprop-sps.  The value of recv-sublayer-id indicates the TID of the
1869	      highest sublayer that a receiver supports.  When not present, the
1870	      value of recv-sublayer-id is inferred to be equal to the value of
1871	      the sprop-sublayer-id parameter in the SDP offer.

1873	      The value of recv-sublayer-id MUST be in the range of 0 to 6,
1874	      inclusive.

1876	   recv-ols-id:

1878	      This parameter MAY be used to signal a receiver's choice of the
1879	      offered or declared output layer sets in the sprop-vps.  The value
1880	      of recv-ols-id indicates the OLS index of the bitstream that a
1881	      receiver supports.  When not present, the value of recv-ols-id is
1882	      inferred to be equal to value of the sprop-ols-id parameter
1883	      inferred from or indicated in the SDP offer.  When present, the
1884	      value of recv-ols-id must be included only when sprop-ols-id was
1885	      received and must refer to an output layer set in the VPS that
1886	      includes no layers other than all or a subset of the layers of the
1887	      OLS referred to by sprop-ols-id.  If this optional parameter is
1888	      present, sprop-vps must have been received or its content must be
1889	      known a priori at the receiver.

1891	      The value of recv-ols-id MUST be in the range of 0 to 256,
1892	      inclusive.

1894	   max-recv-level-id:

1896	      This parameter MAY be used to indicate the highest level a
1897	      receiver supports.

1899	      The value of max-recv-level-id MUST be in the range of 0 to 255,
1900	      inclusive.

1902	      When max-recv-level-id is not present, the value is inferred to be
1903	      equal to level-id.

1905	      max-recv-level-id MUST NOT be present when the highest level the
1906	      receiver supports is not higher than the default level.

1908	   sprop-dci:

1910	      This parameter MAY be used to convey a decoding capability
1911	      information NAL unit of the bitstream for out-of-band
1912	      transmission.  The parameter MAY also be used for capability
1913	      exchange.  The value of the parameter a base64 [RFC4648]
1914	      representations of the decoding capability information NAL unit as
1915	      specified in Section 7.3.2.1 of [VVC].

1917	   sprop-vps:

1919	      This parameter MAY be used to convey any video parameter set NAL
1920	      unit of the bitstream for out-of-band transmission of video
1921	      parameter sets.  The parameter MAY also be used for capability
1922	      exchange and to indicate sub-stream characteristics (i.e.,
1923	      properties of output layer sets and sublayer representations as
1924	      defined in [VVC]).  The value of the parameter is a comma-
1925	      separated (',') list of base64 [RFC4648] representations of the
1926	      video parameter set NAL units as specified in Section 7.3.2.3 of
1927	      [VVC].

1929	      The sprop-vps parameter MAY contain one or more than one video
1930	      parameter set NAL units.  However, all other video parameter sets
1931	      contained in the sprop-vps parameter MUST be consistent with the
1932	      first video parameter set in the sprop-vps parameter.  A video
1933	      parameter set vpsB is said to be consistent with another video
1934	      parameter set vpsA if the number of OLSs in vpsA and vpsB is the
1935	      same and any decoder that conforms to the profile, tier, level,
1936	      and constraints indicated by the data starting from the syntax
1937	      element general_profile_idc to the syntax structure
1938	      general_constraints_info(), inclusive, in the profile_tier_level(
1939	      ) syntax structure corresponding to any OLS with index olsIdx in
1940	      vpsA can decode any CVS(s) referencing vpsB when TargetOlsIdx is
1941	      equal to olsIdx that conforms to the profile, tier, level, and
1942	      constraints indicated by the data starting from the syntax element
1943	      general_profile_idc to the syntax structure
1944	      general_constraints_info(), inclusive, in the profile_tier_level(
1945	      ) syntax structure corresponding to the OLS with index
1946	      TargetOlsIdx in vpsB.

1948	   sprop-sps:

1950	      This parameter MAY be used to convey sequence parameter set NAL
1951	      units of the bitstream for out-of-band transmission of sequence
1952	      parameter sets.  The value of the parameter is a comma-separated
1953	      (',') list of base64 [RFC4648] representations of the sequence
1954	      parameter set NAL units as specified in Section 7.3.2.4 of [VVC].

1956	      A sequence parameter set spsB is said to be consistent with
1957	      another sequence parameter set spsA if any decoder that conforms
1958	      to the profile, tier, level, and constraints indicated by the data
1959	      starting from the syntax element general_profile_idc to the syntax
1960	      structure general_constraints_info(), inclusive, in the
1961	      profile_tier_level( ) syntax structure in spsA can decode any
1962	      CLVS(s) referencing spsB that conforms to the profile, tier,
1963	      level, and constraints indicated by the data starting from the
1964	      syntax element general_profile_idc to the syntax structure
1965	      general_constraints_info(), inclusive, in the profile_tier_level(
1966	      ) syntax structure in spsB.

1968	   sprop-pps:

1970	      This parameter MAY be used to convey picture parameter set NAL
1971	      units of the bitstream for out-of-band transmission of picture
1972	      parameter sets.  The value of the parameter is a comma-separated
1973	      (',') list of base64 [RFC4648] representations of the picture
1974	      parameter set NAL units as specified in Section 7.3.2.5 of [VVC].

1976	   sprop-sei:

1978	      This parameter MAY be used to convey one or more SEI messages that
1979	      describe bitstream characteristics.  When present, a decoder can
1980	      rely on the bitstream characteristics that are described in the
1981	      SEI messages for the entire duration of the session, independently
1982	      from the persistence scopes of the SEI messages as specified in
1983	      [VSEI].

1985	      The value of the parameter is a comma-separated (',') list of
1986	      base64 [RFC4648] representations of SEI NAL units as specified in
1987	      [VSEI].

1989	      Informative note: Intentionally, no list of applicable or
1990	         inapplicable SEI messages is specified here.  Conveying certain
1991	         SEI messages in sprop-sei may be sensible in some application
1992	         scenarios and meaningless in others.  However, a few examples
1993	         are described below:

1995	      1) In an environment where the bitstream was created from film-
1996	         based source material, and no splicing is going to occur during
1997	         the lifetime of the session, the film grain characteristics SEI
1998	         message is likely meaningful, and sending it in sprop-sei
1999	         rather than in the bitstream at each entry point may help with
2000	         saving bits and allows one to configure the renderer only once,
2001	         avoiding unwanted artifacts.

2003	      2) Examples for SEI messages that would be meaningless to be
2004	         conveyed in sprop-sei include the decoded picture hash SEI
2005	         message (it is close to impossible that all decoded pictures
2006	         have the same hashtag) or the filler payload SEI message (as
2007	         there is no point in just having more bits in SDP).

2009	   max-lsr:

2011	      The max-lsr MAY be used to signal the capabilities of a receiver
2012	      implementation and MUST NOT be used for any other purpose.  The
2013	      value of max-lsr is an integer indicating the maximum processing
2014	      rate in units of luma samples per second.  The max-lsr parameter
2015	      signals that the receiver is capable of decoding video at a higher
2016	      rate than is required by the highest level.

2018	      Informative note: When the OPTIONAL media type parameters are
2019	         used to signal the properties of a bitstream, and max-lsr is
2020	         not present, the values of tier-flag, profile-id, sub-profile-
2021	         id interop-constraints, and level-id must always be such that
2022	         the bitstream complies fully with the specified profile, tier,
2023	         and level.

2025	      When max-lsr is signaled, the receiver MUST be able to decode
2026	      bitstreams that conform to the highest level, with the exception
2027	      that the MaxLumaSr value in Table 136 of [VVC] for the highest
2028	      level is replaced with the value of max-lsr.  Senders MAY use this
2029	      knowledge to send pictures of a given size at a higher picture
2030	      rate than is indicated in the highest level.

2032	      When not present, the value of max-lsr is inferred to be equal to
2033	      the value of MaxLumaSr given in Table 136 of [VVC] for the highest
2034	      level.

2036	      The value of max-lsr MUST be in the range of MaxLumaSr to 16 *
2037	      MaxLumaSr, inclusive, where MaxLumaSr is given in Table 136 of
2038	      [VVC] for the highest level.

2040	   max-fps:

2042	      The value of max-fps is an integer indicating the maximum picture
2043	      rate in units of pictures per 100 seconds that can be effectively
2044	      processed by the receiver.  The max-fps parameter MAY be used to
2045	      signal that the receiver has a constraint in that it is not
2046	      capable of processing video effectively at the full picture rate
2047	      that is implied by the highest level and, when present, max-lsr.

2049	      The value of max-fps is not necessarily the picture rate at which
2050	      the maximum picture size can be sent, it constitutes a constraint
2051	      on maximum picture rate for all resolutions.

2053	      Informative note: The max-fps parameter is semantically
2054	         different from max-lsr in that max-fps is used to signal a
2055	         constraint, lowering the maximum picture rate from what is
2056	         implied by other parameters.

2058	      The encoder MUST use a picture rate equal to or less than this
2059	      value.  In cases where the max-fps parameter is absent, the
2060	      encoder is free to choose any picture rate according to the
2061	      highest level and any signaled optional parameters.

2063	      The value of max-fps MUST be smaller than or equal to the full
2064	      picture rate that is implied by the highest level and, when
2065	      present, max-lsr.

2067	   sprop-max-don-diff:

2069	      If there is no NAL unit naluA that is followed in transmission
2070	      order by any NAL unit preceding naluA in decoding order (i.e., the
2071	      transmission order of the NAL units is the same as the decoding
2072	      order), the value of this parameter MUST be equal to 0.

2074	      Otherwise, this parameter specifies the maximum absolute
2075	      difference between the decoding order number (i.e., AbsDon) values
2076	      of any two NAL units naluA and naluB, where naluA follows naluB in
2077	      decoding order and precedes naluB in transmission order.

2079	      The value of sprop-max-don-diff MUST be an integer in the range of
2080	      0 to 32767, inclusive.

2082	      When not present, the value of sprop-max-don-diff is inferred to
2083	      be equal to 0.

2085	   sprop-depack-buf-bytes:

2087	      This parameter signals the required size of the de-packetization
2088	      buffer in units of bytes.  The value of the parameter MUST be
2089	      greater than or equal to the maximum buffer occupancy (in units of
2090	      bytes) of the de-packetization buffer as specified in Section 6.

2092	      The value of sprop-depack-buf-bytes MUST be an integer in the
2093	      range of 0 to 4294967295, inclusive.

2095	      When sprop-max-don-diff is present and greater than 0, this
2096	      parameter MUST be present and the value MUST be greater than 0.
2097	      When not present, the value of sprop-depack-buf-bytes is inferred
2098	      to be equal to 0.

2100	      Informative note: The value of sprop-depack-buf-bytes indicates
2101	         the required size of the de-packetization buffer only.  When
2102	         network jitter can occur, an appropriately sized jitter buffer
2103	         has to be available as well.

2105	   depack-buf-cap:

2107	      This parameter signals the capabilities of a receiver
2108	      implementation and indicates the amount of de-packetization buffer
2109	      space in units of bytes that the receiver has available for
2110	      reconstructing the NAL unit decoding order from NAL units carried
2111	      in the RTP stream.  A receiver is able to handle any RTP stream
2112	      for which the value of the sprop-depack-buf-bytes parameter is
2113	      smaller than or equal to this parameter.

2115	      When not present, the value of depack-buf-cap is inferred to be
2116	      equal to 4294967295.  The value of depack-buf-cap MUST be an
2117	      integer in the range of 1 to 4294967295, inclusive.

2119	      Informative note: depack-buf-cap indicates the maximum possible
2120	         size of the de-packetization buffer of the receiver only,
2121	         without allowing for network jitter.

2123	7.3.  SDP Parameters

2125	   The receiver MUST ignore any parameter unspecified in this memo.

2127	7.3.1.  Mapping of Payload Type Parameters to SDP

2129	   The media type video/H266 string is mapped to fields in the Session
2130	   Description Protocol (SDP) [RFC8866] as follows:

2132	   *  The media name in the "m=" line of SDP MUST be video.

2134	   *  The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the
2135	      media subtype).

2137	   *  The clock rate in the "a=rtpmap" line MUST be 90000.

2139	   *  The OPTIONAL parameters profile-id, tier-flag, sub-profile-id,
2140	      interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id,
2141	      recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max-
2142	      fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf-
2143	      cap, when present, MUST be included in the "a=fmtp" line of SDP.
2144	      The fmtp line is expressed as a media type string, in the form of
2145	      a semicolon-separated list of parameter=value pairs.

2147	   *  The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei,
2148	      and sprop-dci, when present, MUST be included in the "a=fmtp" line
2149	      of SDP or conveyed using the "fmtp" source attribute as specified
2150	      in Section 6.3 of [RFC5576].  For a particular media format (i.e.,
2151	      RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or
2152	      sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP
2153	      and conveyed using the "fmtp" source attribute.  When included in
2154	      the "a=fmtp" line of SDP, those parameters are expressed as a
2155	      media type string, in the form of a semicolon-separated list of
2156	      parameter=value pairs.  When conveyed in the "a=fmtp" line of SDP
2157	      for a particular payload type, the parameters sprop-vps, sprop-
2158	      sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each
2159	      SSRC with the payload type.  When conveyed using the "fmtp" source
2160	      attribute, these parameters are only associated with the given
2161	      source and payload type as parts of the "fmtp" source attribute.

2163	      Informative note: Conveyance of sprop-vps, sprop-sps, and
2164	         sprop-pps using the "fmtp" source attribute allows for out-of-
2165	         band transport of parameter sets in topologies like Topo-Video-
2166	         switch-MCU as specified in [RFC7667]

2168	   An general usage of media representation in SDP is as follows:

2170	           m=video 49170 RTP/AVP 98
2171	           a=rtpmap:98 H266/90000
2172	           a=fmtp:98 profile-id=1;
2173	             sprop-vps=<video parameter sets data>;
2174	             sprop-sps=<sequence parameter set data>;
2175	             sprop-pps=<picture parameter set data>;

2177	   A SIP Offer/Answer exchange wherein both parties are expected to both
2178	   send and receive could look like the following.  Only the media
2179	   codec-specific parts of the SDP are shown.  Some lines are wrapped
2180	   due to text constraints.

2182	     Offerer->Answerer:
2183	           m=video 49170 RTP/AVP 98
2184	           a=rtpmap:98 H266/90000
2185	           a=fmtp:98 profile-id=1; level_id=83;

2187	   The above represents an offer for symmetric video communication using
2188	   [VVC] and it's payload specification, at the main profile and level
2189	   5.1 (and, as the levels are downgradable, all lower levels.
2190	   Informally speaking, this offer tells the receiver of the offer that
2191	   the sender is willing to receive up to 4Kp60 resolution at the
2192	   maximum bitrates specified in [VVC].  At the same time, if this offer
2193	   were accepted "as is", the offer can expect that the answerer would
2194	   be able to receive and properly decode H.266 media up to and
2195	   including level 5.1.

2197	     Answerer->Offerer:
2198	           m=video 49170 RTP/AVP 98
2199	           a=rtpmap:98 H266/90000
2200	           a=fmtp:98 profile-id=1; level_id=67

2202	   With this answer to the offer above, the system receiving the offer
2203	   advises the offerer that it is incapable of handing H.266 at level
2204	   5.1 but is capable of decoding 1080p60.  As H.266 video codecs must
2205	   support decoding at all levels below the maximum level they
2206	   implement, the resulting user experience would likely be that both
2207	   systems send video at 1080p60.  However, nothing prevents an encoder
2208	   from further downgrading its sending to, for example 720p30 if it
2209	   were short of cycles, bandwidth, or for other reasons.

2211	7.3.2.  Usage with SDP Offer/Answer Model

2213	   This section describes the negotiation of unicast messages using the
2214	   offer-answer model as described in [RFC3264] and its updates.  The
2215	   section is split into subsections, covering a) media format
2216	   configurations not involving non-temporal scalability; b) scalable
2217	   media format configurations; c) the description of the use of those
2218	   parameters not involving the media configuration itself but rather
2219	   the parameters of the payload format design; and d) multicast.

2221	7.3.2.1.  Non-scalable media format configuration

2223	   A non-scalable VVC media configuration is such a configuration where
2224	   no non-temporal scalability mechanisms are allowed.  In [VVC] version
2225	   1, that implies that general_profile_idc indicates one of the
2226	   following profiles: Main10, Main10 Still Picture, Main 10 4:4:4,
2227	   Main10 4:4:4 Still Picture, with general_profile_idc values of 1, 65,
2228	   33, and 97, respectively.  Note that non-scalable media
2229	   configurations includes temporal scalability, inline with VVC's
2230	   design philosophy and profile structure.

2232	   The following limitations and rules pertaining to the media
2233	   configuration apply:

2235	   *  The parameters identifying a media format configuration for VVC
2236	      are profile-id, tier-flag, sub-profile-id, level-id, and interop-
2237	      constraints.  These media configuration parameters, except level-
2238	      id, MUST be used symmetrically.

2240	      The answerer MUST structure its answer in according to one of the
2241	      following three options:

2243	      1) maintain all configuration parameters with the values remaining
2244	      the same as in the offer for the media format (payload type), with
2245	      the exception that the value of level-id is changeable as long as
2246	      the highest level indicated by the answer is not higher than that
2247	      indicated by the offer;

2249	      2) include in the answer the recv-sublayer-id parameter, with a
2250	      value less than the sprop-sublayer-id parameter in the offer, for
2251	      the media format (payload type), and maintain all configuration
2252	      parameters with the values remaining the same as in the offer for
2253	      the media format (payload type), with the exception that the value
2254	      of level-id is changeable as long as the highest level indicated
2255	      by the answer is not higher than the level indicated by the sprop-
2256	      sps or sprop-vps in offer for the chosen sublayer representation;
2257	      or
2258	      3) remove the media format (payload type) completely (when one or
2259	      more of the parameter values are not supported).

2261	      Informative note: The above requirement for symmetric use
2262	            does not apply for level-id, and does not apply for the
2263	            other bitstream or RTP stream properties and capability
2264	            parameters as described in Section 7.3.2.3 below.

2266	   *  To simplify handling and matching of these configurations, the
2267	      same RTP payload type number used in the offer SHOULD also be used
2268	      in the answer, as specified in [RFC3264].

2270	   *  The same RTP payload type number used in the offer for the media
2271	      subtype H266 MUST be used in the answer when the answer includes
2272	      recv-sublayer-id.  When the answer does not include recv-sublayer-
2273	      id, the answer MUST NOT contain a payload type number used in the
2274	      offer for the media subtype H266 unless the configuration is
2275	      exactly the same as in the offer or the configuration in the
2276	      answer only differs from that in the offer with a different value
2277	      of level-id.  The answer MAY contain the recv-sublayer-id
2278	      parameter if an VVC bitstream contains multiple operation points
2279	      (using temporal scalability and sublayers) and sprop-sps or sprop-
2280	      vps is included in the offer where information of sublayers are
2281	      present in the first sequence parameter set or video parameter set
2282	      contained in sprop-sps or sprop-vps respectively.  If the sprop-
2283	      sps or sprop-vps is provided in an offer, an answerer MAY select a
2284	      particular operation point indicated in the first sequence
2285	      parameter set or video parameter set contained in sprop-sps or
2286	      sprop-vps respectively.  When the answer includes a recv-sublayer-
2287	      id that is less than a sprop-sublayer-id in the offer, the
2288	      following applies:

2290	      1) When sprop-sps parameter is present, all sequence parameter
2291	      sets contained in the sprop-sps parameter in the SDP answer and
2292	      all sequence parameter sets sent in-band for either the offerer-
2293	      to-answerer direction or the answerer-to-offerer direction MUST be
2294	      consistent with the first sequence parameter set in the sprop-sps
2295	      parameter of the offer (see the semantics of sprop-sps in
2296	      Section 7.1 of this document on one sequence parameter set being
2297	      consistent with another sequence parameter set).

2299	      2) When sprop-vps parameter is present, all video parameter sets
2300	      contained in the sprop-vps parameter in the SDP answer and all
2301	      video parameter sets sent in-band for either the offerer-to-
2302	      answerer direction or the answerer-to-offerer direction MUST be
2303	      consistent with the first video parameter set in the sprop-vps
2304	      parameter of the offer (see the semantics of sprop-vps in
2305	      Section 7.1 of this document on one video parameter set being
2306	      consistent with another video parameter set).

2308	      3) The bitstream sent in either direction MUST conform to the
2309	      profile, tier, level, and constraints of the chosen sublayer
2310	      representation as indicated by the profile_tier_level( ) syntax
2311	      structure in the first sequence parameter set in the sprop-sps
2312	      parameter or by the first profile_tier_level( ) syntax structure
2313	      in the first video parameter set in the sprop-vps parameter of the
2314	      offer.

2316	      Informative note: When an offerer receives an answer that
2317	            does not include recv-sublayer-id, it has to compare payload
2318	            types not declared in the offer based on the media type
2319	            (i.e., video/H266) and the above media configuration
2320	            parameters with any payload types it has already declared.
2321	            This will enable it to determine whether the configuration
2322	            in question is new or if it is equivalent to configuration
2323	            already offered, since a different payload type number may
2324	            be used in the answer.  The ability to perform operation
2325	            point selection enables a receiver to utilize the temporal
2326	            scalable nature of an VVC bitstream.

2328	7.3.2.2.  Scalable media format configuration

2330	   A scalable VVC media configuration is such a configuration where non-
2331	   temporal scalability mechanisms are allowed.  In [VVC] version 1,
2332	   that implies that general_profile_idc indicates one of the following
2333	   profiles: Multilayer Main 10, and Multilayer Main 10 4:4:4, with
2334	   general_profile_idc values of 17 and 49, respectively.

2336	   The following limitations and rules pertaining to the media
2337	   configuration apply.  They are listed in an order that would be
2338	   logical for an implementation to follow:

2340	   *  The parameters identifying a media format configuration for
2341	      scalable VVC are profile-id, tier-flag, sub-profile-id, level-id,
2342	      interop-constraints, and sprop-vps.  These media configuration
2343	      parameters, except level-id, MUST be used symmetrically, except as
2344	      noted below.

2346	   *  The answerer MAY include a level-id that MUST be lower than or
2347	      equal to the level-id indicated in the offer (either expressed by
2348	      level-id in the offer, or implied by the default level as specific
2349	      in Section 7.1).

2351	   *  When sprop-ols-id is present in an offer, sprop-vps MUST also be
2352	      present in the same offer and including at least one valid VPS, so
2353	      to allow the answerer to meaningfully interpret sprop-ols-id and
2354	      select recv-ols-id (see below).

2356	   *  The answerer MUST NOT include recv-ols-id unless the offer
2357	      includes sprop-ols-id.  When present, recv-ols-id MUST indicate a
2358	      supported output layer set in the VPS that includes no layers
2359	      other than all or a subset of the layers of the OLS referred to by
2360	      sprop-ols-id.  If unable, the answerer MUST remove the media
2361	      format.

2363	      Informative note: if an offerer wants to offer more than one
2364	         output layer set, it can do so by offering multiple VVC media
2365	         with different payload types.

2367	   *  The offerer MAY include sprop-sublayer-id which indicates the
2368	      highest allowed value of TID in the bitstream.  The answerer MAY
2369	      include recv-sublayer-id which can be used to reduce the number of
2370	      sublayers from the value of sprop-sublayer-id.

2372	   *  When the answerer includes recv-ols-id and configuration
2373	      parameters profile-id, tier-flag, sub-profile-id, level-id, and
2374	      interop-constraints, it MUST use the configuration parameter
2375	      values as signaled in the sprop-vps for the operating point with
2376	      the largest number of sublayers for the chosen output layer set,
2377	      with the exception that the value of level-id is changeable as
2378	      long as the highest level indicated by the answer is not higher
2379	      than the level indicated by the sprop-vps in offer for the
2380	      operating point with the largest number of sublayers for the
2381	      chosen output layer set.

2383	7.3.2.3.  Payload format configuration

2385	   The following limitations and rules pertain to the configuration of
2386	   the payload format buffer management mostly and apply to both
2387	   scalable and non-scalable VVC.

2389	   *  The parameters sprop-max-don-diff, and sprop-depack-buf-bytes
2390	      describe the properties of an RTP stream that the offerer or the
2391	      answerer is sending for the media format configuration.  This
2392	      differs from the normal usage of the offer/answer parameters:
2393	      normally such parameters declare the properties of the bitstream
2394	      or RTP stream that the offerer or the answerer is able to receive.
2395	      When dealing with VVC, the offerer assumes that the answerer will
2396	      be able to receive media encoded using the configuration being
2397	      offered.

2399	      Informative note: The above parameters apply for any RTP
2400	         stream, when present, sent by a declaring entity with the same
2401	         configuration.  In other words, the applicability of the above
2402	         parameters to RTP streams depends on the source endpoint.
2403	         Rather than being bound to the payload type, the values may
2404	         have to be applied to another payload type when being sent, as
2405	         they apply for the configuration.

2407	   *  The capability parameter max-lsr MAY be used to declare further
2408	      capabilities of the offerer or answerer for receiving.  It MUST
2409	      NOT be present when the direction attribute is sendonly.

2411	   *  The capability parameter max-fps MAY be used to declare lower
2412	      capabilities of the offerer or answerer for receiving.  It MUST
2413	      NOT be present when the direction attribute is sendonly.

2415	   *  When an offerer offers an interleaved stream, indicated by the
2416	      presence of sprop-max-don-diff with a value larger than zero, the
2417	      offerer MUST include the size of the de-packetization buffer
2418	      sprop-depack-buf-bytes.

2420	   *  To enable the offerer and answerer to inform each other about
2421	      their capabilities for de-packetization buffering in receiving RTP
2422	      streams, both parties are RECOMMENDED to include depack-buf-cap.

2424	   *  The sprop-dci, sprop-vps, sprop-sps, or sprop-pps, when present
2425	      (included in the "a=fmtp" line of SDP or conveyed using the "fmtp"
2426	      source attribute as specified in Section 6.3 of [RFC5576]), are
2427	      used for out-of-band transport of the parameter sets (DCI, VPS,
2428	      SPS, or PPS, respectively).

2430	   *  The answerer MAY use either out-of-band or in-band transport of
2431	      parameter sets for the bitstream it is sending, regardless of
2432	      whether out-of-band parameter sets transport has been used in the
2433	      offerer-to-answerer direction.  Parameter sets included in an
2434	      answer are independent of those parameter sets included in the
2435	      offer, as they are used for decoding two different bitstreams, one
2436	      from the answerer to the offerer and the other in the opposit
2437	      direction.  In case some RTP packets are sent before the SDP
2438	      offer/answer settles down, in-band parameter sets MUST be used for
2439	      those RTP stream parts sent before the SDP offer/answer.

2441	   *  The following rules apply to transport of parameter set in the
2442	      offerer-to-answerer direction.

2444	      -  An offer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2445	         sprop-pps.  If none of these parameters is present in the
2446	         offer, then only in-band transport of parameter sets is used.

2448	      -  If the level to use in the offerer-to-answerer direction is
2449	         equal to the default level in the offer, the answerer MUST be
2450	         prepared to use the parameter sets included in sprop-vps,
2451	         sprop-sps, and sprop-pps (either included in the "a=fmtp" line
2452	         of SDP or conveyed using the "fmtp" source attribute) for
2453	         decoding the incoming bitstream, e.g., by passing these
2454	         parameter set NAL units to the video decoder before passing any
2455	         NAL units carried in the RTP streams.  Otherwise, the answerer
2456	         MUST ignore sprop-vps, sprop-sps, and sprop-pps (either
2457	         included in the "a=fmtp" line of SDP or conveyed using the
2458	         "fmtp" source attribute) and the offerer MUST transmit
2459	         parameter sets in-band.

2461	   *  The following rules apply to transport of parameter set in the
2462	      answerer-to-offerer direction.

2464	      -  An answer MAY include sprop-dci, sprop-vps, sprop-sps, and/or
2465	         sprop-pps.  If none of these parameters is present in the
2466	         answer, then only in-band transport of parameter sets is used.

2468	      -  The offerer MUST be prepared to use the parameter sets included
2469	         in sprop-vps, sprop-sps, and sprop-pps (either included in the
2470	         "a=fmtp" line of SDP or conveyed using the "fmtp" source
2471	         attribute) for decoding the incoming bitstream, e.g., by
2472	         passing these parameter set NAL units to the video decoder
2473	         before passing any NAL units carried in the RTP streams.

2475	   *  When sprop-dci, sprop-vps, sprop-sps, and/or sprop-pps are
2476	      conveyed using the "fmtp" source attribute as specified in
2477	      Section 6.3 of [RFC5576], the receiver of the parameters MUST
2478	      store the parameter sets included in sprop-dci, sprop-vps, sprop-
2479	      sps, and/or sprop-pps and associate them with the source given as
2480	      part of the "fmtp" source attribute.  Parameter sets associated
2481	      with one source (given as part of the "fmtp" source attribute)
2482	      MUST only be used to decode NAL units conveyed in RTP packets from
2483	      the same source (given as part of the "fmtp" source attribute).
2484	      When this mechanism is in use, SSRC collision detection and
2485	      resolution MUST be performed as specified in [RFC5576].

2487	   Table 1 lists the interpretation of all the parameters that MAY be
2488	   used for the various combinations of offer, answer, and direction
2489	   attributes.  Note that the two columns wherein the recv-ols-id
2490	   parameter is used only apply to answers, whereas the other columns
2491	   apply to both offers and answers.

2493	                                       sendonly --+
2494	               answer: recvonly, recv-ols-id --+  |
2495	                 recvonly w/o recv-ols-id --+  |  |
2496	         answer: sendrecv, recv-ols-id --+  |  |  |
2497	           sendrecv w/o recv-ols-id --+  |  |  |  |
2498	                                      |  |  |  |  |
2499	   profile-id                         C  D  C  D  P
2500	   tier-flag                          C  D  C  D  P
2501	   level-id                           D  D  D  D  P
2502	   sub-profile-id                     C  D  C  D  P
2503	   interop-constraints                C  D  C  D  P
2504	   max-recv-level-id                  R  R  R  R  -
2505	   sprop-max-don-diff                 P  P  -  -  P
2506	   sprop-depack-buf-bytes             P  P  -  -  P
2507	   depack-buf-cap                     R  R  R  R  -
2508	   max-lsr                            R  R  R  R  -
2509	   max-fps                            R  R  R  R  -
2510	   sprop-dci                          P  P  -  -  P
2511	   sprop-sei                          P  P  -  -  P
2512	   sprop-vps                          P  P  -  -  P
2513	   sprop-sps                          P  P  -  -  P
2514	   sprop-pps                          P  P  -  -  P
2515	   sprop-sublayer-id                  P  P  -  -  P
2516	   recv-sublayer-id                   O  O  O  O  -
2517	   sprop-ols-id                       P  P  -  -  P
2518	   recv-ols-id                        X  O  X  O  -

2520	   Table 1.  Interpretation of parameters for various combinations of
2521	   offers, answers, direction attributes, with and without recv-ols-id.
2522	   Columns that do not indicate offer or answer apply to both.

2524	   Legend:

2526	    C: configuration for sending and receiving bitstreams
2527	    D: changeable configuration, same as C except possible
2528	       to answer with a different but consistent value (see the
2529	       semantics of the six parameters related to profile, tier,
2530	       and level on these parameters being consistent)
2531	    P: properties of the bitstream to be sent
2532	    R: receiver capabilities
2533	    O: operation point selection
2534	    X: MUST NOT be present
2535	    -: not usable, when present MUST be ignored

2537	   Parameters used for declaring receiver capabilities are, in general,
2538	   downgradable; i.e., they express the upper limit for a sender's
2539	   possible behavior.  Thus, a sender MAY select to set its encoder
2540	   using only lower/lesser or equal values of these parameters.

2542	   When the answer does not include a recv-ols-id that is less than the
2543	   sprop-ols-id in the offer, parameters declaring a configuration point
2544	   are not changeable, with the exception of the level-id parameter for
2545	   unicast usage, and these parameters express values a receiver expects
2546	   to be used and MUST be used verbatim in the answer as in the offer.

2548	   When a sender's capabilities are declared with the configuration
2549	   parameters, these parameters express a configuration that is
2550	   acceptable for the sender to receive bitstreams.  In order to achieve
2551	   high interoperability levels, it is often advisable to offer multiple
2552	   alternative configurations.  It is impossible to offer multiple
2553	   configurations in a single payload type.  Thus, when multiple
2554	   configuration offers are made, each offer requires its own RTP
2555	   payload type associated with the offer.  However, it is possible to
2556	   offer multiple operation points using one configuration in a single
2557	   payload type by including sprop-vps in the offer and recv-ols-id in
2558	   the answer.

2560	   An implementation SHOULD be able to understand all media type
2561	   parameters (including all optional media type parameters), even if it
2562	   doesn't support the functionality related to the parameter.  This, in
2563	   conjunction with proper application logic in the implementation
2564	   allows the implementation, after having received an offer, to create
2565	   an answer by potentially downgrading one or more of the optional
2566	   parameters to the point where the implementation can cope, leading to
2567	   higher chances of interoperability beyond the most basic interop
2568	   points (for which, as described above, no optional parameters are
2569	   necessary).

2571	      Informative note: in implementations of previous H.26x payload
2572	      formats it was occasionally observed that implementations were
2573	      incapable of parsing most (or all) of the optional parameters.  As
2574	      a result, the offer-answer exchange resulted in a baseline
2575	      performance (using the default values for the optional parameters)
2576	      with the resulting suboptimal user experience.  However, there are
2577	      valid reasons to forego the implementation complexity of
2578	      implementing the parsing of some or all of the optional
2579	      parameters, for example, when there is pre-determined knowledge,
2580	      not negotiated by an SDP-based offer/answer process, of the
2581	      capabilities of the involved systems (walled gardens, baseline
2582	      requirements defined in application standards higher up in the
2583	      stack, and similar).

2585	   An answerer MAY extend the offer with additional media format
2586	   configurations.  However, to enable their usage, in most cases a
2587	   second offer is required from the offerer to provide the bitstream
2588	   property parameters that the media sender will use.  This also has
2589	   the effect that the offerer has to be able to receive this media
2590	   format configuration, not only to send it.

2592	7.3.2.4.  Multicast

2594	   For bitstreams being delivered over multicast, the following rules
2595	   apply:

2597	   *  The media format configuration is identified by profile-id, tier-
2598	      flag, sub-profile-id, level-id, and interop-constraints.  These
2599	      media format configuration parameters, including level-id, MUST be
2600	      used symmetrically; that is, the answerer MUST either maintain all
2601	      configuration parameters or remove the media format (payload type)
2602	      completely.  Note that this implies that the level-id for offer/
2603	      answer in multicast is not changeable.

2605	   *  To simplify the handling and matching of these configurations, the
2606	      same RTP payload type number used in the offer SHOULD also be used
2607	      in the answer, as specified in [RFC3264].  An answer MUST NOT
2608	      contain a payload type number used in the offer unless the
2609	      configuration is the same as in the offer.

2611	   *  Parameter sets received MUST be associated with the originating
2612	      source and MUST only be used in decoding the incoming bitstream
2613	      from the same source.

2615	   *  The rules for other parameters are the same as above for unicast
2616	      as long as the three above rules are obeyed.

2618	7.3.3.  Usage in Declarative Session Descriptions

2620	   When VVC over RTP is offered with SDP in a declarative style, as in
2621	   Real Time Streaming Protocol (RTSP) [RFC7826] or Session Announcement
2622	   Protocol (SAP) [RFC2974], the following considerations are necessary.

2624	   *  All parameters capable of indicating both bitstream properties and
2625	      receiver capabilities are used to indicate only bitstream
2626	      properties.  For example, in this case, the parameter profile-id,
2627	      tier-id, level-id declares the values used by the bitstream, not
2628	      the capabilities for receiving bitstreams.  As a result, the
2629	      following interpretation of the parameters MUST be used:

2631	      -  Declaring actual configuration or bitstream properties:

2633	         o  profile-id

2635	         o  tier-flag

2637	         o  level-id

2639	         o  interop-constraints

2641	         o  sub-profile-id

2643	         o  sprop-dci

2645	         o  sprop-vps

2647	         o  sprop-sps

2649	         o  sprop-pps

2651	         o  sprop-max-don-diff

2653	         o  sprop-depack-buf-bytes

2655	         o  sprop-sublayer-id

2657	         o  sprop-ols-id

2659	         o  sprop-sei

2661	      -  Not usable (when present, they MUST be ignored):

2663	         o  max-lsr

2665	         o  max-fps

2667	         o  max-recv-level-id

2669	         o  depack-buf-cap

2671	         o  recv-sublayer-id

2673	         o  recv-ols-id

2675	      -  A receiver of the SDP is required to support all parameters and
2676	         values of the parameters provided; otherwise, the receiver MUST
2677	         reject (RTSP) or not participate in (SAP) the session.  It
2678	         falls on the creator of the session to use values that are
2679	         expected to be supported by the receiving application.

2681	7.3.4.  Considerations for Parameter Sets

2683	   When out-of-band transport of parameter sets is used, parameter sets
2684	   MAY still be additionally transported in-band unless explicitly
2685	   disallowed by an application, and some of these additional parameter
2686	   sets may update some of the out-of-band transported parameter sets.
2687	   Update of a parameter set refers to the sending of a parameter set of
2688	   the same type using the same parameter set ID but with different
2689	   values for at least one other parameter of the parameter set.

2691	8.  Use with Feedback Messages

2693	   The following subsections define the use of the Picture Loss
2694	   Indication (PLI) and Full Intra Request (FIR) feedback messages with
2695	   [VVC].  The PLI is defined in [RFC4585], and the FIR message is
2696	   defined in [RFC5104].  In accordance with this memo, unlike [HEVC], a
2697	   sender MUST NOT send Slice Loss Indication (SLI) or Reference Picture
2698	   Selection Indication (RPSI), and a receiver SHOULD ignore RPSI and
2699	   treat a received SLI as a PLI.

2701	8.1.  Picture Loss Indication (PLI)

2703	   As specified in RFC 4585, Section 6.3.1, the reception of a PLI by a
2704	   media sender indicates "the loss of an undefined amount of coded
2705	   video data belonging to one or more pictures".  Without having any
2706	   specific knowledge of the setup of the bitstream (such as use and
2707	   location of in-band parameter sets, non-IRAP decoder refresh points,
2708	   picture structures, and so forth), a reaction to the reception of an
2709	   PLI by a VVC sender SHOULD be to send an IRAP picture and relevant
2710	   parameter sets; potentially with sufficient redundancy so to ensure
2711	   correct reception.  However, sometimes information about the
2712	   bitstream structure is known.  For example, state could have been
2713	   established outside of the mechanisms defined in this document that
2714	   parameter sets are conveyed out of band only, and stay static for the
2715	   duration of the session.  In that case, it is obviously unnecessary
2716	   to send them in-band as a result of the reception of a PLI.  Other
2717	   examples could be devised based on a priori knowledge of different
2718	   aspects of the bitstream structure.  In all cases, the timing and
2719	   congestion control mechanisms of RFC 4585 MUST be observed.

2721	8.2.  Full Intra Request (FIR)

2723	   The purpose of the FIR message is to force an encoder to send an
2724	   independent decoder refresh point as soon as possible, while
2725	   observing applicable congestion-control-related constraints, such as
2726	   those set out in [RFC8082]).

2728	   Upon reception of a FIR, a sender MUST send an IDR picture.
2729	   Parameter sets MUST also be sent, except when there is a priori
2730	   knowledge that the parameter sets have been correctly established.  A
2731	   typical example for that is an understanding between sender and
2732	   receiver, established by means outside this document, that parameter
2733	   sets are exclusively sent out-of-band.

2735	9.  Security Considerations

2737	   The scope of this Security Considerations section is limited to the
2738	   payload format itself and to one feature of [VVC] that may pose a
2739	   particularly serious security risk if implemented naively.  The
2740	   payload format, in isolation, does not form a complete system.
2741	   Implementers are advised to read and understand relevant security-
2742	   related documents, especially those pertaining to RTP (see the
2743	   Security Considerations section in [RFC3550]), and the security of
2744	   the call-control stack chosen (that may make use of the media type
2745	   registration of this memo).  Implementers should also consider known
2746	   security vulnerabilities of video coding and decoding implementations
2747	   in general and avoid those.

2749	   Within this RTP payload format, and with the exception of the user
2750	   data SEI message as described below, no security threats other than
2751	   those common to RTP payload formats are known.  In other words,
2752	   neither the various media-plane-based mechanisms, nor the signaling
2753	   part of this memo, seems to pose a security risk beyond those common
2754	   to all RTP-based systems.

2756	   RTP packets using the payload format defined in this specification
2757	   are subject to the security considerations discussed in the RTP
2758	   specification [RFC3550], and in any applicable RTP profile such as
2759	   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
2760	   SAVPF [RFC5124].  However, as "Securing the RTP Framework: Why RTP
2761	   Does Not Mandate a Single Media Security Solution" [RFC7202]
2762	   discusses, it is not an RTP payload format's responsibility to
2763	   discuss or mandate what solutions are used to meet the basic security
2764	   goals like confidentiality, integrity and source authenticity for RTP
2765	   in general.  This responsibility lays on anyone using RTP in an
2766	   application.  They can find guidance on available security mechanisms
2767	   and important considerations in "Options for Securing RTP Sessions"
2768	   [RFC7201].  The rest of this section discusses the security impacting
2769	   properties of the payload format itself.

2771	   Because the data compression used with this payload format is applied
2772	   end-to-end, any encryption needs to be performed after compression.
2773	   A potential denial-of-service threat exists for data encodings using
2774	   compression techniques that have non-uniform receiver-end
2775	   computational load.  The attacker can inject pathological datagrams
2776	   into the bitstream that are complex to decode and that cause the
2777	   receiver to be overloaded.  [VVC] is particularly vulnerable to such
2778	   attacks, as it is extremely simple to generate datagrams containing
2779	   NAL units that affect the decoding process of many future NAL units.
2780	   Therefore, the usage of data origin authentication and data integrity
2781	   protection of at least the RTP packet is RECOMMENDED, for example,
2782	   with SRTP [RFC3711].

2784	   Like HEVC [RFC7798], [VVC] includes a user data Supplemental
2785	   Enhancement Information (SEI) message.  This SEI message allows
2786	   inclusion of an arbitrary bitstring into the video bitstream.  Such a
2787	   bitstring could include JavaScript, machine code, and other active
2788	   content.  [VVC] leaves the handling of this SEI message to the
2789	   receiving system.  In order to avoid harmful side effects of the user
2790	   data SEI message, decoder implementations cannot naively trust its
2791	   content.  For example, it would be a bad and insecure implementation
2792	   practice to forward any JavaScript a decoder implementation detects
2793	   to a web browser.  The safest way to deal with user data SEI messages
2794	   is to simply discard them, but that can have negative side effects on
2795	   the quality of experience by the user.

2797	   End-to-end security with authentication, integrity, or
2798	   confidentiality protection will prevent a MANE from performing media-
2799	   aware operations other than discarding complete packets.  In the case
2800	   of confidentiality protection, it will even be prevented from
2801	   discarding packets in a media-aware way.  To be allowed to perform
2802	   such operations, a MANE is required to be a trusted entity that is
2803	   included in the security context establishment.

2805	10.  Congestion Control

2807	   Congestion control for RTP SHALL be used in accordance with RTP
2808	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
2809	   If best-effort service is being used, an additional requirement is
2810	   that users of this payload format MUST monitor packet loss to ensure
2811	   that the packet loss rate is within an acceptable range.  Packet loss
2812	   is considered acceptable if a TCP flow across the same network path,
2813	   and experiencing the same network conditions, would achieve an
2814	   average throughput, measured on a reasonable timescale, that is not
2815	   less than all RTP streams combined are achieved.  This condition can
2816	   be satisfied by implementing congestion-control mechanisms to adapt
2817	   the transmission rate, the number of layers subscribed for a layered
2818	   multicast session, or by arranging for a receiver to leave the
2819	   session if the loss rate is unacceptably high.

2821	   The bitrate adaptation necessary for obeying the congestion control
2822	   principle is easily achievable when real-time encoding is used, for
2823	   example, by adequately tuning the quantization parameter.  However,
2824	   when pre-encoded content is being transmitted, bandwidth adaptation
2825	   requires the pre-coded bitstream to be tailored for such adaptivity.
2826	   The key mechanisms available in [VVC] are temporal scalability, and
2827	   spatial/SNR scalability.  A media sender can remove NAL units
2828	   belonging to higher temporal sublayers (i.e., those NAL units with a
2829	   high value of TID) or higher spatio-SNR layers until the sending
2830	   bitrate drops to an acceptable range.

2832	   The mechanisms mentioned above generally work within a defined
2833	   profile and level and, therefore, no renegotiation of the channel is
2834	   required.  Only when non-downgradable parameters (such as profile)
2835	   are required to be changed does it become necessary to terminate and
2836	   restart the RTP stream(s).  This may be accomplished by using
2837	   different RTP payload types.

2839	   MANEs MAY remove certain unusable packets from the RTP stream when
2840	   that RTP stream was damaged due to previous packet losses.  This can
2841	   help reduce the network load in certain special cases.  For example,
2842	   MANEs can remove those FUs where the leading FUs belonging to the
2843	   same NAL unit have been lost or those dependent slice segments when
2844	   the leading slice segments belonging to the same slice have been
2845	   lost, because the trailing FUs or dependent slice segments are
2846	   meaningless to most decoders.  MANE can also remove higher temporal
2847	   scalable layers if the outbound transmission (from the MANE's
2848	   viewpoint) experiences congestion.

2850	11.  IANA Considerations

2852	   A new media type, as specified in Section 7.1 of this memo, has been
2853	   registered with IANA.

2855	12.  Acknowledgements

2857	   Dr. Byeongdoo Choi is thanked for the video codec related technical
2858	   discussion and other aspects in this memo.  Xin Zhao and Dr. Xiang Li
2859	   are thanked for their contributions on [VVC] specification
2860	   descriptive content.  Spencer Dawkins is thanked for his valuable
2861	   review comments that led to great improvements of this memo.  Some
2862	   parts of this specification share text with the RTP payload format
2863	   for HEVC [RFC7798].  We thank the authors of that specification for
2864	   their excellent work.

2866	13.  References

2868	13.1.  Normative References

2870	   [ISO23090-3]
2871	              ISO/IEC 23090-3, "Information technology - Coded
2872	              representation of immersive media Part 3 Versatile Video
2873	              Coding", 2021, <https://www.iso.org/standard/73022.html>.

2875	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
2876	              Requirement Levels", BCP 14, RFC 2119,
2877	              DOI 10.17487/RFC2119, March 1997,
2878	              <https://www.rfc-editor.org/info/rfc2119>.

2880	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
2881	              with Session Description Protocol (SDP)", RFC 3264,
2882	              DOI 10.17487/RFC3264, June 2002,
2883	              <https://www.rfc-editor.org/info/rfc3264>.

2885	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
2886	              Jacobson, "RTP: A Transport Protocol for Real-Time
2887	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
2888	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

2890	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
2891	              Video Conferences with Minimal Control", STD 65, RFC 3551,
2892	              DOI 10.17487/RFC3551, July 2003,
2893	              <https://www.rfc-editor.org/info/rfc3551>.

2895	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
2896	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
2897	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
2898	              <https://www.rfc-editor.org/info/rfc3711>.

2900	   [RFC4556]  Zhu, L. and B. Tung, "Public Key Cryptography for Initial
2901	              Authentication in Kerberos (PKINIT)", RFC 4556,
2902	              DOI 10.17487/RFC4556, June 2006,
2903	              <https://www.rfc-editor.org/info/rfc4556>.

2905	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
2906	              "Extended RTP Profile for Real-time Transport Control
2907	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
2908	              DOI 10.17487/RFC4585, July 2006,
2909	              <https://www.rfc-editor.org/info/rfc4585>.

2911	   [RFC4648]  Josefsson, S., "The Base16, Base32, and Base64 Data
2912	              Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006,
2913	              <https://www.rfc-editor.org/info/rfc4648>.

2915	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
2916	              "Codec Control Messages in the RTP Audio-Visual Profile
2917	              with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104,
2918	              February 2008, <https://www.rfc-editor.org/info/rfc5104>.

2920	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
2921	              Real-time Transport Control Protocol (RTCP)-Based Feedback
2922	              (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
2923	              2008, <https://www.rfc-editor.org/info/rfc5124>.

2925	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
2926	              Media Attributes in the Session Description Protocol
2927	              (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009,
2928	              <https://www.rfc-editor.org/info/rfc5576>.

2930	   [RFC8082]  Wenger, S., Lennox, J., Burman, B., and M. Westerlund,
2931	              "Using Codec Control Messages in the RTP Audio-Visual
2932	              Profile with Feedback with Layered Codecs", RFC 8082,
2933	              DOI 10.17487/RFC8082, March 2017,
2934	              <https://www.rfc-editor.org/info/rfc8082>.

2936	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
2937	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
2938	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

2940	   [RFC8866]  Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP:
2941	              Session Description Protocol", RFC 8866,
2942	              DOI 10.17487/RFC8866, January 2021,
2943	              <https://www.rfc-editor.org/info/rfc8866>.

2945	   [VSEI]     "Versatile supplemental enhancement information messages
2946	              for coded video bitstreams", 2020,
2947	              <https://www.itu.int/rec/T-REC-H.274>.

2949	   [VVC]      "Versatile Video Coding, ITU-T Recommendation H.266",
2950	              2020, <http://www.itu.int/rec/T-REC-H.266>.

2952	13.2.  Informative References

2954	   [CABAC]    and et al, "Transform coefficient coding in HEVC, IEEE
2955	              Transactions on Circuits and Systems for Video
2956	              Technology", DOI 10.1109/TCSVT.2012.2223055, December
2957	              2012, <https://doi.org/10.1109/TCSVT.2012.2223055>.

2959	   [HEVC]     "High efficiency video coding, ITU-T Recommendation
2960	              H.265", 2019, <https://www.itu.int/rec/T-REC-H.265>.

2962	   [MPEG2S]   IS0/IEC, "Information technology - Generic coding of
2963	              moving pictures and associated audio information - Part 1:
2964	              Systems, ISO International Standard 13818-1", 2013.

2966	   [RFC2974]  Handley, M., Perkins, C., and E. Whelan, "Session
2967	              Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974,
2968	              October 2000, <https://www.rfc-editor.org/info/rfc2974>.

2970	   [RFC6184]  Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
2971	              Payload Format for H.264 Video", RFC 6184,
2972	              DOI 10.17487/RFC6184, May 2011,
2973	              <https://www.rfc-editor.org/info/rfc6184>.

2975	   [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
2976	              Eleftheriadis, "RTP Payload Format for Scalable Video
2977	              Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
2978	              <https://www.rfc-editor.org/info/rfc6190>.

2980	   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
2981	              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
2982	              <https://www.rfc-editor.org/info/rfc7201>.

2984	   [RFC7202]  Perkins, C. and M. Westerlund, "Securing the RTP
2985	              Framework: Why RTP Does Not Mandate a Single Media
2986	              Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
2987	              2014, <https://www.rfc-editor.org/info/rfc7202>.

2989	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
2990	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
2991	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
2992	              DOI 10.17487/RFC7656, November 2015,
2993	              <https://www.rfc-editor.org/info/rfc7656>.

2995	   [RFC7667]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
2996	              DOI 10.17487/RFC7667, November 2015,
2997	              <https://www.rfc-editor.org/info/rfc7667>.

2999	   [RFC7798]  Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
3000	              M. Hannuksela, "RTP Payload Format for High Efficiency
3001	              Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
3002	              March 2016, <https://www.rfc-editor.org/info/rfc7798>.

3004	   [RFC7826]  Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M.,
3005	              and M. Stiemerling, Ed., "Real-Time Streaming Protocol
3006	              Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December
3007	              2016, <https://www.rfc-editor.org/info/rfc7826>.

3009	Appendix A.  Change History

3011	   To RFC Editor: PLEASE REMOVE ThIS SECTION BEFORE PUBLICATION

3013	   draft-zhao-payload-rtp-vvc-00 ........ initial version

3015	   draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and
3016	   corrections

3018	   draft-ietf-payload-rtp-vvc-00 ........ initial WG draft

3020	   draft-ietf-payload-rtp-vvc-01 ........ VVC specification update

3022	   draft-ietf-payload-rtp-vvc-02 ........ VVC specification update

3024	   draft-ietf-payload-rtp-vvc-03 ........ VVC coding tool introduction
3025	   update

3027	   draft-ietf-payload-rtp-vvc-04 ........ VVC coding tool introduction
3028	   update

3030	   draft-ietf-payload-rtp-vvc-05 ........ reference udpate and adding
3031	   placement for open issues

3033	   draft-ietf-payload-rtp-vvc-06 ........ address editor's note

3035	   draft-ietf-payload-rtp-vvc-07 ........ address editor's notes

3037	   draft-ietf-payload-rtp-vvc-08 ........ address editor's notes

3039	   draft-ietf-payload-rtp-vvc-09 ........ address editor's notes

3041	   draft-ietf-payload-rtp-vvc-10 ........ address editor's notes

3043	   draft-ietf-payload-rtp-vvc-11 ........ address editor's notes

3045	   draft-ietf-payload-rtp-vvc-12 ........ address editor's notes

3047	   draft-ietf-payload-rtp-vvc-13 ........ address editor's notes

3049	   draft-ietf-payload-rtp-vvc-14 ........ address 2nd WGLC comments

3051	Authors' Addresses
3052	   Shuai Zhao
3053	   Tencent
3054	   2747 Park Blvd
3055	   Palo Alto,  94588
3056	   United States of America
3057	   Email: shuai.zhao@ieee.org

3059	   Stephan Wenger
3060	   Tencent
3061	   2747 Park Blvd
3062	   Palo Alto,  94588
3063	   United States of America
3064	   Email: stewe@stewe.org

3066	   Yago Sanchez
3067	   Fraunhofer HHI
3068	   Einsteinufer 37
3069	   10587 Berlin
3070	   Germany
3071	   Email: yago.sanchez@hhi.fraunhofer.de

3073	   Ye-Kui Wang
3074	   Bytedance Inc.
3075	   8910 University Center Lane
3076	   San Diego,  92122
3077	   United States of America
3078	   Email: yekui.wang@bytedance.com

3080	   Miska M. Hannuksela
3081	   Nokia Technologies
3082	   Hatanpään valtatie 30
3083	   FI-33100 Tampere
3084	   Finland
3085	   Email: miska.hannuksela@nokia.com