idnits 2.17.1 

draft-ietf-avtcore-rtp-evc-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([EVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords -- however, there's a paragraph with
     a matching beginning. Boilerplate error?

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document date (4 February 2021) is 1170 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1113

  -- Possible downref: Non-RFC (?) normative reference: ref. 'EVC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO23094-1'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Downref: Normative reference to an Informational RFC: RFC 7656

  == Outdated reference: A later version (-18) exists of
     draft-ietf-avtcore-rtp-vvc-07


     Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	avtcore                                                          S. Zhao
3	Internet-Draft                                                 S. Wenger
4	Intended status: Standards Track                                 Tencent
5	Expires: 8 August 2021                                            Y. Lim
6	                                                     Samsung Electronics
7	                                                         4 February 2021

9	          RTP Payload Format for Essential Video Coding (EVC)
10	                     draft-ietf-avtcore-rtp-evc-01

12	Abstract

14	   This memo describes an RTP payload format for the video coding
15	   standard ISO/IEC International Standard 23094-1 [EVC], also known as
16	   Essential Video Coding [EVC] and developed by ISO/IEC JTC1/SC29/WG11
17	   (MPEG).  The RTP payload format allows for packetization of one or
18	   more Network Abstraction Layer (NAL) units in each RTP packet payload
19	   as well as fragmentation of a NAL unit into multiple RTP packets.
20	   The payload format has wide applicability in videoconferencing,
21	   Internet video streaming, and high-bitrate entertainment-quality
22	   video, among other applications.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at https://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on 8 August 2021.

41	Copyright Notice

43	   Copyright (c) 2021 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents (https://trustee.ietf.org/
48	   license-info) in effect on the date of publication of this document.
49	   Please review these documents carefully, as they describe your rights
50	   and restrictions with respect to this document.  Code Components
51	   extracted from this document must include Simplified BSD License text
52	   as described in Section 4.e of the Trust Legal Provisions and are
53	   provided without warranty as described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
58	     1.1.  Overview of the EVC Codec . . . . . . . . . . . . . . . .   3
59	       1.1.1.  Coding-Tool Features (informative)  . . . . . . . . .   4
60	       1.1.2.  Systems and Transport Interfaces  . . . . . . . . . .   6
61	       1.1.3.  Parallel Processing Support (informative) . . . . . .   8
62	       1.1.4.  NAL Unit Header . . . . . . . . . . . . . . . . . . .   8
63	     1.2.  Overview of the Payload Format  . . . . . . . . . . . . .   9
64	   2.  Conventions . . . . . . . . . . . . . . . . . . . . . . . . .  10
65	   3.  Definitions and Abbreviations . . . . . . . . . . . . . . . .  10
66	     3.1.  Definitions . . . . . . . . . . . . . . . . . . . . . . .  10
67	       3.1.1.  Definitions from the EVC Specification  . . . . . . .  10
68	       3.1.2.  Definitions Specific to This Memo . . . . . . . . . .  12
69	     3.2.  Abbreviations . . . . . . . . . . . . . . . . . . . . . .  13
70	   4.  RTP Payload Format  . . . . . . . . . . . . . . . . . . . . .  14
71	     4.1.  RTP Header Usage  . . . . . . . . . . . . . . . . . . . .  15
72	     4.2.  Payload Header Usage  . . . . . . . . . . . . . . . . . .  16
73	     4.3.  Payload Structures  . . . . . . . . . . . . . . . . . . .  17
74	       4.3.1.  Single NAL Unit Packets . . . . . . . . . . . . . . .  17
75	       4.3.2.  Aggregation Packets (APs) . . . . . . . . . . . . . .  18
76	       4.3.3.  Fragmentation Units . . . . . . . . . . . . . . . . .  22
77	     4.4.  Decoding Order Number . . . . . . . . . . . . . . . . . .  25
78	   5.  Packetization Rules . . . . . . . . . . . . . . . . . . . . .  26
79	   6.  De-packetization Process  . . . . . . . . . . . . . . . . . .  27
80	   7.  Payload Format Parameters . . . . . . . . . . . . . . . . . .  29
81	     7.1.  Media Type Registration . . . . . . . . . . . . . . . . .  29
82	     7.2.  SDP Parameters  . . . . . . . . . . . . . . . . . . . . .  29
83	       7.2.1.  Mapping of Payload Type Parameters to SDP . . . . . .  29
84	       7.2.2.  Usage with SDP Offer/Answer Model . . . . . . . . . .  30
85	       7.2.3.  SDP Example . . . . . . . . . . . . . . . . . . . . .  30
86	   8.  Use with Feedback Messages  . . . . . . . . . . . . . . . . .  30
87	     8.1.  Picture Loss Indication (PLI) . . . . . . . . . . . . . .  30
88	     8.2.  Full Intra Request (FIR)  . . . . . . . . . . . . . . . .  30
89	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  30
90	   10. Congestion Control  . . . . . . . . . . . . . . . . . . . . .  31
91	   11. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  32
92	   12. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  32
93	   13. References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
94	     13.1.  Normative References . . . . . . . . . . . . . . . . . .  32
95	     13.2.  Informative References . . . . . . . . . . . . . . . . .  34
96	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  35

98	1.  Introduction

100	   The [EVC] specification, which is formally designated as ISO/IEC
101	   International Standard 23094-1 [ISO23094-1] has been published in
102	   October 2020.  One goal of MPEG is to keep [EVC]'s Baseline profile
103	   essentially royalty free by by using the technologies published more
104	   than 20 years or otherwise freely available for use, whereas more
105	   advanced profiles follow a reasonable and non-discriminatory
106	   licensing terms policy.  Both Baseline profile and higher profiles of
107	   [EVC] are reported to provide coding efficiency gains over [HEVC] and
108	   [AVC] under certain configurations.

110	   This memo describes an RTP payload format for [EVC].  It shares its
111	   basic design with the NAL unit-based RTP payload formats of H.264
112	   Video Coding [RFC6184], Scalable Video Coding (SVC) [RFC6190], High
113	   Efficiency Video Coding (HEVC) [RFC7798], and Versatile Video Coding
114	   (VVC)[I-D.ietf-avtcore-rtp-vvc].  With respect to design philosophy,
115	   security, congestion control, and overall implementation complexity,
116	   it has similar properties to those earlier payload format
117	   specifications.  This is a conscious choice, as at least RFC 6184 is
118	   widely deployed and generally known in the relevant implementer
119	   communities.  Certain mechanisms known from [RFC6190] were
120	   incorporated as EVC supports temporal scalability.  [EVC] currently
121	   does not offer higher forms of scalability.

123	1.1.  Overview of the EVC Codec

125	   [EVC], [AVC], [HEVC] and [VVC] share a similar hybrid video codec
126	   design.  In this memo, we provide a very brief overview of those
127	   features of [EVC] that are, in some form, addressed by the payload
128	   format specified herein.  Implementers have to read, understand, and
129	   apply the ISO/IEC specifications pertaining to [EVC] to arrive at
130	   interoperable, well-performing implementations.  The EVC standard has
131	   a Baseline profile and on top of that, a Main profile, the latter
132	   including more advanced features.  The syntax elements allow encoders
133	   to mark a bitstream as to what of the many independent coding tools
134	   are exercised in the bitstream, in a spirit similar to the
135	   general_constraint_flags of [VVC] is provided.

137	   Conceptually, all [EVC], [AVC], [HEVC] and [VVC] include a Video
138	   Coding Layer (VCL), which is often used to refer to the coding-tool
139	   features, and a Network Abstraction Layer (NAL), which is often used
140	   to refer to the systems and transport interface aspects of the
141	   codecs.

143	1.1.1.  Coding-Tool Features (informative)

145	   Coding blocks and transform structure

147	   [EVC] uses a traditional quad-tree coding structure, which divides
148	   the encoded image into blocks of up to 128x128 luma samples, which
149	   can be recursively divided into smaller blocks.  The Main profile
150	   adds two advanced coding structure tools: Binary Ternary Tree (BTT)
151	   that allows non-square coding units and segmentation that changes the
152	   processing order of the segmentation unit from traditional left-
153	   scanning order processing to right-scanning order processing Unit
154	   Coding Order (SUCO).  In the Main profile, the picture can be divided
155	   into slices and tiles, and these slices can be independently encoded
156	   and/or decoded in parallel.

158	   When predicting a data block using intra prediction or inter
159	   prediction, the remaining data is usually added to the prediction
160	   block.  The residual data is added to the prediction block.  The
161	   residual data is obtained by applying an inverse quantization process
162	   and an inverse transform.  [EVC] includes integer discrete cosine
163	   transform (DCT2) and scalar quantization.  For the Main profile,
164	   Improved Quantization and Transform (IQT) uses a different mapping/
165	   clipping function for quantization.  An inverse zig-zag scanning
166	   order is used for coefficient coding.  Advanced Coefficient Coding
167	   (ADCC) in the Main profile can code coefficient values more
168	   efficiently, for example, indicated by the last non-zero coefficient.
169	   In Main profile, Adaptive Transformation Selection (ATS) is also
170	   available and can be applied to integer versions of DST7 or DCT8, and
171	   not just DCT2.

173	   Entropy coding

175	   [EVC] uses a similar binary arithmetic coding mechanism as [AVC].
176	   The mechanism includes a binarization step and a probability update
177	   defined by a lookup table.  In the Main profile, the derivation
178	   process of syntax elements based on adjacent blocks makes the context
179	   modeling and initialization process more efficient.

181	   In-loop filtering

183	   The Baseline profile of [EVC] uses the deblocking filter defined in
184	   H.263 Annex J.  In the Main profile, compared to the deblocking
185	   filter in the Baseline profile, an Advanced Deblocking Filter (ADDB)
186	   can be used, which can further reduce artifacts.  The Main profile
187	   also defines two additional in-loop filters that can be used to
188	   improve the quality of decoded pictures before output and/or for
189	   inter prediction.  A Walsh-Hadamard Transform Domain Filter (HTDF) is
190	   applied to the luma samples before deblocking, and the scanning
191	   process is used to determine 4 adjacent samples for filtering.  An
192	   adaptive Loop Filter (ALF) allows to send signals of up to 25
193	   different filters for the luma components, and the best filter can be
194	   selected through the classification process for each 4x4 block.  The
195	   filter parameters of the ALF filter are signaled in the Adaptation
196	   Parameter Set (APS).

198	   Inter-prediction

200	   The basis of [EVC] inter prediction is motion compensation using
201	   interpolation filters with a quarter sample resolution.  In Baseline
202	   profile, a motion vector signal is transmitted using one of three
203	   spatially neighboring motion vectors and a temporally collocated
204	   motion vector as a predictor.  The motion vector difference may be
205	   signaled relative to the selected predictor, but for the case where
206	   no motion vector difference is signaled and there is no remaining
207	   data in the block, there is a specific mode called a skip mode.  The
208	   Main profile includes six additional tools to provide improved inter
209	   prediction.  With advanced Motion Interpolation and Signaling (AMIS),
210	   adjacent blocks can be conceptually merged to indicate that they use
211	   the same motion, but more advanced schemes can also be used to create
212	   predictions from the basic model list of candidate predictors.  The
213	   Merge with Motion Vector Difference (MMVD) tool uses a process
214	   similar to the concept of merging neighboring blocks, but also allows
215	   the use of expressions that include a starting point, motion
216	   amplitude, and direction of motion to send a motion vector signal.

218	   Using Advanced Motion Vector Prediction (AMVP), candidate motion
219	   vector predictions for the block can be derived from its neighboring
220	   blocks in the same picture and collocated blocks in the reference
221	   picture.  The Adaptive Motion Vector Resolution (AMVR) tool provides
222	   a way to reduce the accuracy of a motion vector from a quarter sample
223	   to half sample, full sample, double sample, or quad sample, which
224	   provides the efficiency advantage, such as when sending large motion
225	   vector differences.  The Main profile also includes the Decoder-side
226	   Motion Vector Refinement (DMVR), which uses a bilateral template
227	   matching process to refine the motion vectors in a bidirectional
228	   fashion.

230	   Intra prediction and intra-coding

232	   Intra prediction in [EVC] is performed on adjacent samples of coding
233	   units in a partitioned structure.  For the Baseline profile, all
234	   coding units are square, and there are five different prediction
235	   modes: DC (mean value of the neighborhood), horizontal, vertical, and
236	   two different diagonal directions.  In the Main profile, intra
237	   prediction can be applied to any rectangular coding unit, and there
238	   are 28 additional direction modes available in the so-called Enhanced
239	   Intra Prediction Directions (EIPD).  In the Main profile, an encoder
240	   can also use Intra Block Copy (IBC), where a previously decoded
241	   sample blocks of the same picture is used as a predictor.  A
242	   displacement vector in integer sample precision is signaled to
243	   indicate where the prediction block in the current picture is used
244	   for this mode.

246	   Decoded picture buffer management

248	   In [EVC], decoded pictures can be stored in a decoded picture buffer
249	   (DPB) for predicting pictures that follow them in decoding order.  In
250	   the Baseline profile, the management of the DPB (i.e. the process of
251	   adding and deleting reference pictures) is controlled by the
252	   information in the SPS.  For the Main profile, if a Reference Picture
253	   List (RPL) scheme is used, DPB management can be controlled by
254	   information that is signaled at the picture level.

256	1.1.2.  Systems and Transport Interfaces

258	   [EVC] inherited the basic systems and transport interfaces designs
259	   from [AVC] and [HEVC].  These include the NAL-unit-based syntax
260	   structure, the hierarchical syntax and data unit structure and the
261	   Supplemental Enhancement Information (SEI) message mechanism.  The
262	   hierarchical syntax and data unit structure consists of a sequence-
263	   level parameter set (SPS), two picture-level parameter sets (PPS and
264	   APS, each of which can apply to one or more pictures), slice-level
265	   header parameters, and lower-level parameters.

267	   A number of key components that influenced the Network Abstraction
268	   Layer design of [EVC] as well as this memo are described below

270	   Sequence parameter set

272	   The Sequence Parameter Set (SPS) contains syntax elements pertaining
273	   to a coded video sequence (CVS), which is a group of pictures,
274	   starting with a random access point, and followed by pictures that
275	   may depend on each other and the random access point picture.  In
276	   MPGEG-2, the equivalent of a CVS was a Group of Pictures (GOP), which
277	   normally started with an I frame and was followed by P and B frames.
278	   While more complex in its options of random access points, EVC
279	   retains this basic concept.  In many TV-like applications, a CVS
280	   contains a few hundred milliseconds to a few seconds of video.  In
281	   video conferencing (without switching MCUs involved), a CVS can be as
282	   long in duration as the whole session.

284	   Picture and adaptation parameter set
285	   The Picture Parameter Set and the Adaptation Parameter Set (PPS and
286	   APS, respectively) carry information pertaining to a single picture.
287	   The PPS contains information that is likely to stay constant from
288	   picture to picture-at least for pictures for a certain type-whereas
289	   the APS contains information, such as adaptive loop filter
290	   coefficients, that are likely to change from picture to picture.

292	   Profile, level and toolsets

294	   Profiles and levels follow the same design considerations ask known
295	   form [AVC], [HEVC], and in fact video codecs as old as MPEG-1 visual.
296	   A profile defines a set of tools (not to confuse with the "toolset"
297	   discussed below) that a decoder compliant with this profile has to
298	   support.  In [EVC], profiles are defined in Annex A.  Formally, they
299	   are defined as a set of constraints that a bitstream needs to conform
300	   to.  In [EVC], the Baseline profile is much more severely constraint
301	   than Main profile, reducing implementation complexity.  Levels relate
302	   to bitstream complexity in dimensions such as maximum sample decoding
303	   rate, maximum picture size, and similar parameters that are directly
304	   related to computational complexity.

306	   Profiles and levels are signaled in the highest parameter set
307	   available, the SPS.

309	   [EVC] contains another mechanism related to the use of coding tools,
310	   known as the toolset syntax element.  This syntax element,
311	   toolset_idc_h and toolset_idc_l located in the SPS, is a bitmask that
312	   allows encoders to indicate which coding tools they are using, within
313	   the menu of profiles offered by the profile that is also signaled.
314	   No decoder conformance point is associated with the toolset, but a
315	   bitstream that were using a coding tool that is indicated as not used
316	   in the toolset syntax element would obviously be non-compliant.
317	   While MPEG specifically rules out the use of the toolset syntax
318	   element as a conformance point, walled garden implementations could
319	   do so without incurring the interoperability problems MPEG fears, and
320	   create bitstreams and decoders that do not support one or more given
321	   tools.  That, in turn, may be useful to mitigate certain patent
322	   related risks.

324	   Bitstream and elementary stream

326	   Above the Coded Video Sequence (CVS), [EVC] defines a video bitstream
327	   that can be used in the MPEG systems context as an elementary stream.
328	   For the purpose of this memo, this is not relevant.

330	   Random access support

332	   [EVC] supports random access mechanism solely based on IDR access
333	   unit.

335	   Temporal scalability support

337	   [EVC] includes support for temporal scalability through the
338	   generalized reference picture selection approach known since
339	   [AVC]/SVC.  Up to six temporal layers are supported.  The temporal
340	   layer is signaled in the NAL unit header (which co-serves as the
341	   payload header in this memo), in the nuh_temporal_id field.

343	   Reference picture management

345	      placeholder

347	   SEI Message

349	   [EVC] inherits many of [HEVC]'s SEI Messages, occasionally with
350	   changes in syntax and/or semantics making them applicable to EVC.

352	1.1.3.  Parallel Processing Support (informative)

354	      Placeholder

356	1.1.4.  NAL Unit Header

358	   [EVC] maintains the NAL unit concept of [HEVC] with different
359	   parameter options.  EVC also uses a two-byte NAL unit header, as
360	   shown in Figure 1.  The payload of a NAL unit refers to the NAL unit
361	   excluding the NAL unit header.

363	                       +---------------+---------------+
364	                       |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
365	                       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
366	                       |F|   Type    | TID | Reserve |E|
367	                       +-------------+-----------------+

369	                     The Structure of the EVC NAL Unit Header

371	                                  Figure 1

373	   The semantics of the fields in the NAL unit header are as specified
374	   in [EVC] and described briefly below for convenience.  In addition to
375	   the name and size of each field, the corresponding syntax element
376	   name in [EVC] is also provided.

378	   F: 1 bit
379	      forbidden_zero_bit.  Required to be zero in [EVC].  Note that the
380	      inclusion of this bit in the NAL unit header was included to
381	      enable transport of EVC video over MPEG-2 transport systems
382	      (avoidance of start code emulations) [MPEG2S].  In the context of
383	      this memo,the value 1 may be used to indicate a syntax violation,
384	      e.g., for a NAL unit resulted from aggregating a number of
385	      fragmented units of a NAL unit but missing the last fragment, as
386	      described in Section xxx. (section # placeholder)

388	   Type: 6 bits

390	      nal_unit_type_plus1.  This field specifies the NAL unit type as
391	      defined in Table 4 of [EVC].  If the value of this field is less
392	      than and equal to 23, the NAL unit is a VCL NAL unit.  Otherwise,
393	      the NAL unit is a non-VCL NAL unit.  For a reference of all
394	      currently defined NAL unit types and their semantics, please refer
395	      to Section 7.4.2.2 in [EVC].

397	   TID: 3 bits

399	      nuh_temporal_id.  This field specifies the temporal identifier of
400	      the NAL unit.  The value of TemporalId is equal to TID.
401	      TemporalId shall be equal to 0 if it is a IDR NAL unit type (NAL
402	      unit type 1).

404	   Reserve: 5 bits

406	      nuh_reserved_zero_5bits.  This field shall be equal to the version
407	      of the [EVC] specification.  Values of nuh_reserved_zero_5bits
408	      greater than 0 are reserved for future use by ISO/IEC.  Decoders
409	      conforming to a profile specified in [EVC] Annex A shall ignore
410	      (i.e., remove from the bitstream and discard) all NAL units with
411	      values of nuh_reserved_zero_5bits greater than 0.

413	   E: 1 bit

415	      nuh_extension_flag.  This field shall be equal the version of the
416	      [EVC] specification.  Value of nuh_extesion_flag equal to 1 is
417	      reserved for future use by ISO/IEC.  Decoders conforming to a
418	      profile specified in Annex A shall ignore (i.e., remove from the
419	      bitstream and discard) all NAL units with values of
420	      nuh_extension_flag equal to 1.

422	1.2.  Overview of the Payload Format

424	   This payload format defines the following processes required for
425	   transport of [EVC] coded data over RTP [RFC3550]:

427	   *  Usage of RTP header with this payload format

429	   *  Packetization of [EVC] coded NAL units into RTP packets using
430	      three types of payload structures: a single NAL unit, aggregation,
431	      and fragment unit packet

433	   *  Transmission of [EVC] NAL units of the same bitstream within a
434	      single RTP stream.

436	   *  Media type parameters to be used with the Session Description
437	      Protocol (SDP) [RFC4566]

439	2.  Conventions

441	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
442	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
443	   "OPTIONAL" in this document are to be interpreted as described in BCP
444	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
445	   capitals, as shown above.

447	3.  Definitions and Abbreviations

449	3.1.  Definitions

451	   This document uses the terms and definitions of EVC.  Section 3.1.1
452	   lists relevant definitions from [EVC] for convenience.  Section 3.1.2
453	   provides definitions specific to this memo.

455	3.1.1.  Definitions from the EVC Specification

457	   Access Unit: A set of NAL units that are associated with each other
458	   according to a specified classification rule, are consecutive in
459	   decoding order, and contain exactly one coded picture.

461	   Bitstream: A sequence of bits, in the form of a NAL unit stream or a
462	   byte stream, that forms the representation of coded pictures and
463	   associated data forming one or more coded video sequences (CVSs).

465	   Coded Picture: A coded representation of a picture containing all
466	   CTUs of the picture.

468	   Coded Video Sequence (CVS): A sequence of access units that consists,
469	   in decoding order, of an IDR access unit, followed by zero or more
470	   access units that are not IDR access units, including all subsequent
471	   access units up to but not including any subsequent access unit that
472	   is an IDR access unit.

474	   Coding Tree Block (CTB): An NxN block of samples for some value of N
475	   such that the division of a component into CTBs is a partitioning.

477	   Coding Tree Unit (CTU): A CTB of luma samples, two corresponding CTBs
478	   of chroma samples of a picture that has three sample arrays, or a CTB
479	   of samples of a monochrome picture or a picture that is coded using
480	   three separate colour planes and syntax structures used to code the
481	   samples.

483	   Decoded Picture: A decoded picture is derived by decoding a coded
484	   picture.

486	   Decoded Picture Buffer (DPB): A buffer holding decoded pictures for
487	   reference, output reordering, or output delay specified for the
488	   hypothetical reference decoder in Annex C of [EVC] specification.

490	   Dynamic Range Adjustment (DRA): A mapping process that is applied to
491	   decoded picture prior to cropping and output as part of the decoding
492	   process and is controlled by parameters conveyed in an Adaptation
493	   Parameter Set (APS).

495	   Hypothetical Reference Decoder (HRD): A hypothetical decoder model
496	   that specifies constraints on the variability of conforming NAL unit
497	   streams or conforming byte streams that an encoding process may
498	   produce.

500	   Instantaneous Decoding Refresh (IDR) access unit: An access unit in
501	   which the coded picture is an IDR picture.

503	   Instantaneous Decoding Refresh (IDR) picture: A coded picture for
504	   which each VCL NAL unit has NalUnitType equal to IDR_NUT.

506	   Level: A defined set of constraints on the values that may be taken
507	   by the syntax elements and variables of this document, or the value
508	   of a transform coefficient prior to scaling.

510	   Network Abstraction Layer (NAL) unit: A syntax structure containing
511	   an indication of the type of data to follow and bytes containing that
512	   data in the form of an RBSP interspersed as necessary.

514	   Network Abstraction Layer (NAL) Unit Stream: A sequence of NAL units.

516	   Non-IDR Picture: A coded picture that is not an IDR picture.

518	   Non-VCL NAL Unit: A NAL unit that is not a VCL NAL unit.

520	   Picture Parameter Set (PPS): A syntax structure containing syntax
521	   elements that apply to zero or more entire coded pictures as
522	   determined by a syntax element found in each slice header.

524	   Picture Order Count (POC): A variable that is associated with each
525	   picture, uniquely identifies the associated picture among all
526	   pictures in the CVS, and, when the associated picture is to be output
527	   from the decoded picture buffer, indicates the position of the
528	   associated picture in output order relative to the output order
529	   positions of the other pictures in the same CVS that are to be output
530	   from the decoded picture buffer.

532	   Raw Byte Sequence Payload (RBSP): A syntax structure containing an
533	   integer number of bytes that is encapsulated in a NAL unit and that
534	   is either empty or has the form of a string of data bits containing
535	   syntax elements followed by an RBSP stop bit and zero or more
536	   subsequent bits equal to 0.

538	   Sequence Parameter Set (SPS): A syntax structure containing syntax
539	   elements that apply to zero or more entire CVSs as determined by the
540	   content of a syntax element found in the PPS referred to by a syntax
541	   element found in each slice header.

543	   Tile row: A rectangular region of CTUs having a height specified by
544	   syntax elements in the PPS and a width equal to the width of the
545	   picture.

547	   Tile scan: A specific sequential ordering of CTUs partitioning a
548	   picture in which the CTUs are ordered consecutively in CTU raster
549	   scan in a tile whereas tiles in a picture are ordered consecutively
550	   in a raster scan of the tiles of the picture.

552	   Video coding layer (VCL) NAL unit: A collective term for coded slice
553	   NAL units and the subset of NAL units that have reserved values of
554	   NalUnitType that are classified as VCL NAL units in this document.

556	3.1.2.  Definitions Specific to This Memo

558	   Media-Aware Network Element (MANE): A network element, such as a
559	   middlebox, selective forwarding unit, or application-layer gateway
560	   that is capable of parsing certain aspects of the RTP payload headers
561	   or the RTP payload and reacting to their contents.

563	      Informative note: The concept of a MANE goes beyond normal routers
564	      or gateways in that a MANE has to be aware of the signaling (e.g.,
565	      to learn about the payload type mappings of the media streams),
566	      and in that it has to be trusted when working with Secure RTP
567	      (SRTP).  The advantage of using MANEs is that they allow packets
568	      to be dropped according to the needs of the media coding.  For
569	      example, if a MANE has to drop packets due to congestion on a
570	      certain link, it can identify and remove those packets whose
571	      elimination produces the least adverse effect on the user
572	      experience.  After dropping packets, MANEs must rewrite RTCP
573	      packets to match the changes to the RTP stream, as specified in
574	      Section 7 of [RFC3550].

576	   NAL unit decoding order: A NAL unit order that conforms to the
577	   constraints on NAL unit order given in Section 8.2 and 8.3 in [EVC],
578	   follow the Order of NAL units in the bitstream.

580	   NAL unit output order: A NAL unit order in which NAL units of
581	   different access units are in the output order of the decoded
582	   pictures corresponding to the access units, as specified in [EVC],
583	   and in which NAL units within an access unit are in their decoding
584	   order.

586	   RTP stream: See [RFC7656].  Within the scope of this memo, one RTP
587	   stream is utilized to transport one or more temporal sub-layers.

589	   Transmission order: The order of packets in ascending RTP sequence
590	   number order (in modulo arithmetic).  Within an aggregation packet,
591	   the NAL unit transmission order is the same as the order of
592	   appearance of NAL units in the packet.

594	3.2.  Abbreviations

596	   APS        Adaptation Parameter Set

598	   ATS        Adaptive Transform Selection

600	   B          Bi-predictive

602	   CBR        Constant Bit Rate

604	   CPB        Coded Picture Buffer

606	   CTB        Coding Tree Block

608	   CTU        Coding Tree Unit

610	   CVS        Coded Video Sequence

612	   DPB        Decoded Picture Buffer

614	   HRD        Hypothetical Reference Decoder
615	   HSS        Hypothetical Stream Scheduler

617	   I          Intra

619	   IDR        Instantaneous Decoding Refresh

621	   LSB        Least Significant Bit

623	   LTRP       Long-Term Reference Picture

625	   MMVD       Merge with Motion Vector Difference

627	   MSB        Most Significant Bit

629	   NAL        Network Abstraction Layer

631	   P          Predictive

633	   POC        Picture Order Count

635	   PPS        Picture Parameter Set

637	   QP         Quantization Parameter

639	   RBSP       Raw Byte Sequence Payload

641	   RGB        Same as GBR

643	   SAR        Sample Aspect Ratio

645	   SEI        Supplemental Enhancement Information

647	   SODB       String Of Data Bits

649	   SPS        Sequence Parameter Set

651	   STRP       Short-Term Reference Picture

653	   VBR        Variable Bit Rate

655	   VCL        Video Coding Layer

657	4.  RTP Payload Format
658	4.1.  RTP Header Usage

660	   The format of the RTP header is specified in [RFC3550] (reprinted as
661	   Figure 2 for convenience).  This payload format uses the fields of
662	   the header in a manner consistent with that specification.

664	   The RTP payload (and the settings for some RTP header bits) for
665	   aggregation packets and fragmentation units are specified in
666	   Section 4.3.2 and Section 4.3.3, respectively.

668	       0                   1                   2                   3
669	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
670	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
671	      |V=2|P|X|  CC   |M|     PT      |       sequence number         |
672	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
673	      |                           timestamp                           |
674	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
675	      |           synchronization source (SSRC) identifier            |
676	      +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
677	      |            contributing source (CSRC) identifiers             |
678	      |                             ....                              |
679	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

681	                        RTP Header According to {{RFC3550}}

683	                                  Figure 2

685	   The RTP header information to be set according to this RTP payload
686	   format is set as follows:

688	   Marker bit (M): 1 bit

690	      Set for the last packet of the access unit, carried in the current
691	      RTP stream.  This is in line with the normal use of the M bit in
692	      video formats to allow an efficient playout buffer handling.

694	         editor-note 4: The informative note below needs updating once
695	         the NAL unit type table is stable in the [EVC] spec.

697	         Informative note: The content of a NAL unit does not tell
698	         whether or not the NAL unit is the last NAL unit, in decoding
699	         order, of an access unit.  An RTP sender implementation may
700	         obtain this information from the video encoder.  If, however,
701	         the implementation cannot obtain this information directly from
702	         the encoder, e.g., when the bitstream was pre-encoded, and also
703	         there is no timestamp allocated for each NAL unit, then the
704	         sender implementation can inspect subsequent NAL units in
705	         decoding order to determine whether or not the NAL unit is the
706	         last NAL unit of an access unit as follows.  A NAL unit is
707	         determined to be the last NAL unit of an access unit if it is
708	         the last NAL unit of the bitstream.  A NAL unit naluX is also
709	         determined to be the last NAL unit of an access unit if both
710	         the following conditions are true: 1) the next VCL NAL unit
711	         naluY in decoding order has the high-order bit of the first
712	         byte after its NAL unit header equal to 1 or nal_unit_type
713	         equal to 27, and 2) all NAL units between naluX and naluY, when
714	         present, have nal_unit_type in the range of 24 to 26,
715	         inclusive, equal to 28 or in the range of 29 to 55.

717	   Payload Type (PT): 7 bits

719	      The assignment of an RTP payload type for this new payload format
720	      is outside the scope of this document and will not be specified
721	      here.  The assignment of a payload type has to be performed either
722	      through the profile used or in a dynamic way.

724	   Sequence Number (SN): 16 bits

726	      Set and used in accordance with [RFC3550].

728	   Timestamp: 32 bits

730	      The RTP timestamp is set to the sampling timestamp of the content.
731	      A 90 kHz clock rate MUST be used.  If the NAL unit has no timing
732	      properties of its own (e.g., parameter sets or certain SEI NAL
733	      units), the RTP timestamp MUST be set to the RTP timestamp of the
734	      coded picture of the access unit in which the NAL unit (according
735	      to Annex D of [EVC]) is included.  Receivers MUST use the RTP
736	      timestamp for the display process, even when the bitstream
737	      contains picture timing SEI messages or decoding unit information
738	      SEI messages as specified in [EVC].

740	   Synchronization source (SSRC): 32 bits

742	      Used to identify the source of the RTP packets.  When using SRST,
743	      by definition a single SSRC is used for all parts of a single
744	      bitstream.

746	4.2.  Payload Header Usage

748	   The first two bytes of the payload of an RTP packet are referred to
749	   as the payload header.  The payload header consists of the same
750	   fields (F, TID, Reserve and E) as the NAL unit header as shown in
751	   Section 1.1.4, irrespective of the type of the payload structure.

753	   The TID value indicates (among other things) the relative importance
754	   of an RTP packet, for example, because NAL units belonging to higher
755	   temporal sub-layers are not used for the decoding of lower temporal
756	   sub-layers.  A lower value of TID indicates a higher importance.
757	   More-important NAL units MAY be better protected against transmission
758	   losses than less-important NAL units.

760	4.3.  Payload Structures

762	   Three different types of RTP packet payload structures are specified.
763	   A receiver can identify the type of an RTP packet payload through the
764	   Type field in the payload header.

766	   The Three different payload structures are as follows:

768	   *  Single NAL unit packet: Contains a single NAL unit in the payload,
769	      and the NAL unit header of the NAL unit also serves as the payload
770	      header.  This payload structure is specified in Section 4.3.1.

772	   *  Aggregation Packet (AP): Contains more than one NAL unit within
773	      one access unit.  This payload structure is specified in
774	      Section 4.3.2.

776	   *  Fragmentation Unit (FU): Contains a subset of a single NAL unit.
777	      This payload structure is specified in Section 4.3.3.

779	4.3.1.  Single NAL Unit Packets

781	   A single NAL unit packet contains exactly one NAL unit, and consists
782	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
783	   DONL field (in network byte order), and the NAL unit payload data
784	   (the NAL unit excluding its NAL unit header) of the contained NAL
785	   unit, as shown in Figure 3.

787	      0                   1                   2                   3
788	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
789	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
790	     |           PayloadHdr          |      DONL (conditional)       |
791	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
792	     |                                                               |
793	     |                  NAL unit payload data                        |
794	     |                                                               |
795	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
796	     |                               :...OPTIONAL RTP padding        |
797	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

799	                  The Structure of a Single NAL Unit Packet
800	                                  Figure 3

802	   The DONL field, when present, specifies the value of the 16 least
803	   significant bits of the decoding order number of the contained NAL
804	   unit.  If sprop-max-don-diff is greater than 0 for any of the RTP
805	   streams, the DONL field MUST be present, and the variable DON for the
806	   contained NAL unit is derived as equal to the value of the DONL
807	   field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
808	   streams), the DONL field MUST NOT be present.

810	4.3.2.  Aggregation Packets (APs)

812	   Aggregation Packets (APs) enable the reduction of packetization
813	   overhead for small NAL units, such as most of the non-VCL NAL units,
814	   which are often only a few octets in size.

816	   An AP aggregates NAL units within one access unit.  Each NAL unit to
817	   be carried in an AP is encapsulated in an aggregation unit.  NAL
818	   units aggregated in one AP are in NAL unit decoding order.

820	   An AP consists of a payload header (denoted as PayloadHdr) followed
821	   by two or more aggregation units, as shown in Figure 4.

823	     0                   1                   2                   3
824	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
825	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
826	    |    PayloadHdr (Type=56)       |                               |
827	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
828	    |                                                               |
829	    |             two or more aggregation units                     |
830	    |                                                               |
831	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
832	    |                               :...OPTIONAL RTP padding        |
833	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

835	                   The Structure of an Aggregation Packet

837	                                  Figure 4

839	   The fields in the payload header are set as follows.  The F bit MUST
840	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
841	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be equal
842	   to 56.

844	   The value of TID MUST be the lowest value of TID of all the
845	   aggregated NAL units.  The value of Reserve and E Must match the
846	   version of [EVC] specification.

848	      Informative note: All VCL NAL units in an AP have the same TID
849	      value since they belong to the same access unit.  However, an AP
850	      may contain non-VCL NAL units for which the TID value in the NAL
851	      unit header may be different than the TID value of the VCL NAL
852	      units in the same AP.

854	   An AP MUST carry at least two aggregation units and can carry as many
855	   aggregation units as necessary; however, the total amount of data in
856	   an AP obviously MUST fit into an IP packet, and the size SHOULD be
857	   chosen so that the resulting IP packet is smaller than the path MTU
858	   size so to avoid IP layer fragmentation.  An AP MUST NOT contain FUs
859	   specified in Section 4.3.3.  APs MUST NOT be nested; i.e., an AP can
860	   not contain another AP.

862	   The first aggregation unit in an AP consists of a conditional 16-bit
863	   DONL field (in network byte order) followed by a 16-bit unsigned size
864	   information (in network byte order) that indicates the size of the
865	   NAL unit in bytes (excluding these two octets, but including the NAL
866	   unit header), followed by the NAL unit itself, including its NAL unit
867	   header, as shown in Figure 5.

869	     0                   1                   2                   3
870	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
871	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
872	    |               :       DONL (conditional)      |   NALU size   |
873	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
874	    |   NALU size   |                                               |
875	    +-+-+-+-+-+-+-+-+         NAL unit                              |
876	    |                                                               |
877	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
878	    |                               :
879	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

881	           The Structure of the First Aggregation Unit in an AP

883	                                  Figure 5

885	   The DONL field, when present, specifies the value of the 16 least
886	   significant bits of the decoding order number of the aggregated NAL
887	   unit.

889	   If sprop-max-don-diff is greater than 0 for any of the RTP streams,
890	   the DONL field MUST be present in an aggregation unit that is the
891	   first aggregation unit in an AP, and the variable DON for the
892	   aggregated NAL unit is derived as equal to the value of the DONL
893	   field.  Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
894	   streams), the DONL field MUST NOT be present in an aggregation unit
895	   that is the first aggregation unit in an AP.

897	   An aggregation unit that is not the first aggregation unit in an AP
898	   will be followed immediately by a 16-bit unsigned size information
899	   (in network byte order) that indicates the size of the NAL unit in
900	   bytes (excluding these two octets, but including the NAL unit
901	   header), followed by the NAL unit itself, including its NAL unit
902	   header, as shown in Figure 6.

904	     0                   1                   2                   3
905	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
906	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
907	     |               :       NALU size               |   NAL unit    |
908	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
909	     |                                                               |
910	     |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
911	     |                               :
912	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

914	         The Structure of an Aggregation Unit That Is Not the First
915	                           Aggregation Unit in an AP

917	                                  Figure 6

919	   Figure 7 presents an example of an AP that contains two aggregation
920	   units, labeled as NALU 1 and NALU 2 in the figure, without the DONL
921	   field being present.

923	     0                   1                   2                   3
924	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
925	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
926	    |                          RTP Header                           |
927	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
928	    |   PayloadHdr (Type=56)        |         NALU 1 Size           |
929	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
930	    |          NALU 1 HDR           |                               |
931	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
932	    |                   . . .                                       |
933	    |                                                               |
934	    +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
935	    |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
936	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
937	    | NALU 2 HDR    |                                               |
938	    +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
939	    |                   . . .                                       |
940	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
941	    |                               :...OPTIONAL RTP padding        |
942	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

944	               An Example of an AP Packet Containing
945	             Two Aggregation Units without the DONL Field

947	                                  Figure 7

949	   Figure 8 presents an example of an AP that contains two aggregation
950	   units, labeled as NALU 1 and NALU 2 in the figure, with the DONL
951	   field being present.

953	     0                   1                   2                   3
954	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
955	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
956	    |                          RTP Header                           |
957	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
958	    |   PayloadHdr (Type=56)        |        NALU 1 DONL            |
959	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
960	    |          NALU 1 Size          |            NALU 1 HDR         |
961	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
962	    |                                                               |
963	    |                 NALU 1 Data   . . .                           |
964	    |                                                               |
965	    +        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
966	    |                               :          NALU 2 Size          |
967	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
968	    |          NALU 2 HDR           |                               |
969	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
970	    |                                                               |
971	    |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
972	    |                               :...OPTIONAL RTP padding        |
973	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

975	                   An Example of an AP Containing
976	                 Two Aggregation Units with the DONL Field

978	                                  Figure 8

980	4.3.3.  Fragmentation Units

982	   Fragmentation Units (FUs) are introduced to enable fragmenting a
983	   single NAL unit into multiple RTP packets, possibly without
984	   cooperation or knowledge of the EVC [EVC] encoder.  A fragment of a
985	   NAL unit consists of an integer number of consecutive octets of that
986	   NAL unit.  Fragments of the same NAL unit MUST be sent in consecutive
987	   order with ascending RTP sequence numbers (with no other RTP packets
988	   within the same RTP stream being sent between the first and last
989	   fragment).

991	   When a NAL unit is fragmented and conveyed within FUs, it is referred
992	   to as a fragmented NAL unit.  APs MUST NOT be fragmented.  FUs MUST
993	   NOT be nested; i.e., an FU must not contain a subset of another FU.

995	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
996	   time of the fragmented NAL unit.

998	   An FU consists of a payload header (denoted as PayloadHdr), an FU
999	   header of one octet, a conditional 16-bit DONL field (in network byte
1000	   order), and an FU payload, as shown in Figure 9.

1002	     0                   1                   2                   3
1003	     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1004	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1005	    |    PayloadHdr (Type=57)       |   FU header   | DONL (cond)   |
1006	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1007	    | DONL (cond)   |                                               |
1008	    |-+-+-+-+-+-+-+-+                                               |
1009	    |                         FU payload                            |
1010	    |                                                               |
1011	    |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1012	    |                               :...OPTIONAL RTP padding        |
1013	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1015	                          The Structure of an FU

1017	                                  Figure 9

1019	   The fields in the payload header are set as follows.  The Type field
1020	   MUST be equal to 57.  The fields F, TID, Reserve and E MUST be equal
1021	   to the fields F, TID, Reserve and E, respectively, of the fragmented
1022	   NAL unit.

1024	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1025	   field, as shown in Figure 10.

1027	                             +---------------+
1028	                             |0|1|2|3|4|5|6|7|
1029	                             +-+-+-+-+-+-+-+-+
1030	                             |S|E|  FuType   |
1031	                             +---------------+

1033	                         The Structure of FU Header

1035	                                 Figure 10

1037	   The semantics of the FU header fields are as follows:

1039	   S: 1 bit

1041	      When set to 1, the S bit indicates the start of a fragmented NAL
1042	      unit, i.e., the first byte of the FU payload is also the first
1043	      byte of the payload of the fragmented NAL unit.  When the FU
1044	      payload is not the start of the fragmented NAL unit payload, the S
1045	      bit MUST be set to 0.

1047	   E: 1 bit
1048	      When set to 1, the E bit indicates the end of a fragmented NAL
1049	      unit, i.e., the last byte of the payload is also the last byte of
1050	      the fragmented NAL unit.  When the FU payload is not the last
1051	      fragment of a fragmented NAL unit, the E bit MUST be set to 0.

1053	   FuType: 6 bits

1055	      The field FuType MUST be equal to the field Type of the fragmented
1056	      NAL unit.

1058	   The DONL field, when present, specifies the value of the 16 least
1059	   significant bits of the decoding order number of the fragmented NAL
1060	   unit.

1062	   If sprop-max-don-diff is greater than 0 for any of the RTP streams,
1063	   and the S bit is equal to 1, the DONL field MUST be present in the
1064	   FU, and the variable DON for the fragmented NAL unit is derived as
1065	   equal to the value of the DONL field.  Otherwise (sprop-max-don-diff
1066	   is equal to 0 for all the RTP streams, or the S bit is equal to 0),
1067	   the DONL field MUST NOT be present in the FU.

1069	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1070	   the Start bit and End bit must not both be set to 1 in the same FU
1071	   header.

1073	   The FU payload consists of fragments of the payload of the fragmented
1074	   NAL unit so that if the FU payloads of consecutive FUs, starting with
1075	   an FU with the S bit equal to 1 and ending with an FU with the E bit
1076	   equal to 1, are sequentially concatenated, the payload of the
1077	   fragmented NAL unit can be reconstructed.  The NAL unit header of the
1078	   fragmented NAL unit is not included as such in the FU payload, but
1079	   rather the information of the NAL unit header of the fragmented NAL
1080	   unit is conveyed in F, TID, Reserve and E fields of the FU payload
1081	   headers of the FUs and the FuType field of the FU header of the FUs.
1082	   An FU payload MUST NOT be empty.

1084	   If an FU is lost, the receiver SHOULD discard all following
1085	   fragmentation units in transmission order corresponding to the same
1086	   fragmented NAL unit, unless the decoder in the receiver is known to
1087	   gracefully handle incomplete NAL units.

1089	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1090	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1091	   n of that NAL unit is not received.  In this case, the
1092	   forbidden_zero_bit of the NAL unit MUST be set to 1 to indicate a
1093	   syntax violation.

1095	4.4.  Decoding Order Number

1097	   For each NAL unit, the variable AbsDon is derived, representing the
1098	   decoding order number that is indicative of the NAL unit decoding
1099	   order.

1101	   Let NAL unit n be the n-th NAL unit in transmission order within an
1102	   RTP stream.

1104	   If sprop-max-don-diff is equal to 0 for all the RTP streams carrying
1105	   the HEVC bitstream, AbsDon[n], the value of AbsDon for NAL unit n, is
1106	   derived as equal to n.

1108	   Otherwise (sprop-max-don-diff is greater than 0 for any of the RTP
1109	   streams), AbsDon[n] is derived as follows, where DON[n] is the value
1110	   of the variable DON for NAL unit n:

1112	   *  If n is equal to 0 (i.e., NAL unit n is the very first NAL unit in
1113	      transmission order), AbsDon[0] is set equal to DON[0].

1115	   *  Otherwise (n is greater than 0), the following applies for
1116	      derivation of AbsDon[n]:

1118	         If DON[n] == DON[n-1],
1119	            AbsDon[n] = AbsDon[n-1]

1121	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1122	            AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1124	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1125	            AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1127	         If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1128	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
1129	            DON[n])

1131	         If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1132	            AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1134	   For any two NAL units m and n, the following applies:

1136	   *  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n follows
1137	      NAL unit m in NAL unit decoding order.

1139	   *  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1140	      of the two NAL units can be in either order.

1142	   *  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1143	      NAL unit m in decoding order.

1145	         Informative note: When two consecutive NAL units in the NAL
1146	         unit decoding order have different values of AbsDon, the
1147	         absolute difference between the two AbsDon values may be
1148	         greater than or equal to 1.

1150	         Informative note: There are multiple reasons to allow for the
1151	         absolute difference of the values of AbsDon for two consecutive
1152	         NAL units in the NAL unit decoding order to be greater than
1153	         one.  An increment by one is not required, as at the time of
1154	         associating values of AbsDon to NAL units, it may not be known
1155	         whether all NAL units are to be delivered to the receiver.  For
1156	         example, a gateway might not forward VCL NAL units of higher
1157	         sub-layers or some SEI NAL units when there is congestion in
1158	         the network.  In another example, the first intra-coded picture
1159	         of a pre-encoded clip is transmitted in advance to ensure that
1160	         it is readily available in the receiver, and when transmitting
1161	         the first intra-coded picture, the originator does not exactly
1162	         know how many NAL units will be encoded before the first intra-
1163	         coded picture of the pre-encoded clip follows in decoding
1164	         order.  Thus, the values of AbsDon for the NAL units of the
1165	         first intra-coded picture of the pre-encoded clip have to be
1166	         estimated when they are transmitted, and gaps in values of
1167	         AbsDon may occur.

1169	5.  Packetization Rules

1171	   The following packetization rules apply:

1173	   *  If sprop-max-don-diff is greater than 0 for any of the RTP
1174	      streams, the transmission order of NAL units carried in the RTP
1175	      stream MAY be different than the NAL unit decoding order and the
1176	      NAL unit output order.

1178	   *  A NAL unit of a small size SHOULD be encapsulated in an
1179	      aggregation packet together with one or more other NAL units in
1180	      order to avoid unnecessary packetization overhead for small NAL
1181	      units.  For example, non-VCL NAL units such as access unit
1182	      delimiters, parameter sets, or SEI NAL units are typically small
1183	      and can often be aggregated with VCL NAL units without violating
1184	      MTU size constraints.

1186	   *  Each non-VCL NAL unit SHOULD, when possible from an MTU size match
1187	      viewpoint, be encapsulated in an aggregation packet together with
1188	      its associated VCL NAL unit, as typically a non-VCL NAL unit would
1189	      be meaningless without the associated VCL NAL unit being
1190	      available.

1192	   *  For carrying exactly one NAL unit in an RTP packet, a single NAL
1193	      unit packet MUST be used.

1195	6.  De-packetization Process

1197	   The general concept behind de-packetization is to get the NAL units
1198	   out of the RTP packets in an RTP stream and pass them to the decoder
1199	   in the NAL unit decoding order.

1201	   The de-packetization process is implementation dependent.  Therefore,
1202	   the following description should be seen as an example of a suitable
1203	   implementation.  Other schemes may be used as well, as long as the
1204	   output for the same input is the same as the process described below.
1205	   The output is the same when the set of output NAL units and their
1206	   order are both identical.  Optimizations relative to the described
1207	   algorithms are possible.

1209	   All normal RTP mechanisms related to buffer management apply.  In
1210	   particular, duplicated or outdated RTP packets (as indicated by the
1211	   RTP sequences number and the RTP timestamp) are removed.  To
1212	   determine the exact time for decoding, factors such as a possible
1213	   intentional delay to allow for proper inter-stream synchronization
1214	   must be factored in.

1216	   NAL units with NAL unit type values in the range of 0 to 55,
1217	   inclusive, may be passed to the decoder.  NAL-unit-like structures
1218	   with NAL unit type values in the range of 56 to 63, inclusive, MUST
1219	   NOT be passed to the decoder.

1221	   The receiver includes a receiver buffer, which is used to compensate
1222	   for transmission delay jitter within individual RTP streams and
1223	   across RTP streams, to reorder NAL units from transmission order to
1224	   the NAL unit decoding order.  In this section, the receiver operation
1225	   is described under the assumption that there is no transmission delay
1226	   jitter within an RTP stream.  To make a difference from a practical
1227	   receiver buffer that is also used for compensation of transmission
1228	   delay jitter, the receiver buffer is hereafter called the de-
1229	   packetization buffer in this section.  Receivers should also prepare
1230	   for transmission delay jitter; that is, either reserve separate
1231	   buffers for transmission delay jitter buffering and de-packetization
1232	   buffering or use a receiver buffer for both transmission delay jitter
1233	   and de-packetization.  Moreover, receivers should take transmission
1234	   delay jitter into account in the buffering operation, e.g., by
1235	   additional initial buffering before starting of decoding and
1236	   playback.

1238	   When sprop-max-don-diff is equal to 0 for the received RTP stream,
1239	   the de-packetization buffer size is zero bytes, and the process
1240	   described in the remainder of this paragraph applies.  The NAL units
1241	   carried in the RTP stream are directly passed to the decoder in their
1242	   transmission order, which is identical to their decoding order.  When
1243	   there are several NAL units of the same RTP stream with the same NTP
1244	   timestamp, the order to pass them to the decoder is their
1245	   transmission order.

1247	      Informative note: The mapping between RTP and NTP timestamps is
1248	      conveyed in RTCP SR packets.  In addition, the mechanisms for
1249	      faster media timestamp synchronization discussed in [RFC6051] may
1250	      be used to speed up the acquisition of the RTP-to-wall-clock
1251	      mapping.

1253	   When sprop-max-don-diff is greater than 0 for the received RTP stream
1254	   the process described in the remainder of this section applies.

1256	   There are two buffering states in the receiver: initial buffering and
1257	   buffering while playing.  Initial buffering starts when the reception
1258	   is initialized.  After initial buffering, decoding and playback are
1259	   started, and the buffering-while-playing mode is used.

1261	   Regardless of the buffering state, the receiver stores incoming NAL
1262	   units, in reception order, into the de-packetization buffer.  NAL
1263	   units carried in RTP packets are stored in the de-packetization
1264	   buffer individually, and the value of AbsDon is calculated and stored
1265	   for each NAL unit.

1267	   Initial buffering lasts until condition A (the difference between the
1268	   greatest and smallest AbsDon values of the NAL units in the de-
1269	   packetization buffer is greater than or equal to the value of sprop-
1270	   max-don-diff) or condition B (the number of NAL units in the de-
1271	   packetization buffer is greater than the value of sprop-depack-buf-
1272	   nalus) is true.

1274	   After initial buffering, whenever condition A or condition B is true,
1275	   the following operation is repeatedly applied until both condition A
1276	   and condition B become false:

1278	   *  The NAL unit in the de-packetization buffer with the smallest
1279	      value of AbsDon is removed from the de-packetization buffer and
1280	      passed to the decoder.

1282	   When no more NAL units are flowing into the de-packetization buffer,
1283	   all NAL units remaining in the de-packetization buffer are removed
1284	   from the buffer and passed to the decoder in the order of increasing
1285	   AbsDon values.

1287	7.  Payload Format Parameters

1289	   This section specifies the optional parameters.  A mapping of the
1290	   parameters with Session Description Protocol (SDP) [RFC4556] is also
1291	   provided for applications that use SDP.

1293	7.1.  Media Type Registration

1295	   The receiver MUST ignore any parameter unspecified in this memo.

1297	   Type name:            video

1299	   Subtype name:         evc

1301	   Required parameters:  none

1303	   Optional parameters:

1305	      editor-note 5: To be updated

1307	7.2.  SDP Parameters

1309	   The receiver MUST ignore any parameter unspecified in this memo.

1311	7.2.1.  Mapping of Payload Type Parameters to SDP

1313	   The media type video/evc string is mapped to fields in the Session
1314	   Description Protocol (SDP) [RFC4566] as follows:

1316	   *  The media name in the "m=" line of SDP MUST be video.

1318	   *  The encoding name in the "a=rtpmap" line of SDP MUST be evc (the
1319	      media subtype).

1321	   *  The clock rate in the "a=rtpmap" line MUST be 90000.

1323	   *  OPTIONAL PARAMETERS:

1325	      editor-note 6: To be updated

1327	7.2.2.  Usage with SDP Offer/Answer Model

1329	   When [EVC] is offered over RTP using SDP in an offer/answer model
1330	   [RFC3264] for negotiation for unicast usage, the following
1331	   limitations and rules apply:

1333	      editor-note 7: to be updated

1335	7.2.3.  SDP Example

1337	      editor-note 8: to be updated

1339	8.  Use with Feedback Messages

1341	      Placeholder

1343	8.1.  Picture Loss Indication (PLI)

1345	      Placeholder

1347	8.2.  Full Intra Request (FIR)

1349	      Placeholder

1351	9.  Security Considerations

1353	   The scope of this Security Considerations section is limited to the
1354	   payload format itself and to one feature of [EVC] that may pose a
1355	   particularly serious security risk if implemented naively.  The
1356	   payload format, in isolation, does not form a complete system.
1357	   Implementers are advised to read and understand relevant security-
1358	   related documents, especially those pertaining to RTP (see the
1359	   Security Considerations section in [RFC3550] ), and the security of
1360	   the call-control stack chosen (that may make use of the media type
1361	   registration of this memo).  Implementers should also consider known
1362	   security vulnerabilities of video coding and decoding implementations
1363	   in general and avoid those.

1365	   Within this RTP payload format, neither the various media-plane-based
1366	   mechanisms, nor the signaling part of this memo, seems to pose a
1367	   security risk beyond those common to all RTP-based systems.

1369	   RTP packets using the payload format defined in this specification
1370	   are subject to the security considerations discussed in the RTP
1371	   specification [RFC3550], and in any applicable RTP profile such as
1372	   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
1373	   SAVPF [RFC5124].  However, as "Securing the RTP Framework: Why RTP
1374	   Does Not Mandate a Single Media Security Solution" [RFC7202]
1375	   discusses, it is not an RTP payload format's responsibility to
1376	   discuss or mandate what solutions are used to meet the basic security
1377	   goals like confidentiality, integrity and source authenticity for RTP
1378	   in general.  This responsibility lays on anyone using RTP in an
1379	   application.  They can find guidance on available security mechanisms
1380	   and important considerations in "Options for Securing RTP Sessions"
1381	   [RFC7201].  Applications SHOULD use one or more appropriate strong
1382	   security mechanisms.  The rest of this section discusses the security
1383	   impacting properties of the payload format itself.

1385	   Because the data compression used with this payload format is applied
1386	   end-to-end, any encryption needs to be performed after compression.
1387	   A potential denial-of-service threat exists for data encodings using
1388	   compression techniques that have non-uniform receiver-end
1389	   computational load.  The attacker can inject pathological datagrams
1390	   into the bitstream that are complex to decode and that cause the
1391	   receiver to be overloaded.  EVC is particularly vulnerable to such
1392	   attacks, as it is extremely simple to generate datagrams containing
1393	   NAL units that affect the decoding process of many future NAL units.
1394	   Therefore, the usage of data origin authentication and data integrity
1395	   protection of at least the RTP packet is RECOMMENDED, for example,
1396	   with SRTP [RFC3711].

1398	   End-to-end security with authentication, integrity, or
1399	   confidentiality protection will prevent a MANE from performing media-
1400	   aware operations other than discarding complete packets.  In the case
1401	   of confidentiality protection, it will even be prevented from
1402	   discarding packets in a media-aware way.  To be allowed to perform
1403	   such operations, a MANE is required to be a trusted entity that is
1404	   included in the security context establishment.

1406	10.  Congestion Control

1408	   Congestion control for RTP SHALL be used in accordance with RTP
1409	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
1410	   If best-effort service is being used, an additional requirement is
1411	   that users of this payload format MUST monitor packet loss to ensure
1412	   that the packet loss rate is within an acceptable range.  Packet loss
1413	   is considered acceptable if a TCP flow across the same network path,
1414	   and experiencing the same network conditions, would achieve an
1415	   average throughput, measured on a reasonable timescale, that is not
1416	   less than all RTP streams combined is achieving.  This condition can
1417	   be satisfied by implementing congestion-control mechanisms to adapt
1418	   the transmission rate, the number of layers subscribed for a layered
1419	   multicast session, or by arranging for a receiver to leave the
1420	   session if the loss rate is unacceptably high.

1422	   The bitrate adaptation necessary for obeying the congestion control
1423	   principle is easily achievable when real-time encoding is used, for
1424	   example, by adequately tuning the quantization parameter.  However,
1425	   when pre-encoded content is being transmitted, bandwidth adaptation
1426	   requires the pre-coded bitstream to be tailored for such adaptivity.
1427	   The key mechanism available in [EVC] is temporal scalability.  A
1428	   media sender can remove NAL units belonging to higher temporal sub-
1429	   layers (i.e., those NAL. units with a high value of TID) until the
1430	   sending bitrate drops to an acceptable range.

1432	   The mechanisms mentioned above generally work within a defined
1433	   profile and level and, therefore, no renegotiation of the channel is
1434	   required.  Only when non-downgradable parameters (such as profile)
1435	   are required to be changed does it become necessary to terminate and
1436	   restart the RTP stream(s).  This may be accomplished by using
1437	   different RTP payload types.

1439	   MANEs MAY remove certain unusable packets from the RTP stream when
1440	   that RTP stream was damaged due to previous packet losses.  This can
1441	   help reduce the network load in certain special cases.  For example,
1442	   MANES can remove those FUs where the leading FUs belonging to the
1443	   same NAL unit have been lost or those dependent slice segments when
1444	   the leading slice segments belonging to the same slice have been
1445	   lost, because the trailing FUs or dependent slice segments are
1446	   meaningless to most decoders.  MANES can also remove higher temporal
1447	   scalable layers if the outbound transmission (from the MANE's
1448	   viewpoint) experiences congestion.

1450	11.  IANA Considerations

1452	   Placeholder

1454	12.  Acknowledgements

1456	   Large parts of this specification share text with the RTP payload
1457	   format for HEVC [RFC7798].  We thank the authors of that
1458	   specification for their excellent work.

1460	13.  References

1462	13.1.  Normative References

1464	   [EVC]      "ISO/IEC FDIS 23094-1 Essential Video Coding", 2020,
1465	              <https://www.iso.org/standard/57797.html>.

1467	   [ISO23094-1]
1468	              "ISO/IEC DIS Information technology --- General video
1469	              coding --- Part 1 Essential video coding", n.d.,
1470	              <https://www.iso.org/standard/57797.html>.

1472	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1473	              Requirement Levels", BCP 14, RFC 2119,
1474	              DOI 10.17487/RFC2119, March 1997,
1475	              <https://www.rfc-editor.org/info/rfc2119>.

1477	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
1478	              with Session Description Protocol (SDP)", RFC 3264,
1479	              DOI 10.17487/RFC3264, June 2002,
1480	              <https://www.rfc-editor.org/info/rfc3264>.

1482	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1483	              Jacobson, "RTP: A Transport Protocol for Real-Time
1484	              Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550,
1485	              July 2003, <https://www.rfc-editor.org/info/rfc3550>.

1487	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1488	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1489	              DOI 10.17487/RFC3551, July 2003,
1490	              <https://www.rfc-editor.org/info/rfc3551>.

1492	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1493	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1494	              RFC 3711, DOI 10.17487/RFC3711, March 2004,
1495	              <https://www.rfc-editor.org/info/rfc3711>.

1497	   [RFC4556]  Zhu, L. and B. Tung, "Public Key Cryptography for Initial
1498	              Authentication in Kerberos (PKINIT)", RFC 4556,
1499	              DOI 10.17487/RFC4556, June 2006,
1500	              <https://www.rfc-editor.org/info/rfc4556>.

1502	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1503	              Description Protocol", RFC 4566, DOI 10.17487/RFC4566,
1504	              July 2006, <https://www.rfc-editor.org/info/rfc4566>.

1506	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1507	              "Extended RTP Profile for Real-time Transport Control
1508	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
1509	              DOI 10.17487/RFC4585, July 2006,
1510	              <https://www.rfc-editor.org/info/rfc4585>.

1512	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
1513	              Real-time Transport Control Protocol (RTCP)-Based Feedback
1514	              (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February
1515	              2008, <https://www.rfc-editor.org/info/rfc5124>.

1517	   [RFC7656]  Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
1518	              B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
1519	              for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
1520	              DOI 10.17487/RFC7656, November 2015,
1521	              <https://www.rfc-editor.org/info/rfc7656>.

1523	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1524	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1525	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1527	13.2.  Informative References

1529	   [AVC]      "ITU-T Recommendation H.264 - Advanced video coding for
1530	              generic audiovisual services", 2014,
1531	              <https://www.iso.org/standard/66069.html>.

1533	   [HEVC]     "High efficiency video coding, ITU-T Recommendation
1534	              H.265", 2017, <https://www.iso.org/standard/69668.html>.

1536	   [I-D.ietf-avtcore-rtp-vvc]
1537	              Zhao, S., Wenger, S., Sanchez, Y., and Y. Wang, "RTP
1538	              Payload Format for Versatile Video Coding (VVC)", Work in
1539	              Progress, Internet-Draft, draft-ietf-avtcore-rtp-vvc-07,
1540	              19 January 2021, <http://www.ietf.org/internet-drafts/
1541	              draft-ietf-avtcore-rtp-vvc-07.txt>.

1543	   [MPEG2S]   IS0/IEC, ., "Information technology - Generic coding
1544	              ofmoving pictures and associated audio information - Part
1545	              1:Systems, ISO International Standard 13818-1", 2013.

1547	   [RFC6051]  Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
1548	              Flows", RFC 6051, DOI 10.17487/RFC6051, November 2010,
1549	              <https://www.rfc-editor.org/info/rfc6051>.

1551	   [RFC6184]  Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
1552	              Payload Format for H.264 Video", RFC 6184,
1553	              DOI 10.17487/RFC6184, May 2011,
1554	              <https://www.rfc-editor.org/info/rfc6184>.

1556	   [RFC6190]  Wenger, S., Wang, Y.-K., Schierl, T., and A.
1557	              Eleftheriadis, "RTP Payload Format for Scalable Video
1558	              Coding", RFC 6190, DOI 10.17487/RFC6190, May 2011,
1559	              <https://www.rfc-editor.org/info/rfc6190>.

1561	   [RFC7201]  Westerlund, M. and C. Perkins, "Options for Securing RTP
1562	              Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014,
1563	              <https://www.rfc-editor.org/info/rfc7201>.

1565	   [RFC7202]  Perkins, C. and M. Westerlund, "Securing the RTP
1566	              Framework: Why RTP Does Not Mandate a Single Media
1567	              Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
1568	              2014, <https://www.rfc-editor.org/info/rfc7202>.

1570	   [RFC7798]  Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
1571	              M. Hannuksela, "RTP Payload Format for High Efficiency
1572	              Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
1573	              March 2016, <https://www.rfc-editor.org/info/rfc7798>.

1575	   [VVC]      "ISO/IEC FDIS 23090-3 Information technology --- Coded
1576	              representation of immersive media --- Part 3 - Versatile
1577	              video coding", 2020,
1578	              <https://www.iso.org/standard/73022.html>.

1580	Authors' Addresses

1582	   Shuai Zhao
1583	   Tencent
1584	   2747 Park Blvd
1585	   Palo Alto,  94588
1586	   United States of America

1588	   Email: shuai.zhao@ieee.org

1590	   Stephan Wenger
1591	   Tencent
1592	   2747 Park Blvd
1593	   Palo Alto,  94588
1594	   United States of America

1596	   Email: stewe@stewe.org

1598	   Youngkwon Lim
1599	   Samsung Electronics
1600	   6625 Excellence Way
1601	   Plano,  75013
1602	   United States of America

1604	   Email: yklwhite@gmail.com