< draft-ietf-avtcore-rtp-vvc-14.txt   draft-ietf-avtcore-rtp-vvc-15.txt >
avtcore S. Zhao avtcore S. Zhao
Internet-Draft S. Wenger Internet-Draft S. Wenger
Intended status: Standards Track Tencent Intended status: Standards Track Tencent
Expires: 29 August 2022 Y. Sanchez Expires: 24 October 2022 Y. Sanchez
Fraunhofer HHI Fraunhofer HHI
Y. Wang Y. Wang
Bytedance Inc. Bytedance Inc.
M. M Hannuksela M. M Hannuksela
Nokia Technologies Nokia Technologies
25 February 2022 22 April 2022
RTP Payload Format for Versatile Video Coding (VVC) RTP Payload Format for Versatile Video Coding (VVC)
draft-ietf-avtcore-rtp-vvc-14 draft-ietf-avtcore-rtp-vvc-15
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation H.266 and ISO/IEC International standard ITU-T Recommendation H.266 and ISO/IEC International
Standard 23090-3, both also known as Versatile Video Coding (VVC) and Standard 23090-3, both also known as Versatile Video Coding (VVC) and
developed by the Joint Video Experts Team (JVET). The RTP payload developed by the Joint Video Experts Team (JVET). The RTP payload
format allows for packetization of one or more Network Abstraction format allows for packetization of one or more Network Abstraction
Layer (NAL) units in each RTP packet payload as well as fragmentation Layer (NAL) units in each RTP packet payload as well as fragmentation
of a NAL unit into multiple RTP packets. The payload format has wide of a NAL unit into multiple RTP packets. The payload format has wide
skipping to change at page 1, line 44 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 29 August 2022. This Internet-Draft will expire on 24 October 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 41 skipping to change at page 2, line 41
4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22
4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22
4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23
4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23
4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 27 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 27
4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 30 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 30
5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 31 5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 31
6. De-packetization Process . . . . . . . . . . . . . . . . . . 32 6. De-packetization Process . . . . . . . . . . . . . . . . . . 32
7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 34 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 34
7.1. Media Type Registration . . . . . . . . . . . . . . . . . 34 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 34
7.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 45 7.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 46
7.2.1. Mapping of Payload Type Parameters to SDP . . . . . . 45 7.2.1. Mapping of Payload Type Parameters to SDP . . . . . . 46
7.2.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 46 7.2.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 48
7.2.3. Usage in Declarative Session Descriptions . . . . . . 55 7.2.3. Usage in Declarative Session Descriptions . . . . . . 57
7.2.4. Considerations for Parameter Sets . . . . . . . . . . 56 7.2.4. Considerations for Parameter Sets . . . . . . . . . . 59
8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 56 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 59
8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 57 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 59
8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 57 8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 59
9. Security Considerations . . . . . . . . . . . . . . . . . . . 57 9. Security Considerations . . . . . . . . . . . . . . . . . . . 60
10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 59 10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 61
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 62
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 62
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 62
13.1. Normative References . . . . . . . . . . . . . . . . . . 60 13.1. Normative References . . . . . . . . . . . . . . . . . . 62
13.2. Informative References . . . . . . . . . . . . . . . . . 62 13.2. Informative References . . . . . . . . . . . . . . . . . 64
Appendix A. Change History . . . . . . . . . . . . . . . . . . . 63 Appendix A. Change History . . . . . . . . . . . . . . . . . . . 66
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 66
1. Introduction 1. Introduction
The Versatile Video Coding specification, formally published as both The Versatile Video Coding specification was formally published as
ITU-T Recommendation H.266 [VVC] and ISO/IEC International Standard both ITU-T Recommendation H.266 [VVC] and ISO/IEC International
23090-3 [ISO23090-3]. VVC is reported to provide significant coding Standard 23090-3 [ISO23090-3]. VVC is reported to provide
efficiency gains over High Efficiency Video Coding [HEVC], also known significant coding efficiency gains over High Efficiency Video Coding
as H.265, and other earlier video codecs. [HEVC], also known as H.265, and other earlier video codecs.
This memo specifies an RTP payload format for VVC. It shares its This memo specifies an RTP payload format for VVC. It shares its
basic design with the NAL (Network Abstraction Layer) unit based RTP basic design with the NAL (Network Abstraction Layer) unit based RTP
payload formats of AVC Video Coding [RFC6184], Scalable Video Coding payload formats of AVC Video Coding [RFC6184], Scalable Video Coding
(SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and
their respective predecessors. With respect to design philosophy, their respective predecessors. With respect to design philosophy,
security, congestion control, and overall implementation complexity, security, congestion control, and overall implementation complexity,
it has similar properties to those earlier payload format it has similar properties to those earlier payload format
specifications. This is a conscious choice, as at least RFC 6184 is specifications. This is a conscious choice, as at least RFC 6184 is
widely deployed and generally known in the relevant implementer widely deployed and generally known in the relevant implementer
skipping to change at page 4, line 23 skipping to change at page 4, line 23
Finally, VVC includes temporal, spatial, and SNR scalability as well Finally, VVC includes temporal, spatial, and SNR scalability as well
as multiview coding support. as multiview coding support.
Coding blocks and transform structure Coding blocks and transform structure
Among major coding-tool differences between HEVC and VVC, one of the Among major coding-tool differences between HEVC and VVC, one of the
important improvements is the more flexible coding tree structure in important improvements is the more flexible coding tree structure in
VVC, i.e., multi-type tree. In addition to quadtree, binary and VVC, i.e., multi-type tree. In addition to quadtree, binary and
ternary trees are also supported, which contributes significant ternary trees are also supported, which contributes significant
improvement in coding efficiency. Moreover, the maximum size of improvement in coding efficiency. Moreover, the maximum size of a
coding tree unit (CTU) is increased from 64x64 to 128x128. To coding tree unit (CTU) is increased from 64x64 to 128x128. To
improve the coding efficiency of chroma signal, luma chroma separated improve the coding efficiency of chroma signal, luma chroma separated
trees at CTU level may be employed for intra-slices. The square trees at CTU level may be employed for intra-slices. The square
transforms in HEVC are extended to non-square transforms for transforms in HEVC are extended to non-square transforms for
rectangular blocks resulting from binary and ternary tree splits. rectangular blocks resulting from binary and ternary tree splits.
Besides, VVC supports multiple transform sets (MTS), including DCT-2, Besides, VVC supports multiple transform sets (MTS), including DCT-2,
DST-7, and DCT-8 as well as the non-separable secondary transform. DST-7, and DCT-8 as well as the non-separable secondary transform.
The transforms used in VVC can have different sizes with support for The transforms used in VVC can have different sizes with support for
larger transform sizes. For DCT-2, the transform sizes range from larger transform sizes. For DCT-2, the transform sizes range from
2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from 2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from
skipping to change at page 5, line 20 skipping to change at page 5, line 20
loop filter (ALF) may be used. As a Wiener filter, ALF reduces loop filter (ALF) may be used. As a Wiener filter, ALF reduces
distortion of decoded pictures. Besides, VVC introduces a new module distortion of decoded pictures. Besides, VVC introduces a new module
called luma mapping with chroma scaling to fully utilize the dynamic called luma mapping with chroma scaling to fully utilize the dynamic
range of signal so that rate-distortion performance of both Standard range of signal so that rate-distortion performance of both Standard
Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved. Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved.
Motion prediction and coding Motion prediction and coding
Compared to HEVC, VVC introduces several improvements in this area. Compared to HEVC, VVC introduces several improvements in this area.
First, there is the adaptive motion vector resolution (AMVR), which First, there is the adaptive motion vector resolution (AMVR), which
can save bit cost for motion vectors by adaptively signalling motion can save bit cost for motion vectors by adaptively signaling motion
vector resolution. Then the affine motion compensation is included vector resolution. Then the affine motion compensation is included
to capture complicated motion like zooming and rotation. Meanwhile, to capture complicated motion like zooming and rotation. Meanwhile,
prediction refinement with the optical flow with affine mode (PROF) prediction refinement with the optical flow with affine mode (PROF)
is further deployed to mimic affine motion at the pixel level. is further deployed to mimic affine motion at the pixel level.
Thirdly the decoder side motion vector refinement (DMVR) is a method Thirdly the decoder side motion vector refinement (DMVR) is a method
to derive MV vector at decoder side based on block matching so that to derive MV vector at decoder side based on block matching so that
fewer bits may be spent on motion vectors. Bi-directional optical fewer bits may be spent on motion vectors. Bi-directional optical
flow (BDOF) is a similar method to PROF. BDOF adds a sample wise flow (BDOF) is a similar method to PROF. BDOF adds a sample wise
offset at 4x4 sub-block level that is derived with equations based on offset at 4x4 sub-block level that is derived with equations based on
gradients of the prediction samples and a motion difference relative gradients of the prediction samples and a motion difference relative
skipping to change at page 7, line 6 skipping to change at page 7, line 6
The decoding capability information includes parameters that stay The decoding capability information includes parameters that stay
constant for the lifetime of a VVC bitstream, which in IETF terms can constant for the lifetime of a VVC bitstream, which in IETF terms can
translate to a session. Such information includes profile, level, translate to a session. Such information includes profile, level,
and sub-profile information to determine a maximum capability interop and sub-profile information to determine a maximum capability interop
point that is guaranteed to be never exceeded, even if splicing of point that is guaranteed to be never exceeded, even if splicing of
video sequences occurs within a session. It further includes video sequences occurs within a session. It further includes
constraint fields (most of which are flags), which can optionally be constraint fields (most of which are flags), which can optionally be
set to indicate that the video bitstream will be constrained in the set to indicate that the video bitstream will be constrained in the
use of certain features as indicated by the values of those fields. use of certain features as indicated by the values of those fields.
With this, a bitstream can be labelled as not using certain tools, With this, a bitstream can be labeled as not using certain tools,
which allows among other things for resource allocation in a decoder which allows among other things for resource allocation in a decoder
implementation. implementation.
Video parameter set Video parameter set
The video parameter set (VPS) pertains to one or more coded video The video parameter set (VPS) pertains to one or more coded video
sequences (CVSs) of multiple layers covering the same range of access sequences (CVSs) of multiple layers covering the same range of access
units, and includes, among other information, decoding dependency units, and includes, among other information, decoding dependency
expressed as information for reference picture list construction of expressed as information for reference picture list construction of
enhancement layers. The VPS provides a "big picture" of a scalable enhancement layers. The VPS provides a "big picture" of a scalable
skipping to change at page 7, line 29 skipping to change at page 7, line 29
high-level properties of the bitstream that can be used as the basis high-level properties of the bitstream that can be used as the basis
for session negotiation and content selection, etc. One VPS may be for session negotiation and content selection, etc. One VPS may be
referenced by one or more sequence parameter sets. referenced by one or more sequence parameter sets.
Sequence parameter set Sequence parameter set
The sequence parameter set (SPS) contains syntax elements pertaining The sequence parameter set (SPS) contains syntax elements pertaining
to a coded layer video sequence (CLVS), which is a group of pictures to a coded layer video sequence (CLVS), which is a group of pictures
belonging to the same layer, starting with a random access point, and belonging to the same layer, starting with a random access point, and
followed by pictures that may depend on each other, until the next followed by pictures that may depend on each other, until the next
random access point picture. In MPGEG-2, the equivalent of a CVS was random access point picture. In MPEG-2, the equivalent of a CVS was
a group of pictures (GOP), which normally started with an I frame and a group of pictures (GOP), which normally started with an I frame and
was followed by P and B frames. While more complex in its options of was followed by P and B frames. While more complex in its options of
random access points, VVC retains this basic concept. One remarkable random access points, VVC retains this basic concept. One remarkable
difference of VVC is that a CLVS may start with a Gradual Decoding difference of VVC is that a CLVS may start with a Gradual Decoding
Refresh (GDR) picture, without requiring presence of traditional Refresh (GDR) picture, without requiring presence of traditional
random access points in the bitstream, such as instantaneous decoding random access points in the bitstream, such as instantaneous decoding
refresh (IDR) or clean random access (CRA) pictures. In many TV-like refresh (IDR) or clean random access (CRA) pictures. In many TV-like
applications, a CVS contains a few hundred milliseconds to a few applications, a CVS contains a few hundred milliseconds to a few
seconds of video. In video conferencing (without switching MCUs seconds of video. In video conferencing (without switching MCUs
involved), a CVS can be as long in duration as the whole session. involved), a CVS can be as long in duration as the whole session.
skipping to change at page 8, line 45 skipping to change at page 8, line 45
according to ITU-T Rec. T.35, that does not carry a semantics. It is according to ITU-T Rec. T.35, that does not carry a semantics. It is
carried in the profile_tier_level structure and hence (potentially) carried in the profile_tier_level structure and hence (potentially)
present in the DCI, VPS, and SPS. External registration bodies can present in the DCI, VPS, and SPS. External registration bodies can
register a T.35 codepoint with ITU-T registration authorities and register a T.35 codepoint with ITU-T registration authorities and
associate with their registration a description of bitstream associate with their registration a description of bitstream
restrictions beyond the profiles defined by ITU-T and ISO/IEC. This restrictions beyond the profiles defined by ITU-T and ISO/IEC. This
would allow encoder manufacturers to label the bitstreams generated would allow encoder manufacturers to label the bitstreams generated
by their encoder as complying with such sub-profile. It is expected by their encoder as complying with such sub-profile. It is expected
that upstream standardization organizations (such as: DVB and ATSC), that upstream standardization organizations (such as: DVB and ATSC),
as well as walled-garden video services will take advantage of this as well as walled-garden video services will take advantage of this
labelling system. In contrast to "normal" profiles, it is expected labeled system. In contrast to "normal" profiles, it is expected
that sub-profiles may indicate encoder choices traditionally left that sub-profiles may indicate encoder choices traditionally left
open in the (decoder-centric) video coding specs, such as GOP open in the (decoder-centric) video coding specs, such as GOP
structures, minimum/maximum QP values, and the mandatory use of structures, minimum/maximum QP values, and the mandatory use of
certain tools or SEI messages. certain tools or SEI messages.
General constraint fields General constraint fields
The profile_tier_level structure carries a considerable number of The profile_tier_level structure carries a considerable number of
constraint fields (most of which are flags), which an encoder can use constraint fields (most of which are flags), which an encoder can use
to indicate to a decoder that it will not use a certain tool or to indicate to a decoder that it will not use a certain tool or
technology. They were included in reaction to a perceived market technology. They were included in reaction to a perceived market
need for labelling a bitstream as not exercising a certain tool that need for labeled a bitstream as not exercising a certain tool that
has become commercially unviable. has become commercially unviable.
Temporal scalability support Temporal scalability support
VVC includes support of temporal scalability, by inclusion of the VVC includes support of temporal scalability, by inclusion of the
signalling of TemporalId in the NAL unit header, the restriction that signaling of TemporalId in the NAL unit header, the restriction that
pictures of a particular temporal sublayer cannot be used for inter pictures of a particular temporal sublayer cannot be used for inter
prediction reference by pictures of a lower temporal sublayer, the prediction reference by pictures of a lower temporal sublayer, the
sub-bitstream extraction process, and the requirement that each sub- sub-bitstream extraction process, and the requirement that each sub-
bitstream extraction output be a conforming bitstream. Media-Aware bitstream extraction output be a conforming bitstream. Media-Aware
Network Elements (MANEs) can utilize the TemporalId in the NAL unit Network Elements (MANEs) can utilize the TemporalId in the NAL unit
header for stream adaptation purposes based on temporal scalability. header for stream adaptation purposes based on temporal scalability.
Reference picture resampling (RPR) Reference picture resampling (RPR)
In AVC and HEVC, the spatial resolution of pictures cannot change In AVC and HEVC, the spatial resolution of pictures cannot change
skipping to change at page 9, line 50 skipping to change at page 9, line 50
video region or some region of interest is needed. video region or some region of interest is needed.
Spatial, SNR, and multiview scalability Spatial, SNR, and multiview scalability
VVC includes support for spatial, SNR, and multiview scalability. VVC includes support for spatial, SNR, and multiview scalability.
Scalable video coding is widely considered to have technical benefits Scalable video coding is widely considered to have technical benefits
and enrich services for various video applications. Until recently, and enrich services for various video applications. Until recently,
however, the functionality has not been included in the first version however, the functionality has not been included in the first version
of specifications of the video codecs. In VVC, however, all those of specifications of the video codecs. In VVC, however, all those
forms of scalability are supported in the first version of VVC forms of scalability are supported in the first version of VVC
natively through the signalling of the nuh_layer_id in the NAL unit natively through the signaling of the nuh_layer_id in the NAL unit
header, the VPS which associates layers with given nuh_layer_id to header, the VPS which associates layers with given nuh_layer_id to
each other, reference picture selection, reference picture resampling each other, reference picture selection, reference picture resampling
for spatial scalability, and a number of other mechanisms not for spatial scalability, and a number of other mechanisms not
relevant for this memo. relevant for this memo.
Spatial scalability Spatial scalability
With the existence of Reference Picture Resampling (RPR), the With the existence of Reference Picture Resampling (RPR), the
additional burden for scalability support is just a additional burden for scalability support is just a
modification of the high-level syntax (HLS). The inter-layer modification of the high-level syntax (HLS). The inter-layer
skipping to change at page 11, line 21 skipping to change at page 11, line 21
subpictures as a feature, which provides the same functionality as subpictures as a feature, which provides the same functionality as
HEVC motion-constrained tile sets (MCTSs) but designed differently to HEVC motion-constrained tile sets (MCTSs) but designed differently to
have better coding efficiency and to be friendlier for usage in have better coding efficiency and to be friendlier for usage in
application systems. More details of these differences are described application systems. More details of these differences are described
below. below.
Tiles and WPP Tiles and WPP
Same as in HEVC, a picture can be split into tile rows and tile Same as in HEVC, a picture can be split into tile rows and tile
columns in VVC, in-picture prediction across tile boundaries is columns in VVC, in-picture prediction across tile boundaries is
disallowed, etc. However, the syntax for signalling of tile disallowed, etc. However, the syntax for signaling of tile
partitioning has been simplified, by using a unified syntax design partitioning has been simplified, by using a unified syntax design
for both the uniform and the non-uniform mode. In addition, for both the uniform and the non-uniform mode. In addition,
signalling of entry point offsets for tiles in the slice header is signaling of entry point offsets for tiles in the slice header is
optional in VVC while it is mandatory in HEVC. The WPP design in VVC optional in VVC while it is mandatory in HEVC. The WPP design in VVC
has two differences compared to HEVC: i) The CTU row delay is reduced has two differences compared to HEVC: i) The CTU row delay is reduced
from two CTUs to one CTU; ii) signalling of entry point offsets for from two CTUs to one CTU; ii) signaling of entry point offsets for
WPP in the slice header is optional in VVC while it is mandatory in WPP in the slice header is optional in VVC while it is mandatory in
HEVC. HEVC.
Slices Slices
In VVC, the conventional slices based on CTUs (as in HEVC) or In VVC, the conventional slices based on CTUs (as in HEVC) or
macroblocks (as in AVC) have been removed. The main reasoning behind macroblocks (as in AVC) have been removed. The main reasoning behind
this architectural change is as follows. The advances in video this architectural change is as follows. The advances in video
coding since 2003 (the publication year of AVC v1) have been such coding since 2003 (the publication year of AVC v1) have been such
that slice-based error concealment has become practically impossible, that slice-based error concealment has become practically impossible,
skipping to change at page 18, line 47 skipping to change at page 18, line 47
Specification. Specification.
3.1.2. Definitions Specific to This Memo 3.1.2. Definitions Specific to This Memo
Media-Aware Network Element (MANE): A network element, such as a Media-Aware Network Element (MANE): A network element, such as a
middlebox, selective forwarding unit, or application-layer gateway middlebox, selective forwarding unit, or application-layer gateway
that is capable of parsing certain aspects of the RTP payload headers that is capable of parsing certain aspects of the RTP payload headers
or the RTP payload and reacting to their contents. or the RTP payload and reacting to their contents.
Informative note: The concept of a MANE goes beyond normal routers Informative note: The concept of a MANE goes beyond normal routers
or gateways in that a MANE has to be aware of the signalling or gateways in that a MANE has to be aware of the signaling (e.g.,
(e.g., to learn about the payload type mappings of the media to learn about the payload type mappings of the media streams),
streams), and in that it has to be trusted when working with and in that it has to be trusted when working with Secure RTP
Secure RTP (SRTP). The advantage of using MANEs is that they (SRTP). The advantage of using MANEs is that they allow packets
allow packets to be dropped according to the needs of the media to be dropped according to the needs of the media coding. For
coding. For example, if a MANE has to drop packets due to example, if a MANE has to drop packets due to congestion on a
congestion on a certain link, it can identify and remove those certain link, it can identify and remove those packets whose
packets whose elimination produces the least adverse effect on the elimination produces the least adverse effect on the user
user experience. After dropping packets, MANEs must rewrite RTCP experience. After dropping packets, MANEs must rewrite RTCP
packets to match the changes to the RTP stream, as specified in packets to match the changes to the RTP stream, as specified in
Section 7 of [RFC3550]. Section 7 of [RFC3550].
NAL unit decoding order: A NAL unit order that conforms to the NAL unit decoding order: A NAL unit order that conforms to the
constraints on NAL unit order given in Section 7.4.2.4 in [VVC], constraints on NAL unit order given in Section 7.4.2.4 in [VVC],
follow the Order of NAL units in the bitstream. follow the Order of NAL units in the bitstream.
RTP stream (See [RFC7656]): Within the scope of this memo, one RTP RTP stream (See [RFC7656]): Within the scope of this memo, one RTP
stream is utilized to transport a VVC bitstream, which may contain stream is utilized to transport a VVC bitstream, which may contain
one or more layers, and each layer may contain one or more temporal one or more layers, and each layer may contain one or more temporal
skipping to change at page 20, line 20 skipping to change at page 20, line 20
NAL Network Abstraction Layer NAL Network Abstraction Layer
NALU Network Abstraction Layer Unit NALU Network Abstraction Layer Unit
OLS Output Layer Set OLS Output Layer Set
PLI Picture Loss Indication PLI Picture Loss Indication
PPS Picture Parameter Set PPS Picture Parameter Set
RPS Reference Picture Set
RPSI Reference Picture Selection Indication RPSI Reference Picture Selection Indication
SEI Supplemental Enhancement Information SEI Supplemental Enhancement Information
SLI Slice Loss Indication SLI Slice Loss Indication
SPS Sequence Parameter Set SPS Sequence Parameter Set
VCL Video Coding Layer VCL Video Coding Layer
skipping to change at page 31, line 51 skipping to change at page 31, line 51
The following packetization rules apply: The following packetization rules apply:
* If sprop-max-don-diff is greater than 0, the transmission order of * If sprop-max-don-diff is greater than 0, the transmission order of
NAL units carried in the RTP stream MAY be different than the NAL NAL units carried in the RTP stream MAY be different than the NAL
unit decoding order. Otherwise (sprop-max-don-diff is equal to unit decoding order. Otherwise (sprop-max-don-diff is equal to
0), the transmission order of NAL units carried in the RTP stream 0), the transmission order of NAL units carried in the RTP stream
MUST be the same as the NAL unit decoding order. MUST be the same as the NAL unit decoding order.
* A NAL unit of a small size SHOULD be encapsulated in an * A NAL unit of a small size SHOULD be encapsulated in an
aggregation packet together one or more other NAL units in order aggregation packet together with one or more other NAL units in
to avoid the unnecessary packetization overhead for small NAL order to avoid the unnecessary packetization overhead for small
units. For example, non-VCL NAL units such as access unit NAL units. For example, non-VCL NAL units such as access unit
delimiters, parameter sets, or SEI NAL units are typically small delimiters, parameter sets, or SEI NAL units are typically small
and can often be aggregated with VCL NAL units without violating and can often be aggregated with VCL NAL units without violating
MTU size constraints. MTU size constraints.
* Each non-VCL NAL unit SHOULD, when possible from an MTU size match * Each non-VCL NAL unit SHOULD, when possible from an MTU size match
viewpoint, be encapsulated in an aggregation packet together with viewpoint, be encapsulated in an aggregation packet together with
its associated VCL NAL unit, as typically a non-VCL NAL unit would its associated VCL NAL unit, as typically a non-VCL NAL unit would
be meaningless without the associated VCL NAL unit being be meaningless without the associated VCL NAL unit being
available. available.
skipping to change at page 32, line 34 skipping to change at page 32, line 34
The de-packetization process is implementation dependent. Therefore, The de-packetization process is implementation dependent. Therefore,
the following description should be seen as an example of a suitable the following description should be seen as an example of a suitable
implementation. Other schemes may be used as well, as long as the implementation. Other schemes may be used as well, as long as the
output for the same input is the same as the process described below. output for the same input is the same as the process described below.
The output is the same when the set of output NAL units and their The output is the same when the set of output NAL units and their
order are both identical. Optimizations relative to the described order are both identical. Optimizations relative to the described
algorithms are possible. algorithms are possible.
All normal RTP mechanisms related to buffer management apply. In All normal RTP mechanisms related to buffer management apply. In
particular, duplicated or outdated RTP packets (as indicated by the particular, duplicated or outdated RTP packets (as indicated by the
RTP sequences number and the RTP timestamp) are removed. To RTP sequence number and the RTP timestamp) are removed. To determine
determine the exact time for decoding, factors such as a possible the exact time for decoding, factors such as a possible intentional
intentional delay to allow for proper inter-stream synchronization delay to allow for proper inter-stream synchronization MUST be
MUST be factored in. factored in.
NAL units with NAL unit type values in the range of 0 to 27, NAL units with NAL unit type values in the range of 0 to 27,
inclusive, may be passed to the decoder. NAL-unit-like structures inclusive, may be passed to the decoder. NAL-unit-like structures
with NAL unit type values in the range of 28 to 31, inclusive, MUST with NAL unit type values in the range of 28 to 31, inclusive, MUST
NOT be passed to the decoder. NOT be passed to the decoder.
The receiver includes a receiver buffer, which is used to compensate The receiver includes a receiver buffer, which is used to compensate
for transmission delay jitter within individual RTP stream, and to for transmission delay jitter within individual RTP stream, and to
reorder NAL units from transmission order to the NAL unit decoding reorder NAL units from transmission order to the NAL unit decoding
order. In this section, the receiver operation is described under order. In this section, the receiver operation is described under
skipping to change at page 34, line 38 skipping to change at page 34, line 38
provided for applications that use SDP. provided for applications that use SDP.
7.1. Media Type Registration 7.1. Media Type Registration
The receiver MUST ignore any parameter unspecified in this memo. The receiver MUST ignore any parameter unspecified in this memo.
Type name: video Type name: video
Subtype name: H266 Subtype name: H266
Required parameters: none Required parameters: N/A
Optional parameters: Optional parameters:
profile-id, tier-flag, sub-profile-id, interop-constraints, and profile-id, tier-flag, sub-profile-id, interop-constraints, and
level-id: level-id:
These parameters indicate the profile, tier, default level, These parameters indicate the profile, tier, default level,
sub-profile, and some constraints of the bitstream carried by sub-profile, and some constraints of the bitstream carried by
the RTP stream, or a specific set of the profile, tier, default the RTP stream, or a specific set of the profile, tier, default
level, sub-profile and some constraints the receiver supports. level, sub-profile and some constraints the receiver supports.
skipping to change at page 35, line 16 skipping to change at page 35, line 16
the bitstream or that the receiver supports, as well as some the bitstream or that the receiver supports, as well as some
additional constraints are indicated collectively by profile- additional constraints are indicated collectively by profile-
id, sub-profile-id, and interop-constraints. id, sub-profile-id, and interop-constraints.
Informative note: There are 128 values of profile-id. The Informative note: There are 128 values of profile-id. The
subset of coding tools identified by the profile-id can be subset of coding tools identified by the profile-id can be
further constrained with up to 255 instances of sub-profile- further constrained with up to 255 instances of sub-profile-
id. In addition, 68 bits included in interop-constraints, id. In addition, 68 bits included in interop-constraints,
which can be extended up to 324 bits provide means to which can be extended up to 324 bits provide means to
further restrict tools from existing profiles. To be able further restrict tools from existing profiles. To be able
to support this fine-granular signalling of coding tool to support this fine-granular signaling of coding tool
subsets with profile-id, sub-profile-id and interop- subsets with profile-id, sub-profile-id and interop-
constraints, it would be safe to require symmetric use of constraints, it would be safe to require symmetric use of
these parameters in SDP offer/answer unless recv-ols-id is these parameters in SDP offer/answer unless recv-ols-id is
included in the SDP answer for choosing one of the layers included in the SDP answer for choosing one of the layers
offered. offered.
The tier is indicated by tier-flag. The default level is The tier is indicated by tier-flag. The default level is
indicated by level-id. The tier and the default level specify indicated by level-id. The tier and the default level specify
the limits on values of syntax elements or arithmetic the limits on values of syntax elements or arithmetic
combinations of values of syntax elements that are followed combinations of values of syntax elements that are followed
when generating the bitstream or that the receiver supports. when generating the bitstream or that the receiver supports.
In SDP offer/answer, when the SDP answer does not include the In SDP offer/answer, when the SDP answer does not include the
recv-ols-id parameter that is less than the sprop-ols-id recv-ols-id parameter that is less than the sprop-ols-id
parameter in the SDP offer, the following applies: parameter in the SDP offer, the following applies:
o The tier-flag, profile-id, sub-profile-id, and interop- o The tier-flag, profile-id, sub-profile-id, and interop-
constraints parameters MUST be used symmetrically, i.e., the constraints parameters MUST be used symmetrically, i.e., the
value of each of these parameters in the offer MUST be the value of each of these parameters in the offer MUST be the
same as that in the answer, either explicitly signalled or same as that in the answer, either explicitly signaled or
implicitly inferred. implicitly inferred.
o The level-id parameter is changeable as long as the highest o The level-id parameter is changeable as long as the highest
level indicated by the answer is either equal to or lower level indicated by the answer is either equal to or lower
than that in the offer. Note that a highest level higher than that in the offer. Note that a highest level higher
than level-id in the offer for receiving can be included as than level-id in the offer for receiving can be included as
max-recv-level-id. max-recv-level-id.
In SDP offer/answer, when the SDP answer does include the recv- In SDP offer/answer, when the SDP answer does include the recv-
ols-id parameter that is less than the sprop-ols-id parameter ols-id parameter that is less than the sprop-ols-id parameter
skipping to change at page 37, line 18 skipping to change at page 37, line 18
structures in all DCI NAL units in the bitstream has the same structures in all DCI NAL units in the bitstream has the same
values respectively for those profile_tier_level( ) syntax values respectively for those profile_tier_level( ) syntax
elements. elements.
[VVC] allows for multiple profile_tier_level( ) structures in a [VVC] allows for multiple profile_tier_level( ) structures in a
DCI NAL unit, which may contain different values for the syntax DCI NAL unit, which may contain different values for the syntax
elements used to derive the values of profile-id, tier-flag, elements used to derive the values of profile-id, tier-flag,
level-id, sub-profile-id, or interop-constraints in the level-id, sub-profile-id, or interop-constraints in the
different entries. However, herein defined is only a single different entries. However, herein defined is only a single
profile-id, tier-flag, level-id, sub-profile-id, or interop- profile-id, tier-flag, level-id, sub-profile-id, or interop-
constraints. When signalling these parameters and a DCI NAL constraints. When signaling these parameters and a DCI NAL
unit is present with multiple profile_tier_level( ) structures, unit is present with multiple profile_tier_level( ) structures,
these values SHOULD be the same as the first profile_tier_level these values SHOULD be the same as the first profile_tier_level
structure in the DCI, unless the sender has ensured that the structure in the DCI, unless the sender has ensured that the
receiver can decode the bitstream when a different value is receiver can decode the bitstream when a different value is
chosen. chosen.
tier-flag, level-id: tier-flag, level-id:
The value of tier-flag MUST be in the range of 0 to 1, The value of tier-flag MUST be in the range of 0 to 1,
inclusive. The value of level-id MUST be in the range of 0 to inclusive. The value of level-id MUST be in the range of 0 to
skipping to change at page 43, line 37 skipping to change at page 43, line 37
decoding video at a higher rate than is required by the highest decoding video at a higher rate than is required by the highest
level. level.
Informative note: When the OPTIONAL media type parameters Informative note: When the OPTIONAL media type parameters
are used to signal the properties of a bitstream, and max- are used to signal the properties of a bitstream, and max-
lsr is not present, the values of tier-flag, profile-id, lsr is not present, the values of tier-flag, profile-id,
sub-profile-id interop-constraints, and level-id must always sub-profile-id interop-constraints, and level-id must always
be such that the bitstream complies fully with the specified be such that the bitstream complies fully with the specified
profile, tier, and level. profile, tier, and level.
When max-lsr is signalled, the receiver MUST be able to decode When max-lsr is signaled, the receiver MUST be able to decode
bitstreams that conform to the highest level, with the bitstreams that conform to the highest level, with the
exception that the MaxLumaSr value in Table 136 of [VVC] for exception that the MaxLumaSr value in Table 136 of [VVC] for
the highest level is replaced with the value of max-lsr. the highest level is replaced with the value of max-lsr.
Senders MAY use this knowledge to send pictures of a given size Senders MAY use this knowledge to send pictures of a given size
at a higher picture rate than is indicated in the highest at a higher picture rate than is indicated in the highest
level. level.
When not present, the value of max-lsr is inferred to be equal When not present, the value of max-lsr is inferred to be equal
to the value of MaxLumaSr given in Table 136 of [VVC] for the to the value of MaxLumaSr given in Table 136 of [VVC] for the
highest level. highest level.
skipping to change at page 44, line 27 skipping to change at page 44, line 27
constraint on maximum picture rate for all resolutions. constraint on maximum picture rate for all resolutions.
Informative note: The max-fps parameter is semantically Informative note: The max-fps parameter is semantically
different from max-lsr in that max-fps is used to signal a different from max-lsr in that max-fps is used to signal a
constraint, lowering the maximum picture rate from what is constraint, lowering the maximum picture rate from what is
implied by other parameters. implied by other parameters.
The encoder MUST use a picture rate equal to or less than this The encoder MUST use a picture rate equal to or less than this
value. In cases where the max-fps parameter is absent, the value. In cases where the max-fps parameter is absent, the
encoder is free to choose any picture rate according to the encoder is free to choose any picture rate according to the
highest level and any signalled optional parameters. highest level and any signaled optional parameters.
The value of max-fps MUST be smaller than or equal to the full The value of max-fps MUST be smaller than or equal to the full
picture rate that is implied by the highest level and, when picture rate that is implied by the highest level and, when
present, max-lsr. present, max-lsr.
sprop-max-don-diff: sprop-max-don-diff:
If there is no NAL unit naluA that is followed in transmission If there is no NAL unit naluA that is followed in transmission
order by any NAL unit preceding naluA in decoding order (i.e., order by any NAL unit preceding naluA in decoding order (i.e.,
the transmission order of the NAL units is the same as the the transmission order of the NAL units is the same as the
skipping to change at page 45, line 44 skipping to change at page 45, line 44
parameter is smaller than or equal to this parameter. parameter is smaller than or equal to this parameter.
When not present, the value of depack-buf-cap is inferred to be When not present, the value of depack-buf-cap is inferred to be
equal to 4294967295. The value of depack-buf-cap MUST be an equal to 4294967295. The value of depack-buf-cap MUST be an
integer in the range of 1 to 4294967295, inclusive. integer in the range of 1 to 4294967295, inclusive.
Informative note: depack-buf-cap indicates the maximum Informative note: depack-buf-cap indicates the maximum
possible size of the de-packetization buffer of the receiver possible size of the de-packetization buffer of the receiver
only, without allowing for network jitter. only, without allowing for network jitter.
Encoding considerations:
This type is only defined for transfer via RTP (RFC 3550).
Security considerations:
See Section 9 of RFC XXXX.
Interoperability considerations: N/A
Published specification:
Please refer to RFC XXXX and its Section 13.
Applications that use this media type: N/A
Fragment identifier considerations: N/A
Additional information: N/A
Person & email address to contact for further information:
Stephan Wenger (stewe@stewe.org)
Intended usage: COMMON
Restrictions on usage: N/A
Author: See Authors' Addresses section of RFC XXXX.
Change controller:
IETF Audio/Video Transport Core Maintenance Working Group
delegated from the IESG.
7.2. SDP Parameters 7.2. SDP Parameters
The receiver MUST ignore any parameter unspecified in this memo. The receiver MUST ignore any parameter unspecified in this memo.
7.2.1. Mapping of Payload Type Parameters to SDP 7.2.1. Mapping of Payload Type Parameters to SDP
The media type video/H266 string is mapped to fields in the Session The media type video/H266 string is mapped to fields in the Session
Description Protocol (SDP) [RFC8866] as follows: Description Protocol (SDP) [RFC8866] as follows:
* The media name in the "m=" line of SDP MUST be video. * The media name in the "m=" line of SDP MUST be video.
skipping to change at page 46, line 17 skipping to change at page 46, line 50
* The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the
media subtype). media subtype).
* The clock rate in the "a=rtpmap" line MUST be 90000. * The clock rate in the "a=rtpmap" line MUST be 90000.
* The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id,
interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id,
recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max-
fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf- fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf-
cap, when present, MUST be included in the "a=fmtp" line of SDP. cap, when present, MUST be included in the "a=fmtp" line of SDP.
This parameter is expressed as a media type string, in the form of The fmtp line is expressed as a media type string, in the form of
a semicolon-separated list of parameter=value pairs. a semicolon-separated list of parameter=value pairs.
* The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei, * The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei,
and sprop-dci, when present, MUST be included in the "a=fmtp" line and sprop-dci, when present, MUST be included in the "a=fmtp" line
of SDP or conveyed using the "fmtp" source attribute as specified of SDP or conveyed using the "fmtp" source attribute as specified
in Section 6.3 of [RFC5576]. For a particular media format (i.e., in Section 6.3 of [RFC5576]. For a particular media format (i.e.,
RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or
sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP
and conveyed using the "fmtp" source attribute. When included in and conveyed using the "fmtp" source attribute. When included in
the "a=fmtp" line of SDP, those parameters are expressed as a the "a=fmtp" line of SDP, those parameters are expressed as a
media type string, in the form of a semicolon-separated list of media type string, in the form of a semicolon-separated list of
parameter=value pairs. When conveyed in the "a=fmtp" line of SDP parameter=value pairs. When conveyed in the "a=fmtp" line of SDP
for a particular payload type, the parameters sprop-vps, sprop- for a particular payload type, the parameters sprop-vps, sprop-
sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each
SSRC with the payload type. When conveyed using the "fmtp" source SSRC with the payload type. When conveyed using the "fmtp" source
attribute, these parameters are only associated with the given attribute, these parameters are only associated with the given
source and payload type as parts of the "fmtp" source attribute. source and payload type as parts of the "fmtp" source attribute.
An example of media representation in SDP is as follows: Informative note: Conveyance of sprop-vps, sprop-sps, and
sprop-pps using the "fmtp" source attribute allows for out-of-
band transport of parameter sets in topologies like Topo-Video-
switch-MCU as specified in [RFC7667]
An general usage of media representation in SDP is as follows:
m=video 49170 RTP/AVP 98 m=video 49170 RTP/AVP 98
a=rtpmap:98 H266/90000 a=rtpmap:98 H266/90000
a=fmtp:98 profile-id=1; a=fmtp:98 profile-id=1;
sprop-vps=<video parameter sets data>; sprop-vps=<video parameter sets data>;
sprop-sps=<sequence parameter set data>; sprop-sps=<sequence parameter set data>;
sprop-pps=<picture parameter set data>; sprop-pps=<picture parameter set data>;
A SIP Offer/Answer exchange wherein both parties are expected to both
send and receive could look like the following. Only the media
codec-specific parts of the SDP are shown. Some lines are wrapped
due to text constraints.
Offerer->Answerer:
m=video 49170 RTP/AVP 98
a=rtpmap:98 H266/90000
a=fmtp:98 profile-id=1; level_id=83;
The above represents an offer for symmetric video communication using
[VVC] and it's payload specification, at the main profile and level
5.1 (and, as the levels are downgradable, all lower levels.
Informally speaking, this offer tells the receiver of the offer that
the sender is willing to receive up to 4Kp60 resolution at the
maximum bitrates specified in [VVC]. At the same time, if this offer
were accepted "as is", the offer can expect that the answerer would
be able to receive and properly decode H.266 media up to and
including level 5.1.
Answerer->Offerer:
m=video 49170 RTP/AVP 98
a=rtpmap:98 H266/90000
a=fmtp:98 profile-id=1; level_id=67
With this answer to the offer above, the system receiving the offer
advises the offerer that it is incapable of handing H.266 at level
5.1 but is capable of decoding 1080p60. As H.266 video codecs must
support decoding at all levels below the maximum level they
implement, the resulting user experience would likely be that both
systems send video at 1080p60. However, nothing prevents an encoder
from further downgrading its sending to, for example 720p30 if it
were short of cycles, bandwidth, or for other reasons.
7.2.2. Usage with SDP Offer/Answer Model 7.2.2. Usage with SDP Offer/Answer Model
This section describes the negotiation of unicast messages using the This section describes the negotiation of unicast messages using the
offer-answer model as described in [RFC3264] and its updates. The offer-answer model as described in [RFC3264] and its updates. The
section is split into subsections, covering a) media format section is split into subsections, covering a) media format
configurations not involving non-temporal scalability; b) scalable configurations not involving non-temporal scalability; b) scalable
media format configurations; c) the description of the use of those media format configurations; c) the description of the use of those
parameters not involving the media configuration itself but rather parameters not involving the media configuration itself but rather
the parameters of the payload format design; and d) multicast. the parameters of the payload format design; and d) multicast.
skipping to change at page 54, line 23 skipping to change at page 56, line 23
acceptable for the sender to receive bitstreams. In order to achieve acceptable for the sender to receive bitstreams. In order to achieve
high interoperability levels, it is often advisable to offer multiple high interoperability levels, it is often advisable to offer multiple
alternative configurations. It is impossible to offer multiple alternative configurations. It is impossible to offer multiple
configurations in a single payload type. Thus, when multiple configurations in a single payload type. Thus, when multiple
configuration offers are made, each offer requires its own RTP configuration offers are made, each offer requires its own RTP
payload type associated with the offer. However, it is possible to payload type associated with the offer. However, it is possible to
offer multiple operation points using one configuration in a single offer multiple operation points using one configuration in a single
payload type by including sprop-vps in the offer and recv-ols-id in payload type by including sprop-vps in the offer and recv-ols-id in
the answer. the answer.
A receiver SHOULD understand all media type parameters, even if it An implementation SHOULD be able to understand all media type
only supports a subset of the payload format's functionality. This parameters (including all optional media type parameters), even if it
ensures that a receiver is capable of understanding when an offer to doesn't support the functionality related to the parameter. This, in
receive media can be downgraded to what is supported by the receiver conjunction with proper application logic in the implementation
of the offer. allows the implementation, after having received an offer, to create
an answer by potentially downgrading one or more of the optional
parameters to the point where the implementation can cope, leading to
higher chances of interoperability beyond the most basic interop
points (for which, as described above, no optional parameters are
necessary).
Informative note: in implementations of previous H.26x payload
formats it was occasionally observed that implementations were
incapable of parsing most (or all) of the optional parameters. As
a result, the offer-answer exchange resulted in a baseline
performance (using the default values for the optional parameters)
with the resulting suboptimal user experience. However, there are
valid reasons to forego the implementation complexity of
implementing the parsing of some or all of the optional
parameters, for example, when there is pre-determined knowledge,
not negotiated by an SDP-based offer/answer process, of the
capabilities of the involved systems (walled gardens, baseline
requirements defined in application standards higher up in the
stack, and similar).
An answerer MAY extend the offer with additional media format An answerer MAY extend the offer with additional media format
configurations. However, to enable their usage, in most cases a configurations. However, to enable their usage, in most cases a
second offer is required from the offerer to provide the bitstream second offer is required from the offerer to provide the bitstream
property parameters that the media sender will use. This also has property parameters that the media sender will use. This also has
the effect that the offerer has to be able to receive this media the effect that the offerer has to be able to receive this media
format configuration, not only to send it. format configuration, not only to send it.
7.2.2.4. Multicast 7.2.2.4. Multicast
skipping to change at page 58, line 8 skipping to change at page 60, line 29
related documents, especially those pertaining to RTP (see the related documents, especially those pertaining to RTP (see the
Security Considerations section in [RFC3550]), and the security of Security Considerations section in [RFC3550]), and the security of
the call-control stack chosen (that may make use of the media type the call-control stack chosen (that may make use of the media type
registration of this memo). Implementers should also consider known registration of this memo). Implementers should also consider known
security vulnerabilities of video coding and decoding implementations security vulnerabilities of video coding and decoding implementations
in general and avoid those. in general and avoid those.
Within this RTP payload format, and with the exception of the user Within this RTP payload format, and with the exception of the user
data SEI message as described below, no security threats other than data SEI message as described below, no security threats other than
those common to RTP payload formats are known. In other words, those common to RTP payload formats are known. In other words,
neither the various media-plane-based mechanisms, nor the signalling neither the various media-plane-based mechanisms, nor the signaling
part of this memo, seems to pose a security risk beyond those common part of this memo, seems to pose a security risk beyond those common
to all RTP-based systems. to all RTP-based systems.
RTP packets using the payload format defined in this specification RTP packets using the payload format defined in this specification
are subject to the security considerations discussed in the RTP are subject to the security considerations discussed in the RTP
specification [RFC3550], and in any applicable RTP profile such as specification [RFC3550], and in any applicable RTP profile such as
RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/
SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP
Does Not Mandate a Single Media Security Solution" [RFC7202] Does Not Mandate a Single Media Security Solution" [RFC7202]
discusses, it is not an RTP payload format's responsibility to discusses, it is not an RTP payload format's responsibility to
skipping to change at page 59, line 23 skipping to change at page 61, line 43
10. Congestion Control 10. Congestion Control
Congestion control for RTP SHALL be used in accordance with RTP Congestion control for RTP SHALL be used in accordance with RTP
[RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551].
If best-effort service is being used, an additional requirement is If best-effort service is being used, an additional requirement is
that users of this payload format MUST monitor packet loss to ensure that users of this payload format MUST monitor packet loss to ensure
that the packet loss rate is within an acceptable range. Packet loss that the packet loss rate is within an acceptable range. Packet loss
is considered acceptable if a TCP flow across the same network path, is considered acceptable if a TCP flow across the same network path,
and experiencing the same network conditions, would achieve an and experiencing the same network conditions, would achieve an
average throughput, measured on a reasonable timescale, that is not average throughput, measured on a reasonable timescale, that is not
less than all RTP streams combined are achieving. This condition can less than all RTP streams combined are achieved. This condition can
be satisfied by implementing congestion-control mechanisms to adapt be satisfied by implementing congestion-control mechanisms to adapt
the transmission rate, the number of layers subscribed for a layered the transmission rate, the number of layers subscribed for a layered
multicast session, or by arranging for a receiver to leave the multicast session, or by arranging for a receiver to leave the
session if the loss rate is unacceptably high. session if the loss rate is unacceptably high.
The bitrate adaptation necessary for obeying the congestion control The bitrate adaptation necessary for obeying the congestion control
principle is easily achievable when real-time encoding is used, for principle is easily achievable when real-time encoding is used, for
example, by adequately tuning the quantization parameter. However, example, by adequately tuning the quantization parameter. However,
when pre-encoded content is being transmitted, bandwidth adaptation when pre-encoded content is being transmitted, bandwidth adaptation
requires the pre-coded bitstream to be tailored for such adaptivity. requires the pre-coded bitstream to be tailored for such adaptivity.
skipping to change at page 63, line 16 skipping to change at page 65, line 38
Framework: Why RTP Does Not Mandate a Single Media Framework: Why RTP Does Not Mandate a Single Media
Security Solution", RFC 7202, DOI 10.17487/RFC7202, April Security Solution", RFC 7202, DOI 10.17487/RFC7202, April
2014, <https://www.rfc-editor.org/info/rfc7202>. 2014, <https://www.rfc-editor.org/info/rfc7202>.
[RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms
for Real-Time Transport Protocol (RTP) Sources", RFC 7656, for Real-Time Transport Protocol (RTP) Sources", RFC 7656,
DOI 10.17487/RFC7656, November 2015, DOI 10.17487/RFC7656, November 2015,
<https://www.rfc-editor.org/info/rfc7656>. <https://www.rfc-editor.org/info/rfc7656>.
[RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667,
DOI 10.17487/RFC7667, November 2015,
<https://www.rfc-editor.org/info/rfc7667>.
[RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M.
M. Hannuksela, "RTP Payload Format for High Efficiency M. Hannuksela, "RTP Payload Format for High Efficiency
Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798,
March 2016, <https://www.rfc-editor.org/info/rfc7798>. March 2016, <https://www.rfc-editor.org/info/rfc7798>.
[RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M.,
and M. Stiemerling, Ed., "Real-Time Streaming Protocol and M. Stiemerling, Ed., "Real-Time Streaming Protocol
Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December
2016, <https://www.rfc-editor.org/info/rfc7826>. 2016, <https://www.rfc-editor.org/info/rfc7826>.
Appendix A. Change History Appendix A. Change History
To RFC Editor: PLEASE REMOVE ThIS SECTION BEFORE PUBLICATION
draft-zhao-payload-rtp-vvc-00 ........ initial version draft-zhao-payload-rtp-vvc-00 ........ initial version
draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and
corrections corrections
draft-ietf-payload-rtp-vvc-00 ........ initial WG draft draft-ietf-payload-rtp-vvc-00 ........ initial WG draft
draft-ietf-payload-rtp-vvc-01 ........ VVC specification update draft-ietf-payload-rtp-vvc-01 ........ VVC specification update
draft-ietf-payload-rtp-vvc-02 ........ VVC specification update draft-ietf-payload-rtp-vvc-02 ........ VVC specification update
 End of changes. 36 change blocks. 
70 lines changed or deleted 166 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/