| < draft-ietf-avtcore-rtp-vvc-14.txt | draft-ietf-avtcore-rtp-vvc-15.txt > | |||
|---|---|---|---|---|
| avtcore S. Zhao | avtcore S. Zhao | |||
| Internet-Draft S. Wenger | Internet-Draft S. Wenger | |||
| Intended status: Standards Track Tencent | Intended status: Standards Track Tencent | |||
| Expires: 29 August 2022 Y. Sanchez | Expires: 24 October 2022 Y. Sanchez | |||
| Fraunhofer HHI | Fraunhofer HHI | |||
| Y. Wang | Y. Wang | |||
| Bytedance Inc. | Bytedance Inc. | |||
| M. M Hannuksela | M. M Hannuksela | |||
| Nokia Technologies | Nokia Technologies | |||
| 25 February 2022 | 22 April 2022 | |||
| RTP Payload Format for Versatile Video Coding (VVC) | RTP Payload Format for Versatile Video Coding (VVC) | |||
| draft-ietf-avtcore-rtp-vvc-14 | draft-ietf-avtcore-rtp-vvc-15 | |||
| Abstract | Abstract | |||
| This memo describes an RTP payload format for the video coding | This memo describes an RTP payload format for the video coding | |||
| standard ITU-T Recommendation H.266 and ISO/IEC International | standard ITU-T Recommendation H.266 and ISO/IEC International | |||
| Standard 23090-3, both also known as Versatile Video Coding (VVC) and | Standard 23090-3, both also known as Versatile Video Coding (VVC) and | |||
| developed by the Joint Video Experts Team (JVET). The RTP payload | developed by the Joint Video Experts Team (JVET). The RTP payload | |||
| format allows for packetization of one or more Network Abstraction | format allows for packetization of one or more Network Abstraction | |||
| Layer (NAL) units in each RTP packet payload as well as fragmentation | Layer (NAL) units in each RTP packet payload as well as fragmentation | |||
| of a NAL unit into multiple RTP packets. The payload format has wide | of a NAL unit into multiple RTP packets. The payload format has wide | |||
| skipping to change at page 1, line 44 ¶ | skipping to change at page 1, line 44 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 29 August 2022. | This Internet-Draft will expire on 24 October 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2022 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| skipping to change at page 2, line 41 ¶ | skipping to change at page 2, line 41 ¶ | |||
| 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22 | 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 22 | |||
| 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22 | 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23 | 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 23 | |||
| 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23 | 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 23 | |||
| 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 27 | 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 27 | |||
| 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 30 | 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 30 | |||
| 5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 31 | 5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 31 | |||
| 6. De-packetization Process . . . . . . . . . . . . . . . . . . 32 | 6. De-packetization Process . . . . . . . . . . . . . . . . . . 32 | |||
| 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 34 | 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 34 | |||
| 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 34 | 7.1. Media Type Registration . . . . . . . . . . . . . . . . . 34 | |||
| 7.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 45 | 7.2. SDP Parameters . . . . . . . . . . . . . . . . . . . . . 46 | |||
| 7.2.1. Mapping of Payload Type Parameters to SDP . . . . . . 45 | 7.2.1. Mapping of Payload Type Parameters to SDP . . . . . . 46 | |||
| 7.2.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 46 | 7.2.2. Usage with SDP Offer/Answer Model . . . . . . . . . . 48 | |||
| 7.2.3. Usage in Declarative Session Descriptions . . . . . . 55 | 7.2.3. Usage in Declarative Session Descriptions . . . . . . 57 | |||
| 7.2.4. Considerations for Parameter Sets . . . . . . . . . . 56 | 7.2.4. Considerations for Parameter Sets . . . . . . . . . . 59 | |||
| 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 56 | 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 59 | |||
| 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 57 | 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 59 | |||
| 8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 57 | 8.2. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 59 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 57 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 60 | |||
| 10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 59 | 10. Congestion Control . . . . . . . . . . . . . . . . . . . . . 61 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 60 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 62 | |||
| 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 | 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 62 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 62 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . 60 | 13.1. Normative References . . . . . . . . . . . . . . . . . . 62 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . 62 | 13.2. Informative References . . . . . . . . . . . . . . . . . 64 | |||
| Appendix A. Change History . . . . . . . . . . . . . . . . . . . 63 | Appendix A. Change History . . . . . . . . . . . . . . . . . . . 66 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 64 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 66 | |||
| 1. Introduction | 1. Introduction | |||
| The Versatile Video Coding specification, formally published as both | The Versatile Video Coding specification was formally published as | |||
| ITU-T Recommendation H.266 [VVC] and ISO/IEC International Standard | both ITU-T Recommendation H.266 [VVC] and ISO/IEC International | |||
| 23090-3 [ISO23090-3]. VVC is reported to provide significant coding | Standard 23090-3 [ISO23090-3]. VVC is reported to provide | |||
| efficiency gains over High Efficiency Video Coding [HEVC], also known | significant coding efficiency gains over High Efficiency Video Coding | |||
| as H.265, and other earlier video codecs. | [HEVC], also known as H.265, and other earlier video codecs. | |||
| This memo specifies an RTP payload format for VVC. It shares its | This memo specifies an RTP payload format for VVC. It shares its | |||
| basic design with the NAL (Network Abstraction Layer) unit based RTP | basic design with the NAL (Network Abstraction Layer) unit based RTP | |||
| payload formats of AVC Video Coding [RFC6184], Scalable Video Coding | payload formats of AVC Video Coding [RFC6184], Scalable Video Coding | |||
| (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and | (SVC) [RFC6190], High Efficiency Video Coding (HEVC) [RFC7798] and | |||
| their respective predecessors. With respect to design philosophy, | their respective predecessors. With respect to design philosophy, | |||
| security, congestion control, and overall implementation complexity, | security, congestion control, and overall implementation complexity, | |||
| it has similar properties to those earlier payload format | it has similar properties to those earlier payload format | |||
| specifications. This is a conscious choice, as at least RFC 6184 is | specifications. This is a conscious choice, as at least RFC 6184 is | |||
| widely deployed and generally known in the relevant implementer | widely deployed and generally known in the relevant implementer | |||
| skipping to change at page 4, line 23 ¶ | skipping to change at page 4, line 23 ¶ | |||
| Finally, VVC includes temporal, spatial, and SNR scalability as well | Finally, VVC includes temporal, spatial, and SNR scalability as well | |||
| as multiview coding support. | as multiview coding support. | |||
| Coding blocks and transform structure | Coding blocks and transform structure | |||
| Among major coding-tool differences between HEVC and VVC, one of the | Among major coding-tool differences between HEVC and VVC, one of the | |||
| important improvements is the more flexible coding tree structure in | important improvements is the more flexible coding tree structure in | |||
| VVC, i.e., multi-type tree. In addition to quadtree, binary and | VVC, i.e., multi-type tree. In addition to quadtree, binary and | |||
| ternary trees are also supported, which contributes significant | ternary trees are also supported, which contributes significant | |||
| improvement in coding efficiency. Moreover, the maximum size of | improvement in coding efficiency. Moreover, the maximum size of a | |||
| coding tree unit (CTU) is increased from 64x64 to 128x128. To | coding tree unit (CTU) is increased from 64x64 to 128x128. To | |||
| improve the coding efficiency of chroma signal, luma chroma separated | improve the coding efficiency of chroma signal, luma chroma separated | |||
| trees at CTU level may be employed for intra-slices. The square | trees at CTU level may be employed for intra-slices. The square | |||
| transforms in HEVC are extended to non-square transforms for | transforms in HEVC are extended to non-square transforms for | |||
| rectangular blocks resulting from binary and ternary tree splits. | rectangular blocks resulting from binary and ternary tree splits. | |||
| Besides, VVC supports multiple transform sets (MTS), including DCT-2, | Besides, VVC supports multiple transform sets (MTS), including DCT-2, | |||
| DST-7, and DCT-8 as well as the non-separable secondary transform. | DST-7, and DCT-8 as well as the non-separable secondary transform. | |||
| The transforms used in VVC can have different sizes with support for | The transforms used in VVC can have different sizes with support for | |||
| larger transform sizes. For DCT-2, the transform sizes range from | larger transform sizes. For DCT-2, the transform sizes range from | |||
| 2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from | 2x2 to 64x64, and for DST-7 and DCT-8, the transform sizes range from | |||
| skipping to change at page 5, line 20 ¶ | skipping to change at page 5, line 20 ¶ | |||
| loop filter (ALF) may be used. As a Wiener filter, ALF reduces | loop filter (ALF) may be used. As a Wiener filter, ALF reduces | |||
| distortion of decoded pictures. Besides, VVC introduces a new module | distortion of decoded pictures. Besides, VVC introduces a new module | |||
| called luma mapping with chroma scaling to fully utilize the dynamic | called luma mapping with chroma scaling to fully utilize the dynamic | |||
| range of signal so that rate-distortion performance of both Standard | range of signal so that rate-distortion performance of both Standard | |||
| Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved. | Dynamic Range (SDR) and High Dynamic Range (HDR) content is improved. | |||
| Motion prediction and coding | Motion prediction and coding | |||
| Compared to HEVC, VVC introduces several improvements in this area. | Compared to HEVC, VVC introduces several improvements in this area. | |||
| First, there is the adaptive motion vector resolution (AMVR), which | First, there is the adaptive motion vector resolution (AMVR), which | |||
| can save bit cost for motion vectors by adaptively signalling motion | can save bit cost for motion vectors by adaptively signaling motion | |||
| vector resolution. Then the affine motion compensation is included | vector resolution. Then the affine motion compensation is included | |||
| to capture complicated motion like zooming and rotation. Meanwhile, | to capture complicated motion like zooming and rotation. Meanwhile, | |||
| prediction refinement with the optical flow with affine mode (PROF) | prediction refinement with the optical flow with affine mode (PROF) | |||
| is further deployed to mimic affine motion at the pixel level. | is further deployed to mimic affine motion at the pixel level. | |||
| Thirdly the decoder side motion vector refinement (DMVR) is a method | Thirdly the decoder side motion vector refinement (DMVR) is a method | |||
| to derive MV vector at decoder side based on block matching so that | to derive MV vector at decoder side based on block matching so that | |||
| fewer bits may be spent on motion vectors. Bi-directional optical | fewer bits may be spent on motion vectors. Bi-directional optical | |||
| flow (BDOF) is a similar method to PROF. BDOF adds a sample wise | flow (BDOF) is a similar method to PROF. BDOF adds a sample wise | |||
| offset at 4x4 sub-block level that is derived with equations based on | offset at 4x4 sub-block level that is derived with equations based on | |||
| gradients of the prediction samples and a motion difference relative | gradients of the prediction samples and a motion difference relative | |||
| skipping to change at page 7, line 6 ¶ | skipping to change at page 7, line 6 ¶ | |||
| The decoding capability information includes parameters that stay | The decoding capability information includes parameters that stay | |||
| constant for the lifetime of a VVC bitstream, which in IETF terms can | constant for the lifetime of a VVC bitstream, which in IETF terms can | |||
| translate to a session. Such information includes profile, level, | translate to a session. Such information includes profile, level, | |||
| and sub-profile information to determine a maximum capability interop | and sub-profile information to determine a maximum capability interop | |||
| point that is guaranteed to be never exceeded, even if splicing of | point that is guaranteed to be never exceeded, even if splicing of | |||
| video sequences occurs within a session. It further includes | video sequences occurs within a session. It further includes | |||
| constraint fields (most of which are flags), which can optionally be | constraint fields (most of which are flags), which can optionally be | |||
| set to indicate that the video bitstream will be constrained in the | set to indicate that the video bitstream will be constrained in the | |||
| use of certain features as indicated by the values of those fields. | use of certain features as indicated by the values of those fields. | |||
| With this, a bitstream can be labelled as not using certain tools, | With this, a bitstream can be labeled as not using certain tools, | |||
| which allows among other things for resource allocation in a decoder | which allows among other things for resource allocation in a decoder | |||
| implementation. | implementation. | |||
| Video parameter set | Video parameter set | |||
| The video parameter set (VPS) pertains to one or more coded video | The video parameter set (VPS) pertains to one or more coded video | |||
| sequences (CVSs) of multiple layers covering the same range of access | sequences (CVSs) of multiple layers covering the same range of access | |||
| units, and includes, among other information, decoding dependency | units, and includes, among other information, decoding dependency | |||
| expressed as information for reference picture list construction of | expressed as information for reference picture list construction of | |||
| enhancement layers. The VPS provides a "big picture" of a scalable | enhancement layers. The VPS provides a "big picture" of a scalable | |||
| skipping to change at page 7, line 29 ¶ | skipping to change at page 7, line 29 ¶ | |||
| high-level properties of the bitstream that can be used as the basis | high-level properties of the bitstream that can be used as the basis | |||
| for session negotiation and content selection, etc. One VPS may be | for session negotiation and content selection, etc. One VPS may be | |||
| referenced by one or more sequence parameter sets. | referenced by one or more sequence parameter sets. | |||
| Sequence parameter set | Sequence parameter set | |||
| The sequence parameter set (SPS) contains syntax elements pertaining | The sequence parameter set (SPS) contains syntax elements pertaining | |||
| to a coded layer video sequence (CLVS), which is a group of pictures | to a coded layer video sequence (CLVS), which is a group of pictures | |||
| belonging to the same layer, starting with a random access point, and | belonging to the same layer, starting with a random access point, and | |||
| followed by pictures that may depend on each other, until the next | followed by pictures that may depend on each other, until the next | |||
| random access point picture. In MPGEG-2, the equivalent of a CVS was | random access point picture. In MPEG-2, the equivalent of a CVS was | |||
| a group of pictures (GOP), which normally started with an I frame and | a group of pictures (GOP), which normally started with an I frame and | |||
| was followed by P and B frames. While more complex in its options of | was followed by P and B frames. While more complex in its options of | |||
| random access points, VVC retains this basic concept. One remarkable | random access points, VVC retains this basic concept. One remarkable | |||
| difference of VVC is that a CLVS may start with a Gradual Decoding | difference of VVC is that a CLVS may start with a Gradual Decoding | |||
| Refresh (GDR) picture, without requiring presence of traditional | Refresh (GDR) picture, without requiring presence of traditional | |||
| random access points in the bitstream, such as instantaneous decoding | random access points in the bitstream, such as instantaneous decoding | |||
| refresh (IDR) or clean random access (CRA) pictures. In many TV-like | refresh (IDR) or clean random access (CRA) pictures. In many TV-like | |||
| applications, a CVS contains a few hundred milliseconds to a few | applications, a CVS contains a few hundred milliseconds to a few | |||
| seconds of video. In video conferencing (without switching MCUs | seconds of video. In video conferencing (without switching MCUs | |||
| involved), a CVS can be as long in duration as the whole session. | involved), a CVS can be as long in duration as the whole session. | |||
| skipping to change at page 8, line 45 ¶ | skipping to change at page 8, line 45 ¶ | |||
| according to ITU-T Rec. T.35, that does not carry a semantics. It is | according to ITU-T Rec. T.35, that does not carry a semantics. It is | |||
| carried in the profile_tier_level structure and hence (potentially) | carried in the profile_tier_level structure and hence (potentially) | |||
| present in the DCI, VPS, and SPS. External registration bodies can | present in the DCI, VPS, and SPS. External registration bodies can | |||
| register a T.35 codepoint with ITU-T registration authorities and | register a T.35 codepoint with ITU-T registration authorities and | |||
| associate with their registration a description of bitstream | associate with their registration a description of bitstream | |||
| restrictions beyond the profiles defined by ITU-T and ISO/IEC. This | restrictions beyond the profiles defined by ITU-T and ISO/IEC. This | |||
| would allow encoder manufacturers to label the bitstreams generated | would allow encoder manufacturers to label the bitstreams generated | |||
| by their encoder as complying with such sub-profile. It is expected | by their encoder as complying with such sub-profile. It is expected | |||
| that upstream standardization organizations (such as: DVB and ATSC), | that upstream standardization organizations (such as: DVB and ATSC), | |||
| as well as walled-garden video services will take advantage of this | as well as walled-garden video services will take advantage of this | |||
| labelling system. In contrast to "normal" profiles, it is expected | labeled system. In contrast to "normal" profiles, it is expected | |||
| that sub-profiles may indicate encoder choices traditionally left | that sub-profiles may indicate encoder choices traditionally left | |||
| open in the (decoder-centric) video coding specs, such as GOP | open in the (decoder-centric) video coding specs, such as GOP | |||
| structures, minimum/maximum QP values, and the mandatory use of | structures, minimum/maximum QP values, and the mandatory use of | |||
| certain tools or SEI messages. | certain tools or SEI messages. | |||
| General constraint fields | General constraint fields | |||
| The profile_tier_level structure carries a considerable number of | The profile_tier_level structure carries a considerable number of | |||
| constraint fields (most of which are flags), which an encoder can use | constraint fields (most of which are flags), which an encoder can use | |||
| to indicate to a decoder that it will not use a certain tool or | to indicate to a decoder that it will not use a certain tool or | |||
| technology. They were included in reaction to a perceived market | technology. They were included in reaction to a perceived market | |||
| need for labelling a bitstream as not exercising a certain tool that | need for labeled a bitstream as not exercising a certain tool that | |||
| has become commercially unviable. | has become commercially unviable. | |||
| Temporal scalability support | Temporal scalability support | |||
| VVC includes support of temporal scalability, by inclusion of the | VVC includes support of temporal scalability, by inclusion of the | |||
| signalling of TemporalId in the NAL unit header, the restriction that | signaling of TemporalId in the NAL unit header, the restriction that | |||
| pictures of a particular temporal sublayer cannot be used for inter | pictures of a particular temporal sublayer cannot be used for inter | |||
| prediction reference by pictures of a lower temporal sublayer, the | prediction reference by pictures of a lower temporal sublayer, the | |||
| sub-bitstream extraction process, and the requirement that each sub- | sub-bitstream extraction process, and the requirement that each sub- | |||
| bitstream extraction output be a conforming bitstream. Media-Aware | bitstream extraction output be a conforming bitstream. Media-Aware | |||
| Network Elements (MANEs) can utilize the TemporalId in the NAL unit | Network Elements (MANEs) can utilize the TemporalId in the NAL unit | |||
| header for stream adaptation purposes based on temporal scalability. | header for stream adaptation purposes based on temporal scalability. | |||
| Reference picture resampling (RPR) | Reference picture resampling (RPR) | |||
| In AVC and HEVC, the spatial resolution of pictures cannot change | In AVC and HEVC, the spatial resolution of pictures cannot change | |||
| skipping to change at page 9, line 50 ¶ | skipping to change at page 9, line 50 ¶ | |||
| video region or some region of interest is needed. | video region or some region of interest is needed. | |||
| Spatial, SNR, and multiview scalability | Spatial, SNR, and multiview scalability | |||
| VVC includes support for spatial, SNR, and multiview scalability. | VVC includes support for spatial, SNR, and multiview scalability. | |||
| Scalable video coding is widely considered to have technical benefits | Scalable video coding is widely considered to have technical benefits | |||
| and enrich services for various video applications. Until recently, | and enrich services for various video applications. Until recently, | |||
| however, the functionality has not been included in the first version | however, the functionality has not been included in the first version | |||
| of specifications of the video codecs. In VVC, however, all those | of specifications of the video codecs. In VVC, however, all those | |||
| forms of scalability are supported in the first version of VVC | forms of scalability are supported in the first version of VVC | |||
| natively through the signalling of the nuh_layer_id in the NAL unit | natively through the signaling of the nuh_layer_id in the NAL unit | |||
| header, the VPS which associates layers with given nuh_layer_id to | header, the VPS which associates layers with given nuh_layer_id to | |||
| each other, reference picture selection, reference picture resampling | each other, reference picture selection, reference picture resampling | |||
| for spatial scalability, and a number of other mechanisms not | for spatial scalability, and a number of other mechanisms not | |||
| relevant for this memo. | relevant for this memo. | |||
| Spatial scalability | Spatial scalability | |||
| With the existence of Reference Picture Resampling (RPR), the | With the existence of Reference Picture Resampling (RPR), the | |||
| additional burden for scalability support is just a | additional burden for scalability support is just a | |||
| modification of the high-level syntax (HLS). The inter-layer | modification of the high-level syntax (HLS). The inter-layer | |||
| skipping to change at page 11, line 21 ¶ | skipping to change at page 11, line 21 ¶ | |||
| subpictures as a feature, which provides the same functionality as | subpictures as a feature, which provides the same functionality as | |||
| HEVC motion-constrained tile sets (MCTSs) but designed differently to | HEVC motion-constrained tile sets (MCTSs) but designed differently to | |||
| have better coding efficiency and to be friendlier for usage in | have better coding efficiency and to be friendlier for usage in | |||
| application systems. More details of these differences are described | application systems. More details of these differences are described | |||
| below. | below. | |||
| Tiles and WPP | Tiles and WPP | |||
| Same as in HEVC, a picture can be split into tile rows and tile | Same as in HEVC, a picture can be split into tile rows and tile | |||
| columns in VVC, in-picture prediction across tile boundaries is | columns in VVC, in-picture prediction across tile boundaries is | |||
| disallowed, etc. However, the syntax for signalling of tile | disallowed, etc. However, the syntax for signaling of tile | |||
| partitioning has been simplified, by using a unified syntax design | partitioning has been simplified, by using a unified syntax design | |||
| for both the uniform and the non-uniform mode. In addition, | for both the uniform and the non-uniform mode. In addition, | |||
| signalling of entry point offsets for tiles in the slice header is | signaling of entry point offsets for tiles in the slice header is | |||
| optional in VVC while it is mandatory in HEVC. The WPP design in VVC | optional in VVC while it is mandatory in HEVC. The WPP design in VVC | |||
| has two differences compared to HEVC: i) The CTU row delay is reduced | has two differences compared to HEVC: i) The CTU row delay is reduced | |||
| from two CTUs to one CTU; ii) signalling of entry point offsets for | from two CTUs to one CTU; ii) signaling of entry point offsets for | |||
| WPP in the slice header is optional in VVC while it is mandatory in | WPP in the slice header is optional in VVC while it is mandatory in | |||
| HEVC. | HEVC. | |||
| Slices | Slices | |||
| In VVC, the conventional slices based on CTUs (as in HEVC) or | In VVC, the conventional slices based on CTUs (as in HEVC) or | |||
| macroblocks (as in AVC) have been removed. The main reasoning behind | macroblocks (as in AVC) have been removed. The main reasoning behind | |||
| this architectural change is as follows. The advances in video | this architectural change is as follows. The advances in video | |||
| coding since 2003 (the publication year of AVC v1) have been such | coding since 2003 (the publication year of AVC v1) have been such | |||
| that slice-based error concealment has become practically impossible, | that slice-based error concealment has become practically impossible, | |||
| skipping to change at page 18, line 47 ¶ | skipping to change at page 18, line 47 ¶ | |||
| Specification. | Specification. | |||
| 3.1.2. Definitions Specific to This Memo | 3.1.2. Definitions Specific to This Memo | |||
| Media-Aware Network Element (MANE): A network element, such as a | Media-Aware Network Element (MANE): A network element, such as a | |||
| middlebox, selective forwarding unit, or application-layer gateway | middlebox, selective forwarding unit, or application-layer gateway | |||
| that is capable of parsing certain aspects of the RTP payload headers | that is capable of parsing certain aspects of the RTP payload headers | |||
| or the RTP payload and reacting to their contents. | or the RTP payload and reacting to their contents. | |||
| Informative note: The concept of a MANE goes beyond normal routers | Informative note: The concept of a MANE goes beyond normal routers | |||
| or gateways in that a MANE has to be aware of the signalling | or gateways in that a MANE has to be aware of the signaling (e.g., | |||
| (e.g., to learn about the payload type mappings of the media | to learn about the payload type mappings of the media streams), | |||
| streams), and in that it has to be trusted when working with | and in that it has to be trusted when working with Secure RTP | |||
| Secure RTP (SRTP). The advantage of using MANEs is that they | (SRTP). The advantage of using MANEs is that they allow packets | |||
| allow packets to be dropped according to the needs of the media | to be dropped according to the needs of the media coding. For | |||
| coding. For example, if a MANE has to drop packets due to | example, if a MANE has to drop packets due to congestion on a | |||
| congestion on a certain link, it can identify and remove those | certain link, it can identify and remove those packets whose | |||
| packets whose elimination produces the least adverse effect on the | elimination produces the least adverse effect on the user | |||
| user experience. After dropping packets, MANEs must rewrite RTCP | experience. After dropping packets, MANEs must rewrite RTCP | |||
| packets to match the changes to the RTP stream, as specified in | packets to match the changes to the RTP stream, as specified in | |||
| Section 7 of [RFC3550]. | Section 7 of [RFC3550]. | |||
| NAL unit decoding order: A NAL unit order that conforms to the | NAL unit decoding order: A NAL unit order that conforms to the | |||
| constraints on NAL unit order given in Section 7.4.2.4 in [VVC], | constraints on NAL unit order given in Section 7.4.2.4 in [VVC], | |||
| follow the Order of NAL units in the bitstream. | follow the Order of NAL units in the bitstream. | |||
| RTP stream (See [RFC7656]): Within the scope of this memo, one RTP | RTP stream (See [RFC7656]): Within the scope of this memo, one RTP | |||
| stream is utilized to transport a VVC bitstream, which may contain | stream is utilized to transport a VVC bitstream, which may contain | |||
| one or more layers, and each layer may contain one or more temporal | one or more layers, and each layer may contain one or more temporal | |||
| skipping to change at page 20, line 20 ¶ | skipping to change at page 20, line 20 ¶ | |||
| NAL Network Abstraction Layer | NAL Network Abstraction Layer | |||
| NALU Network Abstraction Layer Unit | NALU Network Abstraction Layer Unit | |||
| OLS Output Layer Set | OLS Output Layer Set | |||
| PLI Picture Loss Indication | PLI Picture Loss Indication | |||
| PPS Picture Parameter Set | PPS Picture Parameter Set | |||
| RPS Reference Picture Set | ||||
| RPSI Reference Picture Selection Indication | RPSI Reference Picture Selection Indication | |||
| SEI Supplemental Enhancement Information | SEI Supplemental Enhancement Information | |||
| SLI Slice Loss Indication | SLI Slice Loss Indication | |||
| SPS Sequence Parameter Set | SPS Sequence Parameter Set | |||
| VCL Video Coding Layer | VCL Video Coding Layer | |||
| skipping to change at page 31, line 51 ¶ | skipping to change at page 31, line 51 ¶ | |||
| The following packetization rules apply: | The following packetization rules apply: | |||
| * If sprop-max-don-diff is greater than 0, the transmission order of | * If sprop-max-don-diff is greater than 0, the transmission order of | |||
| NAL units carried in the RTP stream MAY be different than the NAL | NAL units carried in the RTP stream MAY be different than the NAL | |||
| unit decoding order. Otherwise (sprop-max-don-diff is equal to | unit decoding order. Otherwise (sprop-max-don-diff is equal to | |||
| 0), the transmission order of NAL units carried in the RTP stream | 0), the transmission order of NAL units carried in the RTP stream | |||
| MUST be the same as the NAL unit decoding order. | MUST be the same as the NAL unit decoding order. | |||
| * A NAL unit of a small size SHOULD be encapsulated in an | * A NAL unit of a small size SHOULD be encapsulated in an | |||
| aggregation packet together one or more other NAL units in order | aggregation packet together with one or more other NAL units in | |||
| to avoid the unnecessary packetization overhead for small NAL | order to avoid the unnecessary packetization overhead for small | |||
| units. For example, non-VCL NAL units such as access unit | NAL units. For example, non-VCL NAL units such as access unit | |||
| delimiters, parameter sets, or SEI NAL units are typically small | delimiters, parameter sets, or SEI NAL units are typically small | |||
| and can often be aggregated with VCL NAL units without violating | and can often be aggregated with VCL NAL units without violating | |||
| MTU size constraints. | MTU size constraints. | |||
| * Each non-VCL NAL unit SHOULD, when possible from an MTU size match | * Each non-VCL NAL unit SHOULD, when possible from an MTU size match | |||
| viewpoint, be encapsulated in an aggregation packet together with | viewpoint, be encapsulated in an aggregation packet together with | |||
| its associated VCL NAL unit, as typically a non-VCL NAL unit would | its associated VCL NAL unit, as typically a non-VCL NAL unit would | |||
| be meaningless without the associated VCL NAL unit being | be meaningless without the associated VCL NAL unit being | |||
| available. | available. | |||
| skipping to change at page 32, line 34 ¶ | skipping to change at page 32, line 34 ¶ | |||
| The de-packetization process is implementation dependent. Therefore, | The de-packetization process is implementation dependent. Therefore, | |||
| the following description should be seen as an example of a suitable | the following description should be seen as an example of a suitable | |||
| implementation. Other schemes may be used as well, as long as the | implementation. Other schemes may be used as well, as long as the | |||
| output for the same input is the same as the process described below. | output for the same input is the same as the process described below. | |||
| The output is the same when the set of output NAL units and their | The output is the same when the set of output NAL units and their | |||
| order are both identical. Optimizations relative to the described | order are both identical. Optimizations relative to the described | |||
| algorithms are possible. | algorithms are possible. | |||
| All normal RTP mechanisms related to buffer management apply. In | All normal RTP mechanisms related to buffer management apply. In | |||
| particular, duplicated or outdated RTP packets (as indicated by the | particular, duplicated or outdated RTP packets (as indicated by the | |||
| RTP sequences number and the RTP timestamp) are removed. To | RTP sequence number and the RTP timestamp) are removed. To determine | |||
| determine the exact time for decoding, factors such as a possible | the exact time for decoding, factors such as a possible intentional | |||
| intentional delay to allow for proper inter-stream synchronization | delay to allow for proper inter-stream synchronization MUST be | |||
| MUST be factored in. | factored in. | |||
| NAL units with NAL unit type values in the range of 0 to 27, | NAL units with NAL unit type values in the range of 0 to 27, | |||
| inclusive, may be passed to the decoder. NAL-unit-like structures | inclusive, may be passed to the decoder. NAL-unit-like structures | |||
| with NAL unit type values in the range of 28 to 31, inclusive, MUST | with NAL unit type values in the range of 28 to 31, inclusive, MUST | |||
| NOT be passed to the decoder. | NOT be passed to the decoder. | |||
| The receiver includes a receiver buffer, which is used to compensate | The receiver includes a receiver buffer, which is used to compensate | |||
| for transmission delay jitter within individual RTP stream, and to | for transmission delay jitter within individual RTP stream, and to | |||
| reorder NAL units from transmission order to the NAL unit decoding | reorder NAL units from transmission order to the NAL unit decoding | |||
| order. In this section, the receiver operation is described under | order. In this section, the receiver operation is described under | |||
| skipping to change at page 34, line 38 ¶ | skipping to change at page 34, line 38 ¶ | |||
| provided for applications that use SDP. | provided for applications that use SDP. | |||
| 7.1. Media Type Registration | 7.1. Media Type Registration | |||
| The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
| Type name: video | Type name: video | |||
| Subtype name: H266 | Subtype name: H266 | |||
| Required parameters: none | Required parameters: N/A | |||
| Optional parameters: | Optional parameters: | |||
| profile-id, tier-flag, sub-profile-id, interop-constraints, and | profile-id, tier-flag, sub-profile-id, interop-constraints, and | |||
| level-id: | level-id: | |||
| These parameters indicate the profile, tier, default level, | These parameters indicate the profile, tier, default level, | |||
| sub-profile, and some constraints of the bitstream carried by | sub-profile, and some constraints of the bitstream carried by | |||
| the RTP stream, or a specific set of the profile, tier, default | the RTP stream, or a specific set of the profile, tier, default | |||
| level, sub-profile and some constraints the receiver supports. | level, sub-profile and some constraints the receiver supports. | |||
| skipping to change at page 35, line 16 ¶ | skipping to change at page 35, line 16 ¶ | |||
| the bitstream or that the receiver supports, as well as some | the bitstream or that the receiver supports, as well as some | |||
| additional constraints are indicated collectively by profile- | additional constraints are indicated collectively by profile- | |||
| id, sub-profile-id, and interop-constraints. | id, sub-profile-id, and interop-constraints. | |||
| Informative note: There are 128 values of profile-id. The | Informative note: There are 128 values of profile-id. The | |||
| subset of coding tools identified by the profile-id can be | subset of coding tools identified by the profile-id can be | |||
| further constrained with up to 255 instances of sub-profile- | further constrained with up to 255 instances of sub-profile- | |||
| id. In addition, 68 bits included in interop-constraints, | id. In addition, 68 bits included in interop-constraints, | |||
| which can be extended up to 324 bits provide means to | which can be extended up to 324 bits provide means to | |||
| further restrict tools from existing profiles. To be able | further restrict tools from existing profiles. To be able | |||
| to support this fine-granular signalling of coding tool | to support this fine-granular signaling of coding tool | |||
| subsets with profile-id, sub-profile-id and interop- | subsets with profile-id, sub-profile-id and interop- | |||
| constraints, it would be safe to require symmetric use of | constraints, it would be safe to require symmetric use of | |||
| these parameters in SDP offer/answer unless recv-ols-id is | these parameters in SDP offer/answer unless recv-ols-id is | |||
| included in the SDP answer for choosing one of the layers | included in the SDP answer for choosing one of the layers | |||
| offered. | offered. | |||
| The tier is indicated by tier-flag. The default level is | The tier is indicated by tier-flag. The default level is | |||
| indicated by level-id. The tier and the default level specify | indicated by level-id. The tier and the default level specify | |||
| the limits on values of syntax elements or arithmetic | the limits on values of syntax elements or arithmetic | |||
| combinations of values of syntax elements that are followed | combinations of values of syntax elements that are followed | |||
| when generating the bitstream or that the receiver supports. | when generating the bitstream or that the receiver supports. | |||
| In SDP offer/answer, when the SDP answer does not include the | In SDP offer/answer, when the SDP answer does not include the | |||
| recv-ols-id parameter that is less than the sprop-ols-id | recv-ols-id parameter that is less than the sprop-ols-id | |||
| parameter in the SDP offer, the following applies: | parameter in the SDP offer, the following applies: | |||
| o The tier-flag, profile-id, sub-profile-id, and interop- | o The tier-flag, profile-id, sub-profile-id, and interop- | |||
| constraints parameters MUST be used symmetrically, i.e., the | constraints parameters MUST be used symmetrically, i.e., the | |||
| value of each of these parameters in the offer MUST be the | value of each of these parameters in the offer MUST be the | |||
| same as that in the answer, either explicitly signalled or | same as that in the answer, either explicitly signaled or | |||
| implicitly inferred. | implicitly inferred. | |||
| o The level-id parameter is changeable as long as the highest | o The level-id parameter is changeable as long as the highest | |||
| level indicated by the answer is either equal to or lower | level indicated by the answer is either equal to or lower | |||
| than that in the offer. Note that a highest level higher | than that in the offer. Note that a highest level higher | |||
| than level-id in the offer for receiving can be included as | than level-id in the offer for receiving can be included as | |||
| max-recv-level-id. | max-recv-level-id. | |||
| In SDP offer/answer, when the SDP answer does include the recv- | In SDP offer/answer, when the SDP answer does include the recv- | |||
| ols-id parameter that is less than the sprop-ols-id parameter | ols-id parameter that is less than the sprop-ols-id parameter | |||
| skipping to change at page 37, line 18 ¶ | skipping to change at page 37, line 18 ¶ | |||
| structures in all DCI NAL units in the bitstream has the same | structures in all DCI NAL units in the bitstream has the same | |||
| values respectively for those profile_tier_level( ) syntax | values respectively for those profile_tier_level( ) syntax | |||
| elements. | elements. | |||
| [VVC] allows for multiple profile_tier_level( ) structures in a | [VVC] allows for multiple profile_tier_level( ) structures in a | |||
| DCI NAL unit, which may contain different values for the syntax | DCI NAL unit, which may contain different values for the syntax | |||
| elements used to derive the values of profile-id, tier-flag, | elements used to derive the values of profile-id, tier-flag, | |||
| level-id, sub-profile-id, or interop-constraints in the | level-id, sub-profile-id, or interop-constraints in the | |||
| different entries. However, herein defined is only a single | different entries. However, herein defined is only a single | |||
| profile-id, tier-flag, level-id, sub-profile-id, or interop- | profile-id, tier-flag, level-id, sub-profile-id, or interop- | |||
| constraints. When signalling these parameters and a DCI NAL | constraints. When signaling these parameters and a DCI NAL | |||
| unit is present with multiple profile_tier_level( ) structures, | unit is present with multiple profile_tier_level( ) structures, | |||
| these values SHOULD be the same as the first profile_tier_level | these values SHOULD be the same as the first profile_tier_level | |||
| structure in the DCI, unless the sender has ensured that the | structure in the DCI, unless the sender has ensured that the | |||
| receiver can decode the bitstream when a different value is | receiver can decode the bitstream when a different value is | |||
| chosen. | chosen. | |||
| tier-flag, level-id: | tier-flag, level-id: | |||
| The value of tier-flag MUST be in the range of 0 to 1, | The value of tier-flag MUST be in the range of 0 to 1, | |||
| inclusive. The value of level-id MUST be in the range of 0 to | inclusive. The value of level-id MUST be in the range of 0 to | |||
| skipping to change at page 43, line 37 ¶ | skipping to change at page 43, line 37 ¶ | |||
| decoding video at a higher rate than is required by the highest | decoding video at a higher rate than is required by the highest | |||
| level. | level. | |||
| Informative note: When the OPTIONAL media type parameters | Informative note: When the OPTIONAL media type parameters | |||
| are used to signal the properties of a bitstream, and max- | are used to signal the properties of a bitstream, and max- | |||
| lsr is not present, the values of tier-flag, profile-id, | lsr is not present, the values of tier-flag, profile-id, | |||
| sub-profile-id interop-constraints, and level-id must always | sub-profile-id interop-constraints, and level-id must always | |||
| be such that the bitstream complies fully with the specified | be such that the bitstream complies fully with the specified | |||
| profile, tier, and level. | profile, tier, and level. | |||
| When max-lsr is signalled, the receiver MUST be able to decode | When max-lsr is signaled, the receiver MUST be able to decode | |||
| bitstreams that conform to the highest level, with the | bitstreams that conform to the highest level, with the | |||
| exception that the MaxLumaSr value in Table 136 of [VVC] for | exception that the MaxLumaSr value in Table 136 of [VVC] for | |||
| the highest level is replaced with the value of max-lsr. | the highest level is replaced with the value of max-lsr. | |||
| Senders MAY use this knowledge to send pictures of a given size | Senders MAY use this knowledge to send pictures of a given size | |||
| at a higher picture rate than is indicated in the highest | at a higher picture rate than is indicated in the highest | |||
| level. | level. | |||
| When not present, the value of max-lsr is inferred to be equal | When not present, the value of max-lsr is inferred to be equal | |||
| to the value of MaxLumaSr given in Table 136 of [VVC] for the | to the value of MaxLumaSr given in Table 136 of [VVC] for the | |||
| highest level. | highest level. | |||
| skipping to change at page 44, line 27 ¶ | skipping to change at page 44, line 27 ¶ | |||
| constraint on maximum picture rate for all resolutions. | constraint on maximum picture rate for all resolutions. | |||
| Informative note: The max-fps parameter is semantically | Informative note: The max-fps parameter is semantically | |||
| different from max-lsr in that max-fps is used to signal a | different from max-lsr in that max-fps is used to signal a | |||
| constraint, lowering the maximum picture rate from what is | constraint, lowering the maximum picture rate from what is | |||
| implied by other parameters. | implied by other parameters. | |||
| The encoder MUST use a picture rate equal to or less than this | The encoder MUST use a picture rate equal to or less than this | |||
| value. In cases where the max-fps parameter is absent, the | value. In cases where the max-fps parameter is absent, the | |||
| encoder is free to choose any picture rate according to the | encoder is free to choose any picture rate according to the | |||
| highest level and any signalled optional parameters. | highest level and any signaled optional parameters. | |||
| The value of max-fps MUST be smaller than or equal to the full | The value of max-fps MUST be smaller than or equal to the full | |||
| picture rate that is implied by the highest level and, when | picture rate that is implied by the highest level and, when | |||
| present, max-lsr. | present, max-lsr. | |||
| sprop-max-don-diff: | sprop-max-don-diff: | |||
| If there is no NAL unit naluA that is followed in transmission | If there is no NAL unit naluA that is followed in transmission | |||
| order by any NAL unit preceding naluA in decoding order (i.e., | order by any NAL unit preceding naluA in decoding order (i.e., | |||
| the transmission order of the NAL units is the same as the | the transmission order of the NAL units is the same as the | |||
| skipping to change at page 45, line 44 ¶ | skipping to change at page 45, line 44 ¶ | |||
| parameter is smaller than or equal to this parameter. | parameter is smaller than or equal to this parameter. | |||
| When not present, the value of depack-buf-cap is inferred to be | When not present, the value of depack-buf-cap is inferred to be | |||
| equal to 4294967295. The value of depack-buf-cap MUST be an | equal to 4294967295. The value of depack-buf-cap MUST be an | |||
| integer in the range of 1 to 4294967295, inclusive. | integer in the range of 1 to 4294967295, inclusive. | |||
| Informative note: depack-buf-cap indicates the maximum | Informative note: depack-buf-cap indicates the maximum | |||
| possible size of the de-packetization buffer of the receiver | possible size of the de-packetization buffer of the receiver | |||
| only, without allowing for network jitter. | only, without allowing for network jitter. | |||
| Encoding considerations: | ||||
| This type is only defined for transfer via RTP (RFC 3550). | ||||
| Security considerations: | ||||
| See Section 9 of RFC XXXX. | ||||
| Interoperability considerations: N/A | ||||
| Published specification: | ||||
| Please refer to RFC XXXX and its Section 13. | ||||
| Applications that use this media type: N/A | ||||
| Fragment identifier considerations: N/A | ||||
| Additional information: N/A | ||||
| Person & email address to contact for further information: | ||||
| Stephan Wenger (stewe@stewe.org) | ||||
| Intended usage: COMMON | ||||
| Restrictions on usage: N/A | ||||
| Author: See Authors' Addresses section of RFC XXXX. | ||||
| Change controller: | ||||
| IETF Audio/Video Transport Core Maintenance Working Group | ||||
| delegated from the IESG. | ||||
| 7.2. SDP Parameters | 7.2. SDP Parameters | |||
| The receiver MUST ignore any parameter unspecified in this memo. | The receiver MUST ignore any parameter unspecified in this memo. | |||
| 7.2.1. Mapping of Payload Type Parameters to SDP | 7.2.1. Mapping of Payload Type Parameters to SDP | |||
| The media type video/H266 string is mapped to fields in the Session | The media type video/H266 string is mapped to fields in the Session | |||
| Description Protocol (SDP) [RFC8866] as follows: | Description Protocol (SDP) [RFC8866] as follows: | |||
| * The media name in the "m=" line of SDP MUST be video. | * The media name in the "m=" line of SDP MUST be video. | |||
| skipping to change at page 46, line 17 ¶ | skipping to change at page 46, line 50 ¶ | |||
| * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | * The encoding name in the "a=rtpmap" line of SDP MUST be H266 (the | |||
| media subtype). | media subtype). | |||
| * The clock rate in the "a=rtpmap" line MUST be 90000. | * The clock rate in the "a=rtpmap" line MUST be 90000. | |||
| * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | * The OPTIONAL parameters profile-id, tier-flag, sub-profile-id, | |||
| interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | interop-constraints, level-id, sprop-sublayer-id, sprop-ols-id, | |||
| recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | recv-sublayer-id, recv-ols-id, max-recv-level-id, max-lsr, max- | |||
| fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf- | fps, sprop-max-don-diff, sprop-depack-buf-bytes and depack-buf- | |||
| cap, when present, MUST be included in the "a=fmtp" line of SDP. | cap, when present, MUST be included in the "a=fmtp" line of SDP. | |||
| This parameter is expressed as a media type string, in the form of | The fmtp line is expressed as a media type string, in the form of | |||
| a semicolon-separated list of parameter=value pairs. | a semicolon-separated list of parameter=value pairs. | |||
| * The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei, | * The OPTIONAL parameter sprop-vps, sprop-sps, sprop-pps, sprop-sei, | |||
| and sprop-dci, when present, MUST be included in the "a=fmtp" line | and sprop-dci, when present, MUST be included in the "a=fmtp" line | |||
| of SDP or conveyed using the "fmtp" source attribute as specified | of SDP or conveyed using the "fmtp" source attribute as specified | |||
| in Section 6.3 of [RFC5576]. For a particular media format (i.e., | in Section 6.3 of [RFC5576]. For a particular media format (i.e., | |||
| RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or | RTP payload type), sprop-vps, sprop-sps, sprop-pps, sprop-sei, or | |||
| sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP | sprop-dci MUST NOT be both included in the "a=fmtp" line of SDP | |||
| and conveyed using the "fmtp" source attribute. When included in | and conveyed using the "fmtp" source attribute. When included in | |||
| the "a=fmtp" line of SDP, those parameters are expressed as a | the "a=fmtp" line of SDP, those parameters are expressed as a | |||
| media type string, in the form of a semicolon-separated list of | media type string, in the form of a semicolon-separated list of | |||
| parameter=value pairs. When conveyed in the "a=fmtp" line of SDP | parameter=value pairs. When conveyed in the "a=fmtp" line of SDP | |||
| for a particular payload type, the parameters sprop-vps, sprop- | for a particular payload type, the parameters sprop-vps, sprop- | |||
| sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each | sps, sprop-pps, sprop-sei, and sprop-dci MUST be applied to each | |||
| SSRC with the payload type. When conveyed using the "fmtp" source | SSRC with the payload type. When conveyed using the "fmtp" source | |||
| attribute, these parameters are only associated with the given | attribute, these parameters are only associated with the given | |||
| source and payload type as parts of the "fmtp" source attribute. | source and payload type as parts of the "fmtp" source attribute. | |||
| An example of media representation in SDP is as follows: | Informative note: Conveyance of sprop-vps, sprop-sps, and | |||
| sprop-pps using the "fmtp" source attribute allows for out-of- | ||||
| band transport of parameter sets in topologies like Topo-Video- | ||||
| switch-MCU as specified in [RFC7667] | ||||
| An general usage of media representation in SDP is as follows: | ||||
| m=video 49170 RTP/AVP 98 | m=video 49170 RTP/AVP 98 | |||
| a=rtpmap:98 H266/90000 | a=rtpmap:98 H266/90000 | |||
| a=fmtp:98 profile-id=1; | a=fmtp:98 profile-id=1; | |||
| sprop-vps=<video parameter sets data>; | sprop-vps=<video parameter sets data>; | |||
| sprop-sps=<sequence parameter set data>; | sprop-sps=<sequence parameter set data>; | |||
| sprop-pps=<picture parameter set data>; | sprop-pps=<picture parameter set data>; | |||
| A SIP Offer/Answer exchange wherein both parties are expected to both | ||||
| send and receive could look like the following. Only the media | ||||
| codec-specific parts of the SDP are shown. Some lines are wrapped | ||||
| due to text constraints. | ||||
| Offerer->Answerer: | ||||
| m=video 49170 RTP/AVP 98 | ||||
| a=rtpmap:98 H266/90000 | ||||
| a=fmtp:98 profile-id=1; level_id=83; | ||||
| The above represents an offer for symmetric video communication using | ||||
| [VVC] and it's payload specification, at the main profile and level | ||||
| 5.1 (and, as the levels are downgradable, all lower levels. | ||||
| Informally speaking, this offer tells the receiver of the offer that | ||||
| the sender is willing to receive up to 4Kp60 resolution at the | ||||
| maximum bitrates specified in [VVC]. At the same time, if this offer | ||||
| were accepted "as is", the offer can expect that the answerer would | ||||
| be able to receive and properly decode H.266 media up to and | ||||
| including level 5.1. | ||||
| Answerer->Offerer: | ||||
| m=video 49170 RTP/AVP 98 | ||||
| a=rtpmap:98 H266/90000 | ||||
| a=fmtp:98 profile-id=1; level_id=67 | ||||
| With this answer to the offer above, the system receiving the offer | ||||
| advises the offerer that it is incapable of handing H.266 at level | ||||
| 5.1 but is capable of decoding 1080p60. As H.266 video codecs must | ||||
| support decoding at all levels below the maximum level they | ||||
| implement, the resulting user experience would likely be that both | ||||
| systems send video at 1080p60. However, nothing prevents an encoder | ||||
| from further downgrading its sending to, for example 720p30 if it | ||||
| were short of cycles, bandwidth, or for other reasons. | ||||
| 7.2.2. Usage with SDP Offer/Answer Model | 7.2.2. Usage with SDP Offer/Answer Model | |||
| This section describes the negotiation of unicast messages using the | This section describes the negotiation of unicast messages using the | |||
| offer-answer model as described in [RFC3264] and its updates. The | offer-answer model as described in [RFC3264] and its updates. The | |||
| section is split into subsections, covering a) media format | section is split into subsections, covering a) media format | |||
| configurations not involving non-temporal scalability; b) scalable | configurations not involving non-temporal scalability; b) scalable | |||
| media format configurations; c) the description of the use of those | media format configurations; c) the description of the use of those | |||
| parameters not involving the media configuration itself but rather | parameters not involving the media configuration itself but rather | |||
| the parameters of the payload format design; and d) multicast. | the parameters of the payload format design; and d) multicast. | |||
| skipping to change at page 54, line 23 ¶ | skipping to change at page 56, line 23 ¶ | |||
| acceptable for the sender to receive bitstreams. In order to achieve | acceptable for the sender to receive bitstreams. In order to achieve | |||
| high interoperability levels, it is often advisable to offer multiple | high interoperability levels, it is often advisable to offer multiple | |||
| alternative configurations. It is impossible to offer multiple | alternative configurations. It is impossible to offer multiple | |||
| configurations in a single payload type. Thus, when multiple | configurations in a single payload type. Thus, when multiple | |||
| configuration offers are made, each offer requires its own RTP | configuration offers are made, each offer requires its own RTP | |||
| payload type associated with the offer. However, it is possible to | payload type associated with the offer. However, it is possible to | |||
| offer multiple operation points using one configuration in a single | offer multiple operation points using one configuration in a single | |||
| payload type by including sprop-vps in the offer and recv-ols-id in | payload type by including sprop-vps in the offer and recv-ols-id in | |||
| the answer. | the answer. | |||
| A receiver SHOULD understand all media type parameters, even if it | An implementation SHOULD be able to understand all media type | |||
| only supports a subset of the payload format's functionality. This | parameters (including all optional media type parameters), even if it | |||
| ensures that a receiver is capable of understanding when an offer to | doesn't support the functionality related to the parameter. This, in | |||
| receive media can be downgraded to what is supported by the receiver | conjunction with proper application logic in the implementation | |||
| of the offer. | allows the implementation, after having received an offer, to create | |||
| an answer by potentially downgrading one or more of the optional | ||||
| parameters to the point where the implementation can cope, leading to | ||||
| higher chances of interoperability beyond the most basic interop | ||||
| points (for which, as described above, no optional parameters are | ||||
| necessary). | ||||
| Informative note: in implementations of previous H.26x payload | ||||
| formats it was occasionally observed that implementations were | ||||
| incapable of parsing most (or all) of the optional parameters. As | ||||
| a result, the offer-answer exchange resulted in a baseline | ||||
| performance (using the default values for the optional parameters) | ||||
| with the resulting suboptimal user experience. However, there are | ||||
| valid reasons to forego the implementation complexity of | ||||
| implementing the parsing of some or all of the optional | ||||
| parameters, for example, when there is pre-determined knowledge, | ||||
| not negotiated by an SDP-based offer/answer process, of the | ||||
| capabilities of the involved systems (walled gardens, baseline | ||||
| requirements defined in application standards higher up in the | ||||
| stack, and similar). | ||||
| An answerer MAY extend the offer with additional media format | An answerer MAY extend the offer with additional media format | |||
| configurations. However, to enable their usage, in most cases a | configurations. However, to enable their usage, in most cases a | |||
| second offer is required from the offerer to provide the bitstream | second offer is required from the offerer to provide the bitstream | |||
| property parameters that the media sender will use. This also has | property parameters that the media sender will use. This also has | |||
| the effect that the offerer has to be able to receive this media | the effect that the offerer has to be able to receive this media | |||
| format configuration, not only to send it. | format configuration, not only to send it. | |||
| 7.2.2.4. Multicast | 7.2.2.4. Multicast | |||
| skipping to change at page 58, line 8 ¶ | skipping to change at page 60, line 29 ¶ | |||
| related documents, especially those pertaining to RTP (see the | related documents, especially those pertaining to RTP (see the | |||
| Security Considerations section in [RFC3550]), and the security of | Security Considerations section in [RFC3550]), and the security of | |||
| the call-control stack chosen (that may make use of the media type | the call-control stack chosen (that may make use of the media type | |||
| registration of this memo). Implementers should also consider known | registration of this memo). Implementers should also consider known | |||
| security vulnerabilities of video coding and decoding implementations | security vulnerabilities of video coding and decoding implementations | |||
| in general and avoid those. | in general and avoid those. | |||
| Within this RTP payload format, and with the exception of the user | Within this RTP payload format, and with the exception of the user | |||
| data SEI message as described below, no security threats other than | data SEI message as described below, no security threats other than | |||
| those common to RTP payload formats are known. In other words, | those common to RTP payload formats are known. In other words, | |||
| neither the various media-plane-based mechanisms, nor the signalling | neither the various media-plane-based mechanisms, nor the signaling | |||
| part of this memo, seems to pose a security risk beyond those common | part of this memo, seems to pose a security risk beyond those common | |||
| to all RTP-based systems. | to all RTP-based systems. | |||
| RTP packets using the payload format defined in this specification | RTP packets using the payload format defined in this specification | |||
| are subject to the security considerations discussed in the RTP | are subject to the security considerations discussed in the RTP | |||
| specification [RFC3550], and in any applicable RTP profile such as | specification [RFC3550], and in any applicable RTP profile such as | |||
| RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711], or RTP/ | |||
| SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | SAVPF [RFC5124]. However, as "Securing the RTP Framework: Why RTP | |||
| Does Not Mandate a Single Media Security Solution" [RFC7202] | Does Not Mandate a Single Media Security Solution" [RFC7202] | |||
| discusses, it is not an RTP payload format's responsibility to | discusses, it is not an RTP payload format's responsibility to | |||
| skipping to change at page 59, line 23 ¶ | skipping to change at page 61, line 43 ¶ | |||
| 10. Congestion Control | 10. Congestion Control | |||
| Congestion control for RTP SHALL be used in accordance with RTP | Congestion control for RTP SHALL be used in accordance with RTP | |||
| [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. | [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC3551]. | |||
| If best-effort service is being used, an additional requirement is | If best-effort service is being used, an additional requirement is | |||
| that users of this payload format MUST monitor packet loss to ensure | that users of this payload format MUST monitor packet loss to ensure | |||
| that the packet loss rate is within an acceptable range. Packet loss | that the packet loss rate is within an acceptable range. Packet loss | |||
| is considered acceptable if a TCP flow across the same network path, | is considered acceptable if a TCP flow across the same network path, | |||
| and experiencing the same network conditions, would achieve an | and experiencing the same network conditions, would achieve an | |||
| average throughput, measured on a reasonable timescale, that is not | average throughput, measured on a reasonable timescale, that is not | |||
| less than all RTP streams combined are achieving. This condition can | less than all RTP streams combined are achieved. This condition can | |||
| be satisfied by implementing congestion-control mechanisms to adapt | be satisfied by implementing congestion-control mechanisms to adapt | |||
| the transmission rate, the number of layers subscribed for a layered | the transmission rate, the number of layers subscribed for a layered | |||
| multicast session, or by arranging for a receiver to leave the | multicast session, or by arranging for a receiver to leave the | |||
| session if the loss rate is unacceptably high. | session if the loss rate is unacceptably high. | |||
| The bitrate adaptation necessary for obeying the congestion control | The bitrate adaptation necessary for obeying the congestion control | |||
| principle is easily achievable when real-time encoding is used, for | principle is easily achievable when real-time encoding is used, for | |||
| example, by adequately tuning the quantization parameter. However, | example, by adequately tuning the quantization parameter. However, | |||
| when pre-encoded content is being transmitted, bandwidth adaptation | when pre-encoded content is being transmitted, bandwidth adaptation | |||
| requires the pre-coded bitstream to be tailored for such adaptivity. | requires the pre-coded bitstream to be tailored for such adaptivity. | |||
| skipping to change at page 63, line 16 ¶ | skipping to change at page 65, line 38 ¶ | |||
| Framework: Why RTP Does Not Mandate a Single Media | Framework: Why RTP Does Not Mandate a Single Media | |||
| Security Solution", RFC 7202, DOI 10.17487/RFC7202, April | Security Solution", RFC 7202, DOI 10.17487/RFC7202, April | |||
| 2014, <https://www.rfc-editor.org/info/rfc7202>. | 2014, <https://www.rfc-editor.org/info/rfc7202>. | |||
| [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and | |||
| B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms | |||
| for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | for Real-Time Transport Protocol (RTP) Sources", RFC 7656, | |||
| DOI 10.17487/RFC7656, November 2015, | DOI 10.17487/RFC7656, November 2015, | |||
| <https://www.rfc-editor.org/info/rfc7656>. | <https://www.rfc-editor.org/info/rfc7656>. | |||
| [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, | ||||
| DOI 10.17487/RFC7667, November 2015, | ||||
| <https://www.rfc-editor.org/info/rfc7667>. | ||||
| [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | [RFC7798] Wang, Y.-K., Sanchez, Y., Schierl, T., Wenger, S., and M. | |||
| M. Hannuksela, "RTP Payload Format for High Efficiency | M. Hannuksela, "RTP Payload Format for High Efficiency | |||
| Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | Video Coding (HEVC)", RFC 7798, DOI 10.17487/RFC7798, | |||
| March 2016, <https://www.rfc-editor.org/info/rfc7798>. | March 2016, <https://www.rfc-editor.org/info/rfc7798>. | |||
| [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., | |||
| and M. Stiemerling, Ed., "Real-Time Streaming Protocol | and M. Stiemerling, Ed., "Real-Time Streaming Protocol | |||
| Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December | |||
| 2016, <https://www.rfc-editor.org/info/rfc7826>. | 2016, <https://www.rfc-editor.org/info/rfc7826>. | |||
| Appendix A. Change History | Appendix A. Change History | |||
| To RFC Editor: PLEASE REMOVE ThIS SECTION BEFORE PUBLICATION | ||||
| draft-zhao-payload-rtp-vvc-00 ........ initial version | draft-zhao-payload-rtp-vvc-00 ........ initial version | |||
| draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and | draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and | |||
| corrections | corrections | |||
| draft-ietf-payload-rtp-vvc-00 ........ initial WG draft | draft-ietf-payload-rtp-vvc-00 ........ initial WG draft | |||
| draft-ietf-payload-rtp-vvc-01 ........ VVC specification update | draft-ietf-payload-rtp-vvc-01 ........ VVC specification update | |||
| draft-ietf-payload-rtp-vvc-02 ........ VVC specification update | draft-ietf-payload-rtp-vvc-02 ........ VVC specification update | |||
| End of changes. 36 change blocks. | ||||
| 70 lines changed or deleted | 166 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||