idnits 2.17.1 

draft-ietf-payload-rtp-h265-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 153 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 11 instances of too long lines in the document, the longest
     one being 14 characters in excess of 72.

  ** The abstract seems to contain references ([HEVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 27 has weird spacing: '...   at  any  ti...'

  == Line 30 has weird spacing: '...   The  list  ...'

  == Line 45 has weird spacing: '...fo)  in  effec...'

  == Line 46 has weird spacing: '...ication  of  t...'

  == Line 47 has weird spacing: '...ly,  as  they ...'

  == (148 more instances...)

  -- The document date (July 1, 2013) is 3945 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '3GP' is mentioned on line 266, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 958

  == Missing Reference: 'RFC5117' is mentioned on line 2074, but not defined

  ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667)

  == Missing Reference: 'RFC2326' is mentioned on line 2275, but not defined

  ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826)

  == Missing Reference: 'RFC2974' is mentioned on line 2276, but not defined

  == Missing Reference: 'RFC5583' is mentioned on line 2320, but not defined

  == Missing Reference: 'RFC3551' is mentioned on line 2480, but not defined

  == Missing Reference: 'RFC3711' is mentioned on line 2480, but not defined

  == Missing Reference: 'RFC5124' is mentioned on line 2481, but not defined

  == Missing Reference: 'I-D.ietf-avt-srtp-not-mandatory' is mentioned on
     line 2483, but not defined

  == Missing Reference: 'I-D.ietf-avtcore-rtp-security-options' is mentioned
     on line 2490, but not defined

  == Missing Reference: 'RFC 3711' is mentioned on line 2506, but not defined

  == Missing Reference: 'RFC 3551' is mentioned on line 2530, but not defined

  == Unused Reference: 'RFC6051' is defined on line 2611, but no explicit
     reference was found in the text

  == Unused Reference: '3GPPFF' is defined on line 2651, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5109' is defined on line 2667, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)


     Summary: 6 errors (**), 0 flaws (~~), 23 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: January 2014                                        T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                           July 1, 2013

12	            RTP Payload Format for High Efficiency Video Coding
13	                    draft-ietf-payload-rtp-h265-00.txt

15	Status of this Memo

17	   This Internet-Draft is submitted to IETF in full conformance with
18	   the provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six
26	   months and may be updated, replaced, or obsoleted by other documents
27	   at  any  time.    It  is  inappropriate  to  use  Internet-Drafts  as
28	   reference material or to cite them other than as "work in progress."

30	   The  list  of  current  Internet-Drafts  can  be  accessed  at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This Internet-Draft will expire on December 11, 2013.

38	Copyright and License Notice

40	   Copyright (c) 2013 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions         Relating         to         IETF         Documents
45	   (http://trustee.ietf.org/license-info)  in  effect  on  the  date  of
46	   publication  of  this  document.  Please  review  these  documents
47	   carefully,  as  they  describe  your  rights  and  restrictions  with
48	   respect  to  this  document.  Code  Components  extracted  from  this
49	   document must include Simplified BSD License text as described in
50	   Section 4.e of the Trust Legal Provisions and are provided without
51	   warranty as described in the Simplified BSD License.

53	Abstract

55	   This memo describes an RTP payload format for the video coding
56	   standard  ITU-T  Recommendation  H.265  and  ISO/IEC  International
57	   Standard 23008-2, both also known as High Efficiency Video Coding
58	   (HEVC) [HEVC], developed by the Joint Collaborative Team on Video
59	   Coding (JCT-VC).  The RTP payload format allows for packetization of
60	   one or more Network Abstraction Layer (NAL) units in each RTP packet
61	   payload, as well as fragmentation of a NAL unit into multiple RTP
62	   packets.  Furthermore, it supports transmission of an HEVC stream
63	   over a single as well as multiple RTP flows.  The payload format has
64	   wide applicability in videoconferencing, Internet video streaming,
65	   and high bit-rate entertainment-quality video, among others.

67	Table of Contents

69	   Status of this Memo...............................................1
70	   Abstract..........................................................3
71	   Table of Contents.................................................3
72	   1 . Introduction..................................................5
73	      1.1 . Overview of the HEVC Codec...............................5
74	         1.1.1 Coding-Tool Features..................................5
75	         1.1.2 Systems and Transport Interfaces......................7
76	         1.1.3 Parallel Processing Support..........................13
77	         1.1.4 NAL Unit Header......................................15
78	      1.2 . Overview of the Payload Format..........................17
79	   2 . Conventions..................................................17
80	   3 . Definitions and Abbreviations................................17
81	      3.1 Definitions...............................................17
82	         3.1.1 Definitions from the HEVC Specification..............18
83	         3.1.2 Definitions Specific to This Memo....................19
84	      3.2 Abbreviations.............................................20
85	   4 . RTP Payload Format...........................................22
86	      4.1 RTP Header Usage..........................................22
87	      4.2 Payload Structures........................................23
88	      4.3 Transmission Modes........................................24
89	      4.4 Decoding Order Number.....................................25
90	      4.5 Single NAL Unit Packets...................................27
91	      4.6 Aggregation Packets (APs).................................27
92	      4.7 Fragmentation Units (FUs).................................32
93	   5 . Packetization Rules..........................................36
94	   6 . De-packetization Process.....................................37
95	   7 . Payload Format Parameters....................................38
96	      7.1 Media Type Registration...................................39
97	      7.2 SDP Parameters............................................52
98	         7.2.1 Mapping of Payload Type Parameters to SDP............53
99	         7.2.2 Usage with SDP Offer/Answer Model....................54
100	         7.2.3 Usage in Declarative Session Descriptions............58
101	         7.2.4 Dependency Signaling in Multi-Session Transmission...60
102	   8 . Use with Feedback Messages...................................60
103	      8.1 Definition of the SPLI Feedback Message...................62
104	      8.2 Use of HEVC with the RPSI Feedback Message................63
105	      8.3 Use of HEVC with the SPLI Feedback Message................63
106	   9 . Security Considerations......................................63
107	   10 . Congestion Control..........................................65
108	   11 . IANA Consideration..........................................66
109	   12 . Acknowledgements............................................66
110	   13 . References..................................................66
111	      13.1 Normative References.....................................66
112	      13.2 Informative References...................................67
113	   14 . Authors' Addresses..........................................68

115	1. Introduction

117	1.1. Overview of the HEVC Codec

119	   High  Efficiency  Video  Coding  [HEVC],  formally  known  as  ITU-T
120	   Recommendation H.265 and ISO/IEC International Standard 23008-2 was
121	   ratified by ITU-T in April 2013 and reportedly provides significant
122	   coding efficiency gains over H.264 [H.264].

124	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
125	   widely deployed and generally known in the relevant implementer
126	   communities,  frequently  only  the  differences  between  those  two
127	   specifications are highlighted in non-normative, explanatory parts
128	   of this memo.  Basic familiarity with both specifications is assumed
129	   for those parts.  However, the normative parts of this memo do not
130	   require study of H.264 or its RTP payload format.

132	   H.264  and  HEVC  share  a  similar  hybrid  video  codec  design.
133	   Conceptually, both technologies include a video coding layer (VCL),
134	   which is often used to refer to the coding-tool features, and a
135	   network abstraction layer (NAL), which is often used to refer to the
136	   systems and transport interface aspects of the codecs.

138	1.1.1 Coding-Tool Features

140	   Similarly to earlier hybrid-video-coding-based standards, including
141	   H.264, the following basic video coding design is employed by HEVC.
142	   A prediction signal is first formed either by intra or motion
143	   compensated prediction, and the residual (the difference between the
144	   original and the prediction) is then coded.  The gains in coding
145	   efficiency are achieved by redesigning and improving almost all
146	   parts of the codec over earlier designs.  In addition, HEVC includes
147	   several tools to make the implementation on parallel architectures
148	   easier.  Below is a summary of HEVC coding-tool features.

150	   Quad-tree block and transform structure

152	   One of the major tools that contribute significantly to the coding
153	   efficiency of HEVC is the usage of flexible coding blocks and
154	   transforms, which are defined in a hierarchical quad-tree manner.
155	   Unlike H.264, where the basic coding block is a macroblock of fixed
156	   size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
157	   of 64x64.  Each CTU can be divided into smaller units in a
158	   hierarchical quad-tree manner and can represent smaller blocks down
159	   to size 4x4.  Similarly, the transforms used in HEVC can have
160	   different sizes, starting from 4x4 and going up to 32x32.  Utilizing
161	   large blocks and transforms contribute to the major gain of HEVC,
162	   especially at high resolutions.

164	   Entropy coding

166	   HEVC uses a single entropy coding engine, which is based on Context
167	   Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two
168	   distinct  entropy  coding  engines.    CABAC  in  HEVC  shares  many
169	   similarities with CABAC of H.264, but contains several improvements.
170	   Those  include  improvements  in  coding  efficiency  and  lowered
171	   implementation complexity, especially for parallel architectures.

173	   In-loop filtering

175	   H.264 includes an in-loop adaptive deblocking filter, where the
176	   blocking artifacts around the transform edges in the reconstructed
177	   picture are smoothed to improve the picture quality and compression
178	   efficiency.  In HEVC, a similar deblocking filter is employed but
179	   with somewhat lower complexity.  In addition, pictures undergo a
180	   subsequent filtering operation called Sample Adaptive Offset (SAO),
181	   which is a new design element in HEVC.  SAO basically adds a pixel-
182	   level offset in an adaptive manner and usually acts as a de-ringing
183	   filter.  It is observed that SAO improves the picture quality,
184	   especially around sharp edges contributing substantially to visual
185	   quality improvements of HEVC.

187	   Motion prediction and coding

189	   There have been a number of improvements in this area that are
190	   summarized as follows.  The first category is motion merge and
191	   advanced  motion  vector  prediction  (AMVP)  modes.    The  motion
192	   information of a prediction block can be inferred from the spatially
193	   or temporally neighboring blocks.  This is similar to the DIRECT
194	   mode in H.264 but includes new aspects to incorporate the flexible
195	   quad-tree   structure   and   methods   to   improve   the   parallel
196	   implementations.  In addition, the motion vector predictor can be
197	   signaled for improved efficiency.  The second category is high-
198	   precision  interpolation.    The  interpolation  filter  length  is
199	   increased to 8-tap from 6-tap, which improves the coding efficiency
200	   but  also  comes  with  increased  complexity.    In  addition,  the
201	   interpolation filter is defined with higher precision without any
202	   intermediate  rounding  operations  to  further  improve  the  coding
203	   efficiency.

205	   Intra prediction and intra coding

207	   Compared to 8 intra prediction modes in H.264, HEVC supports angular
208	   intra prediction with 33 directions.  This increased flexibility
209	   improves both objective coding efficiency and visual quality as the
210	   edges can be better predicted and ringing artifacts around the edges
211	   can be reduced.  In addition, the reference samples are adaptively
212	   smoothed based on the prediction direction.  To avoid contouring
213	   artifacts a new interpolative prediction generation is included to
214	   improve the visual quality.  Furthermore, discrete sine transform
215	   (DST) is utilized instead of traditional discrete cosine transform
216	   (DCT) for 4x4 intra transform blocks.

218	   Other coding-tool features

220	   HEVC includes some tools for lossless coding and efficient screen
221	   content coding, such as skipping the transform for certain blocks.
222	   These tools are particularly useful for example when streaming the
223	   user-interface of a mobile device to a large display.

225	1.1.2 Systems and Transport Interfaces

227	   HEVC inherited the basic systems and transport interfaces designs,
228	   such as the NAL-unit-based syntax structure, the hierarchical syntax
229	   and data unit structure from sequence-level parameter sets, multi-
230	   picture-level or picture-level parameter sets, slice-level header
231	   parameters,  lower-level  parameters,  the  supplemental  enhancement
232	   information  (SEI)  message  mechanism,  the  hypothetical  reference
233	   decoder (HRD) based video buffering model, and so on.  In the
234	   following, a list of differences in these aspects compared to H.264
235	   is summarized.

237	   Video parameter set

239	   A new type of parameter set, called video parameter set (VPS), was
240	   introduced.  For the first (2013) version of [HEVC], the video
241	   parameter set NAL unit is required to be available prior to its
242	   activation, while the information contained in the video parameter
243	   set is not necessary for operation of the decoding process.  For
244	   future HEVC extensions, such as the 3D or scalable extensions, the
245	   video parameter set is expected to include information necessary for
246	   operation of the decoding process, e.g. decoding dependency or
247	   information for reference picture set construction of enhancement
248	   layers.  The VPS provides a "big picture" of a bitstream, including
249	   what types of operation points are provided, the profile, tier, and
250	   level of the operation points, and some other high-level properties
251	   of  the  bitstream  that  can  be  used  as  the  basis  for  session
252	   negotiation and content selection, etc. (see section 7.1).

254	   Profile, tier and level

256	   The profile, tier and level syntax structure that can be included in
257	   both VPS and sequence parameter set (SPS) includes 12 bytes data to
258	   describe the entire bitstream (including all temporally scalable
259	   layers,  which  are  referred  to  as  sub-layers  in  the  HEVC
260	   specification), and can optionally include more profile, tier and
261	   level  information  pertaining  to  individual  temporally  scalable
262	   layers.  The profile indicator indicates the "best viewed as"
263	   profile when the bitstream conforms to multiple profiles, similar to
264	   the major brand concept in the ISO base media file format (ISOBMFF)
265	   [ISOBMFF] and file formats derived based on ISOBMFF, such as the
266	   3GPP  file  format  [3GP].    The  profile,  tier  and  level  syntax
267	   structure also includes the indications of whether the bitstream is
268	   free of frame-packed content, whether the bitstream is free of
269	   interlaced source content and free of field pictures, i.e., contains
270	   only frame pictures of progressive source, such that clients/players
271	   with no support of post-processing functionalities for handling of
272	   frame-packed or interlaced source content or field pictures can
273	   reject those bitstreams.

275	   Bitstream and elementary stream

277	   HEVC includes a definition of an elementary stream, which is new
278	   compared to H.264.  An elementary stream consists of a sequence of
279	   one or more bitstreams.  An elementary stream that consists of two
280	   or more bitstreams has typically been formed by splicing together
281	   two or more bitstreams (or parts thereof).  When an elementary
282	   stream contains more than one bitstream, the last NAL unit of the
283	   last access unit of a bitstream (except the last bitstream in the
284	   elementary stream) must contain an end of bitstream NAL unit and the
285	   first access unit of the subsequent bitstream must be an intra
286	   random access point (IRAP) access unit.  This IRAP access unit may
287	   be a clean random access (CRA), broken link access (BLA), or
288	   instantaneous decoding refresh (IDR) access unit.

290	   Random access support

292	   HEVC includes signaling in NAL unit header, through NAL unit types,
293	   of IRAP pictures beyond IDR pictures.  Three types of IRAP pictures,
294	   namely IDR, CRA and BLA pictures are supported, wherein IDR pictures
295	   are conventionally referred to as closed group-of-pictures (closed-
296	   GOP) random access points, and CRA and BLA pictures are those
297	   conventionally referred to as open-GOP random access points.  BLA
298	   pictures usually originate from splicing of two bitstreams or part
299	   thereof at a CRA picture, e.g. during stream switching.  To enable
300	   better systems usage of IRAP pictures, altogether six different NAL
301	   units are defined to signal the properties of the IRAP pictures,
302	   which can be used to better match the stream access point (SAP)
303	   types as defined in the ISOBMFF [ISOBMFF], which are utilized for
304	   random access support in both 3GP-DASH [3GPDASH] and MPEG DASH
305	   [MPEGDASH].  Pictures following an IRAP picture in decoding order
306	   and preceding the IRAP picture in output order are referred to as
307	   leading pictures associated with the IRAP picture.  There are two
308	   types of leading pictures, namely random access decodable leading
309	   (RADL) pictures and random access skipped leading (RASL) pictures.
310	   RADL  pictures  are  decodable  when  the  decoding  started  at  the
311	   associated IRAP picture, and RASL pictures are not decodable when
312	   the decoding started at the associated IRAP picture and are usually
313	   discarded.  HEVC provides mechanisms to enable the specification of
314	   conformance of bitstreams with RASL pictures being discarded, thus
315	   to provide a standard-compliant way to enable systems components to
316	   discard RASL pictures when needed.

318	   Temporal scalability support

320	   HEVC  includes  an  improved  support  of  temporal  scalability,  by
321	   inclusion of the signaling of TemporalId in the NAL unit header, the
322	   restriction that pictures of a particular temporal sub-layer cannot
323	   be used for inter prediction reference by pictures of a higher
324	   temporal sub-layer, the sub-bitstream extraction process, and the
325	   requirement  that  each  sub-bitstream  extraction  output  be  a
326	   conforming bitstream.  Media-aware network elements (MANEs) can
327	   utilize the TemporalId in the NAL unit header for stream adaptation
328	   purposes based on temporal scalability.

330	   Temporal sub-layer switching support

332	   HEVC specifies, through NAL unit types present in the NAL unit
333	   header,  the  signaling  of  temporal  sub-layer  access  (TSA)  and
334	   stepwise temporal sub-layer access (STSA).  A TSA picture and
335	   pictures following the TSA picture in decoding order do not use
336	   pictures prior to the TSA picture in decoding order with TemporalId
337	   greater  than  or  equal  to  that  of  the  TSA  picture  for  inter
338	   prediction reference.  A TSA picture enables up-switching, at the
339	   TSA picture, to the sub-layer containing the TSA picture or any
340	   higher sub-layer, from the immediately lower sub-layer.  An STSA
341	   picture does not use pictures with the same TemporalId as the STSA
342	   picture for inter prediction reference. Pictures following an STSA
343	   picture in decoding order with the same TemporalId as the STSA
344	   picture do not use pictures prior to the STSA picture in decoding
345	   order with the same TemporalId as the STSA picture for inter
346	   prediction reference.  An STSA picture enables up-switching, at the
347	   STSA picture, to the sub-layer containing the STSA picture, from the
348	   immediately lower sub-layer.

350	   Sub-layer reference or non-reference pictures

352	   The concept and signaling of reference/non-reference pictures in
353	   HEVC are different from H.264.  In H.264, if a picture may be used
354	   by any other picture for inter prediction reference, it is a
355	   reference picture; otherwise it is a non-reference picture, and this
356	   is signaled by two bits in the NAL unit header.  In HEVC, a picture
357	   is called a reference picture only when it is marked as "used for
358	   reference".  In addition, the concept of sub-layer reference picture
359	   was introduced.  If a picture may be used by another other picture
360	   with the same TemporalId for inter prediction reference, it is a
361	   sub-layer  reference  picture;  otherwise  it  is  a  sub-layer  non-
362	   reference picture.  Whether a picture is a sub-layer reference
363	   picture or sub-layer non-reference picture is signaled through NAL
364	   unit type values.

366	   Extensibility

368	   Besides the TemporalId in the NAL unit header, HEVC also includes
369	   the signaling of a six-bit layer ID in the NAL unit header, which
370	   must  be  equal  to  0  for  a  single-layer  bitstream.    Extension
371	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice
372	   headers, and so on.  All these extension mechanisms enable future
373	   extensions in a backward compatible manner, such that bitstreams
374	   encoded according to potential future HEVC extensions can be fed to
375	   then-legacy decoders (e.g. HEVC version 1 decoders) and the then-
376	   legacy decoders can decode and output the base layer bitstream.

378	   Bitstream extraction

380	   HEVC includes a bitstream extraction process as an integral part of
381	   the overall decoding process, as well as specification of the use of
382	   the  bitstream  extraction  process  in  description  of  bitstream
383	   conformance tests as part of the hypothetical reference decoder
384	   (HRD) specification.

386	   Reference picture management

388	   The  reference  picture  management  of  HEVC,  including  reference
389	   picture marking and removal from the decoded picture buffer (DPB) as
390	   well as reference picture list construction (RPLC), differs from
391	   that of H.264.  Instead of the sliding window plus adaptive memory
392	   management control operation (MMCO) based reference picture marking
393	   mechanism in H.264, HEVC specifies a reference picture set (RPS)
394	   based reference picture management and marking mechanism, and the
395	   RPLC is consequently based on the RPS mechanism.  A reference
396	   picture set consists of a set of reference pictures associated with
397	   a picture, consisting of all reference pictures that are prior to
398	   the associated picture in decoding order, that may be used for inter
399	   prediction of the associated picture or any picture following the
400	   associated picture in decoding order.  The reference picture set
401	   consists of five lists of reference pictures; RefPicSetStCurrBefore,
402	   RefPicSetStCurrAfter,    RefPicSetStFoll,    RefPicSetLtCurr    and
403	   RefPicSetLtFoll.    RefPicSetStCurrBefore,  RefPicSetStCurrAfter  and
404	   RefPicSetLtCurr contains all reference pictures that may be used in
405	   inter prediction of the current picture and that may be used in
406	   inter prediction of one or more of the pictures following the
407	   current   picture   in   decoding   order.      RefPicSetStFoll   and
408	   RefPicSetLtFoll consists of all reference pictures that are not used
409	   in inter prediction of the current picture but may be used in inter
410	   prediction of one or more of the pictures following the current
411	   picture in decoding order.  RPS provides an "intra-coded" signaling
412	   of the DPB status, instead of an "inter-coded" signaling, mainly for
413	   improved error resilience.  The RPLC process in HEVC is based on the
414	   RPS, by signaling an index to an RPS subset for each reference
415	   index.  The RPLC process has been simplified compared to that in
416	   H.264, by removal of the reference picture list modification (also
417	   referred to as reference picture list reordering) process.

419	   Ultra low delay support

421	   HEVC specifies a sub-picture-level HRD operation, for support of the
422	   so-called ultra-low delay.  The mechanism specifies a standard-
423	   compliant way to enable delay reduction below one picture interval.
424	   Sub-picture-level coded picture buffer (CPB) and DPB parameters may
425	   be signaled, and utilization of these information for the derivation
426	   of CPB timing (wherein the CPB removal time corresponds to decoding
427	   time) and DPB output timing (display time) is specified.  Decoders
428	   are allowed to operate the HRD at the conventional access-unit-
429	   level, even when the sub-picture-level HRD parameters are present.

431	   New SEI messages

433	   HEVC inherits many H.264 SEI messages with changes in syntax and/or
434	   semantics making them applicable to HEVC.  The active parameter sets
435	   SEI message includes the IDs of the active video parameter set and
436	   the active sequence parameter set and can be used to activate VPSs
437	   and SPSs.  In addition, the SEI message includes the following
438	   indications: 1) An indication of whether "full random accessibility"
439	   is supported (when supported, all parameter sets needed for decoding
440	   of the remaining of the bitstream when random accessing from the
441	   beginning  of  the  current  coded  video  sequence  by  completely
442	   discarding all access units earlier in decoding order are present in
443	   the remaining bitstream and all coded pictures in the remaining
444	   bitstream can be correctly decoded); 2) An indication of whether
445	   there is no parameter set within the current coded video sequence
446	   that updates another parameter set of the same type preceding in
447	   decoding order.  An update of a parameter set refers to the use of
448	   the same parameter set ID but with some other parameters changed.
449	   If this property is true for all coded video sequences in the
450	   bitstream, then all parameter sets can be sent out-of-band before
451	   session start.  The region refresh information SEI message can be
452	   used together with the recovery point SEI message (present in both
453	   H.264 and HEVC) for improved support of gradual decoding refresh
454	   (GDR).  This supports random access from inter-coded pictures,
455	   wherein complete pictures can be correctly decoded or recovered
456	   after an indicated number of pictures in output/display order.

458	1.1.3 Parallel Processing Support

460	   The reportedly significantly higher computational demand of HEVC
461	   over H.264 (especially with respect to encoders, where a complexity
462	   increase of a factor of ten has often been reported), in conjunction
463	   with  the  ever  increasing  video  resolution  (both  spatially  and
464	   temporally) required by the market, led to the adoption of VCL
465	   coding tools specifically targeted to allow for parallelization on
466	   the sub-picture level.  That is, parallelization occurs, at the
467	   minimum, at the granularity of an integer number of CTUs.  The
468	   targets for this type of high-level parallelization are multicore
469	   CPUs and DSPs as well as multiprocessor systems.  In a system
470	   design, to be useful, these tools require signaling support, which
471	   is provided in Section 7 of this memo.  This section provides a
472	   brief overview of the tools available in [HEVC].

474	   Many of the tools incorporated in HEVC were designed keeping in mind
475	   the potential parallel implementations in multi-core/multi-processor
476	   architectures.    Specifically,  for  parallelization,  four  picture
477	   partition strategies are available.

479	   Slices are segments of the bitstream that can be reconstructed
480	   independently from other slices within the same picture (though
481	   there  may  still  be  interdependencies  through  loop  filtering
482	   operations).  Slices are the only tool that can be used for
483	   parallelization that is also available, in virtually identical form,
484	   in H.264.  Slices based parallelization does not require much inter-
485	   processor or inter-core communication (except for inter-processor or
486	   inter-core data sharing for motion compensation when decoding a
487	   predictively coded picture, which is typically much heavier than
488	   inter-processor  or  inter-core  data  sharing  due  to  in-picture
489	   prediction), as slices are designed to be independently decodable.
490	   However,  for  the  same  reason,  slices  can  require  some  coding
491	   overhead.  Further, slices (in contrast to some of the other tools
492	   mentioned below) also serve as the key mechanism for bitstream
493	   partitioning to match Maximum Transfer Unit (MTU) size requirements,
494	   due to the in-picture independence of slices and the fact that each
495	   regular slice is encapsulated in its own NAL unit.  In many cases,
496	   the goal of parallelization and the goal of MTU size matching can
497	   place contradicting demands to the slice layout in a picture.  The
498	   realization of this situation led to the development of the more
499	   advanced tools mentioned below.  This payload format does not
500	   contain  any  specific  mechanisms  aiding  parallelization  through
501	   slices.

503	   Dependent slice segments allow for fragmentation of a coded slice
504	   into fragments at CTU boundaries without breaking any in-picture
505	   prediction mechanism.  They are complementary to the fragmentation
506	   mechanism described in this memo in that they need the cooperation
507	   of the encoder.  As a dependent slice segment necessarily contains
508	   an integer number of CTUs, a decoder using multiple cores operating
509	   on CTUs can process a dependent slice segment without communicating
510	   parts  of  the  slice  segment's  bitstream  to  other  cores.
511	   Fragmentation, as specified in this memo, in contrast, does not
512	   guarantee that a fragment contains an integer number of CTUs.

514	   In wavefront parallel processing (WPP), the picture is partitioned
515	   into rows of CTUs.  Entropy decoding and prediction are allowed to
516	   use data from CTUs in other partitions.  Parallel processing is
517	   possible through parallel decoding of CTU rows, where the start of
518	   the decoding of a row is delayed by two CTUs, so to ensure that data
519	   related to a CTU above and to the right of the subject CTU is
520	   available before the subject CTU is being decoded.  Using this
521	   staggered start (which appears like a wavefront when represented
522	   graphically),  parallelization  is  possible  with  up  to  as  many
523	   processors/cores as the picture contains CTU rows.

525	   Because in-picture prediction between neighboring CTU rows within a
526	   picture   is   allowed,   the   required   inter-processor/inter-core
527	   communication to enable in-picture prediction can be substantial.
528	   The WPP partitioning does not result in the creation of more NAL
529	   units compared to when it is not applied, thus WPP cannot be used
530	   for MTU size matching, though slices can be used in combination for
531	   that purpose.

533	   Tiles define horizontal and vertical boundaries that partition a
534	   picture into tile columns and rows.  The scan order of CTUs is
535	   changed to be local within a tile (in the order of a CTU raster scan
536	   of a tile), before decoding the top-left CTU of the next tile in the
537	   order of tile raster scan of a picture.  Similar to slices, tiles
538	   break in-picture prediction dependencies (including entropy decoding
539	   dependencies).  However, they do not need to be included into
540	   individual NAL units (same as WPP in this regard), hence tiles
541	   cannot be used for MTU size matching, though slices can be used in
542	   combination for that purpose.  Each tile can be processed by one
543	   processor/core,  and  the  inter-processor/inter-core  communication
544	   required for in-picture prediction between processing units decoding
545	   neighboring tiles is limited to conveying the shared slice header in
546	   cases a slice is spanning more than one tile, and loop filtering
547	   related sharing of reconstructed samples and metadata.  Insofar,
548	   tiles are less demanding in terms of inter-processor communication
549	   bandwidth compared to WPP due to the in-picture independence between
550	   two neighboring partitions.

552	1.1.4 NAL Unit Header

554	   HEVC maintains the NAL unit concept of H.264 with modifications.
555	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
556	   payload of a NAL unit refers to the NAL unit excluding the NAL unit
557	   header.

559	                     +---------------+---------------+
560	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
561	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
562	                     |F|   Type    |  LayerId  | TID |
563	                     +-------------+-----------------+

565	              Figure 1 The structure of HEVC NAL unit header

567	   The semantics of the fields in the NAL unit header are as specified
568	   in [HEVC] and described briefly below for convenience.  In addition
569	   to the name and size of each field, the corresponding syntax element
570	   name in [HEVC] is also provided.

572	   F: 1 bit
573	      forbidden_zero_bit.  MUST be zero.  HEVC declares a value of 1 as
574	      a syntax violation.  Note that the inclusion of this bit in the
575	      NAL unit header is to enable transport of HEVC video over MPEG-2
576	      transport systems (avoidance of start code emulations) [MPEG2S].

578	   Type: 6 bits
579	      nal_unit_type.  This field specifies the NAL unit type as defined
580	      in Table 7-1 of [HEVC].  For a reference of all currently defined
581	      NAL unit types and their semantics, please refer to Section 7.4.1
582	      in [HEVC].

584	   LayerId: 6 bits
585	      nuh_layer_id.  MUST be equal to zero.  It is anticipated that in
586	      future  scalable  or  3D  video  coding  extensions  of  this
587	      specification, this syntax element will be used to identify
588	      additional  layers  that  may  be  present  in  the  coded  video
589	      sequence, wherein a layer may be, e.g. a spatial scalable layer,
590	      a quality scalable layer, a texture view, or a depth view.

592	   TID: 3 bits
593	      nuh_temporal_id_plus1.    This  field  specifies  the  temporal
594	      identifier of the NAL unit plus 1.  The value of TemporalId is
595	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
596	      there is at least one bit in the NAL unit header equal to 1, so
597	      to enable independent considerations of start code emulations in
598	      the NAL unit header and in the NAL unit payload data.

600	1.2. Overview of the Payload Format

602	   This payload format defines the following processes required for
603	   transport of HEVC coded data over RTP [RFC3550]:

605	   o Usage of RTP header with this payload format

607	   o Packetization of HEVC coded NAL units into RTP packets using three
608	     types of payload structures, namely single NAL unit packet,
609	     aggregation packet, and fragment unit

611	   o Transmission of HEVC NAL units of the same bitstream within a
612	     single RTP session or multiple RTP sessions

614	   o Media type parameters to be used with the Session Description
615	     Protocol (SDP) [RFC4566]

617	2. Conventions

619	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
620	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
621	   document are to be interpreted as described in BCP 14, RFC 2119
622	   [RFC2119].

624	   This specification uses the notion of setting and clearing a bit
625	   when bit fields are handled.  Setting a bit is the same as assigning
626	   that bit the value of 1 (On).  Clearing a bit is the same as
627	   assigning that bit the value of 0 (Off).

629	3. Definitions and Abbreviations

631	3.1 Definitions

633	   This document uses the terms and definitions of [HEVC].  Section
634	   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
635	   Section 3.1.2 gives definitions specific to this memo.

637	3.1.1 Definitions from the HEVC Specification

639	   access unit: A set of NAL units that are associated with each other
640	   according to a specified classification rule, are consecutive in
641	   decoding order, and contain exactly one coded picture.

643	   BLA access unit: An access unit in which the coded picture is a BLA
644	   picture.

646	   BLA picture: An IRAP picture for which each VCL NAL unit has
647	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

649	   coded video sequence: A sequence of access units that consists, in
650	   decoding order, of an IRAP access unit with NoRaslOutputFlag equal
651	   to 1, followed by zero or more access units that are not IRAP access
652	   units with NoRaslOutputFlag equal to 1, including all subsequent
653	   access units up to but not including any subsequent access unit that
654	   is an IRAP access unit with NoRaslOutputFlag equal to 1.

656	      Informative note: An IRAP access unit may be an IDR access unit,
657	      a  BLA  access  unit,  or  a  CRA  access  unit.  The  value  of
658	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
659	      access unit, and each CRA access unit that is the first access
660	      unit in the bitstream in decoding order, is the first access unit
661	      that follows an end of sequence NAL unit in decoding order, or
662	      has HandleCraAsBlaFlag equal to 1.

664	   CRA access unit: An access unit in which the coded picture is a CRA
665	   picture.

667	   CRA picture: A RAP picture for which each slice has nal_unit_type
668	   equal to CRA_NUT.

670	   IDR access unit: An access unit in which the coded picture is an IDR
671	   picture.

673	   IDR picture: A RAP picture for which each slice has nal_unit_type
674	   equal to IDR_W_RADL or IDR_N_LP.

676	   IRAP access unit: An access unit in which the coded picture is an
677	   IRAP picture.

679	   IRAP picture: A coded picture for which each VCL NAL unit has
680	   nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.

682	   layer: A set of VCL NAL units that all have a particular value of
683	   nuh_layer_id and the associated non-VCL NAL units, or one of a set
684	   of syntactical structures having a hierarchical relationship.

686	   operation  point:  bitstream  created  from  another  bitstream  by
687	   operation of the sub-bitstream extraction process with the another
688	   bitstream,  a  target  highest  TemporalId,  and  a  target  layer
689	   identifier list as inputs.

691	   random access: The act of starting the decoding process for a
692	   bitstream at a point other than the beginning of the stream.

694	   sub-layer:  A  temporal  scalable  layer  of  a  temporal  scalable
695	   bitstream consisting of VCL NAL units with a particular value of the
696	   TemporalId variable, and the associated non-VCL NAL units.

698	   tile: A rectangular region of coding tree blocks within a particular
699	   tile column and a particular tile row in a picture.

701	   tile column: A rectangular region of coding tree blocks having a
702	   height equal to the height of the picture and a width specified by
703	   syntax elements in the picture parameter set.

705	   tile row: A rectangular region of coding tree blocks having a height
706	   specified by syntax elements in the picture parameter set and a
707	   width equal to the width of the picture.

709	3.1.2 Definitions Specific to This Memo

711	   media aware network element (MANE): A network element, such as a
712	   middlebox or application layer gateway that is capable of parsing
713	   certain aspects of the RTP payload headers or the RTP payload and
714	   reacting to their contents.

716	      Informative note: The concept of a MANE goes beyond normal
717	      routers or gateways in that a MANE has to be aware of the
718	      signaling (e.g., to learn about the payload type mappings of the
719	      media streams), and in that it has to be trusted when working
720	      with SRTP.  The advantage of using MANEs is that they allow
721	      packets to be dropped according to the needs of the media coding.
722	      For example, if a MANE has to drop packets due to congestion on a
723	      certain link, it can identify and remove those packets whose
724	      elimination  produces  the  least  adverse  effect  on  the  user
725	      experience.  After dropping packets, MANEs must rewrite RTCP
726	      packets  to  match  the  changes  to  the  RTP  packet  stream  as
727	      specified in Section 7 of [RFC3550].

729	   NAL unit decoding order: A NAL unit order that conforms to the
730	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

732	   NALU-time: The value that the RTP timestamp would have if the NAL
733	   unit would be transported in its own RTP packet.

735	   RTP  packet  stream:  A  sequence  of  RTP  packets  with  increasing
736	   sequence  numbers  (except  for  wrap-around),  identical  PT  and
737	   identical SSRC (Synchronization Source), carried in one RTP session.
738	   Within the scope of this memo, one RTP packet stream is utilized to
739	   transport one or more temporal sub-layers.

741	   transmission order: The order of packets in ascending RTP sequence
742	   number order (in modulo arithmetic).  Within an aggregation packet,
743	   the NAL unit transmission order is the same as the order of
744	   appearance of NAL units in the packet.

746	   base session: an RTP session in Multi-Session Transmission mode that
747	   transports a bitstream subset which the rest of RTP sessions in the
748	   Multi-Session Transmission depends on. [Ed. (YK): Check the need of
749	   this definition after the draft is more complete.]

751	3.2 Abbreviations

753	   AP       Aggregation Packet

755	   BLA      Broken Link Access

757	   CRA      Clean Random Access

759	   CTB      Coding Tree Block

761	   CTU      Coding Tree Unit
762	   CVS      Coded Video Sequence

764	   FU       Fragmentation Unit

766	   GDR      Gradual Decoding Refresh

768	   HRD      Hypothetical Reference Decoder

770	   IDR      Instantaneous Decoding Refresh

772	   IRAP     Intra Random Access Point

774	   MANE     Media Aware Network Element

776	   MST      Multi-Session Transmission

778	   MTU      Maximum Transfer Unit

780	   NAL      Network Abstraction Layer

782	   NALU     Network Abstraction Layer Unit

784	   PPS      Picture Parameter Set

786	   RADL     Random Access Decodable Leading (Picture)

788	   RASL     Random Access Skipped Leading (Picture)

790	   RPS      Reference Picture Set

792	   SEI      Supplemental Enhancement Information

794	   SPS      Sequence Parameter Set

796	   SST      Single-Session Transmission

798	   STSA     Step-wise Temporal Sub-layer Access

800	   TSA      Temporal Sub-layer Access

802	   VCL      Video Coding Layer

804	   VPS      Video Parameter Set

806	4. RTP Payload Format

808	4.1 RTP Header Usage

810	   The format of the RTP header is specified in [RFC3550] and reprinted
811	   in Figure 2 for convenience.  This payload format uses the fields of
812	   the header in a manner consistent with that specification.

814	   The RTP payload (and the settings for some RTP header bits) for
815	   aggregation  packets  and  fragmentation  units  are  specified  in
816	   Sections 4.6 and 4.7, respectively.

818	    0                   1                   2                   3
819	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
820	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
821	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
822	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
823	   |                           timestamp                           |
824	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
825	   |           synchronization source (SSRC) identifier            |
826	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
827	   |            contributing source (CSRC) identifiers             |
828	   |                             ....                              |
829	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

831	                Figure 2 RTP header according to [RFC3550]

833	   The RTP header information to be set according to this RTP payload
834	   format is set as follows:

836	   Marker bit (M): 1 bit

838	      Set for the last packet of the access unit indicated by the RTP
839	      timestamp, in line with the normal use of the M bit in video
840	      formats, to allow an efficient playout buffer handling.  Decoders
841	      can use this bit as an early indication of the last packet of an
842	      access unit.

844	   Payload type (PT): 7 bits

846	      The assignment of an RTP payload type for this new packet format
847	      is outside the scope of this document and will not be specified
848	      here.  The assignment of a payload type has to be performed
849	      either through the profile used or in a dynamic way.

851	   Sequence number (SN): 16 bits

853	      Set and used in accordance with RFC 3550.

855	   Timestamp: 32 bits

857	      The RTP timestamp is set to the sampling timestamp of the
858	      content. A 90 kHz clock rate MUST be used.

860	      If the NAL unit has no timing properties of its own (e.g.,
861	      parameter set and SEI NAL units), the RTP timestamp is set to the
862	      RTP timestamp of the coded picture of the access unit in which
863	      the NAL unit is included, according to Section 7.4.2.4.4 of
864	      [HEVC].

866	      Receivers SHOULD ignore the picture output timing information in
867	      any picture timing SEI messages or decoding unit information SEI
868	      messages as specified in [HEVC].  Instead, receivers SHOULD use
869	      the RTP timestamp for the display process.  Receivers MUST pass
870	      picture timing SEI messages and decoding unit information SEI
871	      messages to the decoder and MAY use the field/frame related
872	      information for the display process e.g. when frame doubling or
873	      frame  tripling  is  indicated  by  the  field/frame  related
874	      information.

876	4.2 Payload Header Usage

878	   The TID value indicates (among other things) the relative importance
879	   of an RTP packet, for example because NAL units belonging to higher
880	   temporal sub-layers are not used for the decoding of lower temporal
881	   sub-layers.  A lower value of TID indicates a higher importance.
882	   More  important  NAL  units  MAY  be  better  protected  against
883	   transmission losses than less important NAL units.

885	4.3 Payload Structures

887	   The first two bytes of the payload of an RTP packet are referred to
888	   as the payload header.  The payload header consists of the same
889	   fields (F, Type, LayerId, and TID) as the NAL unit header as shown
890	   in section 1.1.4, irrespective of the type of the payload structure.

892	   Three  different  types  of  RTP  packet  payload  structures  are
893	   specified.  A receiver can identify the type of an RTP packet
894	   payload through the Type field in the payload header.

896	   The three different payload structures are as follows:

898	   o  Single NAL unit packet: Contains a single NAL unit in the
899	      payload, and the NAL unit header of the NAL unit also serves as
900	      the payload header.  This payload structure is specified in
901	      section 4.6.

903	   o  Aggregation packet (AP): Contains more than one NAL unit within
904	      one access unit.  This payload structure is specified in section
905	      4.6.

907	   o  Fragmentation unit (FU): Contains a subset of a single NAL unit.
908	      This payload structure is specified in section 4.7.

910	4.4 Transmission Modes

912	   This memo enables transmission of an HEVC bitstream over a single
913	   RTP session or multiple RTP sessions.  The concept and working
914	   principle is inherited from [RFC6190] and follows a similar design.
915	   If only one RTP session is used for transmission of the HEVC
916	   bitstream, the transmission mode is referred to as single-session
917	   transmission (SST); otherwise (more than one RTP session is used for
918	   transmission  of  the  HEVC  bitstream),  the  transmission  mode  is
919	   referred to as multi-session transmission (MST).

921	   [Ed. (YK): Unify the style of abbreviated words throughout the
922	   document.]

924	   SST SHOULD be used for point-to-point unicast scenarios, while MST
925	   SHOULD be used for point-to-multipoint multicast scenarios where
926	   different receivers require different operation points of the same
927	   HEVC bitstream, to improve bandwidth utilizing efficiency.

929	      Informative note: A multicast may degrade to a unicast after all
930	      but one receivers have left (this is a justification of the first
931	      "SHOULD" instead of "MUST"), and there might be scenarios where
932	      MST is desirable but not possible e.g. when IP multicast is not
933	      deployed in certain network (this is a justification of the
934	      second "SHOULD" instead of "MUST").

936	   The transmission mode is indicated by the tx-mode media parameter
937	   (see section 7.1).  If tx-mode is equal to "SST", SST MUST be used.
938	   Otherwise (tx-mode is equal to "MST"), MST MUST be used.

940	4.5 Decoding Order Number

942	   For each NAL unit, the variable AbsDon is derived, representing the
943	   decoding order number that is indicative of the NAL unit decoding
944	   order.

946	   Let NAL unit n be the n-th NAL unit in transmission order within an
947	   RTP session.

949	   If tx-mode is equal to "SST" and sprop-depack-buf-nalus is equal
950	   to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as
951	   equal to n.

953	   Otherwise (tx-mode is equal to "MST" or sprop-depack-buf-nalus is
954	   greater than 0), AbsDon[n] is derived as follows, where DON[n] is
955	   the value of the variable DON for NAL unit n:

957	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
958	      transmission order), AbsDon[0] is set equal to DON[0].

960	   o  Otherwise (n is greater than 0), the following applies for
961	      derivation of AbsDon[n]:

963	            If DON[n] == DON[n-1],
964	                AbsDon[n] = AbsDon[n-1]

966	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
967	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

969	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
970	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

972	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
973	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

975	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
976	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

978	   For any two NAL units m and n, the following applies:

980	   o  AbsDon[n]  greater  than  AbsDon[m]  indicates  that  NAL  unit  n
981	      follows NAL unit m in NAL unit decoding order.

983	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
984	      of the two NAL units can be in either order.

986	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
987	      NAL unit m in decoding order.

989	   When two consecutive NAL units in the NAL unit decoding order have
990	   different values of AbsDon, the value of AbsDon for the second NAL
991	   unit in decoding order MUST be greater than the value of AbsDon for
992	   the first NAL unit, and the absolute difference between the two
993	   AbsDon values MAY be greater than or equal to 1.

995	      Informative note: There are multiple reasons to allow for the
996	      absolute difference of the values of AbsDon for two consecutive
997	      NAL units in the NAL unit decoding order to be greater than one.
998	      An  increment  by  one  is  not  required,  as  at  the  time  of
999	      associating values of AbsDon to NAL units, it may not be known
1000	      whether all NAL units are to be delivered to the receiver.  For
1001	      example, a gateway may not forward coded slice NAL units of
1002	      higher sub-layers or some SEI NAL units when there is congestion
1003	      in the network.  In another example, the first intra picture of a
1004	      pre-encoded clip is transmitted in advance to ensure that it is
1005	      readily available in the receiver, and when transmitting the
1006	      first intra picture, the originator does not exactly know how
1007	      many NAL units will be encoded before the first intra picture of
1008	      the pre-encoded clip follows in decoding order.  Thus, the values
1009	      of AbsDon for the NAL units of the first intra picture of the
1010	      pre-encoded clip have to be estimated when they are transmitted,
1011	      and gaps in values of AbsDon may occur.  Another example is MST
1012	      where the AbsDon values must indicate cross-layer decoding order
1013	      for NAL units conveyed in all the RTP sessions.

1015	4.6 Single NAL Unit Packets

1017	   A single NAL unit packet contains exactly one NAL unit, and consists
1018	   of a payload header (denoted as PayloadHdr), an optional 16-bit DONL
1019	   field (in network byte order), and the NAL unit payload data (the
1020	   NAL unit excluding its NAL unit header) of the contained NAL unit,
1021	   as shown in Figure 3.

1023	   0                   1                   2                   3
1024	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1025	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1026	   |           PayloadHdr          |        DONL (optional)        |
1027	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1028	   |                                                               |
1029	   |                  NAL unit payload data                        |
1030	   |                                                               |
1031	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1032	   |                               :...OPTIONAL RTP padding        |
1033	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1035	              Figure 3 The structure a single NAL unit packet

1037	   The payload header SHOULD be an exact copy of the NAL unit header of
1038	   the contained NAL unit.  However, the Type (i.e. nal_unit_type)
1039	   field MAY be changed, e.g. when it is desirable to handle a CRA
1040	   picture to be a BLA picture [JCTVC-J0107].

1042	   The DONL field, when present, specifies the value of the 16 least
1043	   significant bits of the decoding order number of the contained NAL
1044	   unit.

1046	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1047	   than 0, the DONL field MUST be present, and the variable DON for the
1048	   contained NAL unit is derived as equal to the value of the DONL
1049	   field.  Otherwise (tx-mode is equal to "SST" and sprop-depack-buf-
1050	   nalus is equal to 0), the DONL field MUST NOT be present.

1052	4.7 Aggregation Packets (APs)

1054	   Aggregation packets (APs) are introduced to enable the reduction of
1055	   packetization overhead for small NAL units, such as most of the non-
1056	   VCL NAL units, which are often only a few octets in size.

1058	   An AP aggregates NAL units within one access unit.  Each NAL unit to
1059	   be carried in an AP is encapsulated in an aggregation unit.  NAL
1060	   units aggregated in one AP are in NAL unit decoding order.

1062	   An AP consists of a payload header (denoted as PayloadHdr) followed
1063	   by one or more aggregation units, as shown in Figure 4.

1065	   0                   1                   2                   3
1066	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1067	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1068	   |           PayloadHdr          |                               |
1069	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1070	   |                                                               |
1071	   |             one or more aggregation units                     |
1072	   |                                                               |
1073	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1074	   |                               :...OPTIONAL RTP padding        |
1075	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1077	              Figure 4 The structure of an aggregation packet

1079	   The fields in the payload header are set as follows.  The F bit MUST
1080	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
1081	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be
1082	   equal to 48.  The value of LayerId MUST be equal to the lowest value
1083	   of LayerId of all the aggregated NAL units.  The value of TID MUST
1084	   be the lowest value of TID of all the aggregated NAL units.

1086	      Informative Note: All VCL NAL units in an AP have the same TID
1087	      value since they belong to the same access unit.  However, an AP
1088	      may contain non-VCL NAL units for which the TID value in the NAL
1089	      unit header may be different than the TID value of the VCL NAL
1090	      units in the same AP.

1092	   An AP MUST carry at least two aggregation units and can carry as
1093	   many aggregation units as necessary; however, the total amount of
1094	   data in an AP obviously MUST fit into an IP packet, and the size
1095	   SHOULD be chosen so that the resulting IP packet is smaller than the
1096	   MTU size so to avoid IP layer fragmentation.  An AP MUST NOT contain
1097	   Fragmentation Units (FUs) specified in section 4.7.  APs MUST NOT be
1098	   nested; i.e., an AP MUST NOT contain another AP.

1100	   The first aggregation unit in an AP consists of an optional 16-bit
1101	   DONL field (in network byte order) followed by a 16-bit unsigned
1102	   size information (in network byte order) that indicates the size of
1103	   the NAL unit in bytes (excluding these two octets, but including the
1104	   NAL unit header), followed by the NAL unit itself, including its NAL
1105	   unit header, as shown in Figure 5.

1107	   0                   1                   2                   3
1108	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1109	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1110	                   :        DONL (optional)        |   NALU size   |
1111	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1112	   |   NALU size   |                                               |
1113	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1114	   |                                                               |
1115	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1116	   |                               :
1117	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1119	       Figure 5 The structure of the first aggregation unit in an AP

1121	   The DONL field, when present, specifies the value of the 16 least
1122	   significant bits of the decoding order number of the aggregated NAL
1123	   unit.

1125	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1126	   than 0, the DONL field MUST be present in an aggregation unit that
1127	   is the first aggregation unit in an AP, and the variable DON for the
1128	   aggregated NAL unit is derived as equal to the value of the DONL
1129	   field.  Otherwise (tx-mode is equal to "SST" and sprop-depack-buf-
1130	   nalus is equal to 0), the DONL field MUST NOT be present in an
1131	   aggregation unit that is the first aggregation unit in an AP.

1133	   An aggregation unit that is not the first aggregation unit in an AP
1134	   consists of an optional 8-bit DOND field followed by a 16-bit
1135	   unsigned size information (in network byte order) that indicates the
1136	   size of the NAL unit in bytes (excluding these two octets, but
1137	   including the NAL unit header), followed by the NAL unit itself,
1138	   including its NAL unit header, as shown in Figure 6.

1140	   0                   1                   2                   3
1141	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1142	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1143	                   : DOND(optional)|          NALU size            |
1144	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1145	   |                                                               |
1146	   |                       NAL unit                                |
1147	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1148	   |                               :
1149	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1151	    Figure 6 The structure of an aggregation unit that is not the first
1152	                         aggregation unit in an AP

1154	   When present, the DOND field plus 1 specifies the difference between
1155	   the decoding order number values of the current aggregated NAL unit
1156	   and the preceding aggregated NAL unit in the same AP.

1158	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1159	   than 0, the DOND field MUST be present in an aggregation unit that
1160	   is not the first aggregation unit in an AP, and the variable DON for
1161	   the aggregated NAL unit is derived as equal to the DON of the
1162	   preceding aggregated NAL unit in the same AP plus the value of the
1163	   DOND field plus 1 modulo 65536.  Otherwise (tx-mode is equal to
1164	   "SST" and sprop-depack-buf-nalus is equal to 0), the DOND field MUST
1165	   NOT be present in an aggregation unit that is not the first
1166	   aggregation unit in an AP.

1168	   Figure 7 presents an example of an AP that contains two aggregation
1169	   units, labeled as 1 and 2 in the figure, without the DONL and DOND
1170	   fields being present.

1172	    0                   1                   2                   3
1173	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1174	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1175	   |                          RTP Header                           |
1176	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1177	   |           PayloadHdr          |         NALU 1 Size           |
1178	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1179	   |          NALU 1 HDR           |                               |
1180	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1181	   |                   . . .                                       |
1182	   |                                                               |
1183	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1184	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1186	   | NALU 2 HDR    |                                               |
1187	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1188	   |                   . . .                                       |
1189	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1190	   |                               :...OPTIONAL RTP padding        |
1191	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1193	   Figure 7 An example of an AP packet containing two aggregation units
1194	                     without the DONL and DOND fields

1196	   Figure 8 presents an example of an AP that contains two aggregation
1197	   units, labeled as 1 and 2 in the figure, with the DONL and DOND
1198	   fields being present.

1200	    0                   1                   2                   3
1201	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1202	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1203	   |                          RTP Header                           |
1204	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1205	   |           PayloadHdr          |        NALU 1 DONL            |
1206	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1207	   |          NALU 1 Size          |            NALU 1 HDR         |
1208	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1209	   |                                                               |
1210	   |                 NALU 1 Data   . . .                           |
1211	   |                                                               |
1212	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1213	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1214	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1215	   |          NALU 2 HDR           |                               |
1216	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1217	   |                                                               |
1218	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1219	   |                               :...OPTIONAL RTP padding        |
1220	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1222	    Figure 8 An example of an AP containing two aggregation units with
1223	                         the DONL and DOND fields

1225	4.8 Fragmentation Units (FUs)

1227	   Fragmentation units (FUs) are introduced to enable fragmenting a
1228	   single  NAL  unit  into  multiple  RTP  packets,  possibly  without
1229	   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
1230	   unit consists of an integer number of consecutive octets of that NAL
1231	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1232	   order with ascending RTP sequence numbers (with no other RTP packets
1233	   within the same RTP packet stream being sent between the first and
1234	   last fragment).

1236	   When a NAL unit is fragmented and conveyed within FUs, it is
1237	   referred to as a fragmented NAL unit.  APs MUST NOT be fragmented.
1238	   FUs MUST NOT be nested; i.e., an FU MUST NOT contain a subset of
1239	   another FU.

1241	   The RTP timestamp of an RTP packet carrying an FU is set to the
1242	   NALU-time of the fragmented NAL unit.

1244	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1245	   header of one octet, an optional 16-bit DONL field (in network byte
1246	   order), and an FU payload, as shown in Figure 9.

1248	    0                   1                   2                   3
1249	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1250	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1251	   |          PayloadHdr           |   FU header   | DONL(optional)|
1252	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1253	   | DONL(optional)|                                               |
1254	   |-+-+-+-+-+-+-+-+                                               |
1255	   |                         FU payload                            |
1256	   |                                                               |
1257	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1258	   |                               :...OPTIONAL RTP padding        |
1259	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1261	                      Figure 9 The structure of an FU

1263	   The fields in the payload header are set as follows.  The Type field
1264	   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal
1265	   to the fields F, LayerId, and TID, respectively, of the fragmented
1266	   NAL unit.

1268	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1269	   field, as shown in Figure 10.

1271	                             +---------------+
1272	                             |0|1|2|3|4|5|6|7|
1273	                             +-+-+-+-+-+-+-+-+
1274	                             |S|E|  FuType  |
1275	                             +---------------+

1277	                  Figure 10   The structure of FU header

1279	   The semantics of the FU header fields are as follows:
1280	   S: 1 bit
1281	      When set to one, the S bit indicates the start of a fragmented
1282	      NAL unit i.e., the first byte of the FU payload is also the first
1283	      byte of the payload of the fragmented NAL unit.  When the FU
1284	      payload is not the start of the fragmented NAL unit payload, the
1285	      S bit MUST be set to zero.

1287	   E: 1 bit
1288	      When set to one, the E bit indicates the end of a fragmented NAL
1289	      unit, i.e., the last byte of the payload is also the last byte of
1290	      the fragmented NAL unit.  When the FU payload is not the last
1291	      fragment of a fragmented NAL unit, the E bit MUST be set to zero.

1293	   FuType: 6 bits
1294	      The field FuType MUST be equal to the field Type of the
1295	      fragmented NAL unit.

1297	   The DONL field, when present, specifies the value of the 16 least
1298	   significant bits of the decoding order number of the fragmented NAL
1299	   unit.

1301	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1302	   than 0, and the S bit is equal to 1, the DONL field MUST be present
1303	   in the FU, and the variable DON for the fragmented NAL unit is
1304	   derived as equal to the value of the DONL field.  Otherwise (tx-mode
1305	   is equal to "SST" and sprop-depack-buf-nalus is equal to 0, or the S
1306	   bit is equal to 0), the DONL field MUST NOT be present in the FU.

1308	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1309	   the Start bit and End bit MUST NOT both be set to one in the same FU
1310	   header.

1312	   The  FU  payload  consists  of  fragments  of  the  payload  of  the
1313	   fragmented NAL unit so that if the FU payloads of consecutive FUs,
1314	   starting with an FU with the S bit equal to 1 and ending with an FU
1315	   with the E bit equal to 1, are sequentially concatenated, the
1316	   payload of the fragmented NAL unit can be reconstructed.  The NAL
1317	   unit header of the fragmented NAL unit is not included as such in
1318	   the FU payload, but rather the information of the NAL unit header of
1319	   the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
1320	   the FU payload headers of the FUs and the Type field of the FU
1321	   header of the FUs.  An FU payload MAY have any number of octets and
1322	   MAY be empty.

1324	      Informative note: Empty FU payloads are allowed to reduce the
1325	      latency  of  a  certain  class  of  senders  in  nearly  lossless
1326	      environments.  These senders can be characterized in that they
1327	      packetize  fragments  of  a  NAL  unit  before  the  NAL  unit  is
1328	      completely generated and, hence, before the NAL unit size is
1329	      known.  If zero-length FU payloads were not allowed, the sender
1330	      would have to generate at least one bit of data of the following
1331	      fragment of the NAL unit before the current FU could be sent.
1332	      Due to the characteristics of HEVC, where sometimes several CTUs
1333	      occupy  zero  bits,  this  is  undesirable  and  can  add  delay.
1334	      However, the (potential) use of zero-length FU payloads should be
1335	      carefully weighted against the increased risk of the loss of at
1336	      least a part of the fragmented NAL unit because of the additional
1337	      packets employed for its transmission.

1339	   If  an  FU  is  lost,  the  receiver  SHOULD  discard  all  following
1340	   fragmentation units in transmission order corresponding to the same
1341	   fragmented NAL unit, unless the decoder in the receiver is known to
1342	   be prepared to gracefully handle incomplete NAL units.

1344	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1345	   fragments of a NAL unit to an (incomplete) NAL unit, even if
1346	   fragment n of that NAL unit is not received.  In this case, the
1347	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1348	   syntax violation.

1350	5. Packetization Rules

1352	   The following packetization rules apply:

1354	   o  If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1355	      than 0 for an RTP session, the transmission order of NAL units
1356	      carried in the RTP session MAY be different than the NAL unit
1357	      decoding order.  Otherwise (tx-mode is equal to "SST" and sprop-
1358	      depack-buf-nalus  is  equal  to  0  for  an  RTP  session),  the
1359	      transmission order of NAL units carried in the RTP session MUST
1360	      be the same as the NAL unit decoding order.

1362	   o  A  NAL  unit  of  a  small  size  SHOULD  be  encapsulated  in  an
1363	      aggregation packet together with one or more other NAL units in
1364	      order to avoid the unnecessary packetization overhead for small
1365	      NAL units.  For example, non-VCL NAL units such as access unit
1366	      delimiters, parameter sets, or SEI NAL units are typically small
1367	      and  can  often  be  aggregated  with  slice  NAL  units  without
1368	      violating MTU size constraints.

1370	   o  Each non-VCL NAL unit SHOULD be encapsulated in an aggregation
1371	      packet together with its associated VCL NAL unit, as typically a
1372	      non-VCL NAL unit would be meaningless without the associated VCL
1373	      NAL unit being available.FUs SHOULD NOT be applied in live-
1374	      encoding scenarios such as video telephony, video conferencing,
1375	      live streaming and live broadcast, in which cases dependent slice
1376	      segments SHOULD be used when a slice should be transported in
1377	      multiple RTP packets.  For pre-encoded content where using of
1378	      dependent slice segments is not possible without transcoding, FUs
1379	      SHOULD be used for transporting of one NAL unit in multiple RTP
1380	      packets for MTU size matching.

1382	   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
1383	      unit packet MUST be used.

1385	6. De-packetization Process

1387	   The general concept behind de-packetization is to get the NAL units
1388	   out of the RTP packets in an RTP session and all the dependent RTP
1389	   sessions, if any, and pass them to the decoder in the NAL unit
1390	   decoding order.

1392	   The   de-packetization   process   is   implementation   dependent.
1393	   Therefore, the following description should be seen as an example of
1394	   a suitable implementation.  Other schemes may be used as well as
1395	   long as the output for the same input is the same as the process
1396	   described below.  The output is the same when the set of NAL units
1397	   and their order are both identical.  Optimizations relative to the
1398	   described algorithms are possible.

1400	   All normal RTP mechanisms related to buffer management apply.  In
1401	   particular, duplicated or outdated RTP packets (as indicated by the
1402	   RTP sequences number and the RTP timestamp) are removed.  To
1403	   determine the exact time for decoding, factors such as a possible
1404	   intentional delay to allow for proper inter-stream synchronization
1405	   must be factored in.

1407	   NAL units with NAL unit type values in the range of 0 to 47,
1408	   inclusive may be passed to the decoder.  NAL-unit-like structures
1409	   with NAL unit type values in the range of 48 to 63, inclusive, MUST
1410	   NOT be passed to the decoder.

1412	   The receiver includes a receiver buffer, which is used to compensate
1413	   for  transmission  delay  jitter,  to  reorder  NAL  units  from
1414	   transmission order to the NAL unit decoding order, and to recover
1415	   the NAL unit decoding order in MST, when applicable.  In this
1416	   section, the receiver operation is described under the assumption
1417	   that there is no transmission delay jitter.  To make a difference
1418	   from a practical receiver buffer that is also used for compensation
1419	   of transmission delay jitter, the receiver buffer is here after
1420	   called the de-packetization buffer in this section.  Receivers
1421	   SHOULD also prepare for transmission delay jitter; i.e., either
1422	   reserve separate buffers for transmission delay jitter buffering and
1423	   de-packetization  buffering  or  use  a  receiver  buffer  for  both
1424	   transmission delay jitter and de-packetization.  Moreover, receivers
1425	   SHOULD take transmission delay jitter into account in the buffering
1426	   operation; e.g., by additional initial buffering before starting of
1427	   decoding and playback.

1429	   There are two buffering states in the receiver: initial buffering
1430	   and buffering while playing.  Initial buffering starts when the
1431	   reception is initialized.  After initial buffering, decoding and
1432	   playback are started, and the buffering-while-playing mode is used.

1434	   Regardless of the buffering state, the receiver stores incoming NAL
1435	   units, in reception order, into the de-packetization buffer.  NAL
1436	   units carried in single NAL unit packets, APs, and FUs are stored in
1437	   the de-packetization buffer individually, and the value of AbsDon is
1438	   calculated and stored for each NAL unit.  When MST is in use, NAL
1439	   units  of  all  RTP  packet  streams  are  stored  in  the  same  de-
1440	   packetization buffer.

1442	   Initial buffering lasts until condition A (the number of NAL units
1443	   in the de-packetization buffer is greater than the value of sprop-
1444	   depack-buf-nalus of the highest RTP session) is true.

1446	   After initial buffering, whenever condition A is true, the following
1447	   operation is repeatedly applied until condition A becomes false:

1449	   o  The NAL unit in the de-packetization buffer with the smallest
1450	      value of AbsDon is removed from the de-packetization buffer and
1451	      passed to the decoder.

1453	   When no more NAL units are flowing into the de-packetization buffer,
1454	   all NAL units remaining in the de-packetization buffer are removed
1455	   from the buffer and passed to the decoder in the order of increasing
1456	   AbsDon values.

1458	7. Payload Format Parameters

1460	   This section specifies the parameters that MAY be used to select
1461	   optional features of the payload format and certain features or
1462	   properties of the bitstream.  The parameters are specified here as
1463	   part of the media type registration for the HEVC codec.  A mapping
1464	   of  the  parameters  into  the  Session  Description  Protocol  (SDP)
1465	   [RFC4566]  is  also  provided  for  applications  that  use  SDP.
1466	   Equivalent  parameters  could  be  defined  elsewhere  for  use  with
1467	   control protocols that do not use SDP.

1469	7.1 Media Type Registration

1471	   The media subtype for the HEVC codec is allocated from the IETF
1472	   tree.

1474	   The receiver MUST ignore any unspecified parameter.

1476	   Media Type name:     video

1478	   Media subtype name:  H265

1480	   Required parameters: none

1482	   OPTIONAL parameters:

1484	      In the following definitions of parameters, "the stream" or "the
1485	      NAL unit stream" refers to all NAL units conveyed in the current
1486	      RTP session in SST, and all NAL units conveyed in the current RTP
1487	      session and all NAL units conveyed in other RTP sessions that the
1488	      current RTP session depends on in MST.

1490	      profile-space, profile-id:

1492	         The  profile-space  parameter  indicates  the  context  for
1493	         interpretation  of  the  profile-id  parameter  value.    The
1494	         profile, which specifies the subset of coding tools that may
1495	         have been used to generate the stream or that the receiver
1496	         supports,  as  specified  in  [HEVC],  is  defined  by  the
1497	         combination  of  profile-space  and  profile-id.    Note  that
1498	         profile-space is required to be equal to 0 in [HEVC], but
1499	         other values for it may be specified in the future by ITU-T or
1500	         ISO/IEC.

1502	         If the profile-space and profile-id parameters are used to
1503	         indicate properties of a NAL unit stream, it indicates that,
1504	         to decode the stream, the minimum subset of coding tools a
1505	         decoder has to support is the profile specified by both
1506	         parameters.

1508	         If the profile-space and profile-id parameters are used for
1509	         capability exchange or session setup, it indicates the subset
1510	         of coding tools, which is equal to the profile, that the codec
1511	         supports for both receiving and sending.

1513	         If no profile-space is present, a value of 0 MUST be inferred
1514	         and if no profile-id is present the Main profile MUST be
1515	         inferred.

1517	         The profile-space and profile-id parameters are derived from
1518	         the sequence parameter set or video parameter set NAL units,
1519	         as specified in [HEVC], as follows.

1521	         For SST or for the stream corresponding to the highest RTP
1522	         session of MST when MST is applied, the following applies:

1524	         o  profile_space = general_profile_space
1525	         o  profile_id = general_profile_idc

1527	         For streams not corresponding to the highest RTP session of
1528	         MST when MST is applied, the following applies, with j being
1529	         the value of the sub-layer-id parameter:

1531	         o  profile_space = sub_layer_profile_space[j]
1532	         o  profile_id = sub_layer_profile_idc[j]

1534	      tier-flag, level-id:

1536	         The   tier-flag   parameter   indicates   the   context   for
1537	         interpretation of the level-id value.  The default level,
1538	         which limits values of syntax elements or on arithmetic
1539	         combinations of values of syntax elements, as specified in
1540	         [HEVC], is defined by the combination of tier-flag and level-
1541	         id.

1543	         If the tier-flag and level-id parameters are used to indicate
1544	         properties of a NAL unit stream, it indicates that, to decode
1545	         the stream the lowest level the decoder has to support is the
1546	         default level.

1548	         If  the  tier-flag  and  level-id  parameters  are  used  for
1549	         capability exchange or session setup, the following applies.
1550	         If max-recv-level-id is not present, the default level defined
1551	         by tier-flag and level-id indicates the highest level the
1552	         codec wishes to support.  Otherwise, tier-flag and max-recv-
1553	         level-id indicate the highest level the codec supports for
1554	         receiving.  For either receiving or sending, all levels that
1555	         are lower than the highest level supported MUST also be
1556	         supported.

1558	         If no tier-flag is present, a value of 0 MUST be inferred and
1559	         if no level-id is present, a value of 30 (i.e. level 1.0) MUST
1560	         be inferred.

1562	         The tier-flag and level-id parameters are derived from the
1563	         sequence parameter set or video parameter set NAL units, as
1564	         specified in [HEVC], as follows.

1566	         For SST or for the stream corresponding to the highest RTP
1567	         session of MST when MST is applied, the following applies:

1569	         o  tier-flag = general_tier_flag
1570	         o  level-id = general_level_idc

1572	         For streams not corresponding to the highest RTP session of
1573	         MST when MST is applied, the following applies, with j being
1574	         the value of the sub-layer-id parameter:

1576	         o  tier-flag = sub_layer_tier_flag[j]
1577	         o  level-id = sub_layer_level_idc[j]

1579	      interop-constraints:

1581	         A base16 [RFC4648] (hexadecimal) representation of the six
1582	         bytes  derived  from  the  sequence  parameter  set  or  video
1583	         parameter set NAL units as specified in [HEVC] consisting of
1584	         progressive_source_flag,               interlaced_source_flag,
1585	         non_packed_constraint_flag,  frame_only_constraint_flag,  and
1586	         reserved_zero_44bits.    Note  that  reserved_zero_44bits  is
1587	         required to be equal to 0 in [HEVC], but other values for it
1588	         may be specified in the future by ITU-T or ISO/IEC.

1590	         If no interop-constraints are present, the following MUST be
1591	         inferred:

1593	         o  progressive_source_flag = 1
1594	         o  interlaced_source_flag = 0
1595	         o  non_packed_constraint_flag = 1
1596	         o  frame_only_constraint_flag = 1
1597	         o  reserved_zero_44bits = 0
1598	         For SST or for the stream corresponding to the highest RTP
1599	         session of MST when MST is applied, the following applies:

1601	         o  progressive_source_flag = general_progressive_source_flag
1602	         o  interlaced_source_flag = general_interlaced_source_flag
1603	         o  non_packed_constraint_flag =
1604	                          general_non_packed_constraint_flag
1605	         o  frame_only_constraint_flag =
1606	                          general_frame_only_constraint_flag
1607	         o  reserved_zero_44bits = general_reserved_zero_44bits

1609	         For streams not corresponding to the highest RTP session of
1610	         MST when MST is applied, the following applies, with j being
1611	         the value of the sub-layer-id parameter:

1613	         o  progressive_source_flag =
1614	                          sub_layer_progressive_source_flag[j]
1615	         o  interlaced_source_flag =
1616	                          sub_layer_interlaced_source_flag[j]
1617	         o  non_packed_constraint_flag =
1618	                          sub_layer_non_packed_constraint_flag[j]
1619	         o  frame_only_constraint_flag =
1620	                          sub_layer_frame_only_constraint_flag[j]
1621	         o  reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

1623	      profile-compatibility-indicator:

1625	         A  base16  [RFC4648]  representation  of  the  four  bytes
1626	         representing  the  32  profile  compatibility  flags  in  the
1627	         sequence parameter set or video parameter set NAL units.  A
1628	         decoder conforming to a certain profile may be able to decode
1629	         bitstreams  conforming  to  other  profiles.    The  profile-
1630	         compatibility-indicator  provides  exact  information  of  the
1631	         ability of a decoder conforming to a certain profile to decode
1632	         bitstreams conforming to another profile.  More concretely, if
1633	         the profile compatibility flag corresponding to the profile,
1634	         which a decoder conforms to, is set, then the decoder is able
1635	         to decode that bitstream with the flag set, irrespective of
1636	         the profile, which a bitstream conforms to (provided that the
1637	         decoder supports the highest level of the bitstream).

1639	         For SST or for the stream corresponding to highest RTP session
1640	         of  MST  when  MST  is  used  with  temporal  scalability  the
1641	         following applies with j = 0..31:

1643	         o  The 32 flags = general_profile_compatibility_flag[j]

1645	         For streams not corresponding to the highest RTP session (the
1646	         RTP session which no other RTP session depends on) of MST when
1647	         MST is used with temporal scalability the following applies
1648	         with i being the value of the sub-layer-id parameter and j =
1649	         0..31:

1651	         o  The 32 flags = sub_layer_profile_compatibility_flag[i][j]

1653	      sub-layer-id:

1655	         This parameter MAY be used to indicate the TID of the highest
1656	         sub-layer of the stream.  When not present, the value of sub-
1657	         layer-id     is     inferred     to     be     equal     to
1658	         vps_max_sub_layers_minus1+1 and sps_max_sub_layers_minus1+1 in
1659	         the video parameter set and sequence parameter set as defined
1660	         in [HEVC].

1662	      recv-sub-layer-id:

1664	         This parameter MAY be used to signal a receiver's choice of
1665	         the offers or declared sub-layers in the sprop-vps.  The value
1666	         of recv-sub-layer-id indicates the index of the highest sub-
1667	         layer of the stream that a receiver supports.  When not
1668	         present, the value of recv-sub-layer-id is inferred to be
1669	         equal to sub-layer-id.

1671	      max-recv-level-id:

1673	         This parameter MAY be used, together with tier-flag, to
1674	         indicate the highest level a receiver supports. The highest
1675	         level the receiver supports is equal to the value of max-recv-
1676	         level-id  divided  by  30  for  the  Main  or  High  tier  (as
1677	         determined by tier-flag equal to 0 or 1, respectively).

1679	         When max-recv-level-id is not present, the value is inferred
1680	         to be equal to level-id.

1682	         max-recv-level-id MUST NOT be present when the highest level
1683	         the receiver supports is not higher than the default level.

1685	      sprop-vps:

1687	         This parameter MAY be used to convey any video parameter set
1688	         NAL unit of the stream.  When present, the parameter MAY be
1689	         used   to   indicate   codec   capability   and   sub-stream
1690	         characteristics (i.e. properties of representations of sub-
1691	         layers as defined in [HEVC]) as well as for out-of-band
1692	         transmission of video parameter sets.  The value of the
1693	         parameter is a comma-separated (',') list of base64 [RFC4648]
1694	         representations of the video parameter set NAL units as
1695	         specified in Section 7.3.2.1 of [HEVC].

1697	      sprop-sps:

1699	         This parameter MAY be used to convey sequence parameter set
1700	         NAL units of the stream for out-of-band transmission of
1701	         sequence parameter sets.  The value of the parameter is a
1702	         comma-separated (',') list of base64 [RFC4648] representations
1703	         of the sequence parameter set NAL units as specified in
1704	         Section 7.3.2.2 of [HEVC].

1706	      sprop-pps:

1708	         This parameter MAY be used to convey picture parameter set NAL
1709	         units of the stream for out-of-band transmission of picture
1710	         parameter sets.  The value of the parameter is a comma-
1711	         separated (',') list of base64 [RFC4648] representations of
1712	         the picture parameter set NAL units as specified in Section
1713	         7.3.2.3 of [HEVC].

1715	      max-ls, max-lps, max-cpb, max-dpb, max-br:

1717	         These parameters MAY be used to signal the capabilities of a
1718	         receiver implementation. These parameters MUST NOT be used for
1719	         any other purpose.  The highest level (specified by tier-flag
1720	         and max-recv-level-id) MUST be such that the receiver is fully
1721	         capable of supporting.  max-ls, max-lps, max-cpb, max-dpb, and
1722	         max-br MAY be used to indicate capabilities of the receiver
1723	         that extend the required capabilities of the signaled highest
1724	         level, as specified below.

1726	         When more than one parameter from the set (max-ls, max-lps,
1727	         max-cpb,  max-dpb,  max-br)  is  present,  the  receiver  MUST
1728	         support  all  signaled  capabilities  simultaneously.    For
1729	         example, if both max-ls and max-br are present, the signaled
1730	         highest level with the extension of both the frame rate and
1731	         bitrate is supported.  That is, the receiver is able to decode
1732	         NAL unit streams in which the luma sample rate is up to max-ls
1733	         (inclusive), the bitrate is up to max-br (inclusive), the
1734	         coded picture buffer size is derived as specified in the
1735	         semantics  of  the  max-br  parameter  below,  and  the  other
1736	         properties comply with the highest level specified by tier-
1737	         flag and max-recv-level-id.

1739	            Informative note: When the OPTIONAL media type parameters
1740	            are used to signal the properties of a NAL unit stream,
1741	            max-ls,  max-lps,  max-cpb,  max-dpb,  and  max-br  are  not
1742	            present, and the value of profile-space, profile-id, tier-
1743	            flag and level-id must always be such that the NAL unit
1744	            stream complies fully with the specified profile and level.

1746	      max-ls:
1747	         The value of max-ls is an integer indicating the maximum
1748	         processing rate in units of luma samples per second. The max-
1749	         ls parameter signals that the receiver is capable of decoding
1750	         video at a higher rate than is required by the signaled
1751	         highest level.

1753	         When max-ls is signaled, the receiver MUST be able to decode
1754	         NAL unit streams that conform to the signaled highest level,
1755	         with the exception that the MaxLumaSR value in Table A-2 of
1756	         [HEVC] for the signaled highest level is replaced with the
1757	         value of max-ls. The value of max-ls MUST be greater than or
1758	         equal to the value of MaxLumaSR given in Table A-2 of [HEVC]
1759	         for the highest level. Senders MAY use this knowledge to send
1760	         pictures of a given size at a higher picture rate than is
1761	         indicated in the signaled highest level.

1763	      max-lps:
1764	         The value of max-lps is an integer indicating the maximum
1765	         picture size in units of luma samples.  The max-lps parameter
1766	         signals that the receiver is capable of decoding larger
1767	         picture sizes than are required by the signaled highest level.
1768	         When max-lps is signaled, the receiver MUST be able to decode
1769	         NAL unit streams that conform to the signaled highest level,
1770	         with the exception that the MaxLumaPS value in Table A-1 of
1771	         [HEVC] for the signaled highest level is replaced with the
1772	         value of max-lps. The value of max-lps MUST be greater than or
1773	         equal to the value of MaxLumaPS given in Table A-1 of [HEVC]
1774	         for the highest level. Senders MAY use this knowledge to send
1775	         larger pictures at a proportionally lower frame rate than is
1776	         indicated in the signaled highest level.

1778	      max-cpb:
1779	         The value of max-cpb is an integer indicating the maximum
1780	         coded picture buffer size in units of CpbBrVclFactor bits for
1781	         the VCL HRD parameters and in units of CpbBrNalFactor bits for
1782	         the   NAL   HRD   parameters,   where   CpbBrVclFactor   and
1783	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The max-
1784	         cpb parameter signals that the receiver has more memory than
1785	         the minimum amount of coded picture buffer memory required by
1786	         the signaled highest level. When max-cpb is signaled, the
1787	         receiver MUST be able to decode NAL unit streams that conform
1788	         to the signaled highest level, with the exception that the
1789	         MaxCPB value in Table A-1 of [HEVC] for the signaled highest
1790	         level is replaced with the value of max-cpb. The value of max-
1791	         cpb MUST be greater than or equal to the value of MaxCPB given
1792	         in Table A-1 of [HEVC] for the highest level. Senders MAY use
1793	         this knowledge to construct coded video streams with greater
1794	         variation of bitrate than can be achieved with the MaxCPB
1795	         value in Table A-1 of [HEVC].

1797	            Informative note: The coded picture buffer is used in the
1798	            hypothetical reference decoder (Annex C of HEVC). The use
1799	            of the hypothetical reference decoder is recommended in
1800	            HEVC  encoders  to  verify  that  the  produced  bitstream
1801	            conforms to the standard and to control the output bitrate.
1802	            Thus, the coded picture buffer is conceptually independent
1803	            of any other potential buffers in the receiver, including
1804	            de-packetization and de-jitter buffers. The coded picture
1805	            buffer need not be implemented in decoders as specified in
1806	            Annex C of HEVC, but rather standard-compliant decoders can
1807	            have any buffering arrangements provided that they can
1808	            decode standard-compliant bitstreams. Thus, in practice,
1809	            the input buffer for a video decoder can be integrated with
1810	            de-packetization and de-jitter buffers of the receiver.

1812	      max-dpb:
1813	         The value of max-dpb is an integer indicating the maximum
1814	         decoded picture buffer size in units decoded pictures at the
1815	         MaxLumaPS for the highest level, i.e. number of decoded
1816	         pictures at the maximum picture size defined by the highest
1817	         level. The value of max-dpb MUST be smaller than or equal to
1818	         16. The max-dpb parameter signals that the receiver has more
1819	         memory than the minimum amount of decoded picture buffer
1820	         memory required by default, which is MaxDpbPicBuf as defined
1821	         in [HEVC] (equal to 6). When max-dpb is signaled, the receiver
1822	         MUST be able to decode NAL unit streams that conform to the
1823	         signaled  highest  level,  with  the  exception  that  the
1824	         MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with
1825	         the value of max-dpb. Consequently, a receiver that signals
1826	         max-dpb MUST be capable of storing the following number of
1827	         decoded frames (MaxDpbSize) in its decoded picture buffer:

1829	                          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
1830	              MaxDpbSize = Min( 4 * max-dpb, 16 )
1831	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
1832	              MaxDpbSize = Min( 2 * max-dpb, 16 )
1833	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
1834	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
1835	           else
1836	              MaxDpbSize = max-dpb

1838	                        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
1839	         level and PicSizeInSamplesY is the current size of each
1840	         decoded picture in units of luma samples as defined in [HEVC].

1842	                        The value of max-dpb MUST be greater than or equal to the
1843	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC]. Senders
1844	         MAY use this knowledge to construct coded video streams with
1845	         improved compression.

1847	                              Informative note: This parameter was added primarily to
1848	            complement a similar codepoint in the ITU-T Recommendation
1849	            H.245, so as to facilitate signaling gateway designs. The
1850	            decoded picture buffer stores reconstructed samples. There
1851	            is no relationship between the size of the decoded picture
1852	            buffer  and  the  buffers  used  in  RTP,  especially  de-
1853	            packetization and de-jitter buffers.

1855	      max-br:
1856	         The value of max-br is an integer indicating the maximum video
1857	         bitrate in units of CpbBrVclFactor bits per second for the VCL
1858	         HRD parameters and in units of CpbBrNalFactor bits per second
1859	         for  the  NAL  HRD  parameters,  where  CpbBrVclFactor  and
1860	         CpbBrNalFactor are defined in Section A.4 of [HEVC].

1862	                        The max-br parameter signals that the video decoder of the
1863	         receiver is capable of decoding video at a higher bitrate than
1864	         is required by the signaled highest level.

1866	                        When max-br is signaled, the video codec of the receiver MUST
1867	         be able to decode NAL unit streams that conform to the
1868	         signaled highest level, with the following exceptions in the
1869	         limits specified by the highest level:

1871	                        o  The value of max-br  replaces the MaxBR value in Table A-2
1872	         of [HEVC] for the highest level.

1874	                        o  When the max-cpb parameter is not present, the result of
1875	         the following formula replaces the value of MaxCPB in Table A-
1876	         1 of [HEVC]:

1878	                             (MaxCPB of the signaled level) * max-br / (MaxBR of the
1879	              signaled highest level).

1881	                        For example, if a receiver signals capability for Main profile
1882	         Level 2 with max-br equal to 2000, this indicates a maximum
1883	         video bitrate of 2000 kbits/sec for VCL HRD parameters, a
1884	         maximum  video  bitrate  of  2200  kbits/sec  for  NAL  HRD
1885	         parameters, and a CPB size of 2000000 bits (2000000 / 1500000
1886	         * 1500000).

1888	                        The value of max-br  MUST be greater than or equal to the
1889	         value MaxBR given in Table A-2 of [HEVC] for the signaled
1890	         highest level.

1892	                        Senders MAY use this knowledge to send higher bitrate video as
1893	         allowed in the level definition of Annex A of HEVC to achieve
1894	         improved video quality.

1896	            Informative note: This parameter was added primarily to
1897	            complement a similar codepoint in the ITU-T Recommendation
1898	            H.245, so as to facilitate signaling gateway designs.  The
1899	            assumption that the network is capable of handling such
1900	            bitrates at any given time cannot be made from the value of
1901	            this parameter.  In particular, no conclusion can be drawn
1902	            that the signaled bitrate is possible under congestion
1903	            control constraints.

1905	      tx-mode:

1907	         This parameter indicates whether the transmission mode is SST
1908	         or MST.

1910	         The value of tx-mode MUST be equal to either "MST" or "SST".
1911	         When not present, the value of tx-mode is inferred to be equal
1912	         to "SST".

1914	         If the value is equal to "MST", MST MUST be in use.  Otherwise
1915	         (the value is equal to "SST"), SST MUST be in use.

1917	         The value of tx-mode MUST be equal to "MST" for all RTP
1918	         sessions in an MST.

1920	      sprop-depack-buf-nalus:

1922	         This parameter specifies the maximum number of NAL units that
1923	         precede a NAL unit in the de-packetization buffer in reception
1924	         order and follow the NAL unit in decoding order.

1926	         The value of sprop-depack-buf-nalus MUST be an integer in the
1927	         range of 0 to 32767, inclusive.

1929	         When not present, the value of sprop-depack-buf-nalus is
1930	         inferred to be equal to 0.

1932	         When the RTP session depends on one or more other RTP sessions
1933	         (in this case tx-mode MUST be equal to "MST"), this parameter
1934	         MUST be present and the value of sprop-depack-buf-nalus MUST
1935	         be greater than 0.

1937	      sprop-depack-buf-bytes:

1939	         This  parameter  signals  the  required  size  of  the  de-
1940	         packetization buffer in units of bytes.  The value of the
1941	         parameter MUST be greater than or equal to the maximum buffer
1942	         occupancy (in units of bytes) of the de-packetization buffer
1943	         as specified in section 6.

1945	         The value of sprop-depack-buf-bytes MUST be an integer in the
1946	         range of 0 to 4294967295, inclusive.

1948	         When the RTP session depends on one or more other RTP sessions
1949	         (in this case tx-mode MUST be equal to "MST") or sprop-depack-
1950	         buf-nalus is present and is greater than 0, this parameter
1951	         MUST be present and the value of sprop-depack-buf-bytes MUST
1952	         be greater than 0.

1954	            Informative  note:  sprop-depack-buf-bytes  indicates  the
1955	            required size of the de-packetization buffer only.  When
1956	            network jitter can occur, an appropriately sized jitter
1957	            buffer has to be available as well.

1959	      depack-buf-cap:

1961	         This  parameter  signals  the  capabilities  of  a  receiver
1962	         implementation and indicates the amount of de-packetization
1963	         buffer space in units of bytes that the receiver has available
1964	         for reconstructing the NAL unit decoding order.  A receiver is
1965	         able to handle any stream for which the value of the sprop-
1966	         depack-buf-bytes parameter is smaller than or equal to this
1967	         parameter.

1969	         When not present, the value of depack-buf-req is inferred to
1970	         be equal to 0.  The value of depack-buf-cap MUST be an integer
1971	         in the range of 0 to 4294967295, inclusive.

1973	            Informative  note:  depack-buf-cap  indicates  the  maximum
1974	            possible  size  of  the  de-packetization  buffer  of  the
1975	            receiver  only.    When  network  jitter  can  occur,  an
1976	            appropriately sized jitter buffer has to be available as
1977	            well.

1979	      segmentation-id:

1981	         This parameter MAY be used to signal the segmentation tools
1982	         present  in  the  stream  and  that  can  be  used  for
1983	         parallelization.  The value of segmentation-id MUST be an
1984	         integer in the range of 0 to 3, inclusive.  When not present,
1985	         the value of segmentation-id is inferred to be equal to 0.

1987	         When segmentation-id is equal to 0, no information about the
1988	         segmentation tools is provided.  When segmentation-id is equal
1989	         to 1, it indicates that slices are present in the stream.
1990	         When segmentation-id is equal to 2, it indicates that tiles
1991	         are present in the stream.  When segmentation-id is equal to
1992	         3, it indicates that WPP is used in the stream.

1994	      spatial-segmentation-idc:

1996	         A  base16  [RFC4648]  representation  of  the  syntax  element
1997	         min_spatial_segmentation_idc as specified in [HEVC].  This
1998	         parameter MAY be used to describe parallelization capabilities
1999	         of the stream.

2001	      Encoding considerations:

2003	         This type is only defined for transfer via RTP (RFC 3550).

2005	      Security considerations:

2007	         See Section 9 of RFC XXXX.

2009	      Public specification:

2011	         Please refer to Section 13 of RFC XXXX.

2013	      Additional information: None

2015	      File extensions: none

2017	      Macintosh file type code: none

2019	      Object identifier or OID: none

2021	      Person & email address to contact for further information:

2023	      Intended usage: COMMON

2025	      Author: See Section 14 of RFC XXXX.

2027	      Change controller:

2029	         IETF Audio/Video Transport Payloads working group delegated
2030	         from the IESG.

2032	7.2 SDP Parameters

2034	   The receiver MUST ignore any parameter unspecified in this memo.

2036	7.2.1 Mapping of Payload Type Parameters to SDP

2038	   The media type video/H265 string is mapped to fields in the Session
2039	   Description Protocol (SDP) [RFC4566] as follows:

2041	   o  The media name in the "m=" line of SDP MUST be video.

2043	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
2044	      media subtype).

2046	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2048	   o  The  OPTIONAL  parameters  "profile-space",  "profile-id",  "tier-
2049	      flag", "level-id", "interop-constraints", "profile-compatibility-
2050	      indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level-
2051	      id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "tx-
2052	      mode",     "sprop-depack-buf-nalus",     "sprop-depack-buf-bytes",
2053	      "depack-buf-cap",  "segmentation-id",  and  "spatial-segmentation-
2054	      idc", when present, MUST be included in the "a=fmtp" line of SDP.
2055	      This parameter is expressed as a media type string, in the form
2056	      of a semicolon separated list of parameter=value pairs.

2058	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2059	      pps", when present, MUST be included in the "a=fmtp" line of SDP
2060	      or conveyed using the "fmtp" source attribute as specified in
2061	      section 6.3 of [RFC5576].  For a particular media format (i.e.,
2062	      RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
2063	      NOT be both included in the "a=fmtp" line of SDP and conveyed
2064	      using the "fmtp" source attribute.  When included in the "a=fmtp"
2065	      line of SDP, these parameters are expressed as a media type
2066	      string,  in  the  form  of  a  semicolon  separated  list  of
2067	      parameter=value pairs.  When conveyed using the "fmtp" source
2068	      attribute, these parameters are only associated with the given
2069	      source and payload type as parts of the "fmtp" source attribute.

2071	          Informative note: Conveyance of "sprop-vps", "sprop-sps", and
2072	          "sprop-pps" using the "fmtp" source attribute allows for out-
2073	          of-band transport of parameter sets in topologies like Topo-
2074	          Video-switch-MCU as specified in [RFC5117].

2076	   An example of media representation in SDP is as follows:

2078	         m=video 49170 RTP/AVP 98
2079	         a=rtpmap:98 H265/90000
2080	         a=fmtp:98 profile-id=ST;
2081	                   sprop-vps=<video parameter sets data>

2083	7.2.2 Usage with SDP Offer/Answer Model

2085	   When HEVC is offered over RTP using SDP in an Offer/Answer model
2086	   [RFC3264]  for  negotiation  for  unicast  usage,  the  following
2087	   limitations and rules apply:

2089	   o  The parameters identifying a media format configuration for HEVC
2090	      are  profile-space,  profile-id,  tier-flag,  level-id,  interop-
2091	      constraints, tx-mode, and sprop-depack-buf-nalus.  These media
2092	      configuration  parameters,  except  for  level-id,  MUST  be  used
2093	      symmetrically when the answerer does not include recv-sub-layer-
2094	      id in the answer; i.e., the answerer MUST either maintain all
2095	      configuration parameters or remove the media format (payload
2096	      type) completely, if one or more of the parameter values are not
2097	      supported. The value of level-id) is changeable.

2099	          Informative note: The requirement for symmetric use does not
2100	          apply for level-id, and does not apply for the other stream
2101	          properties and capability parameters.

2103	   To simplify handling and matching of these configurations, the same
2104	   RTP payload type number used in the offer SHOULD also be used in the
2105	   answer, as specified in [RFC3264].  The same RTP payload type number
2106	   used in the offer MUST also be used in the answer when the answer
2107	   includes recv-sub-layer-id.  When the answer does not include recv-
2108	   sub-layer-id, the answer MUST NOT contain a payload type number used
2109	   in the offer unless the configuration is exactly the same as in the
2110	   offer or the configuration in the answer only differs from that in
2111	   the offer with a different value of level-id.  The answer MAY
2112	   contain the recv-sub-layer-id parameter if an HEVC stream contains
2113	   multiple  operation  points  (using  temporal  scalability  and  sub-
2114	   layers) and sprop-vps is included in the offer where sub-layers are
2115	   present in the video parameter set.  If the sprop-vps is provided in
2116	   an offer, an answerer MAY select a particular operation point in the
2117	   received and/or in the sent stream.  When recv-sub-layer-id is
2118	   present in the answer, the media configuration parameters MUST NOT
2119	   be present in the answer.  Rather, the media configuration that the
2120	   answerer will use for receiving and/or sending is the one used for
2121	   the selected operation point as indicated in the offer.

2123	          Informative note: When an offerer receives an answer that
2124	          does not include recv-sub-layer-id, it has to compare payload
2125	          types not declared in the offer based on the media type
2126	          (i.e.,  video/H265)  and  the  above  media  configuration
2127	          parameters with any payload types it has already declared.
2128	          This will enable it to determine whether the configuration in
2129	          question is new or if it is equivalent to configuration
2130	          already offered, since a different payload type number may be
2131	          used in the answer.  The ability to perform operation point
2132	          selection enables a receiver to utilize the temporal scalable
2133	          nature of an HEVC stream.

2135	   o  The parameters sprop-depack-buf-nalus and sprop-depack-buf-bytes
2136	      describe the properties of the RTP packet stream that the offerer
2137	      or the answerer is sending for the media format configuration.
2138	      This  differs  from  the  normal  usage  of  the  Offer/Answer
2139	      parameters: normally such parameters declare the properties of
2140	      the stream that the offerer or the answerer is able to receive.
2141	      When dealing with HEVC, the offerer assumes that the answerer
2142	      will be able to receive media encoded using the configuration
2143	      being offered.

2145	            Informative note:  The above parameters apply for any
2146	            stream  sent  by  a  declaring  entity  with  the  same
2147	            configuration; i.e., they are dependent on their source.
2148	            Rather than being bound to the payload type, the values may
2149	            have to be applied to another payload type when being sent,
2150	            as they apply for the configuration.

2152	   o  The capability parameters max-ls, max-lps, max-cpb, max-dpb, and
2153	      max-br MAY be used to declare further capabilities of the offerer
2154	      or answerer for receiving. These parameters MUST NOT be present
2155	      when  the  direction  attribute  is  "sendonly"  and  when  the
2156	      parameters  describe  the  limitations  of  what  the  offerer  or
2157	      answerer accepts for receiving streams.

2159	   o  An offerer has to include the size of the de-packetization
2160	      buffer,  sprop-depack-buf-bytes,  and  sprop-depack-buf-nalus,  in
2161	      the  offer  for  an  interleaved  HEVC  stream  or  for  the  MST
2162	      transmission mode.  To enable the offerer and answerer to inform
2163	      each  other  about  their  capabilities  for  de-packetization
2164	      buffering in receiving streams, both parties are RECOMMENDED to
2165	      include depack-buf-cap.  For interleaved streams or in MST, it is
2166	      also RECOMMENDED to consider offering multiple payload types with
2167	      different buffering requirements when the capabilities of the
2168	      receiver are unknown.

2170	   For streams being delivered over multicast, the following rules
2171	   apply:

2173	   o  The media format configuration is identified by profile-space,
2174	      profile-id, tier-flag, level-id, interop-constraints, tx-mode and
2175	      sprop-depack-buf-nalus.    These  media  format  configuration
2176	      parameters, including level-id, MUST be used symmetrically; that
2177	      is,  the  answerer  MUST  either  maintain  all  configuration
2178	      parameters or remove the media format (payload type) completely.
2179	      Note that this implies that the level-id for Offer/Answer in
2180	      multicast is not changeable.

2182	   To simplify the handling and matching of these configurations, the
2183	   same RTP payload type number used in the offer SHOULD also be used
2184	   in the answer, as specified in [RFC3264].  An answer MUST NOT
2185	   contain  a  payload  type  number  used  in  the  offer  unless  the
2186	   configuration is the same as in the offer.

2188	   o  The rules for other parameters are the same as above for unicast
2189	      as long as the above rules are obeyed.

2191	   Table 1 lists the interpretation of all the parameters that MUST be
2192	   used for the various combinations of offer, answer, and direction
2193	   attributes.  Note that the two columns wherein the recv-sub-layer-id
2194	   parameter is used only apply to answers, whereas the other columns
2195	   apply to both offers and answers.

2197	   Table 1.  Interpretation of parameters for various combinations of
2198	   offers, answers, direction attributes, with and without recv-sub-
2199	   layer-id.  Columns that do not indicate offer or answer apply to
2200	   both.

2202	                                          sendonly --+
2203	             answer: recvonly,recv-sub-layer-id --+  |
2204	              recvonly w/o recv-sub-layer-id --+  |  |
2205	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
2206	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
2207	                                         |  |  |  |  |
2208	      profile-space                      C  X  C  X  P
2209	      profile-id                         C  X  C  X  P
2210	      tier-flag                          C  X  C  X  P
2211	      level-id                           C  X  C  X  P
2212	      interop-constraints                C  X  C  X  P
2213	      profile-compatibility-indicator    C  X  C  X  P
2214	      max-recv-level-id                  R  R  R  R  -
2215	      tx-mode                            C  X  C  X  P
2216	      sprop-depack-buf-nalus             P  P  -  -  P
2217	      sprop-depack-buf-bytes             P  P  -  -  P
2218	      depack-buf-cap                     R  R  R  R  -
2219	      segmentation-id                    P  P  P  P  P
2220	      spatial-segmentation-idc           P  P  P  P  P
2221	      max-br                             R  R  R  R  -
2222	      max-cpb                            R  R  R  R  -
2223	      max-dpb                            R  R  R  R  -
2224	      max-ls                             R  R  R  R  -
2225	      max-lps                            R  R  R  R  -
2226	      sprop-parameter-sets               P  P  -  -  P
2227	      recv-sub-layer-id                  X  O  X  O  -

2229	     Legend:

2231	      C: configuration for sending and receiving streams
2232	      P: properties of the stream to be sent
2233	      R: receiver capabilities
2234	      O: operation point selection
2235	      X: MUST NOT be present
2236	      -: not usable, when present SHOULD be ignored

2238	   Parameters used for declaring receiver capabilities are in general
2239	   downgradable; i.e., they express the upper limit for a sender's
2240	   possible behavior.  Thus, a sender MAY select to set its encoder
2241	   using only lower/lesser or equal values of these parameters.

2243	   Parameters declaring a configuration point are not changeable, with
2244	   the exception of the level-id parameter for unicast usage.  This
2245	   expresses values a receiver expects to be used and MUST be used
2246	   verbatim on the sender side.  If level-id is changed, an answerer
2247	   MUST NOT include the recv-sub-layer-id parameter.

2249	   When  a  sender's  capabilities  are  declared,  and  non-changeable
2250	   parameters are used in this declaration, these parameters express a
2251	   configuration that is acceptable for the sender to receive streams.

2253	   In order to achieve high interoperability levels, it is often
2254	   advisable to offer multiple alternative configurations.  It is
2255	   impossible to offer multiple configurations in a single payload
2256	   type.  Thus, when multiple configuration offers are made, each offer
2257	   requires its own RTP payload type associated with the offer.

2259	   A receiver SHOULD understand all media type parameters, even if it
2260	   only supports a subset of the payload format's functionality.  This
2261	   ensures that a receiver is capable of understanding when an offer to
2262	   receive media can be downgraded to what is supported by the receiver
2263	   of the offer.

2265	   An answerer MAY extend the offer with additional media format
2266	   configurations.  However, to enable their usage, in most cases a
2267	   second offer is required from the offerer to provide the stream
2268	   property parameters that the media sender will use.  This also has
2269	   the effect that the offerer has to be able to receive this media
2270	   format configuration, not only to send it.

2272	7.2.3 Usage in Declarative Session Descriptions

2274	   When HEVC over RTP is offered with SDP in a declarative style, as in
2275	   Real  Time  Streaming  Protocol  (RTSP)  [RFC2326]  or  Session
2276	   Announcement Protocol (SAP) [RFC2974], the following considerations
2277	   are necessary.

2279	   o  All parameters capable of indicating both stream properties and
2280	      receiver  capabilities  are  used  to  indicate  only  stream
2281	      properties. For example, in this case, the parameter profile-
2282	      tier-level-id declares the values used by the stream, not the
2283	      capabilities for receiving streams.  This results in that the
2284	      following interpretation of the parameters MUST be used:

2286	   Declaring actual configuration or stream properties:

2288	     - profile-space
2289	     - profile-id
2290	     - tier-flag
2291	     - level-id
2292	     - interop-constraints
2293	     - tx-mode
2294	     - sprop-parameter-sets
2295	     - sprop-depack-buf-nalus
2296	     - sprop-depack-buf-bytes
2297	     - segmentation-id
2298	     - spatial-segmentation-idc

2300	   Not usable (when present, they SHOULD be ignored):

2302	     - max-lps
2303	     - max-ls
2304	     - max-cpb
2305	     - max-dpb
2306	     - max-br
2307	     - max-recv-level-id
2308	     - depack-buf-cap
2309	     - sub-layer-id

2311	   o  A receiver of the SDP is required to support all parameters and
2312	      values of the parameters provided; otherwise, the receiver MUST
2313	      reject (RTSP) or not participate in (SAP) the session.  It falls
2314	      on the creator of the session to use values that are expected to
2315	      be supported by the receiving application.

2317	7.2.4 Dependency Signaling in Multi-Session Transmission

2319	   If MST is used, the rules on signaling media decoding dependency in
2320	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
2321	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
2322	   apply, i.e., the notation for Connection Data "c=" SHALL NOT be used
2323	   with more than one address.  The order of session dependency is
2324	   given from the RTP session containing the lowest temporal sub-layer
2325	   to the RTP session containing the highest temporal sub-layer.

2327	8. Use with Feedback Messages

2329	   As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
2330	   Feedback messages are identified by the RTCP packet type value PSFB
2331	   (206).    AVPF  [RFC4585]  defines  three  payload-specific  feedback
2332	   messages  and  one  application  layer  feedback  message,  and  CCM
2333	   [RFC5104] specifies four payload-specific feedback messages.

2335	   In  addition,  this  memo  defines  one  payload-specific  feedback
2336	   message.

2338	   These feedback messages are identified by means of the feedback
2339	   message type (FMT) parameter as follows:

2341	   Assigned in [RFC4585]:

2343	      1:     Picture Loss Indication (PLI)
2344	      2:     Slice Lost Indication (SLI)
2345	      3:     Reference Picture Selection Indication (RPSI)
2346	      15:    Application layer FB message
2347	      31:    reserved for future expansion of the number space

2349	   Assigned in [RFC5104]:

2351	      4:     Full Intra Request (FIR) Command
2352	      5:     Temporal-Spatial Trade-off Request (TSTR)
2353	      6:     Temporal-Spatial Trade-off Notification (TSTN)
2354	      7:     Video Back Channel Message (VBCM)

2356	   Assigned in this memo:

2358	      8:     Specific Picture Loss Indication (SPLI)

2360	   Unassigned:

2362	      0:      unassigned
2363	      9-14:   unassigned
2364	      16-30:  unassigned

2366	   The following subsections define the Feedback Control Information
2367	   (FCI) format for the new payload-specific feedback message and how
2368	   to use HEVC with the RPSI and SPLI messages, both for the purpose of
2369	   feedback  based  reference  picture  selection  for  improved  error
2370	   resilience in real-time conversational video applications such as
2371	   video telephone and video conferencing.

2373	   Feedback based reference picture selection has been shown as a
2374	   powerful tool to stop temporal error propagation for improved error
2375	   resilience [Girod99][Wang05].  In one approach, the decoder side
2376	   tracks errors in the decoded pictures and informs to the encoder
2377	   side that a particular picture that has been decoded relatively
2378	   earlier is correct and still present in the decoded picture buffer
2379	   and requests the encoder to use that correct picture for reference
2380	   when encoding the next picture, so to stop further temporal error
2381	   propagation.  For this approach, the decoder side should use the
2382	   RPSI feedback message.  In another approach, the decoder side only
2383	   reports, to the encoder side, which pictures has been entirely or
2384	   partially  lost,  and  the  encoder  tracks  errors  in  the  decoded
2385	   pictures at the decoder side based on the feedback messages, and if
2386	   it infers that an earlier decoded picture is correct at the decoder
2387	   side and is still in the decoded picture buffer of the decoder, it
2388	   encodes the next picture using that correct picture for reference.
2389	   The SPLI message defined below is for use with the second approach
2390	   described above.

2392	   Encoders can encode some long-term reference pictures as specified
2393	   in H.264 or HEVC for purposes described in the previous paragraph
2394	   without the need of a huge decoded picture buffer.  As shown in
2395	   [Wang05], with a flexible reference picture management scheme as in
2396	   H.264 and HEVC, even a decoded picture buffer size of two would work
2397	   for both the approaches described in the previous paragraph.

2399	8.1 Definition of the SPLI Feedback Message

2401	   The SPLI feedback message is identified by PT=PSFB and FMT=8.  There
2402	   MUST be exactly one SPLI contained in the FCI field.

2404	      Informative note: The SPLI message defined in this memo also
2405	      applies to other codecs, and may later be moved to another
2406	      extension of RFC 4585.

2408	   The FCI format of the SPLI message is exactly the same as that of
2409	   the RPSI message, with the name of the field "Native RPSI bit string
2410	   defined per codec" being replaced with "Native SPLI bit string
2411	   defined per codec", as shown in Figure 11.

2413	   0                   1                   2                   3
2414	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2415	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2416	   |      PB       |0| Payload Type|    Native SPLI bit string     |
2417	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2418	   |   defined per codec          ...                | Padding (0) |
2419	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2421	                  Figure 11   The PCI format of the SPLI

2423	   PB: 8 bits

2425	      The number of unused bits required to pad the length of the SPLI
2426	      message to a multiple of 32 bits.

2428	   0: 1 bit

2430	      MUST be set to zero upon transmission and ignored upon reception.

2432	   Payload Type: 7 bits

2434	      Indicates the RTP payload type in the context of which the native
2435	      SPLI bit string MUST be interpreted.

2437	   Native SPLI bit string: variable length

2439	      Indicates the SPLI information as natively defined by the video
2440	      codec.

2442	   Padding: #PB bits

2444	      A number of bits set to zero to fill up the contents of the SPLI
2445	      message to the next 32-bit boundary.  The number of padding bits
2446	      MUST be indicated by the PB field.

2448	   The same timing rules as for the RPSI message, as defined in
2449	   [RFC4585], apply for the SPLI message.

2451	8.2 Use of HEVC with the RPSI Feedback Message

2453	   The field "Native RPSI bit string defined per codec" is a base16
2454	   [RFC4648]  representation  of  the  8  bits  consisting  of  2  most
2455	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
2456	   in [HEVC], followed by the 32 bits representing the value of the
2457	   PicOrderCntVal (in network byte order), as defined in [HEVC], for
2458	   the picture that is requested to be used for reference when encoding
2459	   the next picture.

2461	   Use of the RPSI feedback message as positive acknowledgement is
2462	   deprecated.  In other words, the RPSI feedback message MUST only be
2463	   used as a reference picture selection request, such that it can also
2464	   be used in multicast.

2466	8.3 Use of HEVC with the SPLI Feedback Message

2468	   The field "Native SPLI bit string defined per codec" is a base16
2469	   [RFC4648]  representation  of  the  8  bits  consisting  of  2  most
2470	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
2471	   in [HEVC], followed by the 32 bits representing the value of the
2472	   PicOrderCntVal, as defined in [HEVC], for the picture that is
2473	   indicated as entirely or partially lost.

2475	9. Security Considerations

2477	   RTP packets using the payload format defined in this specification
2478	   are subject to the security considerations discussed in the RTP
2479	   specification [RFC3550], and in any applicable RTP profile such as
2480	   RTP/AVP  [RFC3551],  RTP/AVPF  [RFC4585],  RTP/SAVP  [RFC3711]  or
2481	   RTP/SAVPF  [RFC5124].    However,  as  "Securing  the  RTP  Protocol
2482	   Framework:  Why  RTP  Does  Not  Mandate  a  Single  Media  Security
2483	   Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
2484	   RTP payload format's responsibility to discuss or mandate what
2485	   solutions  are  used  to  meet  the  basic  security  goals  like
2486	   confidentiality,  integrity,  and  source  authenticity  for  RTP  in
2487	   general.  This responsibility lays on anyone using RTP in an
2488	   application.    They  can  find  guidance  on  available  security
2489	   mechanisms and important considerations as discussed in "Options for
2490	   Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].

2492	   The rest of this section discusses the security impacting properties
2493	   of the payload format itself.

2495	   Because the data compression used with this payload format is
2496	   applied end-to-end, any encryption needs to be performed after
2497	   compression.  A potential denial-of-service threat exists for data
2498	   encodings  using  compression  techniques  that  have  non-uniform
2499	   receiver-end  computational  load.    The  attacker  can  inject
2500	   pathological datagrams into the stream that are complex to decode
2501	   and that cause the receiver to be overloaded.  H.265 is particularly
2502	   vulnerable to such attacks, as it is extremely simple to generate
2503	   datagrams containing NAL units that affect the decoding process of
2504	   many  future  NAL  units.    Therefore,  the  usage  of  data  origin
2505	   authentication and data integrity protection of at least the RTP
2506	   packet is RECOMMENDED, for example, with SRTP [RFC 3711].

2508	   Note that the appropriate mechanism to ensure confidentiality and
2509	   integrity of RTP packets and their payloads is very dependent on the
2510	   application and on the transport and signaling protocols employed.
2511	   Thus, although SRTP is given as an example above, other possible
2512	   choices exist.

2514	   Decoders MUST exercise caution with respect to the handling of user
2515	   data SEI messages, particularly if they contain active elements, and
2516	   MUST restrict their domain of applicability to the presentation
2517	   containing the stream.

2519	   End-to-end    security    with    authentication,    integrity,    or
2520	   confidentiality  protection  will  prevent  a  MANE  from  performing
2521	   media-aware operations other than discarding complete packets.  In
2522	   the case of confidentiality protection, it will even be prevented
2523	   from discarding packets in a media-aware way.  To be allowed to
2524	   perform such operations, a MANE is required to be a trusted entity
2525	   that is included in the security context establishment.

2527	10. Congestion Control

2529	   Congestion control for RTP SHALL be used in accordance with RTP
2530	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC 3551].
2531	   If best-effort service is being used, an additional requirement is
2532	   that users of this payload format MUST monitor packet loss to ensure
2533	   that the packet loss rate is within an acceptable range.  Packet
2534	   loss is considered acceptable if a TCP flow across the same network
2535	   path, and experiencing the same network conditions, would achieve an
2536	   average throughput, measured on a reasonable timescale, that is not
2537	   less than the RTP flow is achieving.  This condition can be
2538	   satisfied by implementing congestion control mechanisms to adapt the
2539	   transmission rate, the number of layers subscribed for a layered
2540	   multicast session, or by arranging for a receiver to leave the
2541	   session if the loss rate is unacceptably high.

2543	   The bitrate adaptation necessary for obeying the congestion control
2544	   principle is easily achievable when real-time encoding is used, for
2545	   example by adequately tuning the quantization parameter.

2547	   However, when pre-encoded content is being transmitted, bandwidth
2548	   adaptation requires the pre-coded bitstream to be tailored for such
2549	   adaptivity.    The  key  mechanism  available  in  HEVC  is  temporal
2550	   scalability.  A media sender can remove NAL units belonging to
2551	   higher temporal sub-layers (i.e. those NAL units with a high value
2552	   of TID) until the sending bitrate drops to an acceptable range.
2553	   HEVC contains mechanisms that allow the lightweight identification
2554	   of switching points in temporal enhancement layers, as discussed in
2555	   Section 1.1.2 of this memo.  An HEVC media sender can send packets
2556	   belonging to NAL units of temporal enhancement layers starting from
2557	   these switching points to probe for available bandwidth and to
2558	   utilized bandwidth that has been shown to be available.

2560	   Above mechanisms generally work within a defined profile and level
2561	   and, therefore, no renegotiation of the channel is required.  Only
2562	   when non-downgradable parameters (such as profile) are required to
2563	   be changed does it become necessary to terminate and restart the
2564	   media stream.  This may be accomplished by using a different RTP
2565	   payload type.

2567	   MANEs MAY remove certain unusable packets from the packet stream
2568	   when that stream was damaged due to previous packet losses.  This
2569	   can help reduce the network load in certain special cases.  For
2570	   example, MANES can remove those FUs where the leading FUs belonging
2571	   to the same NAL unit have been lost or those dependent slice
2572	   segments when the leading slice segments belonging to the same slice
2573	   have been lost, because the trailing FUs or dependent slice segments
2574	   are meaningless to most decoders.  MANES can also remove higher
2575	   temporal scalable layers if the outbound transmission (from the
2576	   MANE's viewpoint) experiences congestion.

2578	11. IANA Consideration

2580	   A new media type, as specified in Section 7.1 of this memo, should
2581	   be registered with IANA.

2583	12. Acknowledgements

2585	   Muhammed Coban and Marta Karczewicz are thanked for discussions on
2586	   the specification of the use with feedback messages and other
2587	   aspects in this memo.  Roni Even, Rickard Sjoberg, Sachin Deshpande,
2588	   and  Woo  Johnman  made  valuable  reviewing  comments  that  led  to
2589	   improvements.

2591	   This document was prepared using 2-Word-v2.0.template.dot.

2593	13. References

2595	13.1 Normative References

2597	   [HEVC]    JCT-VC,  "High  Efficiency  Video  Coding  (HEVC)  text
2598	             specification draft 10 (for FDIS & Last Call)", JCTVC-
2599	             L1003v34, March 2013.

2601	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
2602	             generic audiovisual services", January 2012.

2604	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
2605	             Payload Format for H.264 Video", RFC 6184, May 2011.

2607	   [RFC6190] Wenger,   S.,   Wang,   Y.-K.,   Schierl,   T.,   and   A.
2608	             Eleftheriadis,  "RTP  Payload  Format  for  Scalable  Video
2609	             Coding", RFC 6190, May 2011.

2611	   [RFC6051] C. Perkins and T. Schierl, "Rapid Synchronisation of RTP
2612	             Flows"

2614	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2615	             Requirement Levels", BCP 14, RFC 2119, March 1997.

2617	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
2618	             with Session Description Protocol (SDP)", RFC 3264, June
2619	             2002.

2621	   [RFC4648] Josefsson,  S.,  "The  Base16,  Base32,  and  Base64  Data
2622	             Encodings", RFC 4648, October 2006.

2624	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
2625	             V.,   "RTP:   A   Transport   Protocol   for   Real-Time
2626	             Applications", STD 64, RFC 3550, July 2003.

2628	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
2629	             Description Protocol", RFC 4566, July 2006.

2631	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
2632	             Media Attributes in the Session Description Protocol", RFC
2633	             5576, June 2009.

2635	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
2636	             J., "Extended RTP Profile for Real-time Transport Control
2637	             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
2638	             2006.

2640	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
2641	             "Codec Control Messages in the RTP Audio-Visual Profile
2642	             with Feedback (AVPF)", RFC 5104, February 2008.

2644	13.2 Informative References

2646	   [Ed. (YK): Details for some of the following references are to be
2647	             added.]

2649	   [3GPDASH] 3GPP TS 26.247.

2651	   [3GPPFF]  3GPP TS 26.244.

2653	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
2654	             for mobile video transmission", Proceedings IEEE, Vol. 87,
2655	             No. 10, pp. 1707-1723, October 1999.

2657	   [ISOBMFF] IS0/IEC 14496-12.

2659	   [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
2660	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107,
2661	             10th JCT-VC meeting, July 2012, Stockholm, Sweden.

2663	   [MPEG2S]  IS0/IEC 13818-2.

2665	   [MPEGDASH] IS0/IEC 23009-1.

2667	   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
2668	             Correction", RFC 5109, December 2007.

2670	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
2671	             coding   using   flexible   reference   fames",   Visual
2672	             Communications and Image Processing 2005 (VCIP 2005), July
2673	             2005, Beijing, China.

2675	14. Authors' Addresses

2677	   Ye-Kui Wang
2678	   Qualcomm Incorporated
2679	   5775 Morehouse Drive
2680	   San Diego, CA 92121
2681	   USA
2682	   Phone: +1-858-651-8345
2683	   EMail: yekuiw@qti.qualcomm.com

2685	   Yago Sanchez
2686	   Fraunhofer HHI
2687	   Einsteinufer 37
2688	   D-10587 Berlin
2689	   Germany
2690	   Phone: +49-30-31002-227
2691	   Email: yago.sanchez@hhi.fraunhofer.de

2693	   Thomas Schierl
2694	   Fraunhofer HHI
2695	   Einsteinufer 37
2696	   D-10587 Berlin
2697	   Germany
2698	   Phone: +49-30-31002-227
2699	   Email: ts@thomas-schierl.de
2700	   Stephan Wenger
2701	   Vidyo, Inc.          th        433 Hackensack Ave., 7  floor
2702	   Hackensack, N.J. 07601
2703	   USA
2704	   Phone: +1-415-713-5473
2705	   EMail: stewe@stewe.org

2707	   Miska M. Hannuksela
2708	   Nokia Corporation
2709	   P.O. Box 1000
2710	   33721 Tampere
2711	   Finland
2712	   Phone: +358-7180-08000
2713	   EMail: miska.hannuksela@nokia.com