idnits 2.17.1 

draft-ietf-payload-rtp-h265-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 173 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 3 instances of too long lines in the document, the longest one
     being 14 characters in excess of 72.

  ** The abstract seems to contain references ([HEVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 27 has weird spacing: '...   at  any  ti...'

  == Line 30 has weird spacing: '...   The  list  ...'

  == Line 45 has weird spacing: '...fo)  in  effec...'

  == Line 46 has weird spacing: '...ication  of  t...'

  == Line 47 has weird spacing: '...ly,  as  they ...'

  == (168 more instances...)

  -- The document date (February 12, 2014) is 3724 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '3GP' is mentioned on line 269, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1035

  == Missing Reference: 'RFC5234' is mentioned on line 2356, but not defined

  == Missing Reference: 'RFC5117' is mentioned on line 2538, but not defined

  ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667)

  == Missing Reference: 'RFC2326' is mentioned on line 2852, but not defined

  ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826)

  == Missing Reference: 'RFC2974' is mentioned on line 2853, but not defined

  == Missing Reference: 'RFC3551' is mentioned on line 2994, but not defined

  == Missing Reference: 'RFC3711' is mentioned on line 2994, but not defined

  == Missing Reference: 'RFC5124' is mentioned on line 2995, but not defined

  == Missing Reference: 'RFC 3711' is mentioned on line 3020, but not defined

  == Missing Reference: 'RFC 3551' is mentioned on line 3044, but not defined

  == Unused Reference: '3GPPFF' is defined on line 3169, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5109' is defined on line 3218, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-01

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-05


     Summary: 6 errors (**), 0 flaws (~~), 22 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: August 2014                                         T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                      February 12, 2014

12	            RTP Payload Format for High Efficiency Video Coding
13	                    draft-ietf-payload-rtp-h265-02.txt

15	Status of this Memo

17	   This Internet-Draft is submitted to IETF in full conformance with
18	   the provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six
26	   months and may be updated, replaced, or obsoleted by other documents
27	   at  any  time.    It  is  inappropriate  to  use  Internet-Drafts  as
28	   reference material or to cite them other than as "work in progress."

30	   The  list  of  current  Internet-Drafts  can  be  accessed  at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This Internet-Draft will expire on August 12, 2014.

38	Copyright and License Notice

40	   Copyright (c) 2014 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions         Relating         to         IETF         Documents
45	   (http://trustee.ietf.org/license-info)  in  effect  on  the  date  of
46	   publication  of  this  document.    Please  review  these  documents
47	   carefully,  as  they  describe  your  rights  and  restrictions  with
48	   respect to this document.  Code Components extracted from this
49	   document must include Simplified BSD License text as described in
50	   Section 4.e of the Trust Legal Provisions and are provided without
51	   warranty as described in the Simplified BSD License.

53	Abstract

55	   This memo describes an RTP payload format for the video coding
56	   standard  ITU-T  Recommendation  H.265  and  ISO/IEC  International
57	   Standard 23008-2, both also known as High Efficiency Video Coding
58	   (HEVC) [HEVC], developed by the Joint Collaborative Team on Video
59	   Coding (JCT-VC).  The RTP payload format allows for packetization of
60	   one or more Network Abstraction Layer (NAL) units in each RTP packet
61	   payload, as well as fragmentation of a NAL unit into multiple RTP
62	   packets.  Furthermore, it supports transmission of an HEVC stream
63	   over a single as well as multiple RTP flows.  The payload format has
64	   wide applicability in videoconferencing, Internet video streaming,
65	   and high bit-rate entertainment-quality video, among others.

67	Table of Contents

69	   Status of this Memo...............................................1
70	   Abstract..........................................................3
71	   Table of Contents.................................................3
72	   1 . Introduction..................................................5
73	      1.1 . Overview of the HEVC Codec...............................5
74	         1.1.1 Coding-Tool Features..................................5
75	         1.1.2 Systems and Transport Interfaces......................7
76	         1.1.3 Parallel Processing Support..........................14
77	         1.1.4 NAL Unit Header......................................16
78	      1.2 . Overview of the Payload Format..........................17
79	   2 . Conventions..................................................18
80	   3 . Definitions and Abbreviations................................18
81	      3.1 Definitions...............................................18
82	         3.1.1 Definitions from the HEVC Specification..............18
83	         3.1.2 Definitions Specific to This Memo....................20
84	      3.2 Abbreviations.............................................21
85	   4 . RTP Payload Format...........................................23
86	      4.1 RTP Header Usage..........................................23
87	      4.2 Payload Header Usage......................................25
88	      4.3 Payload Structures........................................25
89	      4.4 Transmission Modes........................................26
90	      4.5 Decoding Order Number.....................................27
91	      4.6 Single NAL Unit Packets...................................28
92	      4.7 Aggregation Packets (APs).................................29
93	      4.8 Fragmentation Units (FUs).................................34
94	      4.9 PACI packets..............................................37
95	         4.9.1 Reasons for the PACI rules (informative).............40
96	      4.10 Payload Header Extensions................................41
97	   5 . Packetization Rules..........................................43
98	   6 . De-packetization Process.....................................43
99	   7 . Payload Format Parameters....................................45
100	      7.1 Media Type Registration...................................45
101	      7.2 SDP Parameters............................................64
102	         7.2.1 Mapping of Payload Type Parameters to SDP............64
103	         7.2.2 Usage with SDP Offer/Answer Model....................65
104	         7.2.3 Usage in Declarative Session Descriptions............73
105	         7.2.4 Parameter Sets Considerations........................74
106	         7.2.5 Dependency Signaling in Multi-Session Transmission...74
107	   8 . Use with Feedback Messages...................................75
108	      8.1 Use of HEVC with the RPSI Feedback Message................76
109	   9 . Security Considerations......................................76
110	   10 . Congestion Control..........................................78
111	   11 . IANA Consideration..........................................79
112	   12 . Acknowledgements............................................79
113	   13 . References..................................................79
114	      13.1 Normative References.....................................79
115	      13.2 Informative References...................................81
116	   14 . Authors' Addresses..........................................82

118	1. Introduction

120	1.1. Overview of the HEVC Codec

122	   High  Efficiency  Video  Coding  [HEVC],  formally  known  as  ITU-T
123	   Recommendation H.265 and ISO/IEC International Standard 23008-2 was
124	   ratified by ITU-T in April 2013 and reportedly provides significant
125	   coding efficiency gains over H.264 [H.264].

127	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
128	   widely deployed and generally known in the relevant implementer
129	   communities,  frequently  only  the  differences  between  those  two
130	   specifications are highlighted in non-normative, explanatory parts
131	   of this memo.  Basic familiarity with both specifications is assumed
132	   for those parts.  However, the normative parts of this memo do not
133	   require study of H.264 or its RTP payload format.

135	   H.264  and  HEVC  share  a  similar  hybrid  video  codec  design.
136	   Conceptually, both technologies include a video coding layer (VCL),
137	   which is often used to refer to the coding-tool features, and a
138	   network abstraction layer (NAL), which is often used to refer to the
139	   systems and transport interface aspects of the codecs.

141	1.1.1 Coding-Tool Features

143	   Similarly to earlier hybrid-video-coding-based standards, including
144	   H.264, the following basic video coding design is employed by HEVC.
145	   A prediction signal is first formed either by intra or motion
146	   compensated prediction, and the residual (the difference between the
147	   original and the prediction) is then coded.  The gains in coding
148	   efficiency are achieved by redesigning and improving almost all
149	   parts of the codec over earlier designs.  In addition, HEVC includes
150	   several tools to make the implementation on parallel architectures
151	   easier.  Below is a summary of HEVC coding-tool features.

153	   Quad-tree block and transform structure

155	   One of the major tools that contribute significantly to the coding
156	   efficiency of HEVC is the usage of flexible coding blocks and
157	   transforms, which are defined in a hierarchical quad-tree manner.
158	   Unlike H.264, where the basic coding block is a macroblock of fixed
159	   size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
160	   of 64x64.  Each CTU can be divided into smaller units in a
161	   hierarchical quad-tree manner and can represent smaller blocks down
162	   to size 4x4.  Similarly, the transforms used in HEVC can have
163	   different sizes, starting from 4x4 and going up to 32x32.  Utilizing
164	   large blocks and transforms contribute to the major gain of HEVC,
165	   especially at high resolutions.

167	   Entropy coding

169	   HEVC uses a single entropy coding engine, which is based on Context
170	   Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two
171	   distinct  entropy  coding  engines.    CABAC  in  HEVC  shares  many
172	   similarities with CABAC of H.264, but contains several improvements.
173	   Those  include  improvements  in  coding  efficiency  and  lowered
174	   implementation complexity, especially for parallel architectures.

176	   In-loop filtering

178	   H.264 includes an in-loop adaptive deblocking filter, where the
179	   blocking artifacts around the transform edges in the reconstructed
180	   picture are smoothed to improve the picture quality and compression
181	   efficiency.  In HEVC, a similar deblocking filter is employed but
182	   with somewhat lower complexity.  In addition, pictures undergo a
183	   subsequent filtering operation called Sample Adaptive Offset (SAO),
184	   which is a new design element in HEVC.  SAO basically adds a pixel-
185	   level offset in an adaptive manner and usually acts as a de-ringing
186	   filter.  It is observed that SAO improves the picture quality,
187	   especially around sharp edges contributing substantially to visual
188	   quality improvements of HEVC.

190	   Motion prediction and coding

192	   There have been a number of improvements in this area that are
193	   summarized as follows.  The first category is motion merge and
194	   advanced  motion  vector  prediction  (AMVP)  modes.    The  motion
195	   information of a prediction block can be inferred from the spatially
196	   or temporally neighboring blocks.  This is similar to the DIRECT
197	   mode in H.264 but includes new aspects to incorporate the flexible
198	   quad-tree   structure   and   methods   to   improve   the   parallel
199	   implementations.  In addition, the motion vector predictor can be
200	   signaled for improved efficiency.  The second category is high-
201	   precision  interpolation.    The  interpolation  filter  length  is
202	   increased to 8-tap from 6-tap, which improves the coding efficiency
203	   but  also  comes  with  increased  complexity.    In  addition,  the
204	   interpolation filter is defined with higher precision without any
205	   intermediate  rounding  operations  to  further  improve  the  coding
206	   efficiency.

208	   Intra prediction and intra coding

210	   Compared to 8 intra prediction modes in H.264, HEVC supports angular
211	   intra prediction with 33 directions.  This increased flexibility
212	   improves both objective coding efficiency and visual quality as the
213	   edges can be better predicted and ringing artifacts around the edges
214	   can be reduced.  In addition, the reference samples are adaptively
215	   smoothed based on the prediction direction.  To avoid contouring
216	   artifacts a new interpolative prediction generation is included to
217	   improve the visual quality.  Furthermore, discrete sine transform
218	   (DST) is utilized instead of traditional discrete cosine transform
219	   (DCT) for 4x4 intra transform blocks.

221	   Other coding-tool features

223	   HEVC includes some tools for lossless coding and efficient screen
224	   content coding, such as skipping the transform for certain blocks.
225	   These tools are particularly useful for example when streaming the
226	   user-interface of a mobile device to a large display.

228	1.1.2 Systems and Transport Interfaces

230	   HEVC inherited the basic systems and transport interfaces designs,
231	   such as the NAL-unit-based syntax structure, the hierarchical syntax
232	   and data unit structure from sequence-level parameter sets, multi-
233	   picture-level or picture-level parameter sets, slice-level header
234	   parameters,  lower-level  parameters,  the  supplemental  enhancement
235	   information  (SEI)  message  mechanism,  the  hypothetical  reference
236	   decoder (HRD) based video buffering model, and so on.  In the
237	   following, a list of differences in these aspects compared to H.264
238	   is summarized.

240	   Video parameter set

242	   A new type of parameter set, called video parameter set (VPS), was
243	   introduced.  For the first (2013) version of [HEVC], the video
244	   parameter set NAL unit is required to be available prior to its
245	   activation, while the information contained in the video parameter
246	   set is not necessary for operation of the decoding process.  For
247	   future HEVC extensions, such as the 3D or scalable extensions, the
248	   video parameter set is expected to include information necessary for
249	   operation of the decoding process, e.g. decoding dependency or
250	   information for reference picture set construction of enhancement
251	   layers.  The VPS provides a "big picture" of a bitstream, including
252	   what types of operation points are provided, the profile, tier, and
253	   level of the operation points, and some other high-level properties
254	   of  the  bitstream  that  can  be  used  as  the  basis  for  session
255	   negotiation and content selection, etc. (see section 7.1).

257	   Profile, tier and level

259	   The profile, tier and level syntax structure that can be included in
260	   both VPS and sequence parameter set (SPS) includes 12 bytes data to
261	   describe the entire bitstream (including all temporally scalable
262	   layers,  which  are  referred  to  as  sub-layers  in  the  HEVC
263	   specification), and can optionally include more profile, tier and
264	   level  information  pertaining  to  individual  temporally  scalable
265	   layers.  The profile indicator indicates the "best viewed as"
266	   profile when the bitstream conforms to multiple profiles, similar to
267	   the major brand concept in the ISO base media file format (ISOBMFF)
268	   [ISOBMFF] and file formats derived based on ISOBMFF, such as the
269	   3GPP  file  format  [3GP].    The  profile,  tier  and  level  syntax
270	   structure also includes the indications of whether the bitstream is
271	   free of frame-packed content, whether the bitstream is free of
272	   interlaced source content and free of field pictures, i.e. contains
273	   only frame pictures of progressive source, such that clients/players
274	   with no support of post-processing functionalities for handling of
275	   frame-packed or interlaced source content or field pictures can
276	   reject those bitstreams.

278	   Bitstream and elementary stream

280	   HEVC includes a definition of an elementary stream, which is new
281	   compared to H.264.  An elementary stream consists of a sequence of
282	   one or more bitstreams.  An elementary stream that consists of two
283	   or more bitstreams has typically been formed by splicing together
284	   two or more bitstreams (or parts thereof).  When an elementary
285	   stream contains more than one bitstream, the last NAL unit of the
286	   last access unit of a bitstream (except the last bitstream in the
287	   elementary stream) must contain an end of bitstream NAL unit and the
288	   first access unit of the subsequent bitstream must be an intra
289	   random access point (IRAP) access unit.  This IRAP access unit may
290	   be a clean random access (CRA), broken link access (BLA), or
291	   instantaneous decoding refresh (IDR) access unit.

293	   Random access support

295	   HEVC includes signaling in NAL unit header, through NAL unit types,
296	   of IRAP pictures beyond IDR pictures.  Three types of IRAP pictures,
297	   namely IDR, CRA and BLA pictures are supported, wherein IDR pictures
298	   are conventionally referred to as closed group-of-pictures (closed-
299	   GOP) random access points, and CRA and BLA pictures are those
300	   conventionally referred to as open-GOP random access points.  BLA
301	   pictures usually originate from splicing of two bitstreams or part
302	   thereof at a CRA picture, e.g. during stream switching.  To enable
303	   better systems usage of IRAP pictures, altogether six different NAL
304	   units are defined to signal the properties of the IRAP pictures,
305	   which can be used to better match the stream access point (SAP)
306	   types as defined in the ISOBMFF [ISOBMFF], which are utilized for
307	   random access support in both 3GP-DASH [3GPDASH] and MPEG DASH
308	   [MPEGDASH].  Pictures following an IRAP picture in decoding order
309	   and preceding the IRAP picture in output order are referred to as
310	   leading pictures associated with the IRAP picture.  There are two
311	   types of leading pictures, namely random access decodable leading
312	   (RADL) pictures and random access skipped leading (RASL) pictures.
313	   RADL  pictures  are  decodable  when  the  decoding  started  at  the
314	   associated IRAP picture, and RASL pictures are not decodable when
315	   the decoding started at the associated IRAP picture and are usually
316	   discarded.  HEVC provides mechanisms to enable the specification of
317	   conformance of bitstreams with RASL pictures being discarded, thus
318	   to provide a standard-compliant way to enable systems components to
319	   discard RASL pictures when needed.

321	   Temporal scalability support

323	   HEVC  includes  an  improved  support  of  temporal  scalability,  by
324	   inclusion of the signaling of TemporalId in the NAL unit header, the
325	   restriction that pictures of a particular temporal sub-layer cannot
326	   be used for inter prediction reference by pictures of a lower
327	   temporal sub-layer, the sub-bitstream extraction process, and the
328	   requirement  that  each  sub-bitstream  extraction  output  be  a
329	   conforming bitstream.  Media-aware network elements (MANEs) can
330	   utilize the TemporalId in the NAL unit header for stream adaptation
331	   purposes based on temporal scalability.

333	   Temporal sub-layer switching support

335	   HEVC specifies, through NAL unit types present in the NAL unit
336	   header,  the  signaling  of  temporal  sub-layer  access  (TSA)  and
337	   stepwise temporal sub-layer access (STSA).  A TSA picture and
338	   pictures following the TSA picture in decoding order do not use
339	   pictures prior to the TSA picture in decoding order with TemporalId
340	   greater  than  or  equal  to  that  of  the  TSA  picture  for  inter
341	   prediction reference.  A TSA picture enables up-switching, at the
342	   TSA picture, to the sub-layer containing the TSA picture or any
343	   higher sub-layer, from the immediately lower sub-layer.  An STSA
344	   picture does not use pictures with the same TemporalId as the STSA
345	   picture for inter prediction reference.  Pictures following an STSA
346	   picture in decoding order with the same TemporalId as the STSA
347	   picture do not use pictures prior to the STSA picture in decoding
348	   order with the same TemporalId as the STSA picture for inter
349	   prediction reference.  An STSA picture enables up-switching, at the
350	   STSA picture, to the sub-layer containing the STSA picture, from the
351	   immediately lower sub-layer.

353	   Sub-layer reference or non-reference pictures

355	   The concept and signaling of reference/non-reference pictures in
356	   HEVC are different from H.264.  In H.264, if a picture may be used
357	   by any other picture for inter prediction reference, it is a
358	   reference picture; otherwise it is a non-reference picture, and this
359	   is signaled by two bits in the NAL unit header.  In HEVC, a picture
360	   is called a reference picture only when it is marked as "used for
361	   reference".  In addition, the concept of sub-layer reference picture
362	   was introduced.  If a picture may be used by another other picture
363	   with the same TemporalId for inter prediction reference, it is a
364	   sub-layer  reference  picture;  otherwise  it  is  a  sub-layer  non-
365	   reference picture.  Whether a picture is a sub-layer reference
366	   picture or sub-layer non-reference picture is signaled through NAL
367	   unit type values.

369	   Extensibility

371	   Besides the TemporalId in the NAL unit header, HEVC also includes
372	   the signaling of a six-bit layer ID in the NAL unit header, which
373	   must  be  equal  to  0  for  a  single-layer  bitstream.    Extension
374	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice
375	   headers, and so on.  All these extension mechanisms enable future
376	   extensions in a backward compatible manner, such that bitstreams
377	   encoded according to potential future HEVC extensions can be fed to
378	   then-legacy decoders (e.g. HEVC version 1 decoders) and the then-
379	   legacy decoders can decode and output the base layer bitstream.

381	   Bitstream extraction

383	   HEVC includes a bitstream extraction process as an integral part of
384	   the overall decoding process, as well as specification of the use of
385	   the  bitstream  extraction  process  in  description  of  bitstream
386	   conformance tests as part of the hypothetical reference decoder
387	   (HRD) specification.

389	   Reference picture management

391	   The  reference  picture  management  of  HEVC,  including  reference
392	   picture marking and removal from the decoded picture buffer (DPB) as
393	   well as reference picture list construction (RPLC), differs from
394	   that of H.264.  Instead of the sliding window plus adaptive memory
395	   management control operation (MMCO) based reference picture marking
396	   mechanism in H.264, HEVC specifies a reference picture set (RPS)
397	   based reference picture management and marking mechanism, and the
398	   RPLC is consequently based on the RPS mechanism.  A reference
399	   picture set consists of a set of reference pictures associated with
400	   a picture, consisting of all reference pictures that are prior to
401	   the associated picture in decoding order, that may be used for inter
402	   prediction of the associated picture or any picture following the
403	   associated picture in decoding order.  The reference picture set
404	   consists of five lists of reference pictures; RefPicSetStCurrBefore,
405	   RefPicSetStCurrAfter,    RefPicSetStFoll,    RefPicSetLtCurr    and
406	   RefPicSetLtFoll.    RefPicSetStCurrBefore,  RefPicSetStCurrAfter  and
407	   RefPicSetLtCurr contain all reference pictures that may be used in
408	   inter prediction of the current picture and that may be used in
409	   inter prediction of one or more of the pictures following the
410	   current   picture   in   decoding   order.      RefPicSetStFoll   and
411	   RefPicSetLtFoll consist of all reference pictures that are not used
412	   in inter prediction of the current picture but may be used in inter
413	   prediction of one or more of the pictures following the current
414	   picture in decoding order.  RPS provides an "intra-coded" signaling
415	   of the DPB status, instead of an "inter-coded" signaling, mainly for
416	   improved error resilience.  The RPLC process in HEVC is based on the
417	   RPS, by signaling an index to an RPS subset for each reference
418	   index.  The RPLC process has been simplified compared to that in
419	   H.264, by removal of the reference picture list modification (also
420	   referred to as reference picture list reordering) process.

422	   Ultra low delay support

424	   HEVC specifies a sub-picture-level HRD operation, for support of the
425	   so-called ultra-low delay.  The mechanism specifies a standard-
426	   compliant way to enable delay reduction below one picture interval.
427	   Sub-picture-level coded picture buffer (CPB) and DPB parameters may
428	   be signaled, and utilization of these information for the derivation
429	   of CPB timing (wherein the CPB removal time corresponds to decoding
430	   time) and DPB output timing (display time) is specified.  Decoders
431	   are allowed to operate the HRD at the conventional access-unit-
432	   level, even when the sub-picture-level HRD parameters are present.

434	   New SEI messages

436	   HEVC inherits many H.264 SEI messages with changes in syntax and/or
437	   semantics making them applicable to HEVC.  Additionally, there are a
438	   few new SEI messages reviewed briefly in the following paragraphs.

440	   The structure of pictures SEI message provides information on the
441	   NAL  unit  types,  picture  order  count  values,  and  prediction
442	   dependencies of a sequence of pictures.  The SEI message can be used
443	   for example for concluding what impact a lost picture has on other
444	   pictures.

446	   The decoded picture hash SEI message provides a checksum derived
447	   from the sample values of a decoded picture.  It can be used for
448	   detecting whether a picture was correctly received and decoded.

450	   The active parameter sets SEI message includes the IDs of the active
451	   video parameter set and the active sequence parameter set and can be
452	   used to activate VPSs and SPSs.  In addition, the SEI message
453	   includes the following indications: 1) An indication of whether
454	   "full  random  accessibility"  is  supported  (when  supported,  all
455	   parameter sets needed for decoding of the remaining of the bitstream
456	   when random accessing from the beginning of the current coded video
457	   sequence  by  completely  discarding  all  access  units  earlier  in
458	   decoding order are present in the remaining bitstream and all coded
459	   pictures in the remaining bitstream can be correctly decoded); 2) An
460	   indication of whether there is no parameter set within the current
461	   coded video sequence that updates another parameter set of the same
462	   type preceding in decoding order.  An update of a parameter set
463	   refers to the use of the same parameter set ID but with some other
464	   parameters changed.  If this property is true for all coded video
465	   sequences in the bitstream, then all parameter sets can be sent out-
466	   of-band before session start.

468	   The decoding unit information SEI message provides coded picture
469	   buffer removal delay information for a decoding unit.  The message
470	   can be used in very-low-delay buffering operations.

472	   The region refresh information SEI message can be used together with
473	   the recovery point SEI message (present in both H.264 and HEVC) for
474	   improved support of gradual decoding refresh (GDR).  This supports
475	   random access from inter-coded pictures, wherein complete pictures
476	   can be correctly decoded or recovered after an indicated number of
477	   pictures in output/display order.

479	1.1.3 Parallel Processing Support

481	   The reportedly significantly higher encoding computational demand of
482	   HEVC over H.264, in conjunction with the ever increasing video
483	   resolution (both spatially and temporally) required by the market,
484	   led to the adoption of VCL coding tools specifically targeted to
485	   allow for parallelization on the sub-picture level.  That is,
486	   parallelization occurs, at the minimum, at the granularity of an
487	   integer number of CTUs.  The targets for this type of high-level
488	   parallelization  are  multicore  CPUs  and  DSPs  as  well  as
489	   multiprocessor systems.  In a system design, to be useful, these
490	   tools require signaling support, which is provided in Section 7 of
491	   this memo.  This section provides a brief overview of the tools
492	   available in [HEVC].

494	   Many of the tools incorporated in HEVC were designed keeping in mind
495	   the potential parallel implementations in multi-core/multi-processor
496	   architectures.    Specifically,  for  parallelization,  four  picture
497	   partition strategies are available.

499	   Slices are segments of the bitstream that can be reconstructed
500	   independently from other slices within the same picture (though
501	   there  may  still  be  interdependencies  through  loop  filtering
502	   operations).  Slices are the only tool that can be used for
503	   parallelization that is also available, in virtually identical form,
504	   in H.264.  Slices based parallelization does not require much inter-
505	   processor or inter-core communication (except for inter-processor or
506	   inter-core data sharing for motion compensation when decoding a
507	   predictively coded picture, which is typically much heavier than
508	   inter-processor  or  inter-core  data  sharing  due  to  in-picture
509	   prediction), as slices are designed to be independently decodable.
510	   However,  for  the  same  reason,  slices  can  require  some  coding
511	   overhead.  Further, slices (in contrast to some of the other tools
512	   mentioned below) also serve as the key mechanism for bitstream
513	   partitioning to match Maximum Transfer Unit (MTU) size requirements,
514	   due to the in-picture independence of slices and the fact that each
515	   regular slice is encapsulated in its own NAL unit.  In many cases,
516	   the goal of parallelization and the goal of MTU size matching can
517	   place contradicting demands to the slice layout in a picture.  The
518	   realization of this situation led to the development of the more
519	   advanced tools mentioned below.  This payload format does not
520	   contain  any  specific  mechanisms  aiding  parallelization  through
521	   slices.

523	   Dependent slice segments allow for fragmentation of a coded slice
524	   into fragments at CTU boundaries without breaking any in-picture
525	   prediction mechanism.  They are complementary to the fragmentation
526	   mechanism described in this memo in that they need the cooperation
527	   of the encoder.  As a dependent slice segment necessarily contains
528	   an integer number of CTUs, a decoder using multiple cores operating
529	   on CTUs can process a dependent slice segment without communicating
530	   parts  of  the  slice  segment's  bitstream  to  other  cores.
531	   Fragmentation, as specified in this memo, in contrast, does not
532	   guarantee that a fragment contains an integer number of CTUs.

534	   In wavefront parallel processing (WPP), the picture is partitioned
535	   into rows of CTUs.  Entropy decoding and prediction are allowed to
536	   use data from CTUs in other partitions.  Parallel processing is
537	   possible through parallel decoding of CTU rows, where the start of
538	   the decoding of a row is delayed by two CTUs, so to ensure that data
539	   related to a CTU above and to the right of the subject CTU is
540	   available before the subject CTU is being decoded.  Using this
541	   staggered start (which appears like a wavefront when represented
542	   graphically),  parallelization  is  possible  with  up  to  as  many
543	   processors/cores as the picture contains CTU rows.

545	   Because in-picture prediction between neighboring CTU rows within a
546	   picture   is   allowed,   the   required   inter-processor/inter-core
547	   communication to enable in-picture prediction can be substantial.
548	   The WPP partitioning does not result in the creation of more NAL
549	   units compared to when it is not applied, thus WPP cannot be used
550	   for MTU size matching, though slices can be used in combination for
551	   that purpose.

553	   Tiles define horizontal and vertical boundaries that partition a
554	   picture into tile columns and rows.  The scan order of CTUs is
555	   changed to be local within a tile (in the order of a CTU raster scan
556	   of a tile), before decoding the top-left CTU of the next tile in the
557	   order of tile raster scan of a picture.  Similar to slices, tiles
558	   break in-picture prediction dependencies (including entropy decoding
559	   dependencies).  However, they do not need to be included into
560	   individual NAL units (same as WPP in this regard), hence tiles
561	   cannot be used for MTU size matching, though slices can be used in
562	   combination for that purpose.  Each tile can be processed by one
563	   processor/core,  and  the  inter-processor/inter-core  communication
564	   required for in-picture prediction between processing units decoding
565	   neighboring tiles is limited to conveying the shared slice header in
566	   cases a slice is spanning more than one tile, and loop filtering
567	   related sharing of reconstructed samples and metadata.  Insofar,
568	   tiles are less demanding in terms of inter-processor communication
569	   bandwidth compared to WPP due to the in-picture independence between
570	   two neighboring partitions.

572	1.1.4 NAL Unit Header

574	   HEVC maintains the NAL unit concept of H.264 with modifications.
575	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
576	   payload of a NAL unit refers to the NAL unit excluding the NAL unit
577	   header.

579	                     +---------------+---------------+
580	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
581	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
582	                     |F|   Type    |  LayerId  | TID |
583	                     +-------------+-----------------+

585	              Figure 1 The structure of HEVC NAL unit header

587	   The semantics of the fields in the NAL unit header are as specified
588	   in [HEVC] and described briefly below for convenience.  In addition
589	   to the name and size of each field, the corresponding syntax element
590	   name in [HEVC] is also provided.

592	   F: 1 bit
593	      forbidden_zero_bit.  MUST be zero.  HEVC declares a value of 1 as
594	      a syntax violation.  Note that the inclusion of this bit in the
595	      NAL unit header is to enable transport of HEVC video over MPEG-2
596	      transport systems (avoidance of start code emulations) [MPEG2S].

598	   Type: 6 bits
599	      nal_unit_type.  This field specifies the NAL unit type as defined
600	      in Table 7-1 of [HEVC].  If the most significant bit of this
601	      field of a NAL unit is equal to 0 (i.e. the value of this field
602	      is less than 32), the NAL unit is a VCL NAL unit.  Otherwise, the
603	      NAL unit is a non-VCL NAL unit.  For a reference of all currently
604	      defined NAL unit types and their semantics, please refer to
605	      Section 7.4.1 in [HEVC].

607	   LayerId: 6 bits
608	      nuh_layer_id.  MUST be equal to zero.  It is anticipated that in
609	      future  scalable  or  3D  video  coding  extensions  of  this
610	      specification, this syntax element will be used to identify
611	      additional  layers  that  may  be  present  in  the  coded  video
612	      sequence, wherein a layer may be, e.g. a spatial scalable layer,
613	      a quality scalable layer, a texture view, or a depth view.

615	   TID: 3 bits
616	      nuh_temporal_id_plus1.    This  field  specifies  the  temporal
617	      identifier of the NAL unit plus 1.  The value of TemporalId is
618	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
619	      there is at least one bit in the NAL unit header equal to 1, so
620	      to enable independent considerations of start code emulations in
621	      the NAL unit header and in the NAL unit payload data.

623	1.2. Overview of the Payload Format

625	   This payload format defines the following processes required for
626	   transport of HEVC coded data over RTP [RFC3550]:

628	   o Usage of RTP header with this payload format

630	   o Packetization of HEVC coded NAL units into RTP packets using three
631	     types of payload structures, namely single NAL unit packet,
632	     aggregation packet, and fragment unit

634	   o Transmission of HEVC NAL units of the same bitstream within a
635	     single RTP stream (note that RTP stream is used equivalently as
636	     RTP flow in this memo) or multiple RTP streams

638	   o Media type parameters to be used with the Session Description
639	     Protocol (SDP) [RFC4566]

641	   o A payload header extension mechanism and data structures for
642	     enhanced support of temporal scalability based on that extension
643	     mechanism.

645	2. Conventions

647	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
648	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
649	   document are to be interpreted as described in BCP 14, RFC 2119
650	   [RFC2119].

652	   In  this  document,  these  key  words  will  appear  with  that
653	   interpretation only when in ALL CAPS.  Lower case uses of these
654	   words  are  not  to  be  interpreted  as  carrying  the  RFC  2119
655	   significance.

657	   This specification uses the notion of setting and clearing a bit
658	   when bit fields are handled.  Setting a bit is the same as assigning
659	   that bit the value of 1 (On).  Clearing a bit is the same as
660	   assigning that bit the value of 0 (Off).

662	3. Definitions and Abbreviations

664	3.1 Definitions

666	   This document uses the terms and definitions of [HEVC].  Section
667	   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
668	   Section 3.1.2 gives definitions specific to this memo.

670	3.1.1 Definitions from the HEVC Specification

672	   access unit: A set of NAL units that are associated with each other
673	   according to a specified classification rule, are consecutive in
674	   decoding order, and contain exactly one coded picture.

676	   BLA access unit: An access unit in which the coded picture is a BLA
677	   picture.

679	   BLA picture: An IRAP picture for which each VCL NAL unit has
680	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

682	   coded video sequence: A sequence of access units that consists, in
683	   decoding order, of an IRAP access unit with NoRaslOutputFlag equal
684	   to 1, followed by zero or more access units that are not IRAP access
685	   units with NoRaslOutputFlag equal to 1, including all subsequent
686	   access units up to but not including any subsequent access unit that
687	   is an IRAP access unit with NoRaslOutputFlag equal to 1.

689	      Informative note: An IRAP access unit may be an IDR access unit,
690	      a  BLA  access  unit,  or  a  CRA  access  unit.    The  value  of
691	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
692	      access unit, and each CRA access unit that is the first access
693	      unit in the bitstream in decoding order, is the first access unit
694	      that follows an end of sequence NAL unit in decoding order, or
695	      has HandleCraAsBlaFlag equal to 1.

697	   CRA access unit: An access unit in which the coded picture is a CRA
698	   picture.

700	   CRA  picture:  A  RAP  picture  for  which  each  VCL  NAL  unit  has
701	   nal_unit_type equal to CRA_NUT.

703	   IDR access unit: An access unit in which the coded picture is an IDR
704	   picture.

706	   IDR  picture:  A  RAP  picture  for  which  each  VCL  NAL  unit  has
707	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

709	   IRAP access unit: An access unit in which the coded picture is an
710	   IRAP picture.

712	   IRAP picture: A coded picture for which each VCL NAL unit has
713	   nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.

715	   layer: A set of VCL NAL units that all have a particular value of
716	   nuh_layer_id and the associated non-VCL NAL units, or one of a set
717	   of syntactical structures having a hierarchical relationship.

719	   operation  point:  bitstream  created  from  another  bitstream  by
720	   operation of the sub-bitstream extraction process with the another
721	   bitstream,  a  target  highest  TemporalId,  and  a  target  layer
722	   identifier list as inputs.

724	   random access: The act of starting the decoding process for a
725	   bitstream at a point other than the beginning of the stream.

727	   sub-layer:  A  temporal  scalable  layer  of  a  temporal  scalable
728	   bitstream consisting of VCL NAL units with a particular value of the
729	   TemporalId variable, and the associated non-VCL NAL units.

731	   tile: A rectangular region of coding tree blocks within a particular
732	   tile column and a particular tile row in a picture.

734	   tile column: A rectangular region of coding tree blocks having a
735	   height equal to the height of the picture and a width specified by
736	   syntax elements in the picture parameter set.

738	   tile row: A rectangular region of coding tree blocks having a height
739	   specified by syntax elements in the picture parameter set and a
740	   width equal to the width of the picture.

742	3.1.2 Definitions Specific to This Memo

744	   dependent RTP stream: An RTP stream in an MST on which another RTP
745	   stream depends.

747	   highest RTP stream: The RTP stream in an MST on which no other RTP
748	   stream depends.

750	   media aware network element (MANE): A network element, such as a
751	   middlebox or application layer gateway that is capable of parsing
752	   certain aspects of the RTP payload headers or the RTP payload and
753	   reacting to their contents.

755	      Informative note: The concept of a MANE goes beyond normal
756	      routers or gateways in that a MANE has to be aware of the
757	      signaling (e.g. to learn about the payload type mappings of the
758	      media streams), and in that it has to be trusted when working
759	      with SRTP.  The advantage of using MANEs is that they allow
760	      packets to be dropped according to the needs of the media coding.
761	      For example, if a MANE has to drop packets due to congestion on a
762	      certain link, it can identify and remove those packets whose
763	      elimination  produces  the  least  adverse  effect  on  the  user
764	      experience.  After dropping packets, MANEs must rewrite RTCP
765	      packets to match the changes to the RTP stream as specified in
766	      Section 7 of [RFC3550].

768	   multi-stream transmission (MST): Transmission of an HEVC bitstream
769	   using more than one RTP stream.

771	   NAL unit decoding order: A NAL unit order that conforms to the
772	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

774	   NALU-time: The value that the RTP timestamp would have if the NAL
775	   unit would be transported in its own RTP packet.

777	   RTP stream: A sequence of RTP packets with increasing sequence
778	   numbers (except for wrap-around), identical PT and identical SSRC
779	   (Synchronization Source), carried in one RTP session.  Within the
780	   scope of this memo, one RTP stream is utilized to transport one or
781	   more temporal sub-layers.

783	   single-stream transmission (SST): Transmission of an HEVC bitstream
784	   using only one RTP stream.

786	   transmission order: The order of packets in ascending RTP sequence
787	   number order (in modulo arithmetic).  Within an aggregation packet,
788	   the NAL unit transmission order is the same as the order of
789	   appearance of NAL units in the packet.

791	3.2 Abbreviations

793	   AP       Aggregation Packet

795	   BLA      Broken Link Access

797	   CRA      Clean Random Access

799	   CTB      Coding Tree Block

801	   CTU      Coding Tree Unit

803	   CVS      Coded Video Sequence

805	   FU       Fragmentation Unit
806	   GDR      Gradual Decoding Refresh

808	   HRD      Hypothetical Reference Decoder

810	   IDR      Instantaneous Decoding Refresh

812	   IRAP     Intra Random Access Point

814	   MANE     Media Aware Network Element

816	   MST      Multi-Stream Transmission

818	   MTU      Maximum Transfer Unit

820	   NAL      Network Abstraction Layer

822	   NALU     Network Abstraction Layer Unit

824	   PACI     PAyload Content Information

826	   PHES     Payload Header Extension Structure

828	   PPS      Picture Parameter Set

830	   RADL     Random Access Decodable Leading (Picture)

832	   RASL     Random Access Skipped Leading (Picture)

834	   RPS      Reference Picture Set

836	   SEI      Supplemental Enhancement Information

838	   SPS      Sequence Parameter Set

840	   SST      Single-Stream Transmission

842	   STSA     Step-wise Temporal Sub-layer Access

844	   TSA      Temporal Sub-layer Access

846	   VCL      Video Coding Layer

848	   VPS      Video Parameter Set

850	4. RTP Payload Format

852	4.1 RTP Header Usage

854	   The format of the RTP header is specified in [RFC3550] and reprinted
855	   in Figure 2 for convenience.  This payload format uses the fields of
856	   the header in a manner consistent with that specification.

858	   The RTP payload (and the settings for some RTP header bits) for
859	   aggregation  packets  and  fragmentation  units  are  specified  in
860	   Sections 4.7 and 4.8, respectively.

862	    0                   1                   2                   3
863	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
864	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
865	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
866	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
867	   |                           timestamp                           |
868	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
869	   |           synchronization source (SSRC) identifier            |
870	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
871	   |            contributing source (CSRC) identifiers             |
872	   |                             ....                              |
873	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

875	                Figure 2 RTP header according to [RFC3550]

877	   The RTP header information to be set according to this RTP payload
878	   format is set as follows:

880	   Marker bit (M): 1 bit

882	      Set for the last packet of the access unit indicated by the RTP
883	      timestamp, in line with the normal use of the M bit in video
884	      formats, to allow an efficient playout buffer handling.  Decoders
885	      can use this bit as an early indication of the last packet of an
886	      access unit.

888	         Informative note: The content of a NAL unit does not tell
889	         whether or not the NAL unit is the last NAL unit, in decoding
890	         order, of an access unit.  An RTP sender implementation may
891	         obtain this information from the video encoder.  If, however,
892	         the implementation cannot obtain this information directly
893	         from the encoder, e.g. when the stream was pre-encoded, and
894	         also there is no timestamp allocated for each NAL unit, then
895	         the sender implementation can inspect subsequent NAL units in
896	         decoding order to determine whether or not the NAL unit is the
897	         last NAL unit of an access unit as follows.  A NAL unit naluX
898	         is the last NAL unit of an access unit if it is the last NAL
899	         unit of the stream or the next VCL NAL unit naluY in decoding
900	         order has the high-order bit of the first byte after its NAL
901	         unit header equal to 1, and all NAL units between naluX and
902	         naluY, when present, have nal_unit_type in the range of 32 to
903	         35, inclusive, equal to 39, or in the ranges of 41 to 44,
904	         inclusive, or 48 to 55, inclusive.

906	   Payload type (PT): 7 bits

908	      The assignment of an RTP payload type for this new packet format
909	      is outside the scope of this document and will not be specified
910	      here.  The assignment of a payload type has to be performed
911	      either through the profile used or in a dynamic way.

913	   Sequence number (SN): 16 bits

915	      Set and used in accordance with RFC 3550.

917	   Timestamp: 32 bits

919	      The RTP timestamp is set to the sampling timestamp of the
920	      content.  A 90 kHz clock rate MUST be used.

922	      If the NAL unit has no timing properties of its own (e.g.
923	      parameter set and SEI NAL units), the RTP timestamp is set to the
924	      RTP timestamp of the coded picture of the access unit in which
925	      the NAL unit is included, according to Section 7.4.2.4.4 of
926	      [HEVC].

928	      Receivers SHOULD ignore the picture output timing information in
929	      any picture timing SEI messages or decoding unit information SEI
930	      messages as specified in [HEVC].  Instead, receivers SHOULD use
931	      the RTP timestamp for the display process.  Receivers MUST pass
932	      picture timing SEI messages and decoding unit information SEI
933	      messages to the decoder and MAY use the field/frame related
934	      information for the display process e.g. when frame doubling or
935	      frame  tripling  is  indicated  by  the  field/frame  related
936	      information.

938	4.2 Payload Header Usage

940	   The TID value indicates (among other things) the relative importance
941	   of an RTP packet, for example because NAL units belonging to higher
942	   temporal sub-layers are not used for the decoding of lower temporal
943	   sub-layers.  A lower value of TID indicates a higher importance.
944	   More  important  NAL  units  MAY  be  better  protected  against
945	   transmission losses than less important NAL units.

947	4.3 Payload Structures

949	   The first two bytes of the payload of an RTP packet are referred to
950	   as the payload header.  In most cases, the payload header consists
951	   of the same fields (F, Type, LayerId, and TID) as the NAL unit
952	   header as shown in section 1.1.4, irrespective of the type of the
953	   payload structure.  The single exception is an RTP packet carrying a
954	   Payload Content Information (PACI) NAL-unit like structure.

956	   Four different types of RTP packet payload structures are specified.
957	   A receiver can identify the type of an RTP packet payload through
958	   the Type field in the payload header.

960	   The four different payload structures are as follows:

962	   o  Single NAL unit packet: Contains a single NAL unit in the
963	      payload, and the NAL unit header of the NAL unit also serves as
964	      the payload header.  This payload structure is specified in
965	      section 4.6.

967	   o  Aggregation packet (AP): Contains more than one NAL unit within
968	      one  access  unit.    This  payload  structure  is  specified  in
969	      section 4.7.

971	   o  Fragmentation unit (FU): Contains a subset of a single NAL unit.
972	      This payload structure is specified in section 4.8.

974	   o  PACI carrying RTP packet: Contains a payload header (that differs
975	      from other payload headers for efficiency), a Payload Header
976	      Extension Structure (PHES), and a PACI payload.  This payload
977	      structure is specified in section 4.9.

979	4.4 Transmission Modes

981	   This memo enables transmission of an HEVC bitstream over a single
982	   RTP stream or multiple RTP streams.  The concept and working
983	   principle is inherited from the design of single and multiple
984	   session transmission in [RFC6190] and follows a similar design.  If
985	   only one RTP stream is used for transmission of the HEVC bitstream,
986	   the transmission mode is referred to as single-stream transmission
987	   (SST); otherwise (more than one RTP stream is used for transmission
988	   of the HEVC bitstream), the transmission mode is referred to as
989	   multi-stream transmission (MST).

991	   Dependency of one RTP stream on another RTP stream is indicated as
992	   specified in [RFC5583].  In MST, the RTP stream on which on other
993	   RTP stream depends is referred to as the highest RTP stream.  When
994	   an RTP stream A depends on another RTP stream B, the RTP stream B is
995	   referred to as a dependent RTP stream of the RTP stream A.

997	      Informative note: An MST may involve one or more RTP sessions.
998	      For example, each RTP stream in an MST may be in its own RTP
999	      session.  For another example, a set of multiple RTP streams in
1000	      an MST may belong to the same RTP session, e.g. as indicated by
1001	      the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or
1002	      [I-D.ietf-mmusic-sdp-bundle-negotiation].

1004	   SST SHOULD be used for point-to-point unicast scenarios, while MST
1005	   SHOULD be used for point-to-multipoint multicast scenarios where
1006	   different receivers require different operation points of the same
1007	   HEVC bitstream, to improve bandwidth utilizing efficiency.

1009	      Informative note: A multicast may degrade to a unicast after all
1010	      but one receivers have left (this is a justification of the first
1011	      "SHOULD" instead of "MUST"), and there might be scenarios where
1012	      MST is desirable but not possible e.g. when IP multicast is not
1013	      deployed in certain network (this is a justification of the
1014	      second "SHOULD" instead of "MUST").

1016	   Receivers MUST support both SST and MST.

1018	4.5 Decoding Order Number

1020	   For each NAL unit, the variable AbsDon is derived, representing the
1021	   decoding order number that is indicative of the NAL unit decoding
1022	   order.

1024	   Let NAL unit n be the n-th NAL unit in transmission order within an
1025	   RTP stream.

1027	   If sprop-depack-buf-nalus is equal to 0, AbsDon[n], the value of
1028	   AbsDon for NAL unit n, is derived as equal to n.

1030	   Otherwise (sprop-depack-buf-nalus is greater than 0), AbsDon[n] is
1031	   derived as follows, where DON[n] is the value of the variable DON
1032	   for NAL unit n:

1034	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
1035	      transmission order), AbsDon[0] is set equal to DON[0].

1037	   o  Otherwise (n is greater than 0), the following applies for
1038	      derivation of AbsDon[n]:

1040	            If DON[n] == DON[n-1],
1041	                AbsDon[n] = AbsDon[n-1]

1043	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1044	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1046	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1047	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1049	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1050	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1052	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1053	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1055	   For any two NAL units m and n, the following applies:

1057	   o  AbsDon[n]  greater  than  AbsDon[m]  indicates  that  NAL  unit  n
1058	      follows NAL unit m in NAL unit decoding order.

1060	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1061	      of the two NAL units can be in either order.

1063	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1064	      NAL unit m in decoding order.

1066	   When two consecutive NAL units in the NAL unit decoding order have
1067	   different values of AbsDon, the value of AbsDon for the second NAL
1068	   unit in decoding order MUST be greater than the value of AbsDon for
1069	   the first NAL unit, and the absolute difference between the two
1070	   AbsDon values MAY be greater than or equal to 1.

1072	      Informative note: There are multiple reasons to allow for the
1073	      absolute difference of the values of AbsDon for two consecutive
1074	      NAL units in the NAL unit decoding order to be greater than one.
1075	      An  increment  by  one  is  not  required,  as  at  the  time  of
1076	      associating values of AbsDon to NAL units, it may not be known
1077	      whether all NAL units are to be delivered to the receiver.  For
1078	      example, a gateway may not forward VCL NAL units of higher sub-
1079	      layers or some SEI NAL units when there is congestion in the
1080	      network.  In another example, the first intra picture of a pre-
1081	      encoded clip is transmitted in advance to ensure that it is
1082	      readily available in the receiver, and when transmitting the
1083	      first intra picture, the originator does not exactly know how
1084	      many NAL units will be encoded before the first intra picture of
1085	      the pre-encoded clip follows in decoding order.  Thus, the values
1086	      of AbsDon for the NAL units of the first intra picture of the
1087	      pre-encoded clip have to be estimated when they are transmitted,
1088	      and gaps in values of AbsDon may occur.  Another example is MST
1089	      where the AbsDon values must indicate cross-layer decoding order
1090	      for NAL units conveyed in all the RTP streams.

1092	4.6 Single NAL Unit Packets

1094	   A single NAL unit packet contains exactly one NAL unit, and consists
1095	   of a payload header (denoted as PayloadHdr), an optional 16-bit DONL
1096	   field (in network byte order), and the NAL unit payload data (the
1097	   NAL unit excluding its NAL unit header) of the contained NAL unit,
1098	   as shown in Figure 3.

1100	   0                   1                   2                   3
1101	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1102	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1103	   |           PayloadHdr          |        DONL (optional)        |
1104	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1105	   |                                                               |
1106	   |                  NAL unit payload data                        |
1107	   |                                                               |
1108	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1109	   |                               :...OPTIONAL RTP padding        |
1110	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1112	              Figure 3 The structure a single NAL unit packet

1114	   The payload header SHOULD be an exact copy of the NAL unit header of
1115	   the contained NAL unit.  However, the Type (i.e. nal_unit_type)
1116	   field MAY be changed, e.g. when it is desirable to handle a CRA
1117	   picture to be a BLA picture [JCTVC-J0107].

1119	   The DONL field, when present, specifies the value of the 16 least
1120	   significant bits of the decoding order number of the contained NAL
1121	   unit.

1123	   If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be
1124	   present, and the variable DON for the contained NAL unit is derived
1125	   as equal to the value of the DONL field.  Otherwise (sprop-depack-
1126	   buf-nalus is equal to 0), the DONL field MUST NOT be present.

1128	4.7 Aggregation Packets (APs)

1130	   Aggregation packets (APs) are introduced to enable the reduction of
1131	   packetization overhead for small NAL units, such as most of the non-
1132	   VCL NAL units, which are often only a few octets in size.

1134	   An AP aggregates NAL units within one access unit.  Each NAL unit to
1135	   be carried in an AP is encapsulated in an aggregation unit.  NAL
1136	   units aggregated in one AP are in NAL unit decoding order.

1138	   An AP consists of a payload header (denoted as PayloadHdr) followed
1139	   by two or more aggregation units, as shown in Figure 4.

1141	   0                   1                   2                   3
1142	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1143	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1144	   |           PayloadHdr          |                               |
1145	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1146	   |                                                               |
1147	   |             two or more aggregation units                     |
1148	   |                                                               |
1149	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1150	   |                               :...OPTIONAL RTP padding        |
1151	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1153	              Figure 4 The structure of an aggregation packet

1155	   The fields in the payload header are set as follows.  The F bit MUST
1156	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
1157	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be
1158	   equal to 48.  The value of LayerId MUST be equal to the lowest value
1159	   of LayerId of all the aggregated NAL units.  The value of TID MUST
1160	   be the lowest value of TID of all the aggregated NAL units.

1162	      Informative Note: All VCL NAL units in an AP have the same TID
1163	      value since they belong to the same access unit.  However, an AP
1164	      may contain non-VCL NAL units for which the TID value in the NAL
1165	      unit header may be different than the TID value of the VCL NAL
1166	      units in the same AP.

1168	   An AP MUST carry at least two aggregation units and can carry as
1169	   many aggregation units as necessary; however, the total amount of
1170	   data in an AP obviously MUST fit into an IP packet, and the size
1171	   SHOULD be chosen so that the resulting IP packet is smaller than the
1172	   MTU size so to avoid IP layer fragmentation.  An AP MUST NOT contain
1173	   Fragmentation Units (FUs) specified in section 4.8.  APs MUST NOT be
1174	   nested; i.e. an AP MUST NOT contain another AP.

1176	   The first aggregation unit in an AP consists of an optional 16-bit
1177	   DONL field (in network byte order) followed by a 16-bit unsigned
1178	   size information (in network byte order) that indicates the size of
1179	   the NAL unit in bytes (excluding these two octets, but including the
1180	   NAL unit header), followed by the NAL unit itself, including its NAL
1181	   unit header, as shown in Figure 5.

1183	   0                   1                   2                   3
1184	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1186	                   :        DONL (optional)        |   NALU size   |
1187	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1188	   |   NALU size   |                                               |
1189	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1190	   |                                                               |
1191	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1192	   |                               :
1193	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1195	       Figure 5 The structure of the first aggregation unit in an AP

1197	   The DONL field, when present, specifies the value of the 16 least
1198	   significant bits of the decoding order number of the aggregated NAL
1199	   unit.

1201	   If sprop-depack-buf-nalus is greater than 0, the DONL field MUST be
1202	   present in an aggregation unit that is the first aggregation unit in
1203	   an AP, and the variable DON for the aggregated NAL unit is derived
1204	   as equal to the value of the DONL field.  Otherwise (sprop-depack-
1205	   buf-nalus is equal to 0), the DONL field MUST NOT be present in an
1206	   aggregation unit that is the first aggregation unit in an AP.

1208	   An aggregation unit that is not the first aggregation unit in an AP
1209	   consists of an optional 8-bit DOND field followed by a 16-bit
1210	   unsigned size information (in network byte order) that indicates the
1211	   size of the NAL unit in bytes (excluding these two octets, but
1212	   including the NAL unit header), followed by the NAL unit itself,
1213	   including its NAL unit header, as shown in Figure 6.

1215	   0                   1                   2                   3
1216	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1217	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1218	                   : DOND(optional)|          NALU size            |
1219	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1220	   |                                                               |
1221	   |                       NAL unit                                |
1222	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1223	   |                               :
1224	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1226	    Figure 6 The structure of an aggregation unit that is not the first
1227	                         aggregation unit in an AP

1229	   When present, the DOND field plus 1 specifies the difference between
1230	   the decoding order number values of the current aggregated NAL unit
1231	   and the preceding aggregated NAL unit in the same AP.

1233	   If sprop-depack-buf-nalus is greater than 0, the DOND field MUST be
1234	   present in an aggregation unit that is not the first aggregation
1235	   unit in an AP, and the variable DON for the aggregated NAL unit is
1236	   derived as equal to the DON of the preceding aggregated NAL unit in
1237	   the same AP plus the value of the DOND field plus 1 modulo 65536.
1238	   Otherwise (sprop-depack-buf-nalus is equal to 0), the DOND field
1239	   MUST NOT be present in an aggregation unit that is not the first
1240	   aggregation unit in an AP.

1242	   Figure 7 presents an example of an AP that contains two aggregation
1243	   units, labeled as 1 and 2 in the figure, without the DONL and DOND
1244	   fields being present.

1246	    0                   1                   2                   3
1247	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1248	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1249	   |                          RTP Header                           |
1250	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1251	   |           PayloadHdr          |         NALU 1 Size           |
1252	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1253	   |          NALU 1 HDR           |                               |
1254	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1255	   |                   . . .                                       |
1256	   |                                                               |
1257	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1258	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1259	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1260	   | NALU 2 HDR    |                                               |
1261	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1262	   |                   . . .                                       |
1263	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1264	   |                               :...OPTIONAL RTP padding        |
1265	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1267	   Figure 7 An example of an AP packet containing two aggregation units
1268	                     without the DONL and DOND fields

1270	   Figure 8 presents an example of an AP that contains two aggregation
1271	   units, labeled as 1 and 2 in the figure, with the DONL and DOND
1272	   fields being present.

1274	    0                   1                   2                   3
1275	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1276	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1277	   |                          RTP Header                           |
1278	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1279	   |           PayloadHdr          |        NALU 1 DONL            |
1280	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1281	   |          NALU 1 Size          |            NALU 1 HDR         |
1282	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1283	   |                                                               |
1284	   |                 NALU 1 Data   . . .                           |
1285	   |                                                               |
1286	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1287	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1288	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1289	   |          NALU 2 HDR           |                               |
1290	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1291	   |                                                               |
1292	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1293	   |                               :...OPTIONAL RTP padding        |
1294	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1296	    Figure 8 An example of an AP containing two aggregation units with
1297	                         the DONL and DOND fields

1299	4.8 Fragmentation Units (FUs)

1301	   Fragmentation units (FUs) are introduced to enable fragmenting a
1302	   single  NAL  unit  into  multiple  RTP  packets,  possibly  without
1303	   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
1304	   unit consists of an integer number of consecutive octets of that NAL
1305	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1306	   order with ascending RTP sequence numbers (with no other RTP packets
1307	   within the same RTP stream being sent between the first and last
1308	   fragment).

1310	   When a NAL unit is fragmented and conveyed within FUs, it is
1311	   referred to as a fragmented NAL unit.  APs MUST NOT be fragmented.
1312	   FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of
1313	   another FU.

1315	   The RTP timestamp of an RTP packet carrying an FU is set to the
1316	   NALU-time of the fragmented NAL unit.

1318	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1319	   header of one octet, an optional 16-bit DONL field (in network byte
1320	   order), and an FU payload, as shown in Figure 9.

1322	    0                   1                   2                   3
1323	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1324	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1325	   |          PayloadHdr           |   FU header   | DONL(optional)|
1326	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1327	   | DONL(optional)|                                               |
1328	   |-+-+-+-+-+-+-+-+                                               |
1329	   |                         FU payload                            |
1330	   |                                                               |
1331	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1332	   |                               :...OPTIONAL RTP padding        |
1333	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1335	                      Figure 9 The structure of an FU

1337	   The fields in the payload header are set as follows.  The Type field
1338	   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal
1339	   to the fields F, LayerId, and TID, respectively, of the fragmented
1340	   NAL unit.

1342	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1343	   field, as shown in Figure 10.

1345	                             +---------------+
1346	                             |0|1|2|3|4|5|6|7|
1347	                             +-+-+-+-+-+-+-+-+
1348	                             |S|E|  FuType  |
1349	                             +---------------+

1351	                  Figure 10   The structure of FU header

1353	   The semantics of the FU header fields are as follows:
1354	   S: 1 bit
1355	      When set to one, the S bit indicates the start of a fragmented
1356	      NAL unit i.e. the first byte of the FU payload is also the first
1357	      byte of the payload of the fragmented NAL unit.  When the FU
1358	      payload is not the start of the fragmented NAL unit payload, the
1359	      S bit MUST be set to zero.

1361	   E: 1 bit
1362	      When set to one, the E bit indicates the end of a fragmented NAL
1363	      unit, i.e. the last byte of the payload is also the last byte of
1364	      the fragmented NAL unit.  When the FU payload is not the last
1365	      fragment of a fragmented NAL unit, the E bit MUST be set to zero.

1367	   FuType: 6 bits
1368	      The field FuType MUST be equal to the field Type of the
1369	      fragmented NAL unit.

1371	   The DONL field, when present, specifies the value of the 16 least
1372	   significant bits of the decoding order number of the fragmented NAL
1373	   unit.

1375	   If sprop-depack-buf-nalus is greater than 0, and the S bit is equal
1376	   to 1, the DONL field MUST be present in the FU, and the variable DON
1377	   for the fragmented NAL unit is derived as equal to the value of the
1378	   DONL field.  Otherwise (sprop-depack-buf-nalus is equal to 0, or the
1379	   S bit is equal to 0), the DONL field MUST NOT be present in the FU.

1381	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
1382	   the Start bit and End bit MUST NOT both be set to one in the same FU
1383	   header.

1385	   The  FU  payload  consists  of  fragments  of  the  payload  of  the
1386	   fragmented NAL unit so that if the FU payloads of consecutive FUs,
1387	   starting with an FU with the S bit equal to 1 and ending with an FU
1388	   with the E bit equal to 1, are sequentially concatenated, the
1389	   payload of the fragmented NAL unit can be reconstructed.  The NAL
1390	   unit header of the fragmented NAL unit is not included as such in
1391	   the FU payload, but rather the information of the NAL unit header of
1392	   the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
1393	   the FU payload headers of the FUs and the FuType field of the FU
1394	   header of the FUs.  An FU payload MAY have any number of octets and
1395	   MAY be empty.

1397	      Informative note: Empty FU payloads are allowed to reduce the
1398	      latency  of  a  certain  class  of  senders  in  nearly  lossless
1399	      environments.  These senders can be characterized in that they
1400	      packetize  fragments  of  a  NAL  unit  before  the  NAL  unit  is
1401	      completely generated and, hence, before the NAL unit size is
1402	      known.  If zero-length FU payloads were not allowed, the sender
1403	      would have to generate at least one bit of data of the following
1404	      fragment of the NAL unit before the current FU could be sent.
1405	      Due to the characteristics of HEVC, where sometimes several CTUs
1406	      occupy  zero  bits,  this  is  undesirable  and  can  add  delay.
1407	      However, the (potential) use of zero-length FU payloads should be
1408	      carefully weighted against the increased risk of the loss of at
1409	      least a part of the fragmented NAL unit because of the additional
1410	      packets employed for its transmission.

1412	   If  an  FU  is  lost,  the  receiver  SHOULD  discard  all  following
1413	   fragmentation units in transmission order corresponding to the same
1414	   fragmented NAL unit, unless the decoder in the receiver is known to
1415	   be prepared to gracefully handle incomplete NAL units.

1417	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1418	   fragments of a NAL unit to an (incomplete) NAL unit, even if
1419	   fragment n of that NAL unit is not received.  In this case, the
1420	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1421	   syntax violation.

1423	4.9 PACI packets

1425	   This section specifies the PACI packet structure, based on a payload
1426	   header extension mechanism that is generic and extensible to carry
1427	   payload header extensions.

1429	   The structure of an RTP packet carrying a Payload Header Extension
1430	   Structure (PHES) and a PACI payload is as follows:

1432	      0                   1                   2                   3
1433	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1434	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1435	      |                          RTP Header                           |
1436	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1437	      |F| PACI=50   |  LayerId  | TID |A|    Type   | PHSsize |F0..2|X|
1438	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1439	      |        Payload Header Extension Structure (PHES)              |
1440	      |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
1441	      |                                                               |
1442	      |                  PACI payload: NAL unit                       |
1443	      |                   . . .                                       |
1444	      |                                                               |
1445	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1446	      |                               :...OPTIONAL RTP padding        |
1447	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

1449	                    Figure 11   The structure of a PACI

1451	   The semantics of the fields are as follows:

1453	   F: 1 bit
1454	      Forbidden_zero-bit.  MUST be zero.

1456	   PACI: 6 bits
1457	      Indicates a PACI, and must be 50.

1459	   LayerId: 6 bits
1460	      Copy of the LayerId field of the PACI payload NAL unit or NAL
1461	      unit like structure

1463	   TID: 3 bits
1464	      Copy of the TID field of the PACI payload NAL unit or NAL unit
1465	      like structure

1467	   A: 1 bit
1468	      Copy of the F bit of the PACI payload NAL unit or NAL unit like
1469	      structure

1471	   Type: 6 bits
1472	      Copy of the Type field of the PACI payload NAL unit or NAL unit
1473	      like structure

1475	   PHSsize: 5 bits
1476	      Indicates the total length of the PHES.  The value is limited to
1477	      be less than or equal to 32 octets, to simplify encoder design
1478	      for MTU size matching.

1480	   F0..2: 3 bits
1481	      Each of the three bits indicate, when set, the presence of an
1482	      optional field (or set of fields) in the PHES.

1484	   X: 1 bit
1485	      The X bit, when set, indicates the presence of another octet
1486	      consisting of seven flags and another X bit, each of the seven
1487	      flags indicating the presence of more PHES fields (for future
1488	      extensions).

1490	   PHES: variable number of octets
1491	      A variable number of octets as indicated by the value of PHSsize.

1493	   PACI Payload
1494	      The NAL unit or NAL unit like structure (such as: FU or AP) to be
1495	      carried, not including the first two octets.

1497	         Informative note: The first two octets of the NAL unit or NAL
1498	         unit like structure carried in the PACI payload are not
1499	         included in the PACI payload. Rather, the respective values
1500	         are copied in locations of the PayloadHdr of the RTP packet.
1501	         This  design  offers  two  advantages:  first,  the  overall
1502	         structure of the payload header is preserved, i.e. there is no
1503	         special case of payload header structure that needs to be
1504	         implemented for PACI.  Second, no additional overhead is
1505	         introduced.

1507	      A PACI payload MAY be a single NAL unit, an FU, or an AP.  PACIs
1508	      MUST NOT be fragmented or aggregated.  The following subsection
1509	      documents the reasons for these design choices.

1511	4.9.1 Reasons for the PACI rules (informative)

1513	   A PACI cannot be fragmented.  If a PACI could be fragmented, and a
1514	   fragment other than the first fragment would get lost, access to the
1515	   information in the PACI would not be possible.  Therefore, a PACI
1516	   must not be fragmented.  In other words, an FU must not carry
1517	   (fragments of) a PACI.

1519	   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
1520	   from a compression viewpoint, as, in many cases, several to be
1521	   aggregated NAL units would share identical PACI fields and values
1522	   which would be carried redundantly for no reason.   Most, if not all
1523	   the  practical  effects  of  PACI  aggregation  can  be  achieved  by
1524	   aggregating NAL units and bundling them with a PACI (see below).
1525	   Therefore, a PACI must not be aggregated.  In other words, an AP
1526	   must not contain a PACI.

1528	   The payload of a PACI can be a fragment.  Both middleboxes and
1529	   sending  systems  with  inflexible  (often  hardware-based)  encoders
1530	   occasionally find themselves in situations where a PACI and its
1531	   headers, combined, are larger than the MTU size.  In such a
1532	   scenario, the middlebox or sender can fragment the NAL unit and
1533	   encapsulate the fragment in a PACI.  Doing so preserves the payload
1534	   header extension information for all fragments, allowing downstream
1535	   middleboxes and the receiver to take advantage of that information.
1536	   Therefore, a sender may place a fragment into a PACI, and a receiver
1537	   must be able to handle such a PACI.

1539	   The payload of a PACI can be an aggregation NAL unit.  HEVC
1540	   bitstreams can contain unevenly sized and/or small (when compared to
1541	   the MTU size) NAL units.  In order to efficiently packetize such
1542	   small NAL units, AP were introduced.  The benefits of APs are
1543	   independent  from  the  need  for  a  payload  header  extension.
1544	   Therefore, a sender may place an AP into a PACI, and a receiver must
1545	   be able to handle such a PACI.

1547	4.10 Payload Header Extensions

1549	   This section describes the single payload header extension defined
1550	   in this specification.  If, in the future, additional payload header
1551	   extensions become necessary, they could be specified in this section
1552	   of an updated version of this document, or in their own documents.

1554	   When bit 0 of the field F0..2 is set to 1 in a PACI, this indicates
1555	   the  presence  of  the  temporal  scalability  information  fields
1556	   TL0REFIDX, IrapPicID, S, and E as follows:

1558	     0                   1                   2                   3
1559	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1560	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1561	      |                          RTP Header                           |
1562	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1563	      |F| PACI=50   |  LayerId  | TID |A|    Type   | PHSsize |F0..2|X|
1564	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1565	      |   TL0REFIDX   |   IrapPicID   |S|E|  reserved |               |
1566	      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1567	      |                           ....                                |
1568	      |               PACI payload: NAL unit                          |
1569	      |                                                               |
1570	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1571	      |                               :...OPTIONAL RTP padding        |
1572	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1574	      Figure 12   The structure of a PACI with a PHES containing some
1575	                     temporal scalability information

1577	   TL0PICIDX (8 bits)
1578	      When present, the TL0PICIDX field MUST be set to equal to
1579	      temporal_sub_layer_zero_idx as specified in Section D.3.32 of
1580	      [H.265] for the access unit containing the NAL unit in the PACI.

1582	   IrapPicID (8 bits)
1583	      When present, the IrapPicID field MUST be set to equal to
1584	      irap_pic_id as specified in Section D.3.32 of [H.265] for the
1585	      access unit containing the NAL unit in the PACI.

1587	   S (1 bit)
1588	      The S bit MUST be set to 1 if any of the following conditions is
1589	      true and MUST be set to 0 otherwise:

1591	      . The NAL unit in the payload of the PACI is the first VCL NAL
1592	        unit, in decoding order, of a picture.
1593	      . The NAL unit in the payload of the PACI is an AP and the NAL
1594	        unit in the first contained aggregation unit is the first VCL
1595	        NAL unit, in decoding order, of a picture.
1596	      . The NAL unit in the payload of the PACI is an FU with its S bit
1597	        equal to 1 and the FU payload containing a fragment of the
1598	        first VCL NAL unit, in decoding order of a picture.

1600	   E (1 bit)
1601	      The E bit MUST be set to 1 if any of the following conditions is
1602	      true and MUST be set to 0 otherwise:

1604	      . The NAL unit in the payload of the PACI is the last VCL NAL
1605	        unit, in decoding order, of a picture.
1606	      . The NAL unit in the payload of the PACI is an AP and the NAL
1607	        unit in the last contained aggregation unit is the last VCL NAL
1608	        unit, in decoding order, of a picture.
1609	      . The NAL unit in the payload of the PACI is an FU with its E bit
1610	        equal to 1 and the FU payload containing a fragment of the last
1611	        VCL NAL unit, in decoding order of a picture.

1613	   The values of bits 1 and 2 of the field F0..2 MUST be set to 0, the
1614	   value of the X bit MUST be set to 0, and the value of PHSsize MUST
1615	   be set to 3.  Receivers SHALL allow other values of the fields
1616	   F0..2, X, and PHSsize, and SHALL any ignore additional fields, when
1617	   present, than specified above in the PHES.

1619	5. Packetization Rules

1621	   The following packetization rules apply:

1623	   o  If sprop-depack-buf-nalus is greater than 0 for an RTP stream,
1624	      the transmission order of NAL units carried in the RTP stream MAY
1625	      be different than the NAL unit decoding order.  Otherwise (sprop-
1626	      depack-buf-nalus  is  equal  to  0  for  an  RTP  stream),  the
1627	      transmission order of NAL units carried in the RTP stream MUST be
1628	      the same as the NAL unit decoding order.

1630	   o  A  NAL  unit  of  a  small  size  SHOULD  be  encapsulated  in  an
1631	      aggregation packet together with one or more other NAL units in
1632	      order to avoid the unnecessary packetization overhead for small
1633	      NAL units.  For example, non-VCL NAL units such as access unit
1634	      delimiters, parameter sets, or SEI NAL units are typically small
1635	      and can often be aggregated with VCL NAL units without violating
1636	      MTU size constraints.

1638	   o  Each non-VCL NAL unit SHOULD be encapsulated in an aggregation
1639	      packet together with its associated VCL NAL unit, as typically a
1640	      non-VCL NAL unit would be meaningless without the associated VCL
1641	      NAL unit being available.

1643	   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
1644	      unit packet MUST be used.

1646	6. De-packetization Process

1648	   The general concept behind de-packetization is to get the NAL units
1649	   out of the RTP packets in an RTP stream and all the dependent RTP
1650	   streams, if any, and pass them to the decoder in the NAL unit
1651	   decoding order.

1653	   The   de-packetization   process   is   implementation   dependent.
1654	   Therefore, the following description should be seen as an example of
1655	   a suitable implementation.  Other schemes may be used as well as
1656	   long as the output for the same input is the same as the process
1657	   described below.  The output is the same when the set of NAL units
1658	   and their order are both identical.  Optimizations relative to the
1659	   described algorithms are possible.

1661	   All normal RTP mechanisms related to buffer management apply.  In
1662	   particular, duplicated or outdated RTP packets (as indicated by the
1663	   RTP sequences number and the RTP timestamp) are removed.  To
1664	   determine the exact time for decoding, factors such as a possible
1665	   intentional delay to allow for proper inter-stream synchronization
1666	   must be factored in.

1668	   NAL units with NAL unit type values in the range of 0 to 47,
1669	   inclusive may be passed to the decoder.  NAL-unit-like structures
1670	   with NAL unit type values in the range of 48 to 63, inclusive, MUST
1671	   NOT be passed to the decoder.

1673	   The receiver includes a receiver buffer, which is used to compensate
1674	   for  transmission  delay  jitter,  to  reorder  NAL  units  from
1675	   transmission order to the NAL unit decoding order, and to recover
1676	   the NAL unit decoding order in MST, when applicable.  In this
1677	   section, the receiver operation is described under the assumption
1678	   that there is no transmission delay jitter.  To make a difference
1679	   from a practical receiver buffer that is also used for compensation
1680	   of transmission delay jitter, the receiver buffer is here after
1681	   called the de-packetization buffer in this section.  Receivers
1682	   SHOULD also prepare for transmission delay jitter; i.e. either
1683	   reserve separate buffers for transmission delay jitter buffering and
1684	   de-packetization  buffering  or  use  a  receiver  buffer  for  both
1685	   transmission delay jitter and de-packetization.  Moreover, receivers
1686	   SHOULD take transmission delay jitter into account in the buffering
1687	   operation; e.g. by additional initial buffering before starting of
1688	   decoding and playback.

1690	   There are two buffering states in the receiver: initial buffering
1691	   and buffering while playing.  Initial buffering starts when the
1692	   reception is initialized.  After initial buffering, decoding and
1693	   playback are started, and the buffering-while-playing mode is used.

1695	   Regardless of the buffering state, the receiver stores incoming NAL
1696	   units, in reception order, into the de-packetization buffer.  NAL
1697	   units carried in RTP packets are stored in the de-packetization
1698	   buffer individually, and the value of AbsDon is calculated and
1699	   stored for each NAL unit.  When MST is in use, NAL units of all RTP
1700	   streams are stored in the same de-packetization buffer.

1702	   Initial buffering lasts until condition A (the number of NAL units
1703	   in the de-packetization buffer is greater than the value of sprop-
1704	   depack-buf-nalus of the highest RTP stream) is true.

1706	   After initial buffering, whenever condition A is true, the following
1707	   operation is repeatedly applied until condition A becomes false:

1709	   o  The NAL unit in the de-packetization buffer with the smallest
1710	      value of AbsDon is removed from the de-packetization buffer and
1711	      passed to the decoder.

1713	   When no more NAL units are flowing into the de-packetization buffer,
1714	   all NAL units remaining in the de-packetization buffer are removed
1715	   from the buffer and passed to the decoder in the order of increasing
1716	   AbsDon values.

1718	7. Payload Format Parameters

1720	   This section specifies the parameters that MAY be used to select
1721	   optional features of the payload format and certain features or
1722	   properties of the bitstream.  The parameters are specified here as
1723	   part of the media type registration for the HEVC codec.  A mapping
1724	   of  the  parameters  into  the  Session  Description  Protocol  (SDP)
1725	   [RFC4566]  is  also  provided  for  applications  that  use  SDP.
1726	   Equivalent  parameters  could  be  defined  elsewhere  for  use  with
1727	   control protocols that do not use SDP.

1729	7.1 Media Type Registration

1731	   The media subtype for the HEVC codec is allocated from the IETF
1732	   tree.

1734	   The receiver MUST ignore any unspecified parameter.

1736	   Media Type name:     video

1738	   Media subtype name:  H265

1740	   Required parameters: none

1742	   OPTIONAL parameters:

1744	      In the following definitions of parameters, "the stream" or "the
1745	      NAL unit stream" refers to all NAL units conveyed in the current
1746	      RTP stream in SST, and all NAL units conveyed in the current RTP
1747	      stream and all NAL units conveyed in other RTP streams that the
1748	      current RTP stream depends on in MST.

1750	      profile-space, profile-id:

1752	         The  profile-space  parameter  indicates  the  context  for
1753	         interpretation  of  the  profile-id  parameter  value.    The
1754	         profile, which specifies the subset of coding tools that may
1755	         have been used to generate the stream or that the receiver
1756	         supports,  as  specified  in  [HEVC],  is  defined  by  the
1757	         combination  of  profile-space  and  profile-id.    Note  that
1758	         profile-space is required to be equal to 0 in [HEVC], but
1759	         other values for it may be specified in the future by ITU-T or
1760	         ISO/IEC.

1762	         If the profile-space and profile-id parameters are used to
1763	         indicate properties of a NAL unit stream, it indicates that,
1764	         to decode the stream, the minimum subset of coding tools a
1765	         decoder has to support is the profile specified by both
1766	         parameters.

1768	         If the profile-space and profile-id parameters are used for
1769	         capability exchange or session setup, it indicates the subset
1770	         of coding tools, which is equal to the profile, that the codec
1771	         supports for both receiving and sending.

1773	         If no profile-space is present, a value of 0 MUST be inferred
1774	         and if no profile-id is present the Main profile (i.e. a value
1775	         of 1) MUST be inferred.

1777	         When used to indicate properties of a NAL unit stream, the
1778	         profile-space and profile-id parameters are derived from the
1779	         sequence parameter set or video parameter set NAL units, as
1780	         specified in [HEVC], as follows.

1782	            If the RTP stream is not a dependent RTP stream, the
1783	            following applies:

1785	            o profile_space = general_profile_space
1786	            o profile_id = general_profile_idc

1788	            Otherwise (the RTP stream is a dependent RTP stream), the
1789	            following applies, with j being the value of the sub-layer-
1790	            id parameter:

1792	            o profile_space = sub_layer_profile_space[j]
1793	            o profile_id = sub_layer_profile_idc[j]

1795	      tier-flag, level-id:

1797	         The   tier-flag   parameter   indicates   the   context   for
1798	         interpretation of the level-id value.  The default level,
1799	         which limits values of syntax elements or on arithmetic
1800	         combinations of values of syntax elements, as specified in
1801	         [HEVC], is defined by the combination of tier-flag and level-
1802	         id.

1804	         If the tier-flag and level-id parameters are used to indicate
1805	         properties of a NAL unit stream, it indicates that, to decode
1806	         the stream the lowest level the decoder has to support is the
1807	         default level.

1809	         If  the  tier-flag  and  level-id  parameters  are  used  for
1810	         capability exchange or session setup, the following applies.
1811	         If max-recv-level-id is not present, the default level defined
1812	         by tier-flag and level-id indicates the highest level the
1813	         codec wishes to support.  Otherwise, tier-flag and max-recv-
1814	         level-id indicate the highest level the codec supports for
1815	         receiving.  For either receiving or sending, all levels that
1816	         are lower than the highest level supported MUST also be
1817	         supported.

1819	         If no tier-flag is present, a value of 0 MUST be inferred and
1820	         if no level-id is present, a value of 93 (i.e. level 3.1) MUST
1821	         be inferred.

1823	         When used to indicate properties of a NAL unit stream, the
1824	         tier-flag  and  level-id  parameters  are  derived  from  the
1825	         sequence parameter set or video parameter set NAL units, as
1826	         specified in [HEVC], as follows.

1828	            If the RTP stream is not a dependent RTP stream, the
1829	            following applies:

1831	            o tier-flag = general_tier_flag
1832	            o level-id = general_level_idc

1834	            Otherwise (the RTP stream is a dependent RTP stream), the
1835	            following applies, with j being the value of the sub-layer-
1836	            id parameter:

1838	            o tier-flag = sub_layer_tier_flag[j]
1839	            o level-id = sub_layer_level_idc[j]

1841	      interop-constraints:

1843	         A base16 [RFC4648] (hexadecimal) representation of the six
1844	         bytes  derived  from  the  sequence  parameter  set  or  video
1845	         parameter set NAL units as specified in [HEVC] consisting of
1846	         progressive_source_flag,               interlaced_source_flag,
1847	         non_packed_constraint_flag,  frame_only_constraint_flag,  and
1848	         reserved_zero_44bits.    Note  that  reserved_zero_44bits  is
1849	         required to be equal to 0 in [HEVC], but other values for it
1850	         may be specified in the future by ITU-T or ISO/IEC.

1852	         If no interop-constraints are present, the following MUST be
1853	         inferred:

1855	            o progressive_source_flag = 1
1856	            o interlaced_source_flag = 0
1857	            o non_packed_constraint_flag = 1
1858	            o frame_only_constraint_flag = 1
1859	            o reserved_zero_44bits = 0

1861	         When used to indicate properties of a NAL unit stream, the
1862	         following applies.

1864	            If the RTP stream is not a dependent RTP stream, the
1865	            following applies:

1867	            o progressive_source_flag = general_progressive_source_flag
1868	            o interlaced_source_flag = general_interlaced_source_flag
1869	            o non_packed_constraint_flag =
1870	                              general_non_packed_constraint_flag
1871	            o frame_only_constraint_flag =
1872	                              general_frame_only_constraint_flag
1873	            o reserved_zero_44bits = general_reserved_zero_44bits

1875	            Otherwise (the RTP stream is a dependent RTP stream), the
1876	            following applies, with j being the value of the sub-layer-
1877	            id parameter:

1879	            o progressive_source_flag =
1880	                              sub_layer_progressive_source_flag[j]
1881	            o interlaced_source_flag =
1882	                              sub_layer_interlaced_source_flag[j]
1883	            o non_packed_constraint_flag =
1884	                              sub_layer_non_packed_constraint_flag[j]
1885	            o frame_only_constraint_flag =
1886	                              sub_layer_frame_only_constraint_flag[j]
1887	            o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

1889	      profile-compatibility-indicator:

1891	         A  base16  [RFC4648]  representation  of  the  four  bytes
1892	         representing  the  32  profile  compatibility  flags  in  the
1893	         sequence parameter set or video parameter set NAL units.  A
1894	         decoder conforming to a certain profile may be able to decode
1895	         bitstreams  conforming  to  other  profiles.    The  profile-
1896	         compatibility-indicator  provides  exact  information  of  the
1897	         ability of a decoder conforming to a certain profile to decode
1898	         bitstreams conforming to another profile.  More concretely, if
1899	         the profile compatibility flag corresponding to the profile a
1900	         decoder conforms to is set, then the decoder is able to decode
1901	         any bitstream with the flag set, irrespective of the profile
1902	         the bitstream conforms to (provided that the decoder supports
1903	         the highest level of the bitstream).

1905	         When used to indicate properties of a NAL unit stream, the
1906	         following applies.

1908	            If the RTP stream is not a dependent RTP stream, the
1909	            following applies with j = 0..31:

1911	            o The 32 flags = general_profile_compatibility_flag[j]

1913	            Otherwise (the RTP stream is a dependent RTP stream), the
1914	            following applies with i being the value of the sub-layer-
1915	            id parameter and j = 0..31:

1917	            o The 32 flags = sub_layer_profile_compatibility_flag[i][j]

1919	      sub-layer-id:

1921	         This parameter MAY be used to indicate the highest allowed
1922	         value of TID in the stream.  When not present, the value of
1923	         sub-layer-id is inferred to be equal to 6.

1925	      recv-sub-layer-id:

1927	         This parameter MAY be used to signal a receiver's choice of
1928	         the offered or declared sub-layers in the sprop-vps.  The
1929	         value of recv-sub-layer-id indicates the TID of the highest
1930	         sub-layer of the stream that a receiver supports.  When not
1931	         present, the value of recv-sub-layer-id is inferred to be
1932	         equal to sub-layer-id.

1934	      max-recv-level-id:

1936	         This parameter MAY be used, together with tier-flag, to
1937	         indicate the highest level a receiver supports.  The highest
1938	         level the receiver supports is equal to the value of max-recv-
1939	         level-id  divided  by  30  for  the  Main  or  High  tier  (as
1940	         determined by tier-flag equal to 0 or 1, respectively).

1942	         When max-recv-level-id is not present, the value is inferred
1943	         to be equal to level-id.

1945	         max-recv-level-id MUST NOT be present when the highest level
1946	         the receiver supports is not higher than the default level.

1948	      sprop-vps:

1950	         This parameter MAY be used to convey any video parameter set
1951	         NAL unit of the stream.  When present, the parameter MAY be
1952	         used   to   indicate   codec   capability   and   sub-stream
1953	         characteristics (i.e. properties of sub-layer representations
1954	         as defined in [HEVC]) as well as for out-of-band transmission
1955	         of video parameter sets.  The value of the parameter is a
1956	         comma-separated (',') list of base64 [RFC4648] representations
1957	         of the video parameter set NAL units as specified in Section
1958	         7.3.2.1 of [HEVC].

1960	      sprop-sps:

1962	         This parameter MAY be used to convey sequence parameter set
1963	         NAL units of the stream for out-of-band transmission of
1964	         sequence parameter sets.  The value of the parameter is a
1965	         comma-separated (',') list of base64 [RFC4648] representations
1966	         of the sequence parameter set NAL units as specified in
1967	         Section 7.3.2.2 of [HEVC].

1969	      sprop-pps:

1971	         This parameter MAY be used to convey picture parameter set NAL
1972	         units of the stream for out-of-band transmission of picture
1973	         parameter sets.  The value of the parameter is a comma-
1974	         separated (',') list of base64 [RFC4648] representations of
1975	         the picture parameter set NAL units as specified in Section
1976	         7.3.2.3 of [HEVC].

1978	      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

1980	         These parameters MAY be used to signal the capabilities of a
1981	         receiver implementation.  These parameters MUST NOT be used
1982	         for any other purpose.  The highest level (specified by tier-
1983	         flag and max-recv-level-id) MUST be such that the receiver is
1984	         fully capable of supporting.  max-lsr, max-lps, max-cpb, max-
1985	         dpb, max-br, max-tr, and max-tc MAY be used to indicate
1986	         capabilities  of  the  receiver  that  extend  the  required
1987	         capabilities of the highest level, as specified below.

1989	         When more than one parameter from the set (max-lsr, max-lps,
1990	         max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
1991	         receiver    MUST    support    all    signaled    capabilities
1992	         simultaneously.  For example, if both max-lsr and max-br are
1993	         present, the highest level with the extension of both the
1994	         picture rate and bitrate is supported.  That is, the receiver
1995	         is able to decode NAL unit streams in which the luma sample
1996	         rate is up to max-lsr (inclusive), the bitrate is up to max-br
1997	         (inclusive), the coded picture buffer size is derived as
1998	         specified in the semantics of the max-br parameter below, and
1999	         the other properties comply with the highest level specified
2000	         by tier-flag and max-recv-level-id.

2002	            Informative note: When the OPTIONAL media type parameters
2003	            are used to signal the properties of a NAL unit stream, and
2004	            max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and
2005	            max-tc  are  not  present,  the  values  of  profile-space,
2006	            profile-id, tier-flag, and level-id must always be such
2007	            that the NAL unit stream complies fully with the specified
2008	            profile and level.

2010	      max-lsr:
2011	         The value of max-lsr is an integer indicating the maximum
2012	         processing rate in units of luma samples per second.  The max-
2013	         lsr parameter signals that the receiver is capable of decoding
2014	         video at a higher rate than is required by the highest level.

2016	         When max-lsr is signaled, the receiver MUST be able to decode
2017	         NAL unit streams that conform to the highest level, with the
2018	         exception that the MaxLumaSR value in Table A-2 of [HEVC] for
2019	         the highest level is replaced with the value of max-lsr.  The
2020	         value of max-lsr MUST be greater than or equal to the value of
2021	         MaxLumaSR given in Table A-2 of [HEVC] for the highest level.
2022	         Senders MAY use this knowledge to send pictures of a given
2023	         size at a higher picture rate than is indicated in the highest
2024	         level.

2026	         When not present, the value of max-lsr is inferred to be equal
2027	         to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
2028	         highest level.

2030	      max-lps:
2031	         The value of max-lps is an integer indicating the maximum
2032	         picture size in units of luma samples.  The max-lps parameter
2033	         signals that the receiver is capable of decoding larger
2034	         picture sizes than are required by the highest level.  When
2035	         max-lps is signaled, the receiver MUST be able to decode NAL
2036	         unit streams that conform to the highest level, with the
2037	         exception that the MaxLumaPS value in Table A-1 of [HEVC] for
2038	         the highest level is replaced with the value of max-lps.  The
2039	         value of max-lps MUST be greater than or equal to the value of
2040	         MaxLumaPS given in Table A-1 of [HEVC] for the highest level.
2041	         Senders MAY use this knowledge to send larger pictures at a
2042	         proportionally lower picture rate than is indicated in the
2043	         highest level.

2045	         When not present, the value of max-lps is inferred to be equal
2046	         to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
2047	         highest level.

2049	      max-cpb:
2050	         The value of max-cpb is an integer indicating the maximum
2051	         coded picture buffer size in units of CpbBrVclFactor bits for
2052	         the VCL HRD parameters and in units of CpbBrNalFactor bits for
2053	         the   NAL   HRD   parameters,   where   CpbBrVclFactor   and
2054	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The max-
2055	         cpb parameter signals that the receiver has more memory than
2056	         the minimum amount of coded picture buffer memory required by
2057	         the highest level.  When max-cpb is signaled, the receiver
2058	         MUST be able to decode NAL unit streams that conform to the
2059	         highest level, with the exception that the MaxCPB value in
2060	         Table A-1 of [HEVC] for the highest level is replaced with the
2061	         value of max-cpb.  The value of max-cpb MUST be greater than
2062	         or equal to the value of MaxCPB given in Table A-1 of [HEVC]
2063	         for the highest level.  Senders MAY use this knowledge to
2064	         construct  coded  video  streams  with  greater  variation  of
2065	         bitrate than can be achieved with the MaxCPB value in Table A-
2066	         1 of [HEVC].

2068	         When not present, the value of max-cpb is inferred to be equal
2069	         to the value of MaxCPB given in Table A-1 of [HEVC] for the
2070	         highest level.

2072	            Informative note: The coded picture buffer is used in the
2073	            hypothetical reference decoder (Annex C of HEVC).  The use
2074	            of the hypothetical reference decoder is recommended in
2075	            HEVC  encoders  to  verify  that  the  produced  bitstream
2076	            conforms to the standard and to control the output bitrate.
2077	            Thus, the coded picture buffer is conceptually independent
2078	            of any other potential buffers in the receiver, including
2079	            de-packetization and de-jitter buffers.  The coded picture
2080	            buffer need not be implemented in decoders as specified in
2081	            Annex C of HEVC, but rather standard-compliant decoders can
2082	            have any buffering arrangements provided that they can
2083	            decode standard-compliant bitstreams.  Thus, in practice,
2084	            the input buffer for a video decoder can be integrated with
2085	            de-packetization and de-jitter buffers of the receiver.

2087	      max-dpb:
2088	         The value of max-dpb is an integer indicating the maximum
2089	         decoded picture buffer size in units decoded pictures at the
2090	         MaxLumaPS for the highest level, i.e. number of decoded
2091	         pictures at the maximum picture size defined by the highest
2092	         level.  The value of max-dpb MUST be smaller than or equal to
2093	         16.  The max-dpb parameter signals that the receiver has more
2094	         memory than the minimum amount of decoded picture buffer
2095	         memory required by default, which is MaxDpbPicBuf as defined
2096	         in [HEVC] (equal to 6).  When max-dpb is signaled, the
2097	         receiver MUST be able to decode NAL unit streams that conform
2098	         to  the  highest  level,  with  the  exception  that  the
2099	         MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with
2100	         the value of max-dpb.  Consequently, a receiver that signals
2101	         max-dpb MUST be capable of storing the following number of
2102	         decoded pictures (MaxDpbSize) in its decoded picture buffer:

2104	                          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
2105	              MaxDpbSize = Min( 4 * max-dpb, 16 )
2106	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
2107	              MaxDpbSize = Min( 2 * max-dpb, 16 )
2108	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
2109	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
2110	           else
2111	              MaxDpbSize = max-dpb
2112	                        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
2113	         level and PicSizeInSamplesY is the current size of each
2114	         decoded picture in units of luma samples as defined in [HEVC].

2116	                        The value of max-dpb MUST be greater than or equal to the
2117	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].  Senders
2118	         MAY use this knowledge to construct coded video streams with
2119	         improved compression.

2121	                        When not present, the value of max-dpb is inferred to be equal
2122	         to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].

2124	            Informative note: This parameter was added primarily to
2125	            complement a similar codepoint in the ITU-T Recommendation
2126	            H.245, so as to facilitate signaling gateway designs.  The
2127	            decoded picture buffer stores reconstructed samples.  There
2128	            is no relationship between the size of the decoded picture
2129	            buffer  and  the  buffers  used  in  RTP,  especially  de-
2130	            packetization and de-jitter buffers.

2132	      max-br:
2133	         The value of max-br is an integer indicating the maximum video
2134	         bitrate in units of CpbBrVclFactor bits per second for the VCL
2135	         HRD parameters and in units of CpbBrNalFactor bits per second
2136	         for  the  NAL  HRD  parameters,  where  CpbBrVclFactor  and
2137	         CpbBrNalFactor are defined in Section A.4 of [HEVC].

2139	         The max-br parameter signals that the video decoder of the
2140	         receiver is capable of decoding video at a higher bitrate than
2141	         is required by the highest level.

2143	         When max-br is signaled, the video codec of the receiver MUST
2144	         be able to decode NAL unit streams that conform to the highest
2145	         level, with the following exceptions in the limits specified
2146	         by the highest level:

2148	          o The value of max-br replaces the MaxBR value in Table A-2
2149	            of [HEVC] for the highest level.
2150	          o When the max-cpb parameter is not present, the result of
2151	            the following formula replaces the value of MaxCPB in Table
2152	            A-1 of [HEVC]:

2154	               (MaxCPB of the highest level) * max-br / (MaxBR of the
2155	               highest level)

2157	         For example, if a receiver signals capability for Main profile
2158	         Level 2 with max-br equal to 2000, this indicates a maximum
2159	         video bitrate of 2000 kbits/sec for VCL HRD parameters, a
2160	         maximum  video  bitrate  of  2200  kbits/sec  for  NAL  HRD
2161	         parameters, and a CPB size of 2000000 bits (2000000 / 1500000
2162	         * 1500000).

2164	         The value of max-br MUST be greater than or equal to the value
2165	         MaxBR given in Table A-2 of [HEVC] for the highest level.

2167	         Senders MAY use this knowledge to send higher bitrate video as
2168	         allowed in the level definition of Annex A of HEVC to achieve
2169	         improved video quality.

2171	         When not present, the value of max-br is inferred to be equal
2172	         to the value of MaxBR given in Table A-2 of [HEVC] for the
2173	         highest level.

2175	            Informative note: This parameter was added primarily to
2176	            complement a similar codepoint in the ITU-T Recommendation
2177	            H.245, so as to facilitate signaling gateway designs.  The
2178	            assumption that the network is capable of handling such
2179	            bitrates at any given time cannot be made from the value of
2180	            this parameter.  In particular, no conclusion can be drawn
2181	            that the signaled bitrate is possible under congestion
2182	            control constraints.

2184	      max-tr:
2185	         The value of max-tr is an integer indication the maximum
2186	         number of tile rows.  The max-tr parameter signals that the
2187	         receiver is capable of decoding video with a larger number of
2188	         tile rows than the value allowed by the highest level.

2190	         When max-tr is signaled, the receiver MUST be able to decode
2191	         NAL unit streams that conform to the highest level, with the
2192	         exception that the MaxTileRows value in Table A-1 of [HEVC]
2193	         for the highest level is replaced with the value of max-tr.

2195	         The value of max-tr MUST be greater than or equal to the value
2196	         of MaxTileRows given in Table A-1 of [HEVC] for the highest
2197	         level.  Senders MAY use this knowledge to send pictures
2198	         utilizing a larger number of tile rows than the value allowed
2199	         by the highest level.

2201	         When not present, the value of max-tr is inferred to be equal
2202	         to the value of MaxTileRows given in Table A-1 of [HEVC] for
2203	         the highest level.

2205	      max-tc:
2206	         The value of max-tc is an integer indication the maximum
2207	         number of tile columns.  The max-tc parameter signals that the
2208	         receiver is capable of decoding video with a larger number of
2209	         tile columns than the value allowed by the highest level.

2211	         When max-tc is signaled, the receiver MUST be able to decode
2212	         NAL unit streams that conform to the highest level, with the
2213	         exception that the MaxTileCols value in Table A-1 of [HEVC]
2214	         for the highest level is replaced with the value of max-tc.

2216	         The value of max-tc MUST be greater than or equal to the value
2217	         of MaxTileCols given in Table A-1 of [HEVC] for the highest
2218	         level.  Senders MAY use this knowledge to send pictures
2219	         utilizing a larger number of tile columns than the value
2220	         allowed by the highest level.

2222	         When not present, the value of max-tc is inferred to be equal
2223	         to the value of MaxTileCols given in Table A-1 of [HEVC] for
2224	         the highest level.

2226	      max-fps:

2228	         The value of max-fps is an integer indicating the maximum
2229	         picture rate in units of hundreds of pictures per second that
2230	         can be efficiently received.  The max-fps parameter MAY be
2231	         used to signal that the receiver has a constraint in that it
2232	         is not capable of decoding video efficiently at the full
2233	         picture rate that is implied by the highest level and, when
2234	         present, one or more of the parameters max-lsr, max-lps, and
2235	         max-br.

2237	         The value of max-fps is not necessarily the picture rate at
2238	         which the maximum picture size can be sent, it constitutes a
2239	         constraint on maximum picture rate for all resolutions.

2241	            Informative note: The max-fps parameter is semantically
2242	            different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
2243	            max-tr, and max-tc in that max-fps is used to signal a
2244	            constraint, lowering the maximum picture rate from what is
2245	            implied by other parameters.

2247	         The encoder MUST use a picture rate equal to or less than this
2248	         value.  In cases where the max-fps parameter is absent the
2249	         encoder is free to choose any picture rate according to the
2250	         highest level and any signaled optional parameters.

2252	      sprop-depack-buf-nalus:

2254	         This parameter specifies the maximum number of NAL units that
2255	         precede a NAL unit in the de-packetization buffer in reception
2256	         order and follow the NAL unit in decoding order.

2258	         The value of sprop-depack-buf-nalus MUST be an integer in the
2259	         range of 0 to 32767, inclusive.

2261	         When not present, the value of sprop-depack-buf-nalus is
2262	         inferred to be equal to 0.

2264	         When the RTP stream depends on one or more other RTP streams
2265	         (in this case MST is in use), this parameter MUST be present
2266	         and the value MUST be greater than 0.

2268	            Informative note: When the RTP stream does not depends on
2269	            other RTP streams, either MST or SST may be in use.

2271	      sprop-depack-buf-bytes:

2273	         This  parameter  signals  the  required  size  of  the  de-
2274	         packetization buffer in units of bytes.  The value of the
2275	         parameter MUST be greater than or equal to the maximum buffer
2276	         occupancy (in units of bytes) of the de-packetization buffer
2277	         as specified in section 6.

2279	         The value of sprop-depack-buf-bytes MUST be an integer in the
2280	         range of 0 to 4294967295, inclusive.

2282	         When the RTP stream depends on one or more other RTP streams
2283	         (in this case MST is in use) or sprop-depack-buf-nalus is
2284	         present and is greater than 0, this parameter MUST be present
2285	         and the value MUST be greater than 0.

2287	            Informative  note:  The  value  of  sprop-depack-buf-bytes
2288	            indicates the required size of the de-packetization buffer
2289	            only.  When network jitter can occur, an appropriately
2290	            sized jitter buffer has to be available as well.

2292	      depack-buf-cap:

2294	         This  parameter  signals  the  capabilities  of  a  receiver
2295	         implementation and indicates the amount of de-packetization
2296	         buffer space in units of bytes that the receiver has available
2297	         for reconstructing the NAL unit decoding order.  A receiver is
2298	         able to handle any stream for which the value of the sprop-
2299	         depack-buf-bytes parameter is smaller than or equal to this
2300	         parameter.

2302	         When not present, the value of depack-buf-cap is inferred to
2303	         be equal to 4294967295.  The value of depack-buf-cap MUST be
2304	         an integer in the range of 1 to 4294967295, inclusive.

2306	            Informative  note:  depack-buf-cap  indicates  the  maximum
2307	            possible  size  of  the  de-packetization  buffer  of  the
2308	            receiver  only.    When  network  jitter  can  occur,  an
2309	            appropriately sized jitter buffer has to be available as
2310	            well.

2312	      sprop-segmentation-id:

2314	         This parameter MAY be used to signal the segmentation tools
2315	         present  in  the  stream  and  that  can  be  used  for
2316	         parallelization.  The value of sprop-segmentation-id MUST be
2317	         an integer in the range of 0 to 3, inclusive.  When not
2318	         present, the value of sprop-segmentation-id is inferred to be
2319	         equal to 0.

2321	         When sprop-segmentation-id is equal to 0, no information about
2322	         the segmentation tools is provided.  When sprop-segmentation-
2323	         id is equal to 1, it indicates that slices are present in the
2324	         stream.    When  sprop-segmentation-id  is  equal  to  2,  it
2325	         indicates that tiles are present in the stream.  When sprop-
2326	         segmentation-id is equal to 3, it indicates that WPP is used
2327	         in the stream.

2329	      sprop-spatial-segmentation-idc:

2331	         A  base16  [RFC4648]  representation  of  the  syntax  element
2332	         min_spatial_segmentation_idc as specified in [HEVC].  This
2333	         parameter MAY be used to describe parallelization capabilities
2334	         of the stream.

2336	      dec-parallel-cap:

2338	         This  parameter  MAY  be  used  to  indicate  the  decoder's
2339	         additional decoding capabilities given the presence of tools
2340	         enabling parallel decoding, such as slices, tiles, and WPP, in
2341	         the video stream.  The decoding capability of the decoder may
2342	         vary with the setting of the parallel decoding tools present
2343	         in the stream, e.g. the size of the tiles that are present in
2344	         a stream.  Therefore, multiple capability points may be
2345	         provided,  each  indicating  the  minimum  required  decoding
2346	         capability that is associated with a parallelism requirement,
2347	         which is a requirement on the video stream that enables
2348	         parallel decoding.

2350	         Each capability point is defined as a combination of 1) a
2351	         parallelism requirement, 2) a profile (determined by profile-
2352	         space and profile-id), 3) a highest level, and 4) a maximum
2353	         processing rate, a maximum picture size, and a maximum video
2354	         bitrate that may be equal to or greater than that determined
2355	         by  the  highest  level.    The  parameter's  syntax  in  ABNF
2356	         [RFC5234] is as follows:

2358	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2359	                               cap-point) "}"

2361	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2362	                         cap-parameter)

2364	            spatial-seg-idc = 1*4DIGIT ; 1-4095

2366	            cap-parameter = tier-flag / level-id / max-lsr
2367	                            / max-lps / max-br

2369	         The set of capability points expressed by the dec-parallel-cap
2370	         parameter is enclosed in a pair of curly braces ("{}").  Each
2371	         set of two consecutive capability points is separated by a
2372	         comma (',').  Within each capability point, each set of two
2373	         consecutive parameters, and when present, their values, is
2374	         separated by a semicolon (';').

2376	         The profile of all capability points is determined by profile-
2377	         space and profile-id that are outside the dec-parallel-cap
2378	         parameter.

2380	         Each  capability  point  starts  with  an  indication  of  the
2381	         parallelism requirement, which consists of a parallel tool
2382	         type, which may be equal to 'w' or 't', and a decimal value of
2383	         the spatial-seg-idc parameter.  When the type is 'w', the
2384	         capability point is valid only for H.265 bitstreams with WPP
2385	         in use, i.e. entropy_coding_sync_enabled_flag equal to 1.
2386	         When the type is 't', the capability point is valid only for
2387	         H.265   bitstreams   with   WPP   not   in   use   (i.e.
2388	         entropy_coding_sync_enabled_flag equal to 0).  The capability-
2389	         point   is   valid   only   for   H.265   bitstreams   with
2390	         min_spatial_segmentation_idc equal to or greater than spatial-
2391	         seg-idc.

2393	         The value of spatial-seg-idc MUST be greater than 0.

2395	         After the parallelism requirement indication, each capability
2396	         point continues with one or more pairs of parameter and value
2397	         in any order for any of the following parameters:

2399	            o tier-flag
2400	            o level-id
2401	            o max-lsr
2402	            o max-lps
2403	            o max-br

2405	         At most one occurrence of each of the above five parameters is
2406	         allowed within each capability point.

2408	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2409	         cap.level-id for a capability point indicate the highest level
2410	         of the capability point.  The values of dec-parallel-cap.max-
2411	         lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
2412	         a capability point indicate the maximum processing rate in
2413	         units of luma samples per second, the maximum picture size in
2414	         units of luma samples, and the maximum video bitrate (in units
2415	         of CpbBrVclFactor bits per second for the VCL HRD parameters
2416	         and in units of CpbBrNalFactor bits per second for the NAL HRD
2417	         parameters where CpbBrVclFactor and CpbBrNalFactor are defined
2418	         in Section A.4 of [HEVC]).

2420	         When not present, the value of dec-parallel-cap.tier-flag is
2421	         inferred to be equal to the value of tier-flag outside the
2422	         dec-parallel-cap parameter.  When not present, the value of
2423	         dec-parallel-cap.level-id is inferred to be equal to the value
2424	         of max-recv-level-id outside the dec-parallel-cap parameter.
2425	         When not present, the value of dec-parallel-cap.max-lsr, dec-
2426	         parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
2427	         to be equal to the value of max-lsr, max-lps, or max-br,
2428	         respectively, outside the dec-parallel-cap parameter.

2430	         The general decoding capability, expressed by the set of
2431	         parameters outside of dec-parallel-cap, is defined as the
2432	         capability  point  that  is  determined  by  the  following
2433	         combination of parameters: 1) the parallelism requirement
2434	         corresponding to the value of sprop-segmentation-id equal to 0
2435	         for a stream, 2) the profile determined by profile-space and
2436	         profile-id, 3) the highest level determined by tier-flag and
2437	         max-recv-level-id, and 4) the maximum processing rate, the
2438	         maximum picture size, and the maximum video bitrate determined
2439	         by the highest level.  The general decoding capability MUST
2440	         NOT be included as one of the set of capability points in the
2441	         dec-parallel-cap parameter.

2443	         For example, the following parameters express the general
2444	         decoding capability of 720p30 (Level 3.1) plus an additional
2445	         decoding capability of 1080p30 (Level 4) given that the
2446	         spatially largest tile or slice used in the bitstream is equal
2447	         to or less than 1/3 of the picture size:

2449	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}

2451	         For another example, the following parameters express an
2452	         additional decoding capability of 1080p30, using dec-parallel-
2453	         cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
2454	         used in the stream:

2456	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2457	                        max-lsr=62668800;max-lps=2088960}

2459	            Informative  note:  When  min_spatial_segmentation_idc  is
2460	            present in a stream and WPP is not used, [HEVC] specifies
2461	            that there is no slice or no tile in the stream containing
2462	            more      than      4      *      PicSizeInSamplesY      /
2463	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2465	      Encoding considerations:

2467	         This type is only defined for transfer via RTP (RFC 3550).

2469	      Security considerations:

2471	         See Section 9 of RFC XXXX.

2473	      Public specification:

2475	         Please refer to Section 13 of RFC XXXX.

2477	      Additional information: None

2479	      File extensions: none

2481	      Macintosh file type code: none

2483	      Object identifier or OID: none
2484	      Person & email address to contact for further information:

2486	      Intended usage: COMMON

2488	      Author: See Section 14 of RFC XXXX.

2490	      Change controller:

2492	         IETF Audio/Video Transport Payloads working group delegated
2493	         from the IESG.

2495	7.2 SDP Parameters

2497	   The receiver MUST ignore any parameter unspecified in this memo.

2499	7.2.1 Mapping of Payload Type Parameters to SDP

2501	   The media type video/H265 string is mapped to fields in the Session
2502	   Description Protocol (SDP) [RFC4566] as follows:

2504	   o  The media name in the "m=" line of SDP MUST be video.

2506	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
2507	      media subtype).

2509	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2511	   o  The  OPTIONAL  parameters  "profile-space",  "profile-id",  "tier-
2512	      flag", "level-id", "interop-constraints", "profile-compatibility-
2513	      indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level-
2514	      id", "max-lsr", "max-lps", "max-cpb", "max-dpb", "max-br", "max-
2515	      tr",  "max-tc",  "max-fps",  "sprop-depack-buf-nalus",  "sprop-
2516	      depack-buf-bytes",   "depack-buf-cap",   "sprop-segmentation-id",
2517	      "sprop-spatial-segmentation-idc",  and  "dec-parallel-cap",  when
2518	      present, MUST be included in the "a=fmtp" line of SDP.  This
2519	      parameter is expressed as a media type string, in the form of a
2520	      semicolon separated list of parameter=value pairs.

2522	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2523	      pps", when present, MUST be included in the "a=fmtp" line of SDP
2524	      or conveyed using the "fmtp" source attribute as specified in
2525	      section 6.3 of [RFC5576].  For a particular media format (i.e.
2526	      RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
2527	      NOT be both included in the "a=fmtp" line of SDP and conveyed
2528	      using the "fmtp" source attribute.  When included in the "a=fmtp"
2529	      line of SDP, these parameters are expressed as a media type
2530	      string,  in  the  form  of  a  semicolon  separated  list  of
2531	      parameter=value pairs.  When conveyed using the "fmtp" source
2532	      attribute, these parameters are only associated with the given
2533	      source and payload type as parts of the "fmtp" source attribute.

2535	          Informative note: Conveyance of "sprop-vps", "sprop-sps", and
2536	          "sprop-pps" using the "fmtp" source attribute allows for out-
2537	          of-band transport of parameter sets in topologies like Topo-
2538	          Video-switch-MCU as specified in [RFC5117].

2540	   An example of media representation in SDP is as follows:

2542	         m=video 49170 RTP/AVP 98
2543	         a=rtpmap:98 H265/90000
2544	         a=fmtp:98 profile-id=1;
2545	                   sprop-vps=<video parameter sets data>

2547	7.2.2 Usage with SDP Offer/Answer Model

2549	   When HEVC is offered over RTP using SDP in an Offer/Answer model
2550	   [RFC3264]  for  negotiation  for  unicast  usage,  the  following
2551	   limitations and rules apply:

2553	   o  The parameters identifying a media format configuration for HEVC
2554	      are  profile-space,  profile-id,  tier-flag,  level-id,  interop-
2555	      constraints, and profile-compatibility-indicator.  These media
2556	      configuration  parameters,  except  for  level-id,  MUST  be  used
2557	      symmetrically when the answerer does not include recv-sub-layer-
2558	      id in the answer for the media format (payload type).  In other
2559	      words, the answerer MUST 1) maintain all configuration parameters
2560	      for the media format (payload type), 2) include recv-sub-layer-id
2561	      in the answer for the media format (payload type), or 3) remove
2562	      the media format (payload type) completely (when one or more of
2563	      the parameter values are not supported).  The value of level-id
2564	      is changeable.

2566	          Informative note: The requirement for symmetric use does not
2567	          apply for level-id, and does not apply for the other stream
2568	          properties and capability parameters.

2570	   o  To simplify handling and matching of these configurations, the
2571	      same RTP payload type number used in the offer SHOULD also be
2572	      used in the answer, as specified in [RFC3264].  The same RTP
2573	      payload type number used in the offer MUST also be used in the
2574	      answer when the answer includes recv-sub-layer-id.  When the
2575	      answer does not include recv-sub-layer-id, the answer MUST NOT
2576	      contain a payload type number used in the offer unless the
2577	      configuration  is  exactly  the  same  as  in  the  offer  or  the
2578	      configuration in the answer only differs from that in the offer
2579	      with a different value of level-id.  The answer MAY contain the
2580	      recv-sub-layer-id parameter if an HEVC stream contains multiple
2581	      operation points (using temporal scalability and sub-layers) and
2582	      sprop-vps is included in the offer where sub-layers are present
2583	      in the video parameter set.  If the sprop-vps is provided in an
2584	      offer, an answerer MAY select a particular operation point in the
2585	      received and/or in the sent stream.  When recv-sub-layer-id is
2586	      present in the answer, the media configuration parameters MUST
2587	      NOT be present in the answer.  Rather, the media configuration
2588	      that the answerer will use for receiving and/or sending is the
2589	      one used for the selected operation point as indicated in the
2590	      offer.

2592	          Informative note: When an offerer receives an answer that
2593	          does not include recv-sub-layer-id, it has to compare payload
2594	          types not declared in the offer based on the media type (i.e.
2595	          video/H265) and the above media configuration parameters with
2596	          any payload types it has already declared.  This will enable
2597	          it to determine whether the configuration in question is new
2598	          or if it is equivalent to configuration already offered,
2599	          since a different payload type number may be used in the
2600	          answer.  The ability to perform operation point selection
2601	          enables a receiver to utilize the temporal scalable nature of
2602	          an HEVC stream.

2604	   o  The parameters sprop-depack-buf-nalus and sprop-depack-buf-bytes
2605	      describe the properties of the RTP stream that the offerer or the
2606	      answerer is sending for the media format configuration.  This
2607	      differs from the normal usage of the Offer/Answer parameters:
2608	      normally such parameters declare the properties of the stream
2609	      that the offerer or the answerer is able to receive.  When
2610	      dealing with HEVC, the offerer assumes that the answerer will be
2611	      able to receive media encoded using the configuration being
2612	      offered.

2614	          Informative note:  The above parameters apply for any stream
2615	          sent by a declaring entity with the same configuration; i.e.
2616	          they are dependent on their source.  Rather than being bound
2617	          to the payload type, the values may have to be applied to
2618	          another payload type when being sent, as they apply for the
2619	          configuration.

2621	   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
2622	      max-br,  max-tr,  and  max-tc  MAY  be  used  to  declare  further
2623	      capabilities of the offerer or answerer for receiving.  These
2624	      parameters MUST NOT be present when the direction attribute is
2625	      "sendonly".

2627	   o  The capability parameter max-fps MAY be used to declare lower
2628	      capabilities of the offerer or answerer for receiving.  The
2629	      parameters MUST NOT be present when the direction attribute is
2630	      "sendonly".

2632	   o  The capability parameter dec-parallel-cap MAY be used to declare
2633	      additional decoding capabilities of the offerer or answerer for
2634	      receiving.  Upon receiving such a declaration of a receiver, a
2635	      sender  MAY  send  a  stream  to  the  receiver  utilizing  those
2636	      capabilities under the assumption that the stream fulfills the
2637	      parallelism requirement.  A stream that is sent based on choosing
2638	      a capability point with parallel tool type 'w' from dec-parallel-
2639	      cap MUST have entropy_coding_sync_enabled_flag equal to 1 and
2640	      min_spatial_segmentation_idc  equal  to  or  larger  than  dec-
2641	      parallel-cap.spatial-seg-idc of the capability point.  A stream
2642	      that is sent based on choosing a capability point with parallel
2643	      tool    type    't'    from    dec-parallel-cap    MUST    have
2644	      entropy_coding_sync_enabled_flag     equal     to     0     and
2645	      min_spatial_segmentation_idc  equal  to  or  larger  than  dec-
2646	      parallel-cap.spatial-seg-idc of the capability point.

2648	   o  An offerer has to include the size of the de-packetization
2649	      buffer,  sprop-depack-buf-bytes,  and  sprop-depack-buf-nalus,  in
2650	      the  offer  for  an  interleaved  HEVC  stream  or  for  the  MST
2651	      transmission mode.  To enable the offerer and answerer to inform
2652	      each  other  about  their  capabilities  for  de-packetization
2653	      buffering in receiving streams, both parties are RECOMMENDED to
2654	      include depack-buf-cap.  For interleaved streams or in MST, it is
2655	      also RECOMMENDED to consider offering multiple payload types with
2656	      different buffering requirements when the capabilities of the
2657	      receiver are unknown.

2659	   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
2660	      the "a=fmtp" line of SDP or conveyed using the "fmtp" source
2661	      attribute as specified in section 6.3 of [RFC5576]), are used for
2662	      out-of-band transport of the parameter sets (VPS, SPS, or PPS
2663	      respectively).  However, when out-of-band transport of parameter
2664	      sets  is  used,  parameter  sets  MAY  still  be  additionally
2665	      transported   in-band   unless   explicitly   disallowed   by   an
2666	      application.

2668	   o  The answerer MAY use either out-of-band or in-band transport of
2669	      parameter sets for the stream it is sending, regardless of
2670	      whether out-of-band parameter sets transport has been used in the
2671	      offerer-to-answerer direction.  Parameter sets included in an
2672	      answer are independent of those parameter sets included in the
2673	      offer, as they are used for decoding two different video streams,
2674	      one from the answerer to the offerer and the other in the
2675	      opposite direction.

2677	   o  The following rules apply to transport of parameter set in the
2678	      offerer-to-answerer direction.

2680	       o An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
2681	          If none of these parameters is present in the offer, then
2682	          only in-band transport of parameter sets is used.

2684	       o If the level to use in the offerer-to-answerer direction is
2685	          equal to the default level in the offer, the answerer MUST be
2686	          prepared to use the parameter sets included in sprop-vps,
2687	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
2688	          line of SDP or conveyed using the "fmtp" source attribute)
2689	          for decoding the incoming NAL unit stream.  Otherwise, the
2690	          answerer  MUST  ignore  sprop-vps,  sprop-sps,  and  sprop-pps
2691	          (either included in the "a=fmtp" line of SDP or conveyed
2692	          using the "fmtp" source attribute) and the offerer MUST
2693	          transmit parameter sets in-band.

2695	       o In MST, the answerer MUST be prepared to use the parameter
2696	          sets included in sprop-vps, sprop-sps, and sprop-pps of all
2697	          RTP streams that a particular RTP stream depends on, when
2698	          present (either included in the "a=fmtp" line of SDP or
2699	          conveyed using the "fmtp" source attribute), for decoding the
2700	          incoming NAL unit stream.

2702	   o  The following rules apply to transport of parameter set in the
2703	      answerer-to-offerer direction.

2705	       o An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
2706	          If none of these parameters is present in the answer, then
2707	          only in-band transport of parameter sets is used.

2709	       o If the level to use in the answerer-to-offerer direction is
2710	          equal to the default level in the answer, the offerer MUST be
2711	          prepared to use the parameter sets included in sprop-vps,
2712	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
2713	          line of SDP or conveyed using the "fmtp" source attribute)
2714	          for decoding the incoming NAL unit stream.  Otherwise, the
2715	          offerer  MUST  ignore  sprop-vps,  sprop-sps,  and  sprop-pps
2716	          (either included in the "a=fmtp" line of SDP or conveyed
2717	          using the "fmtp" source attribute) and the answerer MUST
2718	          transmit parameter sets in-band.

2720	       o In MST, the offerer MUST be prepared to use the parameter
2721	          sets included in sprop-vps, sprop-sps, and sprop-pps of all
2722	          RTP streams that a particular RTP stream depends on, when
2723	          present (either included in the "a=fmtp" line of SDP or
2724	          conveyed using the "fmtp" source attribute), for decoding the
2725	          incoming NAL unit stream.

2727	   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
2728	      the "fmtp" source attribute as specified in section 6.3 of
2729	      [RFC5576],  the  receiver  of  the  parameters  MUST  store  the
2730	      parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps
2731	      and associate them with the source given as part of the "fmtp"
2732	      source attribute.  Parameter sets associated with one source MUST
2733	      only be used to decode NAL units conveyed in RTP packets from the
2734	      same source.  When this mechanism is in use, SSRC collision
2735	      detection  and  resolution  MUST  be  performed  as  specified  in
2736	      [RFC5576].

2738	   For streams being delivered over multicast, the following rules
2739	   apply:

2741	   o  The media format configuration is identified by profile-space,
2742	      profile-id,   tier-flag,   level-id,   interop-constraints,   and
2743	      profile-compatibility-indicator.        These    media    format
2744	      configuration  parameters,  including  level-id,  MUST  be  used
2745	      symmetrically; that is, the answerer MUST either maintain all
2746	      configuration parameters or remove the media format (payload
2747	      type) completely.  Note that this implies that the level-id for
2748	      Offer/Answer in multicast is not changeable.

2750	   o  To simplify the handling and matching of these configurations,
2751	      the same RTP payload type number used in the offer SHOULD also be
2752	      used in the answer, as specified in [RFC3264].  An answer MUST
2753	      NOT contain a payload type number used in the offer unless the
2754	      configuration is the same as in the offer.

2756	   o  Parameter sets received MUST be associated with the originating
2757	      source and MUST only be used in decoding the incoming NAL unit
2758	      stream from the same source.

2760	   o  The rules for other parameters are the same as above for unicast
2761	      as long as the above rules are obeyed.

2763	   Table 1 lists the interpretation of all the parameters that MUST be
2764	   used for the various combinations of offer, answer, and direction
2765	   attributes.  Note that the two columns wherein the recv-sub-layer-id
2766	   parameter is used only apply to answers, whereas the other columns
2767	   apply to both offers and answers.

2769	   Table 1.  Interpretation of parameters for various combinations of
2770	   offers, answers, direction attributes, with and without recv-sub-
2771	   layer-id.  Columns that do not indicate offer or answer apply to
2772	   both.

2774	                                          sendonly --+
2775	            answer: recvonly, recv-sub-layer-id --+  |
2776	              recvonly w/o recv-sub-layer-id --+  |  |
2777	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
2778	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
2779	                                         |  |  |  |  |
2780	      profile-space                      C  X  C  X  P
2781	      profile-id                         C  X  C  X  P
2782	      tier-flag                          C  X  C  X  P
2783	      level-id                           C  X  C  X  P
2784	      interop-constraints                C  X  C  X  P
2785	      profile-compatibility-indicator    C  X  C  X  P
2786	      max-recv-level-id                  R  R  R  R  -
2787	      sprop-depack-buf-nalus             P  P  -  -  P
2788	      sprop-depack-buf-bytes             P  P  -  -  P
2789	      depack-buf-cap                     R  R  R  R  -
2790	      sprop-segmentation-id              P  P  P  P  P
2791	      sprop-spatial-segmentation-idc     P  P  P  P  P
2792	      max-br                             R  R  R  R  -
2793	      max-cpb                            R  R  R  R  -
2794	      max-dpb                            R  R  R  R  -
2795	      max-lsr                            R  R  R  R  -
2796	      max-lps                            R  R  R  R  -
2797	      max-tr                             R  R  R  R  -
2798	      max-tc                             R  R  R  R  -
2799	      max-fps                            R  R  R  R  -
2800	      sprop-vps                          P  P  -  -  P
2801	      sprop-sps                          P  P  -  -  P
2802	      sprop-pps                          P  P  -  -  P
2803	      sub-layer-id                       P  P  -  -  P
2804	      recv-sub-layer-id                  X  O  X  O  -
2805	      dec-parallel-cap                   R  R  R  R  -

2807	     Legend:

2809	      C: configuration for sending and receiving streams
2810	      P: properties of the stream to be sent
2811	      R: receiver capabilities
2812	      O: operation point selection
2813	      X: MUST NOT be present
2814	      -: not usable, when present SHOULD be ignored

2816	   Parameters used for declaring receiver capabilities are in general
2817	   downgradable; i.e. they express the upper limit for a sender's
2818	   possible behavior.  Thus, a sender MAY select to set its encoder
2819	   using only lower/lesser or equal values of these parameters.

2821	   Parameters declaring a configuration point are not changeable, with
2822	   the exception of the level-id parameter for unicast usage.  This
2823	   expresses values a receiver expects to be used and MUST be used
2824	   verbatim on the sender side.  If level-id is changed, an answerer
2825	   MUST NOT include the recv-sub-layer-id parameter.

2827	   When  a  sender's  capabilities  are  declared,  and  non-changeable
2828	   parameters are used in this declaration, these parameters express a
2829	   configuration that is acceptable for the sender to receive streams.
2830	   In order to achieve high interoperability levels, it is often
2831	   advisable to offer multiple alternative configurations.  It is
2832	   impossible to offer multiple configurations in a single payload
2833	   type.  Thus, when multiple configuration offers are made, each offer
2834	   requires its own RTP payload type associated with the offer.

2836	   A receiver SHOULD understand all media type parameters, even if it
2837	   only supports a subset of the payload format's functionality.  This
2838	   ensures that a receiver is capable of understanding when an offer to
2839	   receive media can be downgraded to what is supported by the receiver
2840	   of the offer.

2842	   An answerer MAY extend the offer with additional media format
2843	   configurations.  However, to enable their usage, in most cases a
2844	   second offer is required from the offerer to provide the stream
2845	   property parameters that the media sender will use.  This also has
2846	   the effect that the offerer has to be able to receive this media
2847	   format configuration, not only to send it.

2849	7.2.3 Usage in Declarative Session Descriptions

2851	   When HEVC over RTP is offered with SDP in a declarative style, as in
2852	   Real  Time  Streaming  Protocol  (RTSP)  [RFC2326]  or  Session
2853	   Announcement Protocol (SAP) [RFC2974], the following considerations
2854	   are necessary.

2856	   o  All parameters capable of indicating both stream properties and
2857	      receiver  capabilities  are  used  to  indicate  only  stream
2858	      properties.  For example, in this case, the parameter profile-
2859	      tier-level-id declares the values used by the stream, not the
2860	      capabilities for receiving streams.  This results in that the
2861	      following interpretation of the parameters MUST be used:

2863	   Declaring actual configuration or stream properties:

2865	     - profile-space
2866	     - profile-id
2867	     - tier-flag
2868	     - level-id
2869	     - interop-constraints
2870	     - sprop-vps
2871	     - sprop-sps
2872	     - sprop-pps
2873	     - sprop-depack-buf-nalus
2874	     - sprop-depack-buf-bytes
2875	     - sprop-segmentation-id
2876	     - sprop-spatial-segmentation-idc

2878	   Not usable (when present, they SHOULD be ignored):

2880	     - max-lps
2881	     - max-lsr
2882	     - max-cpb
2883	     - max-dpb
2884	     - max-br
2885	     - max-tr
2886	     - max-tc
2887	     - max-fps
2888	     - max-recv-level-id
2889	     - depack-buf-cap
2890	     - sub-layer-id
2891	     - dec-parallel-cap

2893	   o  A receiver of the SDP is required to support all parameters and
2894	      values of the parameters provided; otherwise, the receiver MUST
2895	      reject (RTSP) or not participate in (SAP) the session.  It falls
2896	      on the creator of the session to use values that are expected to
2897	      be supported by the receiving application.

2899	7.2.4 Parameter Sets Considerations

2901	   If MST is used, the rules on signaling media decoding dependency in
2902	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
2903	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
2904	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
2905	   with more than one address.  The order of session dependency is
2906	   given from the RTP stream containing the lowest temporal sub-layer
2907	   to the RTP stream containing the highest temporal sub-layer.

2909	7.2.5 Dependency Signaling in Multi-Session Transmission

2911	   If MST is used, the rules on signaling media decoding dependency in
2912	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
2913	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
2914	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
2915	   with more than one address.  The order of session dependency is
2916	   given from the RTP stream containing the lowest temporal sub-layer
2917	   to the RTP stream containing the highest temporal sub-layer.

2919	8. Use with Feedback Messages

2921	   As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
2922	   Feedback messages are identified by the RTCP packet type value PSFB
2923	   (206).    AVPF  [RFC4585]  defines  three  payload-specific  feedback
2924	   messages  and  one  application  layer  feedback  message,  and  CCM
2925	   [RFC5104] specifies four payload-specific feedback messages.

2927	   These feedback messages are identified by means of the feedback
2928	   message type (FMT) parameter as follows:

2930	   Assigned in [RFC4585]:

2932	      1:     Picture Loss Indication (PLI)
2933	      2:     Slice Lost Indication (SLI)
2934	      3:     Reference Picture Selection Indication (RPSI)
2935	      15:    Application layer FB message
2936	      31:    reserved for future expansion of the number space

2938	   Assigned in [RFC5104]:

2940	      4:     Full Intra Request (FIR) Command
2941	      5:     Temporal-Spatial Trade-off Request (TSTR)
2942	      6:     Temporal-Spatial Trade-off Notification (TSTN)
2943	      7:     Video Back Channel Message (VBCM)

2945	   Unassigned:

2947	      0:      unassigned
2948	      8-14:   unassigned
2949	      16-30:  unassigned

2951	   The following subsection defines how to use HEVC with the RPSI
2952	   message,  for  the  purpose  of  feedback  based  reference  picture
2953	   selection for improved error resilience in real-time conversational
2954	   video applications such as video telephone and video conferencing.

2956	   Feedback based reference picture selection has been shown as a
2957	   powerful tool to stop temporal error propagation for improved error
2958	   resilience [Girod99][Wang05].  In one approach, the decoder side
2959	   tracks errors in the decoded pictures and informs to the encoder
2960	   side that a particular picture that has been decoded relatively
2961	   earlier is correct and still present in the decoded picture buffer
2962	   and requests the encoder to use that correct picture for reference
2963	   when encoding the next picture, so to stop further temporal error
2964	   propagation.  For this approach, the decoder side should use the
2965	   RPSI feedback message.

2967	   Encoders can encode some long-term reference pictures as specified
2968	   in H.264 or HEVC for purposes described in the previous paragraph
2969	   without the need of a huge decoded picture buffer.  As shown in
2970	   [Wang05], with a flexible reference picture management scheme as in
2971	   H.264 and HEVC, even a decoded picture buffer size of two would work
2972	   for the approach described in the previous paragraph.

2974	8.1 Use of HEVC with the RPSI Feedback Message

2976	   The field "Native RPSI bit string defined per codec" is a base16
2977	   [RFC4648]  representation  of  the  8  bits  consisting  of  2  most
2978	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
2979	   in [HEVC], followed by the 32 bits representing the value of the
2980	   PicOrderCntVal (in network byte order), as defined in [HEVC], for
2981	   the picture that is requested to be used for reference when encoding
2982	   the next picture.

2984	   The use of the RPSI feedback message as positive acknowledgement
2985	   with HEVC is deprecated.  In other words, the RPSI feedback message
2986	   MUST only be used as a reference picture selection request, such
2987	   that it can also be used in multicast.

2989	9. Security Considerations

2991	   RTP packets using the payload format defined in this specification
2992	   are subject to the security considerations discussed in the RTP
2993	   specification [RFC3550], and in any applicable RTP profile such as
2994	   RTP/AVP  [RFC3551],  RTP/AVPF  [RFC4585],  RTP/SAVP  [RFC3711]  or
2995	   RTP/SAVPF  [RFC5124].    However,  as  "Securing  the  RTP  Protocol
2996	   Framework:  Why  RTP  Does  Not  Mandate  a  Single  Media  Security
2997	   Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
2998	   RTP payload format's responsibility to discuss or mandate what
2999	   solutions  are  used  to  meet  the  basic  security  goals  like
3000	   confidentiality,  integrity,  and  source  authenticity  for  RTP  in
3001	   general.  This responsibility lays on anyone using RTP in an
3002	   application.    They  can  find  guidance  on  available  security
3003	   mechanisms and important considerations as discussed in "Options for
3004	   Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].

3006	   The rest of this section discusses the security impacting properties
3007	   of the payload format itself.

3009	   Because the data compression used with this payload format is
3010	   applied end-to-end, any encryption needs to be performed after
3011	   compression.  A potential denial-of-service threat exists for data
3012	   encodings  using  compression  techniques  that  have  non-uniform
3013	   receiver-end  computational  load.    The  attacker  can  inject
3014	   pathological datagrams into the stream that are complex to decode
3015	   and that cause the receiver to be overloaded.  H.265 is particularly
3016	   vulnerable to such attacks, as it is extremely simple to generate
3017	   datagrams containing NAL units that affect the decoding process of
3018	   many  future  NAL  units.    Therefore,  the  usage  of  data  origin
3019	   authentication and data integrity protection of at least the RTP
3020	   packet is RECOMMENDED, for example, with SRTP [RFC 3711].

3022	   Note that the appropriate mechanism to ensure confidentiality and
3023	   integrity of RTP packets and their payloads is very dependent on the
3024	   application and on the transport and signaling protocols employed.
3025	   Thus, although SRTP is given as an example above, other possible
3026	   choices exist.

3028	   Decoders MUST exercise caution with respect to the handling of user
3029	   data SEI messages, particularly if they contain active elements, and
3030	   MUST restrict their domain of applicability to the presentation
3031	   containing the stream.

3033	   End-to-end    security    with    authentication,    integrity,    or
3034	   confidentiality  protection  will  prevent  a  MANE  from  performing
3035	   media-aware operations other than discarding complete packets.  In
3036	   the case of confidentiality protection, it will even be prevented
3037	   from discarding packets in a media-aware way.  To be allowed to
3038	   perform such operations, a MANE is required to be a trusted entity
3039	   that is included in the security context establishment.

3041	10. Congestion Control

3043	   Congestion control for RTP SHALL be used in accordance with RTP
3044	   [RFC3550] and with any applicable RTP profile, e.g. AVP [RFC 3551].
3045	   If best-effort service is being used, an additional requirement is
3046	   that users of this payload format MUST monitor packet loss to ensure
3047	   that the packet loss rate is within an acceptable range.  Packet
3048	   loss is considered acceptable if a TCP flow across the same network
3049	   path, and experiencing the same network conditions, would achieve an
3050	   average throughput, measured on a reasonable timescale, that is not
3051	   less than the RTP flow is achieving.  This condition can be
3052	   satisfied by implementing congestion control mechanisms to adapt the
3053	   transmission rate, the number of layers subscribed for a layered
3054	   multicast session, or by arranging for a receiver to leave the
3055	   session if the loss rate is unacceptably high.

3057	   The bitrate adaptation necessary for obeying the congestion control
3058	   principle is easily achievable when real-time encoding is used, for
3059	   example by adequately tuning the quantization parameter.

3061	   However, when pre-encoded content is being transmitted, bandwidth
3062	   adaptation requires the pre-coded bitstream to be tailored for such
3063	   adaptivity.    The  key  mechanism  available  in  HEVC  is  temporal
3064	   scalability.  A media sender can remove NAL units belonging to
3065	   higher temporal sub-layers (i.e. those NAL units with a high value
3066	   of TID) until the sending bitrate drops to an acceptable range.
3067	   HEVC contains mechanisms that allow the lightweight identification
3068	   of switching points in temporal enhancement layers, as discussed in
3069	   Section 1.1.2 of this memo.  An HEVC media sender can send packets
3070	   belonging to NAL units of temporal enhancement layers starting from
3071	   these switching points to probe for available bandwidth and to
3072	   utilized bandwidth that has been shown to be available.

3074	   Above mechanisms generally work within a defined profile and level
3075	   and, therefore, no renegotiation of the channel is required.  Only
3076	   when non-downgradable parameters (such as profile) are required to
3077	   be changed does it become necessary to terminate and restart the
3078	   media stream.  This may be accomplished by using a different RTP
3079	   payload type.

3081	   MANEs MAY remove certain unusable packets from the packet stream
3082	   when that stream was damaged due to previous packet losses.  This
3083	   can help reduce the network load in certain special cases.  For
3084	   example, MANES can remove those FUs where the leading FUs belonging
3085	   to the same NAL unit have been lost or those dependent slice
3086	   segments when the leading slice segments belonging to the same slice
3087	   have been lost, because the trailing FUs or dependent slice segments
3088	   are meaningless to most decoders.  MANES can also remove higher
3089	   temporal scalable layers if the outbound transmission (from the
3090	   MANE's viewpoint) experiences congestion.

3092	11. IANA Consideration

3094	   A new media type, as specified in Section 7.1 of this memo, should
3095	   be registered with IANA.

3097	12. Acknowledgements

3099	   Muhammed Coban and Marta Karczewicz are thanked for discussions on
3100	   the specification of the use with feedback messages and other
3101	   aspects in this memo.  Jonathan Lennox and Jill Boyce are thanked
3102	   for their contributions to the PACI design included in this memo.
3103	   Rickard Sjoberg, Arild Fuldseth, Bo Burman Magnus Westerlund, and
3104	   Tom Kristensen are thanked for their contributions to parallel
3105	   processing related signalling.  Bernard Aboba, Roni Even, Rickard
3106	   Sjoberg,  Sachin  Deshpande,  Woo  Johnman,  Mo  Zanaty,  and  Ross
3107	   Finlayson made valuable reviewing comments that led to improvements.

3109	   This document was prepared using 2-Word-v2.0.template.dot.

3111	13. References

3113	13.1 Normative References

3115	   [HEVC]    ITU-T  Recommendation  H.265,  "High  efficiency  video
3116	             coding", April 2013.

3118	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
3119	             generic audiovisual services", April 2013.

3121	   [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
3122	             Dependency in the Session Description Protocol (SDP)", RFC
3123	             5583, July 2009.

3125	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
3126	             Payload Format for H.264 Video", RFC 6184, May 2011.

3128	   [RFC6190] Wenger,   S.,   Wang,   Y.-K.,   Schierl,   T.,   and   A.
3129	             Eleftheriadis,  "RTP  Payload  Format  for  Scalable  Video
3130	             Coding", RFC 6190, May 2011.

3132	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
3133	             Requirement Levels", BCP 14, RFC 2119, March 1997.

3135	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
3136	             with Session Description Protocol (SDP)", RFC 3264, June
3137	             2002.

3139	   [RFC4648] Josefsson,  S.,  "The  Base16,  Base32,  and  Base64  Data
3140	             Encodings", RFC 4648, October 2006.

3142	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
3143	             V.,   "RTP:   A   Transport   Protocol   for   Real-Time
3144	             Applications", STD 64, RFC 3550, July 2003.

3146	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
3147	             Description Protocol", RFC 4566, July 2006.

3149	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
3150	             Media Attributes in the Session Description Protocol", RFC
3151	             5576, June 2009.

3153	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
3154	             J., "Extended RTP Profile for Real-time Transport Control
3155	             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
3156	             2006.

3158	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
3159	             "Codec Control Messages in the RTP Audio-Visual Profile
3160	             with Feedback (AVPF)", RFC 5104, February 2008.

3162	13.2 Informative References

3164	   [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
3165	             Streaming Service (PSS); Progressive Download and Dynamic
3166	             Adaptive  Streaming  over  HTTP  (3GP-DASH)",  v12.1.0,
3167	             December 2013.

3169	   [3GPPFF]  3GPP TS 26.244, "Transparent end-to-end packet switched
3170	             streaming service (PSS); 3GPP file format (3GP)", v12.20,
3171	             December 2013.

3173	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
3174	             for mobile video transmission", Proceedings IEEE, Vol. 87,
3175	             No. 10, pp. 1707-1723, October 1999.

3177	   [I-D.ietf-avt-srtp-not-mandatory]
3178	             Perkins,  C.  and  M.  Westerlund,  "Securing  the  RTP
3179	             ProtocolFramework:  Why  RTP  Does  Not  Mandate  a  Single
3180	             MediaSecurity      Solution",      draft-ietf-avt-srtp-not-
3181	             mandatory-16 (work in progress), January 2014.

3183	   [I-D.ietf-avtcore-rtp-security-options]
3184	             Westerlund, M. and C. Perkins, "Options for Securing RTP
3185	             Sessions",       draft-ietf-avtcore-rtp-security-options-10
3186	             (work in progress), January 2014.

3188	   [I-D.ietf-avtcore-rtp-multi-stream]
3189	             Lennox,  J.,  Westerlund,  M.,  Wu,  W.,  and  C.  Perkins,
3190	             "Sending Multiple Media Streams in a Single RTP Session",
3191	             draft-ietf-avtcore-rtp-multi-stream-01 (work in progress),
3192	             July 2013.

3194	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
3195	             Holmberg,   C.,   Alvestrand,   H.,   and   C.   Jennings,
3196	             "Multiplexing   Negotiation   Using   Session   Description
3197	             Protocol  (SDP)  Port  Numbers",  draft-ietf-mmusic-sdp-
3198	             bundle-negotiation-05 (work in progress), October 2013.

3200	   [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
3201	             Coding of audio-visual objects - Part 12: ISO base media
3202	             file format" | "Information technology - JPEG 2000 image
3203	             coding system - Part 12: ISO base media file format",
3204	             2012.

3206	   [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
3207	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107,
3208	             10th JCT-VC meeting, July 2012, Stockholm, Sweden.

3210	   [MPEG2S]  ISO/IEC 13818-1, "Information technology - Generic coding
3211	             of  moving  pictures  and  associated  audio  information:
3212	             Systems", 2013.

3214	   [MPEGDASH] ISO/IEC  23009-1,  "Information  technology  -  Dynamic
3215	             adaptive  streaming  over  HTTP  (DASH)  -  Part  1:  Media
3216	             presentation description and segment formats", 2012.

3218	   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
3219	             Correction", RFC 5109, December 2007.

3221	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
3222	             coding   using   flexible   reference   fames",   Visual
3223	             Communications and Image Processing 2005 (VCIP 2005), July
3224	             2005, Beijing, China.

3226	14. Authors' Addresses

3228	   Ye-Kui Wang
3229	   Qualcomm Incorporated
3230	   5775 Morehouse Drive
3231	   San Diego, CA 92121
3232	   USA
3233	   Phone: +1-858-651-8345
3234	   EMail: yekuiw@qti.qualcomm.com

3236	   Yago Sanchez
3237	   Fraunhofer HHI
3238	   Einsteinufer 37
3239	   D-10587 Berlin
3240	   Germany
3241	   Phone: +49-30-31002-227
3242	   Email: yago.sanchez@hhi.fraunhofer.de

3244	   Thomas Schierl
3245	   Fraunhofer HHI
3246	   Einsteinufer 37
3247	   D-10587 Berlin
3248	   Germany
3249	   Phone: +49-30-31002-227
3250	   Email: ts@thomas-schierl.de

3252	   Stephan Wenger
3253	   Vidyo, Inc.
3254	   433 Hackensack Ave., 7th floor
3255	   Hackensack, N.J. 07601
3256	   USA
3257	   Phone: +1-415-713-5473
3258	   EMail: stewe@stewe.org

3260	   Miska M. Hannuksela
3261	   Nokia Corporation
3262	   P.O. Box 1000
3263	   33721 Tampere
3264	   Finland
3265	   Phone: +358-7180-08000
3266	   EMail: miska.hannuksela@nokia.com