idnits 2.17.1 

draft-ietf-payload-rtp-h265-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 165 instances of weird spacing in the document.  Is it really
     formatted ragged-right, rather than justified?

  ** There are 3 instances of too long lines in the document, the longest one
     being 14 characters in excess of 72.

  ** The abstract seems to contain references ([HEVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 27 has weird spacing: '...   at  any  ti...'

  == Line 30 has weird spacing: '...   The  list  ...'

  == Line 45 has weird spacing: '...fo)  in  effec...'

  == Line 46 has weird spacing: '...ication  of  t...'

  == Line 47 has weird spacing: '...ly,  as  they ...'

  == (160 more instances...)

  -- The document date (September 6, 2013) is 3885 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '3GP' is mentioned on line 266, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 996

  == Missing Reference: 'RFC5234' is mentioned on line 2133, but not defined

  == Missing Reference: 'RFC5117' is mentioned on line 2318, but not defined

  ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667)

  == Missing Reference: 'RFC2326' is mentioned on line 2543, but not defined

  ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826)

  == Missing Reference: 'RFC2974' is mentioned on line 2544, but not defined

  == Missing Reference: 'RFC5583' is mentioned on line 2594, but not defined

  == Missing Reference: 'RFC3551' is mentioned on line 2754, but not defined

  == Missing Reference: 'RFC3711' is mentioned on line 2754, but not defined

  == Missing Reference: 'RFC5124' is mentioned on line 2755, but not defined

  == Missing Reference: 'I-D.ietf-avt-srtp-not-mandatory' is mentioned on
     line 2757, but not defined

  == Missing Reference: 'I-D.ietf-avtcore-rtp-security-options' is mentioned
     on line 2764, but not defined

  == Missing Reference: 'RFC 3711' is mentioned on line 2780, but not defined

  == Missing Reference: 'RFC 3551' is mentioned on line 2804, but not defined

  == Unused Reference: 'RFC6051' is defined on line 2887, but no explicit
     reference was found in the text

  == Unused Reference: '3GPPFF' is defined on line 2927, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5109' is defined on line 2943, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)


     Summary: 6 errors (**), 0 flaws (~~), 24 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: March 2014                                          T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                      September 6, 2013

12	            RTP Payload Format for High Efficiency Video Coding
13	                    draft-ietf-payload-rtp-h265-01.txt

15	Status of this Memo

17	   This Internet-Draft is submitted to IETF in full conformance with
18	   the provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six
26	   months and may be updated, replaced, or obsoleted by other documents
27	   at  any  time.    It  is  inappropriate  to  use  Internet-Drafts  as
28	   reference material or to cite them other than as "work in progress."

30	   The  list  of  current  Internet-Drafts  can  be  accessed  at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This Internet-Draft will expire on December 11, 2013.

38	Copyright and License Notice

40	   Copyright (c) 2013 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions         Relating         to         IETF         Documents
45	   (http://trustee.ietf.org/license-info)  in  effect  on  the  date  of
46	   publication  of  this  document.    Please  review  these  documents
47	   carefully,  as  they  describe  your  rights  and  restrictions  with
48	   respect to this document.  Code Components extracted from this
49	   document must include Simplified BSD License text as described in
50	   Section 4.e of the Trust Legal Provisions and are provided without
51	   warranty as described in the Simplified BSD License.

53	Abstract

55	   This memo describes an RTP payload format for the video coding
56	   standard  ITU-T  Recommendation  H.265  and  ISO/IEC  International
57	   Standard 23008-2, both also known as High Efficiency Video Coding
58	   (HEVC) [HEVC], developed by the Joint Collaborative Team on Video
59	   Coding (JCT-VC).  The RTP payload format allows for packetization of
60	   one or more Network Abstraction Layer (NAL) units in each RTP packet
61	   payload, as well as fragmentation of a NAL unit into multiple RTP
62	   packets.  Furthermore, it supports transmission of an HEVC stream
63	   over a single as well as multiple RTP flows.  The payload format has
64	   wide applicability in videoconferencing, Internet video streaming,
65	   and high bit-rate entertainment-quality video, among others.

67	Table of Contents

69	   Status of this Memo...............................................1
70	   Abstract..........................................................3
71	   Table of Contents.................................................3
72	   1 . Introduction..................................................5
73	      1.1 . Overview of the HEVC Codec...............................5
74	         1.1.1 Coding-Tool Features..................................5
75	         1.1.2 Systems and Transport Interfaces......................7
76	         1.1.3 Parallel Processing Support..........................13
77	         1.1.4 NAL Unit Header......................................15
78	      1.2 . Overview of the Payload Format..........................17
79	   2 . Conventions..................................................17
80	   3 . Definitions and Abbreviations................................17
81	      3.1 Definitions...............................................17
82	         3.1.1 Definitions from the HEVC Specification..............18
83	         3.1.2 Definitions Specific to This Memo....................19
84	      3.2 Abbreviations.............................................20
85	   4 . RTP Payload Format...........................................22
86	      4.1 RTP Header Usage..........................................22
87	      4.2 Payload Structures........................................23
88	      4.3 Transmission Modes........................................24
89	      4.4 Decoding Order Number.....................................25
90	      4.5 Single NAL Unit Packets...................................27
91	      4.6 Aggregation Packets (APs).................................27
92	      4.7 Fragmentation Units (FUs).................................32
93	   5 . Packetization Rules..........................................36
94	   6 . De-packetization Process.....................................37
95	   7 . Payload Format Parameters....................................38
96	      7.1 Media Type Registration...................................39
97	      7.2 SDP Parameters............................................52
98	         7.2.1 Mapping of Payload Type Parameters to SDP............53
99	         7.2.2 Usage with SDP Offer/Answer Model....................54
100	         7.2.3 Usage in Declarative Session Descriptions............58
101	         7.2.4 Dependency Signaling in Multi-Session Transmission...60
102	   8 . Use with Feedback Messages...................................60
103	      8.1 Definition of the SPLI Feedback Message...................62
104	      8.2 Use of HEVC with the RPSI Feedback Message................63
105	      8.3 Use of HEVC with the SPLI Feedback Message................63
106	   9 . Security Considerations......................................63
107	   10 . Congestion Control..........................................65
108	   11 . IANA Consideration..........................................66
109	   12 . Acknowledgements............................................66
110	   13 . References..................................................66
111	      13.1 Normative References.....................................66
112	      13.2 Informative References...................................67
113	   14 . Authors' Addresses..........................................68

115	1. Introduction

117	1.1. Overview of the HEVC Codec

119	   High  Efficiency  Video  Coding  [HEVC],  formally  known  as  ITU-T
120	   Recommendation H.265 and ISO/IEC International Standard 23008-2 was
121	   ratified by ITU-T in April 2013 and reportedly provides significant
122	   coding efficiency gains over H.264 [H.264].

124	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
125	   widely deployed and generally known in the relevant implementer
126	   communities,  frequently  only  the  differences  between  those  two
127	   specifications are highlighted in non-normative, explanatory parts
128	   of this memo.  Basic familiarity with both specifications is assumed
129	   for those parts.  However, the normative parts of this memo do not
130	   require study of H.264 or its RTP payload format.

132	   H.264  and  HEVC  share  a  similar  hybrid  video  codec  design.
133	   Conceptually, both technologies include a video coding layer (VCL),
134	   which is often used to refer to the coding-tool features, and a
135	   network abstraction layer (NAL), which is often used to refer to the
136	   systems and transport interface aspects of the codecs.

138	1.1.1 Coding-Tool Features

140	   Similarly to earlier hybrid-video-coding-based standards, including
141	   H.264, the following basic video coding design is employed by HEVC.
142	   A prediction signal is first formed either by intra or motion
143	   compensated prediction, and the residual (the difference between the
144	   original and the prediction) is then coded.  The gains in coding
145	   efficiency are achieved by redesigning and improving almost all
146	   parts of the codec over earlier designs.  In addition, HEVC includes
147	   several tools to make the implementation on parallel architectures
148	   easier.  Below is a summary of HEVC coding-tool features.

150	   Quad-tree block and transform structure

152	   One of the major tools that contribute significantly to the coding
153	   efficiency of HEVC is the usage of flexible coding blocks and
154	   transforms, which are defined in a hierarchical quad-tree manner.
155	   Unlike H.264, where the basic coding block is a macroblock of fixed
156	   size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
157	   of 64x64.  Each CTU can be divided into smaller units in a
158	   hierarchical quad-tree manner and can represent smaller blocks down
159	   to size 4x4.  Similarly, the transforms used in HEVC can have
160	   different sizes, starting from 4x4 and going up to 32x32.  Utilizing
161	   large blocks and transforms contribute to the major gain of HEVC,
162	   especially at high resolutions.

164	   Entropy coding

166	   HEVC uses a single entropy coding engine, which is based on Context
167	   Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two
168	   distinct  entropy  coding  engines.    CABAC  in  HEVC  shares  many
169	   similarities with CABAC of H.264, but contains several improvements.
170	   Those  include  improvements  in  coding  efficiency  and  lowered
171	   implementation complexity, especially for parallel architectures.

173	   In-loop filtering

175	   H.264 includes an in-loop adaptive deblocking filter, where the
176	   blocking artifacts around the transform edges in the reconstructed
177	   picture are smoothed to improve the picture quality and compression
178	   efficiency.  In HEVC, a similar deblocking filter is employed but
179	   with somewhat lower complexity.  In addition, pictures undergo a
180	   subsequent filtering operation called Sample Adaptive Offset (SAO),
181	   which is a new design element in HEVC.  SAO basically adds a pixel-
182	   level offset in an adaptive manner and usually acts as a de-ringing
183	   filter.  It is observed that SAO improves the picture quality,
184	   especially around sharp edges contributing substantially to visual
185	   quality improvements of HEVC.

187	   Motion prediction and coding

189	   There have been a number of improvements in this area that are
190	   summarized as follows.  The first category is motion merge and
191	   advanced  motion  vector  prediction  (AMVP)  modes.    The  motion
192	   information of a prediction block can be inferred from the spatially
193	   or temporally neighboring blocks.  This is similar to the DIRECT
194	   mode in H.264 but includes new aspects to incorporate the flexible
195	   quad-tree   structure   and   methods   to   improve   the   parallel
196	   implementations.  In addition, the motion vector predictor can be
197	   signaled for improved efficiency.  The second category is high-
198	   precision  interpolation.    The  interpolation  filter  length  is
199	   increased to 8-tap from 6-tap, which improves the coding efficiency
200	   but  also  comes  with  increased  complexity.    In  addition,  the
201	   interpolation filter is defined with higher precision without any
202	   intermediate  rounding  operations  to  further  improve  the  coding
203	   efficiency.

205	   Intra prediction and intra coding

207	   Compared to 8 intra prediction modes in H.264, HEVC supports angular
208	   intra prediction with 33 directions.  This increased flexibility
209	   improves both objective coding efficiency and visual quality as the
210	   edges can be better predicted and ringing artifacts around the edges
211	   can be reduced.  In addition, the reference samples are adaptively
212	   smoothed based on the prediction direction.  To avoid contouring
213	   artifacts a new interpolative prediction generation is included to
214	   improve the visual quality.  Furthermore, discrete sine transform
215	   (DST) is utilized instead of traditional discrete cosine transform
216	   (DCT) for 4x4 intra transform blocks.

218	   Other coding-tool features

220	   HEVC includes some tools for lossless coding and efficient screen
221	   content coding, such as skipping the transform for certain blocks.
222	   These tools are particularly useful for example when streaming the
223	   user-interface of a mobile device to a large display.

225	1.1.2 Systems and Transport Interfaces

227	   HEVC inherited the basic systems and transport interfaces designs,
228	   such as the NAL-unit-based syntax structure, the hierarchical syntax
229	   and data unit structure from sequence-level parameter sets, multi-
230	   picture-level or picture-level parameter sets, slice-level header
231	   parameters,  lower-level  parameters,  the  supplemental  enhancement
232	   information  (SEI)  message  mechanism,  the  hypothetical  reference
233	   decoder (HRD) based video buffering model, and so on.  In the
234	   following, a list of differences in these aspects compared to H.264
235	   is summarized.

237	   Video parameter set

239	   A new type of parameter set, called video parameter set (VPS), was
240	   introduced.  For the first (2013) version of [HEVC], the video
241	   parameter set NAL unit is required to be available prior to its
242	   activation, while the information contained in the video parameter
243	   set is not necessary for operation of the decoding process.  For
244	   future HEVC extensions, such as the 3D or scalable extensions, the
245	   video parameter set is expected to include information necessary for
246	   operation of the decoding process, e.g. decoding dependency or
247	   information for reference picture set construction of enhancement
248	   layers.  The VPS provides a "big picture" of a bitstream, including
249	   what types of operation points are provided, the profile, tier, and
250	   level of the operation points, and some other high-level properties
251	   of  the  bitstream  that  can  be  used  as  the  basis  for  session
252	   negotiation and content selection, etc. (see section 7.1).

254	   Profile, tier and level

256	   The profile, tier and level syntax structure that can be included in
257	   both VPS and sequence parameter set (SPS) includes 12 bytes data to
258	   describe the entire bitstream (including all temporally scalable
259	   layers,  which  are  referred  to  as  sub-layers  in  the  HEVC
260	   specification), and can optionally include more profile, tier and
261	   level  information  pertaining  to  individual  temporally  scalable
262	   layers.  The profile indicator indicates the "best viewed as"
263	   profile when the bitstream conforms to multiple profiles, similar to
264	   the major brand concept in the ISO base media file format (ISOBMFF)
265	   [ISOBMFF] and file formats derived based on ISOBMFF, such as the
266	   3GPP  file  format  [3GP].    The  profile,  tier  and  level  syntax
267	   structure also includes the indications of whether the bitstream is
268	   free of frame-packed content, whether the bitstream is free of
269	   interlaced source content and free of field pictures, i.e., contains
270	   only frame pictures of progressive source, such that clients/players
271	   with no support of post-processing functionalities for handling of
272	   frame-packed or interlaced source content or field pictures can
273	   reject those bitstreams.

275	   Bitstream and elementary stream

277	   HEVC includes a definition of an elementary stream, which is new
278	   compared to H.264.  An elementary stream consists of a sequence of
279	   one or more bitstreams.  An elementary stream that consists of two
280	   or more bitstreams has typically been formed by splicing together
281	   two or more bitstreams (or parts thereof).  When an elementary
282	   stream contains more than one bitstream, the last NAL unit of the
283	   last access unit of a bitstream (except the last bitstream in the
284	   elementary stream) must contain an end of bitstream NAL unit and the
285	   first access unit of the subsequent bitstream must be an intra
286	   random access point (IRAP) access unit.  This IRAP access unit may
287	   be a clean random access (CRA), broken link access (BLA), or
288	   instantaneous decoding refresh (IDR) access unit.

290	   Random access support

292	   HEVC includes signaling in NAL unit header, through NAL unit types,
293	   of IRAP pictures beyond IDR pictures.  Three types of IRAP pictures,
294	   namely IDR, CRA and BLA pictures are supported, wherein IDR pictures
295	   are conventionally referred to as closed group-of-pictures (closed-
296	   GOP) random access points, and CRA and BLA pictures are those
297	   conventionally referred to as open-GOP random access points.  BLA
298	   pictures usually originate from splicing of two bitstreams or part
299	   thereof at a CRA picture, e.g. during stream switching.  To enable
300	   better systems usage of IRAP pictures, altogether six different NAL
301	   units are defined to signal the properties of the IRAP pictures,
302	   which can be used to better match the stream access point (SAP)
303	   types as defined in the ISOBMFF [ISOBMFF], which are utilized for
304	   random access support in both 3GP-DASH [3GPDASH] and MPEG DASH
305	   [MPEGDASH].  Pictures following an IRAP picture in decoding order
306	   and preceding the IRAP picture in output order are referred to as
307	   leading pictures associated with the IRAP picture.  There are two
308	   types of leading pictures, namely random access decodable leading
309	   (RADL) pictures and random access skipped leading (RASL) pictures.
310	   RADL  pictures  are  decodable  when  the  decoding  started  at  the
311	   associated IRAP picture, and RASL pictures are not decodable when
312	   the decoding started at the associated IRAP picture and are usually
313	   discarded.  HEVC provides mechanisms to enable the specification of
314	   conformance of bitstreams with RASL pictures being discarded, thus
315	   to provide a standard-compliant way to enable systems components to
316	   discard RASL pictures when needed.

318	   Temporal scalability support

320	   HEVC  includes  an  improved  support  of  temporal  scalability,  by
321	   inclusion of the signaling of TemporalId in the NAL unit header, the
322	   restriction that pictures of a particular temporal sub-layer cannot
323	   be used for inter prediction reference by pictures of a higher
324	   temporal sub-layer, the sub-bitstream extraction process, and the
325	   requirement  that  each  sub-bitstream  extraction  output  be  a
326	   conforming bitstream.  Media-aware network elements (MANEs) can
327	   utilize the TemporalId in the NAL unit header for stream adaptation
328	   purposes based on temporal scalability.

330	   Temporal sub-layer switching support

332	   HEVC specifies, through NAL unit types present in the NAL unit
333	   header,  the  signaling  of  temporal  sub-layer  access  (TSA)  and
334	   stepwise temporal sub-layer access (STSA).  A TSA picture and
335	   pictures following the TSA picture in decoding order do not use
336	   pictures prior to the TSA picture in decoding order with TemporalId
337	   greater  than  or  equal  to  that  of  the  TSA  picture  for  inter
338	   prediction reference.  A TSA picture enables up-switching, at the
339	   TSA picture, to the sub-layer containing the TSA picture or any
340	   higher sub-layer, from the immediately lower sub-layer.  An STSA
341	   picture does not use pictures with the same TemporalId as the STSA
342	   picture for inter prediction reference.  Pictures following an STSA
343	   picture in decoding order with the same TemporalId as the STSA
344	   picture do not use pictures prior to the STSA picture in decoding
345	   order with the same TemporalId as the STSA picture for inter
346	   prediction reference.  An STSA picture enables up-switching, at the
347	   STSA picture, to the sub-layer containing the STSA picture, from the
348	   immediately lower sub-layer.

350	   Sub-layer reference or non-reference pictures

352	   The concept and signaling of reference/non-reference pictures in
353	   HEVC are different from H.264.  In H.264, if a picture may be used
354	   by any other picture for inter prediction reference, it is a
355	   reference picture; otherwise it is a non-reference picture, and this
356	   is signaled by two bits in the NAL unit header.  In HEVC, a picture
357	   is called a reference picture only when it is marked as "used for
358	   reference".  In addition, the concept of sub-layer reference picture
359	   was introduced.  If a picture may be used by another other picture
360	   with the same TemporalId for inter prediction reference, it is a
361	   sub-layer  reference  picture;  otherwise  it  is  a  sub-layer  non-
362	   reference picture.  Whether a picture is a sub-layer reference
363	   picture or sub-layer non-reference picture is signaled through NAL
364	   unit type values.

366	   Extensibility

368	   Besides the TemporalId in the NAL unit header, HEVC also includes
369	   the signaling of a six-bit layer ID in the NAL unit header, which
370	   must  be  equal  to  0  for  a  single-layer  bitstream.    Extension
371	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice
372	   headers, and so on.  All these extension mechanisms enable future
373	   extensions in a backward compatible manner, such that bitstreams
374	   encoded according to potential future HEVC extensions can be fed to
375	   then-legacy decoders (e.g. HEVC version 1 decoders) and the then-
376	   legacy decoders can decode and output the base layer bitstream.

378	   Bitstream extraction

380	   HEVC includes a bitstream extraction process as an integral part of
381	   the overall decoding process, as well as specification of the use of
382	   the  bitstream  extraction  process  in  description  of  bitstream
383	   conformance tests as part of the hypothetical reference decoder
384	   (HRD) specification.

386	   Reference picture management

388	   The  reference  picture  management  of  HEVC,  including  reference
389	   picture marking and removal from the decoded picture buffer (DPB) as
390	   well as reference picture list construction (RPLC), differs from
391	   that of H.264.  Instead of the sliding window plus adaptive memory
392	   management control operation (MMCO) based reference picture marking
393	   mechanism in H.264, HEVC specifies a reference picture set (RPS)
394	   based reference picture management and marking mechanism, and the
395	   RPLC is consequently based on the RPS mechanism.  A reference
396	   picture set consists of a set of reference pictures associated with
397	   a picture, consisting of all reference pictures that are prior to
398	   the associated picture in decoding order, that may be used for inter
399	   prediction of the associated picture or any picture following the
400	   associated picture in decoding order.  The reference picture set
401	   consists of five lists of reference pictures; RefPicSetStCurrBefore,
402	   RefPicSetStCurrAfter,    RefPicSetStFoll,    RefPicSetLtCurr    and
403	   RefPicSetLtFoll.    RefPicSetStCurrBefore,  RefPicSetStCurrAfter  and
404	   RefPicSetLtCurr contain all reference pictures that may be used in
405	   inter prediction of the current picture and that may be used in
406	   inter prediction of one or more of the pictures following the
407	   current   picture   in   decoding   order.      RefPicSetStFoll   and
408	   RefPicSetLtFoll consist of all reference pictures that are not used
409	   in inter prediction of the current picture but may be used in inter
410	   prediction of one or more of the pictures following the current
411	   picture in decoding order.  RPS provides an "intra-coded" signaling
412	   of the DPB status, instead of an "inter-coded" signaling, mainly for
413	   improved error resilience.  The RPLC process in HEVC is based on the
414	   RPS, by signaling an index to an RPS subset for each reference
415	   index.  The RPLC process has been simplified compared to that in
416	   H.264, by removal of the reference picture list modification (also
417	   referred to as reference picture list reordering) process.

419	   Ultra low delay support

421	   HEVC specifies a sub-picture-level HRD operation, for support of the
422	   so-called ultra-low delay.  The mechanism specifies a standard-
423	   compliant way to enable delay reduction below one picture interval.
424	   Sub-picture-level coded picture buffer (CPB) and DPB parameters may
425	   be signaled, and utilization of these information for the derivation
426	   of CPB timing (wherein the CPB removal time corresponds to decoding
427	   time) and DPB output timing (display time) is specified.  Decoders
428	   are allowed to operate the HRD at the conventional access-unit-
429	   level, even when the sub-picture-level HRD parameters are present.

431	   New SEI messages

433	   HEVC inherits many H.264 SEI messages with changes in syntax and/or
434	   semantics making them applicable to HEVC.  Additionally, there are a
435	   few new SEI messages reviewed briefly in the following paragraphs.

437	   The structure of pictures SEI message provides information on the
438	   NAL  unit  types,  picture  order  count  values,  and  prediction
439	   dependencies of a sequence of pictures.  The SEI message can be used
440	   for example for concluding what impact a lost picture has on other
441	   pictures.

443	   The decoded picture hash SEI message provides a checksum derived
444	   from the sample values of a decoded picture.  It can be used for
445	   detecting whether a picture was correctly received and decoded.

447	   The active parameter sets SEI message includes the IDs of the active
448	   video parameter set and the active sequence parameter set and can be
449	   used to activate VPSs and SPSs.  In addition, the SEI message
450	   includes the following indications: 1) An indication of whether
451	   "full  random  accessibility"  is  supported  (when  supported,  all
452	   parameter sets needed for decoding of the remaining of the bitstream
453	   when random accessing from the beginning of the current coded video
454	   sequence  by  completely  discarding  all  access  units  earlier  in
455	   decoding order are present in the remaining bitstream and all coded
456	   pictures in the remaining bitstream can be correctly decoded); 2) An
457	   indication of whether there is no parameter set within the current
458	   coded video sequence that updates another parameter set of the same
459	   type preceding in decoding order.  An update of a parameter set
460	   refers to the use of the same parameter set ID but with some other
461	   parameters changed.  If this property is true for all coded video
462	   sequences in the bitstream, then all parameter sets can be sent out-
463	   of-band before session start.

465	   The decoding unit information SEI message provides coded picture
466	   buffer removal delay information for a decoding unit.  The message
467	   can be used in very-low-delay buffering operations.

469	   The region refresh information SEI message can be used together with
470	   the recovery point SEI message (present in both H.264 and HEVC) for
471	   improved support of gradual decoding refresh (GDR).  This supports
472	   random access from inter-coded pictures, wherein complete pictures
473	   can be correctly decoded or recovered after an indicated number of
474	   pictures in output/display order.

476	1.1.3 Parallel Processing Support

478	   The reportedly significantly higher encoding computational demand of
479	   HEVC over H.264, in conjunction with the ever increasing video
480	   resolution (both spatially and temporally) required by the market,
481	   led to the adoption of VCL coding tools specifically targeted to
482	   allow for parallelization on the sub-picture level.  That is,
483	   parallelization occurs, at the minimum, at the granularity of an
484	   integer number of CTUs.  The targets for this type of high-level
485	   parallelization  are  multicore  CPUs  and  DSPs  as  well  as
486	   multiprocessor systems.  In a system design, to be useful, these
487	   tools require signaling support, which is provided in Section 7 of
488	   this memo.  This section provides a brief overview of the tools
489	   available in [HEVC].

491	   Many of the tools incorporated in HEVC were designed keeping in mind
492	   the potential parallel implementations in multi-core/multi-processor
493	   architectures.    Specifically,  for  parallelization,  four  picture
494	   partition strategies are available.

496	   Slices are segments of the bitstream that can be reconstructed
497	   independently from other slices within the same picture (though
498	   there  may  still  be  interdependencies  through  loop  filtering
499	   operations).  Slices are the only tool that can be used for
500	   parallelization that is also available, in virtually identical form,
501	   in H.264.  Slices based parallelization does not require much inter-
502	   processor or inter-core communication (except for inter-processor or
503	   inter-core data sharing for motion compensation when decoding a
504	   predictively coded picture, which is typically much heavier than
505	   inter-processor  or  inter-core  data  sharing  due  to  in-picture
506	   prediction), as slices are designed to be independently decodable.
507	   However,  for  the  same  reason,  slices  can  require  some  coding
508	   overhead.  Further, slices (in contrast to some of the other tools
509	   mentioned below) also serve as the key mechanism for bitstream
510	   partitioning to match Maximum Transfer Unit (MTU) size requirements,
511	   due to the in-picture independence of slices and the fact that each
512	   regular slice is encapsulated in its own NAL unit.  In many cases,
513	   the goal of parallelization and the goal of MTU size matching can
514	   place contradicting demands to the slice layout in a picture.  The
515	   realization of this situation led to the development of the more
516	   advanced tools mentioned below.  This payload format does not
517	   contain  any  specific  mechanisms  aiding  parallelization  through
518	   slices.

520	   Dependent slice segments allow for fragmentation of a coded slice
521	   into fragments at CTU boundaries without breaking any in-picture
522	   prediction mechanism.  They are complementary to the fragmentation
523	   mechanism described in this memo in that they need the cooperation
524	   of the encoder.  As a dependent slice segment necessarily contains
525	   an integer number of CTUs, a decoder using multiple cores operating
526	   on CTUs can process a dependent slice segment without communicating
527	   parts  of  the  slice  segment's  bitstream  to  other  cores.
528	   Fragmentation, as specified in this memo, in contrast, does not
529	   guarantee that a fragment contains an integer number of CTUs.

531	   In wavefront parallel processing (WPP), the picture is partitioned
532	   into rows of CTUs.  Entropy decoding and prediction are allowed to
533	   use data from CTUs in other partitions.  Parallel processing is
534	   possible through parallel decoding of CTU rows, where the start of
535	   the decoding of a row is delayed by two CTUs, so to ensure that data
536	   related to a CTU above and to the right of the subject CTU is
537	   available before the subject CTU is being decoded.  Using this
538	   staggered start (which appears like a wavefront when represented
539	   graphically),  parallelization  is  possible  with  up  to  as  many
540	   processors/cores as the picture contains CTU rows.

542	   Because in-picture prediction between neighboring CTU rows within a
543	   picture   is   allowed,   the   required   inter-processor/inter-core
544	   communication to enable in-picture prediction can be substantial.
545	   The WPP partitioning does not result in the creation of more NAL
546	   units compared to when it is not applied, thus WPP cannot be used
547	   for MTU size matching, though slices can be used in combination for
548	   that purpose.

550	   Tiles define horizontal and vertical boundaries that partition a
551	   picture into tile columns and rows.  The scan order of CTUs is
552	   changed to be local within a tile (in the order of a CTU raster scan
553	   of a tile), before decoding the top-left CTU of the next tile in the
554	   order of tile raster scan of a picture.  Similar to slices, tiles
555	   break in-picture prediction dependencies (including entropy decoding
556	   dependencies).  However, they do not need to be included into
557	   individual NAL units (same as WPP in this regard), hence tiles
558	   cannot be used for MTU size matching, though slices can be used in
559	   combination for that purpose.  Each tile can be processed by one
560	   processor/core,  and  the  inter-processor/inter-core  communication
561	   required for in-picture prediction between processing units decoding
562	   neighboring tiles is limited to conveying the shared slice header in
563	   cases a slice is spanning more than one tile, and loop filtering
564	   related sharing of reconstructed samples and metadata.  Insofar,
565	   tiles are less demanding in terms of inter-processor communication
566	   bandwidth compared to WPP due to the in-picture independence between
567	   two neighboring partitions.

569	1.1.4 NAL Unit Header

571	   HEVC maintains the NAL unit concept of H.264 with modifications.
572	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
573	   payload of a NAL unit refers to the NAL unit excluding the NAL unit
574	   header.

576	                     +---------------+---------------+
577	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
578	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
579	                     |F|   Type    |  LayerId  | TID |
580	                     +-------------+-----------------+

582	              Figure 1 The structure of HEVC NAL unit header

584	   The semantics of the fields in the NAL unit header are as specified
585	   in [HEVC] and described briefly below for convenience.  In addition
586	   to the name and size of each field, the corresponding syntax element
587	   name in [HEVC] is also provided.

589	   F: 1 bit
590	      forbidden_zero_bit.  MUST be zero.  HEVC declares a value of 1 as
591	      a syntax violation.  Note that the inclusion of this bit in the
592	      NAL unit header is to enable transport of HEVC video over MPEG-2
593	      transport systems (avoidance of start code emulations) [MPEG2S].

595	   Type: 6 bits
596	      nal_unit_type.  This field specifies the NAL unit type as defined
597	      in Table 7-1 of [HEVC].  If the most significant bit of this
598	      field of a NAL unit is equal to 0 (i.e. the value of this field
599	      is less than 32), the NAL unit is a VCL NAL unit.  Otherwise, the
600	      NAL unit is a non-VCL NAL unit.  For a reference of all currently
601	      defined NAL unit types and their semantics, please refer to
602	      Section 7.4.1 in [HEVC].

604	   LayerId: 6 bits
605	      nuh_layer_id.  MUST be equal to zero.  It is anticipated that in
606	      future  scalable  or  3D  video  coding  extensions  of  this
607	      specification, this syntax element will be used to identify
608	      additional  layers  that  may  be  present  in  the  coded  video
609	      sequence, wherein a layer may be, e.g. a spatial scalable layer,
610	      a quality scalable layer, a texture view, or a depth view.

612	   TID: 3 bits
613	      nuh_temporal_id_plus1.    This  field  specifies  the  temporal
614	      identifier of the NAL unit plus 1.  The value of TemporalId is
615	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
616	      there is at least one bit in the NAL unit header equal to 1, so
617	      to enable independent considerations of start code emulations in
618	      the NAL unit header and in the NAL unit payload data.

620	1.2. Overview of the Payload Format

622	   This payload format defines the following processes required for
623	   transport of HEVC coded data over RTP [RFC3550]:

625	   o Usage of RTP header with this payload format

627	   o Packetization of HEVC coded NAL units into RTP packets using three
628	     types of payload structures, namely single NAL unit packet,
629	     aggregation packet, and fragment unit

631	   o Transmission of HEVC NAL units of the same bitstream within a
632	     single RTP session or multiple RTP sessions

634	   o Media type parameters to be used with the Session Description
635	     Protocol (SDP) [RFC4566]

637	2. Conventions

639	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
640	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
641	   document are to be interpreted as described in BCP 14, RFC 2119
642	   [RFC2119].

644	   This specification uses the notion of setting and clearing a bit
645	   when bit fields are handled.  Setting a bit is the same as assigning
646	   that bit the value of 1 (On).  Clearing a bit is the same as
647	   assigning that bit the value of 0 (Off).

649	3. Definitions and Abbreviations

651	3.1 Definitions

653	   This document uses the terms and definitions of [HEVC].  Section
654	   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
655	   Section 3.1.2 gives definitions specific to this memo.

657	3.1.1 Definitions from the HEVC Specification

659	   access unit: A set of NAL units that are associated with each other
660	   according to a specified classification rule, are consecutive in
661	   decoding order, and contain exactly one coded picture.

663	   BLA access unit: An access unit in which the coded picture is a BLA
664	   picture.

666	   BLA picture: An IRAP picture for which each VCL NAL unit has
667	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

669	   coded video sequence: A sequence of access units that consists, in
670	   decoding order, of an IRAP access unit with NoRaslOutputFlag equal
671	   to 1, followed by zero or more access units that are not IRAP access
672	   units with NoRaslOutputFlag equal to 1, including all subsequent
673	   access units up to but not including any subsequent access unit that
674	   is an IRAP access unit with NoRaslOutputFlag equal to 1.

676	      Informative note: An IRAP access unit may be an IDR access unit,
677	      a  BLA  access  unit,  or  a  CRA  access  unit.    The  value  of
678	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
679	      access unit, and each CRA access unit that is the first access
680	      unit in the bitstream in decoding order, is the first access unit
681	      that follows an end of sequence NAL unit in decoding order, or
682	      has HandleCraAsBlaFlag equal to 1.

684	   CRA access unit: An access unit in which the coded picture is a CRA
685	   picture.

687	   CRA  picture:  A  RAP  picture  for  which  each  VCL  NAL  unit  has
688	   nal_unit_type equal to CRA_NUT.

690	   IDR access unit: An access unit in which the coded picture is an IDR
691	   picture.

693	   IDR  picture:  A  RAP  picture  for  which  each  VCL  NAL  unit  has
694	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

696	   IRAP access unit: An access unit in which the coded picture is an
697	   IRAP picture.

699	   IRAP picture: A coded picture for which each VCL NAL unit has
700	   nal_unit_type in the range of BLA_W_LP to RSV_IRAP_VCL23, inclusive.

702	   layer: A set of VCL NAL units that all have a particular value of
703	   nuh_layer_id and the associated non-VCL NAL units, or one of a set
704	   of syntactical structures having a hierarchical relationship.

706	   operation  point:  bitstream  created  from  another  bitstream  by
707	   operation of the sub-bitstream extraction process with the another
708	   bitstream,  a  target  highest  TemporalId,  and  a  target  layer
709	   identifier list as inputs.

711	   random access: The act of starting the decoding process for a
712	   bitstream at a point other than the beginning of the stream.

714	   sub-layer:  A  temporal  scalable  layer  of  a  temporal  scalable
715	   bitstream consisting of VCL NAL units with a particular value of the
716	   TemporalId variable, and the associated non-VCL NAL units.

718	   tile: A rectangular region of coding tree blocks within a particular
719	   tile column and a particular tile row in a picture.

721	   tile column: A rectangular region of coding tree blocks having a
722	   height equal to the height of the picture and a width specified by
723	   syntax elements in the picture parameter set.

725	   tile row: A rectangular region of coding tree blocks having a height
726	   specified by syntax elements in the picture parameter set and a
727	   width equal to the width of the picture.

729	3.1.2 Definitions Specific to This Memo

731	   media aware network element (MANE): A network element, such as a
732	   middlebox or application layer gateway that is capable of parsing
733	   certain aspects of the RTP payload headers or the RTP payload and
734	   reacting to their contents.

736	      Informative note: The concept of a MANE goes beyond normal
737	      routers or gateways in that a MANE has to be aware of the
738	      signaling (e.g., to learn about the payload type mappings of the
739	      media streams), and in that it has to be trusted when working
740	      with SRTP.  The advantage of using MANEs is that they allow
741	      packets to be dropped according to the needs of the media coding.
742	      For example, if a MANE has to drop packets due to congestion on a
743	      certain link, it can identify and remove those packets whose
744	      elimination  produces  the  least  adverse  effect  on  the  user
745	      experience.  After dropping packets, MANEs must rewrite RTCP
746	      packets  to  match  the  changes  to  the  RTP  packet  stream  as
747	      specified in Section 7 of [RFC3550].

749	   NAL unit decoding order: A NAL unit order that conforms to the
750	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

752	   NALU-time: The value that the RTP timestamp would have if the NAL
753	   unit would be transported in its own RTP packet.

755	   RTP  packet  stream:  A  sequence  of  RTP  packets  with  increasing
756	   sequence  numbers  (except  for  wrap-around),  identical  PT  and
757	   identical SSRC (Synchronization Source), carried in one RTP session.
758	   Within the scope of this memo, one RTP packet stream is utilized to
759	   transport one or more temporal sub-layers.

761	   transmission order: The order of packets in ascending RTP sequence
762	   number order (in modulo arithmetic).  Within an aggregation packet,
763	   the NAL unit transmission order is the same as the order of
764	   appearance of NAL units in the packet.

766	   base session: an RTP session in Multi-Session Transmission mode that
767	   transports a bitstream subset which the rest of RTP sessions in the
768	   Multi-Session Transmission depends on. [Ed. (YK): Check the need of
769	   this definition after the draft is more complete.]

771	3.2 Abbreviations

773	   AP       Aggregation Packet

775	   BLA      Broken Link Access

777	   CRA      Clean Random Access

779	   CTB      Coding Tree Block

781	   CTU      Coding Tree Unit

783	   CVS      Coded Video Sequence

785	   FU       Fragmentation Unit

787	   GDR      Gradual Decoding Refresh

789	   HRD      Hypothetical Reference Decoder

791	   IDR      Instantaneous Decoding Refresh

793	   IRAP     Intra Random Access Point

795	   MANE     Media Aware Network Element

797	   MST      Multi-Session Transmission

799	   MTU      Maximum Transfer Unit

801	   NAL      Network Abstraction Layer
802	   NALU     Network Abstraction Layer Unit

804	   PPS      Picture Parameter Set

806	   RADL     Random Access Decodable Leading (Picture)

808	   RASL     Random Access Skipped Leading (Picture)

810	   RPS      Reference Picture Set

812	   SEI      Supplemental Enhancement Information

814	   SPS      Sequence Parameter Set

816	   SST      Single-Session Transmission

818	   STSA     Step-wise Temporal Sub-layer Access

820	   TSA      Temporal Sub-layer Access

822	   VCL      Video Coding Layer

824	   VPS      Video Parameter Set

826	4. RTP Payload Format

828	4.1 RTP Header Usage

830	   The format of the RTP header is specified in [RFC3550] and reprinted
831	   in Figure 2 for convenience.  This payload format uses the fields of
832	   the header in a manner consistent with that specification.

834	   The RTP payload (and the settings for some RTP header bits) for
835	   aggregation  packets  and  fragmentation  units  are  specified  in
836	   Sections 4.7 and 4.8, respectively.

838	    0                   1                   2                   3
839	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
840	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
841	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
842	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
843	   |                           timestamp                           |
844	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
845	   |           synchronization source (SSRC) identifier            |
846	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
847	   |            contributing source (CSRC) identifiers             |
848	   |                             ....                              |
849	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

851	                Figure 2 RTP header according to [RFC3550]

853	   The RTP header information to be set according to this RTP payload
854	   format is set as follows:

856	   Marker bit (M): 1 bit

858	      Set for the last packet of the access unit indicated by the RTP
859	      timestamp, in line with the normal use of the M bit in video
860	      formats, to allow an efficient playout buffer handling.  Decoders
861	      can use this bit as an early indication of the last packet of an
862	      access unit.

864	         Informative note: The content of a NAL unit does not tell
865	         whether or not the NAL unit is the last NAL unit, in decoding
866	         order, of an access unit.  An RTP sender implementation may
867	         obtain this information from the video encoder.  If, however,
868	         the implementation cannot obtain this information directly
869	         from the encoder, e.g., when the stream was pre-encoded, and
870	         also there is no timestamp allocated for each NAL unit, then
871	         the sender implementation can inspect subsequent NAL units in
872	         decoding order to determine whether or not the NAL unit is the
873	         last NAL unit of an access unit as follows.  A NAL unit naluX
874	         is the last NAL unit of an access unit if it is the last NAL
875	         unit of the stream or the next VCL NAL unit naluY in decoding
876	         order has the high-order bit of the first byte after its NAL
877	         unit header equal to 1, and all NAL units between naluX and
878	         naluY, when present, have nal_unit_type in the range of 32 to
879	         35, inclusive, equal to 39, or in the ranges of 41 to 44,
880	         inclusive, or 48 to 55, inclusive.

882	   Payload type (PT): 7 bits

884	      The assignment of an RTP payload type for this new packet format
885	      is outside the scope of this document and will not be specified
886	      here.  The assignment of a payload type has to be performed
887	      either through the profile used or in a dynamic way.

889	   Sequence number (SN): 16 bits

891	      Set and used in accordance with RFC 3550.

893	   Timestamp: 32 bits

895	      The RTP timestamp is set to the sampling timestamp of the
896	      content.  A 90 kHz clock rate MUST be used.

898	      If the NAL unit has no timing properties of its own (e.g.,
899	      parameter set and SEI NAL units), the RTP timestamp is set to the
900	      RTP timestamp of the coded picture of the access unit in which
901	      the NAL unit is included, according to Section 7.4.2.4.4 of
902	      [HEVC].

904	      Receivers SHOULD ignore the picture output timing information in
905	      any picture timing SEI messages or decoding unit information SEI
906	      messages as specified in [HEVC].  Instead, receivers SHOULD use
907	      the RTP timestamp for the display process.  Receivers MUST pass
908	      picture timing SEI messages and decoding unit information SEI
909	      messages to the decoder and MAY use the field/frame related
910	      information for the display process e.g. when frame doubling or
911	      frame  tripling  is  indicated  by  the  field/frame  related
912	      information.

914	4.2 Payload Header Usage

916	   The TID value indicates (among other things) the relative importance
917	   of an RTP packet, for example because NAL units belonging to higher
918	   temporal sub-layers are not used for the decoding of lower temporal
919	   sub-layers.  A lower value of TID indicates a higher importance.

921	   More  important  NAL  units  MAY  be  better  protected  against
922	   transmission losses than less important NAL units.

924	4.3 Payload Structures

926	   The first two bytes of the payload of an RTP packet are referred to
927	   as the payload header.  The payload header consists of the same
928	   fields (F, Type, LayerId, and TID) as the NAL unit header as shown
929	   in section 1.1.4, irrespective of the type of the payload structure.

931	   Three  different  types  of  RTP  packet  payload  structures  are
932	   specified.  A receiver can identify the type of an RTP packet
933	   payload through the Type field in the payload header.

935	   The three different payload structures are as follows:

937	   o  Single NAL unit packet: Contains a single NAL unit in the
938	      payload, and the NAL unit header of the NAL unit also serves as
939	      the payload header.  This payload structure is specified in
940	      section 4.6.

942	   o  Aggregation packet (AP): Contains more than one NAL unit within
943	      one  access  unit.    This  payload  structure  is  specified  in
944	      section 4.7.

946	   o  Fragmentation unit (FU): Contains a subset of a single NAL unit.
947	      This payload structure is specified in section 4.8.

949	4.4 Transmission Modes

951	   This memo enables transmission of an HEVC bitstream over a single
952	   RTP session or multiple RTP sessions.  The concept and working
953	   principle is inherited from [RFC6190] and follows a similar design.
954	   If only one RTP session is used for transmission of the HEVC
955	   bitstream, the transmission mode is referred to as single-session
956	   transmission (SST); otherwise (more than one RTP session is used for
957	   transmission  of  the  HEVC  bitstream),  the  transmission  mode  is
958	   referred to as multi-session transmission (MST).

960	   [Ed. (YK): Unify the style of abbreviated words throughout the
961	   document.]
962	   SST SHOULD be used for point-to-point unicast scenarios, while MST
963	   SHOULD be used for point-to-multipoint multicast scenarios where
964	   different receivers require different operation points of the same
965	   HEVC bitstream, to improve bandwidth utilizing efficiency.

967	      Informative note: A multicast may degrade to a unicast after all
968	      but one receivers have left (this is a justification of the first
969	      "SHOULD" instead of "MUST"), and there might be scenarios where
970	      MST is desirable but not possible e.g. when IP multicast is not
971	      deployed in certain network (this is a justification of the
972	      second "SHOULD" instead of "MUST").

974	   The transmission mode is indicated by the tx-mode media parameter
975	   (see section 7.1).  If tx-mode is equal to "SST", SST MUST be used.
976	   Otherwise (tx-mode is equal to "MST"), MST MUST be used.

978	4.5 Decoding Order Number

980	   For each NAL unit, the variable AbsDon is derived, representing the
981	   decoding order number that is indicative of the NAL unit decoding
982	   order.

984	   Let NAL unit n be the n-th NAL unit in transmission order within an
985	   RTP session.

987	   If tx-mode is equal to "SST" and sprop-depack-buf-nalus is equal
988	   to 0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as
989	   equal to n.

991	   Otherwise (tx-mode is equal to "MST" or sprop-depack-buf-nalus is
992	   greater than 0), AbsDon[n] is derived as follows, where DON[n] is
993	   the value of the variable DON for NAL unit n:

995	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
996	      transmission order), AbsDon[0] is set equal to DON[0].

998	   o  Otherwise (n is greater than 0), the following applies for
999	      derivation of AbsDon[n]:

1001	            If DON[n] == DON[n-1],
1002	                AbsDon[n] = AbsDon[n-1]

1004	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1005	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1007	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1008	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1010	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1011	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1013	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1014	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1016	   For any two NAL units m and n, the following applies:

1018	   o  AbsDon[n]  greater  than  AbsDon[m]  indicates  that  NAL  unit  n
1019	      follows NAL unit m in NAL unit decoding order.

1021	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1022	      of the two NAL units can be in either order.

1024	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1025	      NAL unit m in decoding order.

1027	   When two consecutive NAL units in the NAL unit decoding order have
1028	   different values of AbsDon, the value of AbsDon for the second NAL
1029	   unit in decoding order MUST be greater than the value of AbsDon for
1030	   the first NAL unit, and the absolute difference between the two
1031	   AbsDon values MAY be greater than or equal to 1.

1033	      Informative note: There are multiple reasons to allow for the
1034	      absolute difference of the values of AbsDon for two consecutive
1035	      NAL units in the NAL unit decoding order to be greater than one.
1036	      An  increment  by  one  is  not  required,  as  at  the  time  of
1037	      associating values of AbsDon to NAL units, it may not be known
1038	      whether all NAL units are to be delivered to the receiver.  For
1039	      example, a gateway may not forward VCL NAL units of higher sub-
1040	      layers or some SEI NAL units when there is congestion in the
1041	      network.  In another example, the first intra picture of a pre-
1042	      encoded clip is transmitted in advance to ensure that it is
1043	      readily available in the receiver, and when transmitting the
1044	      first intra picture, the originator does not exactly know how
1045	      many NAL units will be encoded before the first intra picture of
1046	      the pre-encoded clip follows in decoding order.  Thus, the values
1047	      of AbsDon for the NAL units of the first intra picture of the
1048	      pre-encoded clip have to be estimated when they are transmitted,
1049	      and gaps in values of AbsDon may occur.  Another example is MST
1050	      where the AbsDon values must indicate cross-layer decoding order
1051	      for NAL units conveyed in all the RTP sessions.

1053	4.6 Single NAL Unit Packets

1055	   A single NAL unit packet contains exactly one NAL unit, and consists
1056	   of a payload header (denoted as PayloadHdr), an optional 16-bit DONL
1057	   field (in network byte order), and the NAL unit payload data (the
1058	   NAL unit excluding its NAL unit header) of the contained NAL unit,
1059	   as shown in Figure 3.

1061	   0                   1                   2                   3
1062	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1063	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1064	   |           PayloadHdr          |        DONL (optional)        |
1065	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1066	   |                                                               |
1067	   |                  NAL unit payload data                        |
1068	   |                                                               |
1069	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1070	   |                               :...OPTIONAL RTP padding        |
1071	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1073	              Figure 3 The structure a single NAL unit packet

1075	   The payload header SHOULD be an exact copy of the NAL unit header of
1076	   the contained NAL unit.  However, the Type (i.e. nal_unit_type)
1077	   field MAY be changed, e.g. when it is desirable to handle a CRA
1078	   picture to be a BLA picture [JCTVC-J0107].

1080	   The DONL field, when present, specifies the value of the 16 least
1081	   significant bits of the decoding order number of the contained NAL
1082	   unit.

1084	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1085	   than 0, the DONL field MUST be present, and the variable DON for the
1086	   contained NAL unit is derived as equal to the value of the DONL
1087	   field.  Otherwise (tx-mode is equal to "SST" and sprop-depack-buf-
1088	   nalus is equal to 0), the DONL field MUST NOT be present.

1090	4.7 Aggregation Packets (APs)

1092	   Aggregation packets (APs) are introduced to enable the reduction of
1093	   packetization overhead for small NAL units, such as most of the non-
1094	   VCL NAL units, which are often only a few octets in size.

1096	   An AP aggregates NAL units within one access unit.  Each NAL unit to
1097	   be carried in an AP is encapsulated in an aggregation unit.  NAL
1098	   units aggregated in one AP are in NAL unit decoding order.

1100	   An AP consists of a payload header (denoted as PayloadHdr) followed
1101	   by two or more aggregation units, as shown in Figure 4.

1103	   0                   1                   2                   3
1104	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1105	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1106	   |           PayloadHdr          |                               |
1107	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1108	   |                                                               |
1109	   |             one or more aggregation units                     |
1110	   |                                                               |
1111	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1112	   |                               :...OPTIONAL RTP padding        |
1113	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1115	              Figure 4 The structure of an aggregation packet

1117	   The fields in the payload header are set as follows.  The F bit MUST
1118	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
1119	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be
1120	   equal to 48.  The value of LayerId MUST be equal to the lowest value
1121	   of LayerId of all the aggregated NAL units.  The value of TID MUST
1122	   be the lowest value of TID of all the aggregated NAL units.

1124	      Informative Note: All VCL NAL units in an AP have the same TID
1125	      value since they belong to the same access unit.  However, an AP
1126	      may contain non-VCL NAL units for which the TID value in the NAL
1127	      unit header may be different than the TID value of the VCL NAL
1128	      units in the same AP.

1130	   An AP MUST carry at least two aggregation units and can carry as
1131	   many aggregation units as necessary; however, the total amount of
1132	   data in an AP obviously MUST fit into an IP packet, and the size
1133	   SHOULD be chosen so that the resulting IP packet is smaller than the
1134	   MTU size so to avoid IP layer fragmentation.  An AP MUST NOT contain
1135	   Fragmentation Units (FUs) specified in section 4.8.  APs MUST NOT be
1136	   nested; i.e., an AP MUST NOT contain another AP.

1138	   The first aggregation unit in an AP consists of an optional 16-bit
1139	   DONL field (in network byte order) followed by a 16-bit unsigned
1140	   size information (in network byte order) that indicates the size of
1141	   the NAL unit in bytes (excluding these two octets, but including the
1142	   NAL unit header), followed by the NAL unit itself, including its NAL
1143	   unit header, as shown in Figure 5.

1145	   0                   1                   2                   3
1146	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1147	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1148	                   :        DONL (optional)        |   NALU size   |
1149	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1150	   |   NALU size   |                                               |
1151	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1152	   |                                                               |
1153	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1154	   |                               :
1155	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1157	       Figure 5 The structure of the first aggregation unit in an AP

1159	   The DONL field, when present, specifies the value of the 16 least
1160	   significant bits of the decoding order number of the aggregated NAL
1161	   unit.

1163	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1164	   than 0, the DONL field MUST be present in an aggregation unit that
1165	   is the first aggregation unit in an AP, and the variable DON for the
1166	   aggregated NAL unit is derived as equal to the value of the DONL
1167	   field.  Otherwise (tx-mode is equal to "SST" and sprop-depack-buf-
1168	   nalus is equal to 0), the DONL field MUST NOT be present in an
1169	   aggregation unit that is the first aggregation unit in an AP.

1171	   An aggregation unit that is not the first aggregation unit in an AP
1172	   consists of an optional 8-bit DOND field followed by a 16-bit
1173	   unsigned size information (in network byte order) that indicates the
1174	   size of the NAL unit in bytes (excluding these two octets, but
1175	   including the NAL unit header), followed by the NAL unit itself,
1176	   including its NAL unit header, as shown in Figure 6.

1178	   0                   1                   2                   3
1179	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1180	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1181	                   : DOND(optional)|          NALU size            |
1182	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1183	   |                                                               |
1184	   |                       NAL unit                                |
1185	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1186	   |                               :
1187	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1189	    Figure 6 The structure of an aggregation unit that is not the first
1190	                         aggregation unit in an AP

1192	   When present, the DOND field plus 1 specifies the difference between
1193	   the decoding order number values of the current aggregated NAL unit
1194	   and the preceding aggregated NAL unit in the same AP.

1196	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1197	   than 0, the DOND field MUST be present in an aggregation unit that
1198	   is not the first aggregation unit in an AP, and the variable DON for
1199	   the aggregated NAL unit is derived as equal to the DON of the
1200	   preceding aggregated NAL unit in the same AP plus the value of the
1201	   DOND field plus 1 modulo 65536.  Otherwise (tx-mode is equal to
1202	   "SST" and sprop-depack-buf-nalus is equal to 0), the DOND field MUST
1203	   NOT be present in an aggregation unit that is not the first
1204	   aggregation unit in an AP.

1206	   Figure 7 presents an example of an AP that contains two aggregation
1207	   units, labeled as 1 and 2 in the figure, without the DONL and DOND
1208	   fields being present.

1210	    0                   1                   2                   3
1211	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1212	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1213	   |                          RTP Header                           |
1214	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1215	   |           PayloadHdr          |         NALU 1 Size           |
1216	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1217	   |          NALU 1 HDR           |                               |
1218	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1219	   |                   . . .                                       |
1220	   |                                                               |
1221	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1222	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1223	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1224	   | NALU 2 HDR    |                                               |
1225	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1226	   |                   . . .                                       |
1227	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1228	   |                               :...OPTIONAL RTP padding        |
1229	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1231	   Figure 7 An example of an AP packet containing two aggregation units
1232	                     without the DONL and DOND fields

1234	   Figure 8 presents an example of an AP that contains two aggregation
1235	   units, labeled as 1 and 2 in the figure, with the DONL and DOND
1236	   fields being present.

1238	    0                   1                   2                   3
1239	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1240	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1241	   |                          RTP Header                           |
1242	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1243	   |           PayloadHdr          |        NALU 1 DONL            |
1244	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1245	   |          NALU 1 Size          |            NALU 1 HDR         |
1246	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1247	   |                                                               |
1248	   |                 NALU 1 Data   . . .                           |
1249	   |                                                               |
1250	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1251	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1252	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1253	   |          NALU 2 HDR           |                               |
1254	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1255	   |                                                               |
1256	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1257	   |                               :...OPTIONAL RTP padding        |
1258	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1260	    Figure 8 An example of an AP containing two aggregation units with
1261	                         the DONL and DOND fields

1263	4.8 Fragmentation Units (FUs)

1265	   Fragmentation units (FUs) are introduced to enable fragmenting a
1266	   single  NAL  unit  into  multiple  RTP  packets,  possibly  without
1267	   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
1268	   unit consists of an integer number of consecutive octets of that NAL
1269	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1270	   order with ascending RTP sequence numbers (with no other RTP packets
1271	   within the same RTP packet stream being sent between the first and
1272	   last fragment).

1274	   When a NAL unit is fragmented and conveyed within FUs, it is
1275	   referred to as a fragmented NAL unit.  APs MUST NOT be fragmented.
1276	   FUs MUST NOT be nested; i.e., an FU MUST NOT contain a subset of
1277	   another FU.

1279	   The RTP timestamp of an RTP packet carrying an FU is set to the
1280	   NALU-time of the fragmented NAL unit.

1282	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1283	   header of one octet, an optional 16-bit DONL field (in network byte
1284	   order), and an FU payload, as shown in Figure 9.

1286	    0                   1                   2                   3
1287	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1288	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1289	   |          PayloadHdr           |   FU header   | DONL(optional)|
1290	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1291	   | DONL(optional)|                                               |
1292	   |-+-+-+-+-+-+-+-+                                               |
1293	   |                         FU payload                            |
1294	   |                                                               |
1295	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1296	   |                               :...OPTIONAL RTP padding        |
1297	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1299	                      Figure 9 The structure of an FU

1301	   The fields in the payload header are set as follows.  The Type field
1302	   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal
1303	   to the fields F, LayerId, and TID, respectively, of the fragmented
1304	   NAL unit.

1306	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1307	   field, as shown in Figure 10.

1309	                             +---------------+
1310	                             |0|1|2|3|4|5|6|7|
1311	                             +-+-+-+-+-+-+-+-+
1312	                             |S|E|  FuType  |
1313	                             +---------------+

1315	                  Figure 10   The structure of FU header

1317	   The semantics of the FU header fields are as follows:
1318	   S: 1 bit
1319	      When set to one, the S bit indicates the start of a fragmented
1320	      NAL unit i.e., the first byte of the FU payload is also the first
1321	      byte of the payload of the fragmented NAL unit.  When the FU
1322	      payload is not the start of the fragmented NAL unit payload, the
1323	      S bit MUST be set to zero.

1325	   E: 1 bit
1326	      When set to one, the E bit indicates the end of a fragmented NAL
1327	      unit, i.e., the last byte of the payload is also the last byte of
1328	      the fragmented NAL unit.  When the FU payload is not the last
1329	      fragment of a fragmented NAL unit, the E bit MUST be set to zero.

1331	   FuType: 6 bits
1332	      The field FuType MUST be equal to the field Type of the
1333	      fragmented NAL unit.

1335	   The DONL field, when present, specifies the value of the 16 least
1336	   significant bits of the decoding order number of the fragmented NAL
1337	   unit.

1339	   If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1340	   than 0, and the S bit is equal to 1, the DONL field MUST be present
1341	   in the FU, and the variable DON for the fragmented NAL unit is
1342	   derived as equal to the value of the DONL field.  Otherwise (tx-mode
1343	   is equal to "SST" and sprop-depack-buf-nalus is equal to 0, or the S
1344	   bit is equal to 0), the DONL field MUST NOT be present in the FU.

1346	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.,
1347	   the Start bit and End bit MUST NOT both be set to one in the same FU
1348	   header.

1350	   The  FU  payload  consists  of  fragments  of  the  payload  of  the
1351	   fragmented NAL unit so that if the FU payloads of consecutive FUs,
1352	   starting with an FU with the S bit equal to 1 and ending with an FU
1353	   with the E bit equal to 1, are sequentially concatenated, the
1354	   payload of the fragmented NAL unit can be reconstructed.  The NAL
1355	   unit header of the fragmented NAL unit is not included as such in
1356	   the FU payload, but rather the information of the NAL unit header of
1357	   the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
1358	   the FU payload headers of the FUs and the FuType field of the FU
1359	   header of the FUs.  An FU payload MAY have any number of octets and
1360	   MAY be empty.

1362	      Informative note: Empty FU payloads are allowed to reduce the
1363	      latency  of  a  certain  class  of  senders  in  nearly  lossless
1364	      environments.  These senders can be characterized in that they
1365	      packetize  fragments  of  a  NAL  unit  before  the  NAL  unit  is
1366	      completely generated and, hence, before the NAL unit size is
1367	      known.  If zero-length FU payloads were not allowed, the sender
1368	      would have to generate at least one bit of data of the following
1369	      fragment of the NAL unit before the current FU could be sent.
1370	      Due to the characteristics of HEVC, where sometimes several CTUs
1371	      occupy  zero  bits,  this  is  undesirable  and  can  add  delay.
1372	      However, the (potential) use of zero-length FU payloads should be
1373	      carefully weighted against the increased risk of the loss of at
1374	      least a part of the fragmented NAL unit because of the additional
1375	      packets employed for its transmission.

1377	   If  an  FU  is  lost,  the  receiver  SHOULD  discard  all  following
1378	   fragmentation units in transmission order corresponding to the same
1379	   fragmented NAL unit, unless the decoder in the receiver is known to
1380	   be prepared to gracefully handle incomplete NAL units.

1382	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1383	   fragments of a NAL unit to an (incomplete) NAL unit, even if
1384	   fragment n of that NAL unit is not received.  In this case, the
1385	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1386	   syntax violation.

1388	5. Packetization Rules

1390	   The following packetization rules apply:

1392	   o  If tx-mode is equal to "MST" or sprop-depack-buf-nalus is greater
1393	      than 0 for an RTP session, the transmission order of NAL units
1394	      carried in the RTP session MAY be different than the NAL unit
1395	      decoding order.  Otherwise (tx-mode is equal to "SST" and sprop-
1396	      depack-buf-nalus  is  equal  to  0  for  an  RTP  session),  the
1397	      transmission order of NAL units carried in the RTP session MUST
1398	      be the same as the NAL unit decoding order.

1400	   o  A  NAL  unit  of  a  small  size  SHOULD  be  encapsulated  in  an
1401	      aggregation packet together with one or more other NAL units in
1402	      order to avoid the unnecessary packetization overhead for small
1403	      NAL units.  For example, non-VCL NAL units such as access unit
1404	      delimiters, parameter sets, or SEI NAL units are typically small
1405	      and can often be aggregated with VCL NAL units without violating
1406	      MTU size constraints.

1408	   o  Each non-VCL NAL unit SHOULD be encapsulated in an aggregation
1409	      packet together with its associated VCL NAL unit, as typically a
1410	      non-VCL NAL unit would be meaningless without the associated VCL
1411	      NAL unit being available.

1413	   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
1414	      unit packet MUST be used.

1416	6. De-packetization Process

1418	   The general concept behind de-packetization is to get the NAL units
1419	   out of the RTP packets in an RTP session and all the dependent RTP
1420	   sessions, if any, and pass them to the decoder in the NAL unit
1421	   decoding order.

1423	   The   de-packetization   process   is   implementation   dependent.
1424	   Therefore, the following description should be seen as an example of
1425	   a suitable implementation.  Other schemes may be used as well as
1426	   long as the output for the same input is the same as the process
1427	   described below.  The output is the same when the set of NAL units
1428	   and their order are both identical.  Optimizations relative to the
1429	   described algorithms are possible.

1431	   All normal RTP mechanisms related to buffer management apply.  In
1432	   particular, duplicated or outdated RTP packets (as indicated by the
1433	   RTP sequences number and the RTP timestamp) are removed.  To
1434	   determine the exact time for decoding, factors such as a possible
1435	   intentional delay to allow for proper inter-stream synchronization
1436	   must be factored in.

1438	   NAL units with NAL unit type values in the range of 0 to 47,
1439	   inclusive may be passed to the decoder.  NAL-unit-like structures
1440	   with NAL unit type values in the range of 48 to 63, inclusive, MUST
1441	   NOT be passed to the decoder.

1443	   The receiver includes a receiver buffer, which is used to compensate
1444	   for  transmission  delay  jitter,  to  reorder  NAL  units  from
1445	   transmission order to the NAL unit decoding order, and to recover
1446	   the NAL unit decoding order in MST, when applicable.  In this
1447	   section, the receiver operation is described under the assumption
1448	   that there is no transmission delay jitter.  To make a difference
1449	   from a practical receiver buffer that is also used for compensation
1450	   of transmission delay jitter, the receiver buffer is here after
1451	   called the de-packetization buffer in this section.  Receivers
1452	   SHOULD also prepare for transmission delay jitter; i.e., either
1453	   reserve separate buffers for transmission delay jitter buffering and
1454	   de-packetization  buffering  or  use  a  receiver  buffer  for  both
1455	   transmission delay jitter and de-packetization.  Moreover, receivers
1456	   SHOULD take transmission delay jitter into account in the buffering
1457	   operation; e.g., by additional initial buffering before starting of
1458	   decoding and playback.

1460	   There are two buffering states in the receiver: initial buffering
1461	   and buffering while playing.  Initial buffering starts when the
1462	   reception is initialized.  After initial buffering, decoding and
1463	   playback are started, and the buffering-while-playing mode is used.

1465	   Regardless of the buffering state, the receiver stores incoming NAL
1466	   units, in reception order, into the de-packetization buffer.  NAL
1467	   units carried in single NAL unit packets, APs, and FUs are stored in
1468	   the de-packetization buffer individually, and the value of AbsDon is
1469	   calculated and stored for each NAL unit.  When MST is in use, NAL
1470	   units  of  all  RTP  packet  streams  are  stored  in  the  same  de-
1471	   packetization buffer.

1473	   Initial buffering lasts until condition A (the number of NAL units
1474	   in the de-packetization buffer is greater than the value of sprop-
1475	   depack-buf-nalus of the highest RTP session) is true.

1477	   After initial buffering, whenever condition A is true, the following
1478	   operation is repeatedly applied until condition A becomes false:

1480	   o  The NAL unit in the de-packetization buffer with the smallest
1481	      value of AbsDon is removed from the de-packetization buffer and
1482	      passed to the decoder.

1484	   When no more NAL units are flowing into the de-packetization buffer,
1485	   all NAL units remaining in the de-packetization buffer are removed
1486	   from the buffer and passed to the decoder in the order of increasing
1487	   AbsDon values.

1489	7. Payload Format Parameters

1491	   This section specifies the parameters that MAY be used to select
1492	   optional features of the payload format and certain features or
1493	   properties of the bitstream.  The parameters are specified here as
1494	   part of the media type registration for the HEVC codec.  A mapping
1495	   of  the  parameters  into  the  Session  Description  Protocol  (SDP)
1496	   [RFC4566]  is  also  provided  for  applications  that  use  SDP.
1497	   Equivalent  parameters  could  be  defined  elsewhere  for  use  with
1498	   control protocols that do not use SDP.

1500	7.1 Media Type Registration

1502	   The media subtype for the HEVC codec is allocated from the IETF
1503	   tree.

1505	   The receiver MUST ignore any unspecified parameter.

1507	   Media Type name:     video

1509	   Media subtype name:  H265
1510	   Required parameters: none

1512	   OPTIONAL parameters:

1514	      In the following definitions of parameters, "the stream" or "the
1515	      NAL unit stream" refers to all NAL units conveyed in the current
1516	      RTP session in SST, and all NAL units conveyed in the current RTP
1517	      session and all NAL units conveyed in other RTP sessions that the
1518	      current RTP session depends on in MST.

1520	      profile-space, profile-id:

1522	         The  profile-space  parameter  indicates  the  context  for
1523	         interpretation  of  the  profile-id  parameter  value.    The
1524	         profile, which specifies the subset of coding tools that may
1525	         have been used to generate the stream or that the receiver
1526	         supports,  as  specified  in  [HEVC],  is  defined  by  the
1527	         combination  of  profile-space  and  profile-id.    Note  that
1528	         profile-space is required to be equal to 0 in [HEVC], but
1529	         other values for it may be specified in the future by ITU-T or
1530	         ISO/IEC.

1532	         If the profile-space and profile-id parameters are used to
1533	         indicate properties of a NAL unit stream, it indicates that,
1534	         to decode the stream, the minimum subset of coding tools a
1535	         decoder has to support is the profile specified by both
1536	         parameters.

1538	         If the profile-space and profile-id parameters are used for
1539	         capability exchange or session setup, it indicates the subset
1540	         of coding tools, which is equal to the profile, that the codec
1541	         supports for both receiving and sending.

1543	         If no profile-space is present, a value of 0 MUST be inferred
1544	         and if no profile-id is present the Main profile (i.e. a value
1545	         of 1) MUST be inferred.

1547	         The profile-space and profile-id parameters are derived from
1548	         the sequence parameter set or video parameter set NAL units,
1549	         as specified in [HEVC], as follows.

1551	         For SST or for the stream corresponding to the highest RTP
1552	         session of MST when MST is applied, the following applies:

1554	            o profile_space = general_profile_space
1555	            o profile_id = general_profile_idc

1557	         For streams not corresponding to the highest RTP session of
1558	         MST when MST is applied, the following applies, with j being
1559	         the value of the sub-layer-id parameter:

1561	            o profile_space = sub_layer_profile_space[j]
1562	            o profile_id = sub_layer_profile_idc[j]

1564	      tier-flag, level-id:

1566	         The   tier-flag   parameter   indicates   the   context   for
1567	         interpretation of the level-id value.  The default level,
1568	         which limits values of syntax elements or on arithmetic
1569	         combinations of values of syntax elements, as specified in
1570	         [HEVC], is defined by the combination of tier-flag and level-
1571	         id.

1573	         If the tier-flag and level-id parameters are used to indicate
1574	         properties of a NAL unit stream, it indicates that, to decode
1575	         the stream the lowest level the decoder has to support is the
1576	         default level.

1578	         If  the  tier-flag  and  level-id  parameters  are  used  for
1579	         capability exchange or session setup, the following applies.
1580	         If max-recv-level-id is not present, the default level defined
1581	         by tier-flag and level-id indicates the highest level the
1582	         codec wishes to support.  Otherwise, tier-flag and max-recv-
1583	         level-id indicate the highest level the codec supports for
1584	         receiving.  For either receiving or sending, all levels that
1585	         are lower than the highest level supported MUST also be
1586	         supported.

1588	         If no tier-flag is present, a value of 0 MUST be inferred and
1589	         if no level-id is present, a value of 93 (i.e. level 3.1) MUST
1590	         be inferred.

1592	         The tier-flag and level-id parameters are derived from the
1593	         sequence parameter set or video parameter set NAL units, as
1594	         specified in [HEVC], as follows.

1596	         For SST or for the stream corresponding to the highest RTP
1597	         session of MST when MST is applied, the following applies:

1599	            o tier-flag = general_tier_flag
1600	            o level-id = general_level_idc

1602	         For streams not corresponding to the highest RTP session of
1603	         MST when MST is applied, the following applies, with j being
1604	         the value of the sub-layer-id parameter:

1606	            o tier-flag = sub_layer_tier_flag[j]
1607	            o level-id = sub_layer_level_idc[j]

1609	      interop-constraints:

1611	         A base16 [RFC4648] (hexadecimal) representation of the six
1612	         bytes  derived  from  the  sequence  parameter  set  or  video
1613	         parameter set NAL units as specified in [HEVC] consisting of
1614	         progressive_source_flag,               interlaced_source_flag,
1615	         non_packed_constraint_flag,  frame_only_constraint_flag,  and
1616	         reserved_zero_44bits.    Note  that  reserved_zero_44bits  is
1617	         required to be equal to 0 in [HEVC], but other values for it
1618	         may be specified in the future by ITU-T or ISO/IEC.

1620	         If no interop-constraints are present, the following MUST be
1621	         inferred:

1623	            o progressive_source_flag = 1
1624	            o interlaced_source_flag = 0
1625	            o non_packed_constraint_flag = 1
1626	            o frame_only_constraint_flag = 1
1627	            o reserved_zero_44bits = 0

1629	         For SST or for the stream corresponding to the highest RTP
1630	         session of MST when MST is applied, the following applies:

1632	            o progressive_source_flag = general_progressive_source_flag
1633	            o interlaced_source_flag = general_interlaced_source_flag
1634	            o non_packed_constraint_flag =
1635	                              general_non_packed_constraint_flag
1636	            o frame_only_constraint_flag =
1637	                              general_frame_only_constraint_flag
1638	            o reserved_zero_44bits = general_reserved_zero_44bits

1640	         For streams not corresponding to the highest RTP session of
1641	         MST when MST is applied, the following applies, with j being
1642	         the value of the sub-layer-id parameter:

1644	            o progressive_source_flag =
1645	                              sub_layer_progressive_source_flag[j]
1646	            o interlaced_source_flag =
1647	                              sub_layer_interlaced_source_flag[j]
1648	            o non_packed_constraint_flag =
1649	                              sub_layer_non_packed_constraint_flag[j]
1650	            o frame_only_constraint_flag =
1651	                              sub_layer_frame_only_constraint_flag[j]
1652	            o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

1654	      profile-compatibility-indicator:

1656	         A  base16  [RFC4648]  representation  of  the  four  bytes
1657	         representing  the  32  profile  compatibility  flags  in  the
1658	         sequence parameter set or video parameter set NAL units.  A
1659	         decoder conforming to a certain profile may be able to decode
1660	         bitstreams  conforming  to  other  profiles.    The  profile-
1661	         compatibility-indicator  provides  exact  information  of  the
1662	         ability of a decoder conforming to a certain profile to decode
1663	         bitstreams conforming to another profile.  More concretely, if
1664	         the profile compatibility flag corresponding to the profile,
1665	         which a decoder conforms to, is set, then the decoder is able
1666	         to decode that bitstream with the flag set, irrespective of
1667	         the profile, which a bitstream conforms to (provided that the
1668	         decoder supports the highest level of the bitstream).

1670	         For SST or for the stream corresponding to highest RTP session
1671	         of  MST  when  MST  is  used  with  temporal  scalability  the
1672	         following applies with j = 0..31:

1674	            o The 32 flags = general_profile_compatibility_flag[j]

1676	         When MST is in use, for streams not corresponding to the
1677	         highest RTP session, the following applies with i being the
1678	         value of the sub-layer-id parameter and j = 0..31:

1680	            o The 32 flags = sub_layer_profile_compatibility_flag[i][j]

1682	      sub-layer-id:

1684	         This parameter MAY be used to indicate the highest allowed
1685	         value of TID in the stream.  When not present, the value of
1686	         sub-layer-id is inferred to be equal to 6.

1688	      recv-sub-layer-id:

1690	         This parameter MAY be used to signal a receiver's choice of
1691	         the offers or declared sub-layers in the sprop-vps.  The value
1692	         of recv-sub-layer-id indicates the index of the highest sub-
1693	         layer of the stream that a receiver supports.  When not
1694	         present, the value of recv-sub-layer-id is inferred to be
1695	         equal to sub-layer-id.

1697	      max-recv-level-id:

1699	         This parameter MAY be used, together with tier-flag, to
1700	         indicate the highest level a receiver supports.  The highest
1701	         level the receiver supports is equal to the value of max-recv-
1702	         level-id  divided  by  30  for  the  Main  or  High  tier  (as
1703	         determined by tier-flag equal to 0 or 1, respectively).

1705	         When max-recv-level-id is not present, the value is inferred
1706	         to be equal to level-id.

1708	         max-recv-level-id MUST NOT be present when the highest level
1709	         the receiver supports is not higher than the default level.

1711	      sprop-vps:

1713	         This parameter MAY be used to convey any video parameter set
1714	         NAL unit of the stream.  When present, the parameter MAY be
1715	         used   to   indicate   codec   capability   and   sub-stream
1716	         characteristics (i.e. properties of sub-layer representations
1717	         as defined in [HEVC]) as well as for out-of-band transmission
1718	         of video parameter sets.  The value of the parameter is a
1719	         comma-separated (',') list of base64 [RFC4648] representations
1720	         of the video parameter set NAL units as specified in Section
1721	         7.3.2.1 of [HEVC].

1723	      sprop-sps:

1725	         This parameter MAY be used to convey sequence parameter set
1726	         NAL units of the stream for out-of-band transmission of
1727	         sequence parameter sets.  The value of the parameter is a
1728	         comma-separated (',') list of base64 [RFC4648] representations
1729	         of the sequence parameter set NAL units as specified in
1730	         Section 7.3.2.2 of [HEVC].

1732	      sprop-pps:

1734	         This parameter MAY be used to convey picture parameter set NAL
1735	         units of the stream for out-of-band transmission of picture
1736	         parameter sets.  The value of the parameter is a comma-
1737	         separated (',') list of base64 [RFC4648] representations of
1738	         the picture parameter set NAL units as specified in Section
1739	         7.3.2.3 of [HEVC].

1741	      max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

1743	         These parameters MAY be used to signal the capabilities of a
1744	         receiver implementation.  These parameters MUST NOT be used
1745	         for any other purpose.  The highest level (specified by tier-
1746	         flag and max-recv-level-id) MUST be such that the receiver is
1747	         fully capable of supporting.  max-ls, max-lps, max-cpb, max-
1748	         dpb, max-br, max-tr, and max-tc MAY be used to indicate
1749	         capabilities  of  the  receiver  that  extend  the  required
1750	         capabilities of the highest level, as specified below.

1752	         When more than one parameter from the set (max-ls, max-lps,
1753	         max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
1754	         receiver    MUST    support    all    signaled    capabilities
1755	         simultaneously.  For example, if both max-ls and max-br are
1756	         present, the highest level with the extension of both the
1757	         picture rate and bitrate is supported.  That is, the receiver
1758	         is able to decode NAL unit streams in which the luma sample
1759	         rate is up to max-ls (inclusive), the bitrate is up to max-br
1760	         (inclusive), the coded picture buffer size is derived as
1761	         specified in the semantics of the max-br parameter below, and
1762	         the other properties comply with the highest level specified
1763	         by tier-flag and max-recv-level-id.

1765	            Informative note: When the OPTIONAL media type parameters
1766	            are used to signal the properties of a NAL unit stream, and
1767	            max-ls, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-
1768	            tc are not present, the values of profile-space, profile-
1769	            id, tier-flag, and level-id must always be such that the
1770	            NAL unit stream complies fully with the specified profile
1771	            and level.

1773	      max-ls:
1774	         The value of max-ls is an integer indicating the maximum
1775	         processing rate in units of luma samples per second.  The max-
1776	         ls parameter signals that the receiver is capable of decoding
1777	         video at a higher rate than is required by the highest level.

1779	         When max-ls is signaled, the receiver MUST be able to decode
1780	         NAL unit streams that conform to the highest level, with the
1781	         exception that the MaxLumaSR value in Table A-2 of [HEVC] for
1782	         the highest level is replaced with the value of max-ls.  The
1783	         value of max-ls MUST be greater than or equal to the value of
1784	         MaxLumaSR given in Table A-2 of [HEVC] for the highest level.
1785	         Senders MAY use this knowledge to send pictures of a given
1786	         size at a higher picture rate than is indicated in the highest
1787	         level.

1789	         When not present, the value of max-ls is inferred to be equal
1790	         to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
1791	         highest level.

1793	      max-lps:
1794	         The value of max-lps is an integer indicating the maximum
1795	         picture size in units of luma samples.  The max-lps parameter
1796	         signals that the receiver is capable of decoding larger
1797	         picture sizes than are required by the highest level.  When
1798	         max-lps is signaled, the receiver MUST be able to decode NAL
1799	         unit streams that conform to the highest level, with the
1800	         exception that the MaxLumaPS value in Table A-1 of [HEVC] for
1801	         the highest level is replaced with the value of max-lps.  The
1802	         value of max-lps MUST be greater than or equal to the value of
1803	         MaxLumaPS given in Table A-1 of [HEVC] for the highest level.
1804	         Senders MAY use this knowledge to send larger pictures at a
1805	         proportionally lower picture rate than is indicated in the
1806	         highest level.

1808	         When not present, the value of max-lps is inferred to be equal
1809	         to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
1810	         highest level.

1812	      max-cpb:
1813	         The value of max-cpb is an integer indicating the maximum
1814	         coded picture buffer size in units of CpbBrVclFactor bits for
1815	         the VCL HRD parameters and in units of CpbBrNalFactor bits for
1816	         the   NAL   HRD   parameters,   where   CpbBrVclFactor   and
1817	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The max-
1818	         cpb parameter signals that the receiver has more memory than
1819	         the minimum amount of coded picture buffer memory required by
1820	         the highest level.  When max-cpb is signaled, the receiver
1821	         MUST be able to decode NAL unit streams that conform to the
1822	         highest level, with the exception that the MaxCPB value in
1823	         Table A-1 of [HEVC] for the highest level is replaced with the
1824	         value of max-cpb.  The value of max-cpb MUST be greater than
1825	         or equal to the value of MaxCPB given in Table A-1 of [HEVC]
1826	         for the highest level.  Senders MAY use this knowledge to
1827	         construct  coded  video  streams  with  greater  variation  of
1828	         bitrate than can be achieved with the MaxCPB value in Table A-
1829	         1 of [HEVC].

1831	         When not present, the value of max-cpb is inferred to be equal
1832	         to the value of MaxCPB given in Table A-1 of [HEVC] for the
1833	         highest level.

1835	            Informative note: The coded picture buffer is used in the
1836	            hypothetical reference decoder (Annex C of HEVC).  The use
1837	            of the hypothetical reference decoder is recommended in
1838	            HEVC  encoders  to  verify  that  the  produced  bitstream
1839	            conforms to the standard and to control the output bitrate.
1840	            Thus, the coded picture buffer is conceptually independent
1841	            of any other potential buffers in the receiver, including
1842	            de-packetization and de-jitter buffers.  The coded picture
1843	            buffer need not be implemented in decoders as specified in
1844	            Annex C of HEVC, but rather standard-compliant decoders can
1845	            have any buffering arrangements provided that they can
1846	            decode standard-compliant bitstreams.  Thus, in practice,
1847	            the input buffer for a video decoder can be integrated with
1848	            de-packetization and de-jitter buffers of the receiver.

1850	      max-dpb:
1851	         The value of max-dpb is an integer indicating the maximum
1852	         decoded picture buffer size in units decoded pictures at the
1853	         MaxLumaPS for the highest level, i.e. number of decoded
1854	         pictures at the maximum picture size defined by the highest
1855	         level.  The value of max-dpb MUST be smaller than or equal to
1856	         16.  The max-dpb parameter signals that the receiver has more
1857	         memory than the minimum amount of decoded picture buffer
1858	         memory required by default, which is MaxDpbPicBuf as defined
1859	         in [HEVC] (equal to 6).  When max-dpb is signaled, the
1860	         receiver MUST be able to decode NAL unit streams that conform
1861	         to  the  highest  level,  with  the  exception  that  the
1862	         MaxDpbPicBuff value defined in [HEVC] as 6 is replaced with
1863	         the value of max-dpb.  Consequently, a receiver that signals
1864	         max-dpb MUST be capable of storing the following number of
1865	         decoded pictures (MaxDpbSize) in its decoded picture buffer:

1867	                          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
1868	              MaxDpbSize = Min( 4 * max-dpb, 16 )
1869	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
1870	              MaxDpbSize = Min( 2 * max-dpb, 16 )
1871	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
1872	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
1873	           else
1874	              MaxDpbSize = max-dpb

1876	                        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
1877	         level and PicSizeInSamplesY is the current size of each
1878	         decoded picture in units of luma samples as defined in [HEVC].

1880	                        The value of max-dpb MUST be greater than or equal to the
1881	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].  Senders
1882	         MAY use this knowledge to construct coded video streams with
1883	         improved compression.

1885	                        When not present, the value of max-dpb is inferred to be equal
1886	         to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].

1888	            Informative note: This parameter was added primarily to
1889	            complement a similar codepoint in the ITU-T Recommendation
1890	            H.245, so as to facilitate signaling gateway designs.  The
1891	            decoded picture buffer stores reconstructed samples.  There
1892	            is no relationship between the size of the decoded picture
1893	            buffer  and  the  buffers  used  in  RTP,  especially  de-
1894	            packetization and de-jitter buffers.

1896	      max-br:
1897	         The value of max-br is an integer indicating the maximum video
1898	         bitrate in units of CpbBrVclFactor bits per second for the VCL
1899	         HRD parameters and in units of CpbBrNalFactor bits per second
1900	         for  the  NAL  HRD  parameters,  where  CpbBrVclFactor  and
1901	         CpbBrNalFactor are defined in Section A.4 of [HEVC].

1903	         The max-br parameter signals that the video decoder of the
1904	         receiver is capable of decoding video at a higher bitrate than
1905	         is required by the highest level.

1907	         When max-br is signaled, the video codec of the receiver MUST
1908	         be able to decode NAL unit streams that conform to the highest
1909	         level, with the following exceptions in the limits specified
1910	         by the highest level:

1912	          o The value of max-br replaces the MaxBR value in Table A-2
1913	            of [HEVC] for the highest level.
1914	          o When the max-cpb parameter is not present, the result of
1915	            the following formula replaces the value of MaxCPB in Table
1916	            A-1 of [HEVC]:

1918	               (MaxCPB of the highest level) * max-br / (MaxBR of the
1919	               highest level)

1921	         For example, if a receiver signals capability for Main profile
1922	         Level 2 with max-br equal to 2000, this indicates a maximum
1923	         video bitrate of 2000 kbits/sec for VCL HRD parameters, a
1924	         maximum  video  bitrate  of  2200  kbits/sec  for  NAL  HRD
1925	         parameters, and a CPB size of 2000000 bits (2000000 / 1500000
1926	         * 1500000).

1928	         The value of max-br MUST be greater than or equal to the value
1929	         MaxBR given in Table A-2 of [HEVC] for the highest level.

1931	         Senders MAY use this knowledge to send higher bitrate video as
1932	         allowed in the level definition of Annex A of HEVC to achieve
1933	         improved video quality.

1935	         When not present, the value of max-br is inferred to be equal
1936	         to the value of MaxBR given in Table A-2 of [HEVC] for the
1937	         highest level.

1939	            Informative note: This parameter was added primarily to
1940	            complement a similar codepoint in the ITU-T Recommendation
1941	            H.245, so as to facilitate signaling gateway designs.  The
1942	            assumption that the network is capable of handling such
1943	            bitrates at any given time cannot be made from the value of
1944	            this parameter.  In particular, no conclusion can be drawn
1945	            that the signaled bitrate is possible under congestion
1946	            control constraints.

1948	      max-tr:
1949	         The value of max-tr is an integer indication the maximum
1950	         number of tile rows.  The max-tr parameter signals that the
1951	         receiver is capable of decoding video with a larger number of
1952	         tile rows than the value allowed by the highest level.

1954	         When max-tr is signaled, the receiver MUST be able to decode
1955	         NAL unit streams that conform to the highest level, with the
1956	         exception that the MaxTileRows value in Table A-1 of [HEVC]
1957	         for the highest level is replaced with the value of max-tr.

1959	         The value of max-tr MUST be greater than or equal to the value
1960	         of MaxTileRows given in Table A-1 of [HEVC] for the highest
1961	         level.  Senders MAY use this knowledge to send pictures
1962	         utilizing a larger number of tile rows than the value allowed
1963	         by the highest level.

1965	         When not present, the value of max-tr is inferred to be equal
1966	         to the value of MaxTileRows given in Table A-1 of [HEVC] for
1967	         the highest level.

1969	      max-tc:
1970	         The value of max-tc is an integer indication the maximum
1971	         number of tile columns.  The max-tc parameter signals that the
1972	         receiver is capable of decoding video with a larger number of
1973	         tile columns than the value allowed by the highest level.

1975	         When max-tc is signaled, the receiver MUST be able to decode
1976	         NAL unit streams that conform to the highest level, with the
1977	         exception that the MaxTileCols value in Table A-1 of [HEVC]
1978	         for the highest level is replaced with the value of max-tc.

1980	         The value of max-tc MUST be greater than or equal to the value
1981	         of MaxTileCols given in Table A-1 of [HEVC] for the highest
1982	         level.  Senders MAY use this knowledge to send pictures
1983	         utilizing a larger number of tile columns than the value
1984	         allowed by the highest level.

1986	         When not present, the value of max-tc is inferred to be equal
1987	         to the value of MaxTileCols given in Table A-1 of [HEVC] for
1988	         the highest level.

1990	      max-fps:

1992	         The value of max-fps is an integer indicating the maximum
1993	         picture rate in units of hundreds of pictures per second that
1994	         can be efficiently received.  The max-fps parameter MAY be
1995	         used to signal that the receiver has a constraint in that it
1996	         is not capable of decoding video efficiently at the full
1997	         picture rate that is implied by the highest level and, when
1998	         present, one or more of the parameters max-ls, max-lps, and
1999	         max-br.

2001	         The value of max-fps is not necessarily the picture rate at
2002	         which the maximum picture size can be sent, it constitutes a
2003	         constraint on maximum picture rate for all resolutions.

2005	            Informative note: The max-fps parameter is semantically
2006	            different from max-ls, max-lps, max-cpb, max-dpb, max-br,
2007	            max-tr, and max-tc in that max-fps is used to signal a
2008	            constraint, lowering the maximum picture rate from what is
2009	            implied by other parameters.

2011	         The encoder MUST use a picture rate equal to or less than this
2012	         value.  In cases where the max-fps parameter is absent the
2013	         encoder is free to choose any picture rate according to the
2014	         highest level and any signaled optional parameters.

2016	      tx-mode:

2018	         This parameter indicates whether the transmission mode is SST
2019	         or MST.

2021	         The value of tx-mode MUST be equal to either "MST" or "SST".
2022	         When not present, the value of tx-mode is inferred to be equal
2023	         to "SST".

2025	         If the value is equal to "MST", MST MUST be in use.  Otherwise
2026	         (the value is equal to "SST"), SST MUST be in use.

2028	         The value of tx-mode MUST be equal to "MST" for all RTP
2029	         sessions in an MST.

2031	      sprop-depack-buf-nalus:

2033	         This parameter specifies the maximum number of NAL units that
2034	         precede a NAL unit in the de-packetization buffer in reception
2035	         order and follow the NAL unit in decoding order.

2037	         The value of sprop-depack-buf-nalus MUST be an integer in the
2038	         range of 0 to 32767, inclusive.

2040	         When not present, the value of sprop-depack-buf-nalus is
2041	         inferred to be equal to 0.

2043	         When the RTP session depends on one or more other RTP sessions
2044	         (in this case tx-mode MUST be equal to "MST"), this parameter
2045	         MUST be present and the value of sprop-depack-buf-nalus MUST
2046	         be greater than 0.

2048	      sprop-depack-buf-bytes:

2050	         This  parameter  signals  the  required  size  of  the  de-
2051	         packetization buffer in units of bytes.  The value of the
2052	         parameter MUST be greater than or equal to the maximum buffer
2053	         occupancy (in units of bytes) of the de-packetization buffer
2054	         as specified in section 6.

2056	         The value of sprop-depack-buf-bytes MUST be an integer in the
2057	         range of 0 to 4294967295, inclusive.

2059	         When the RTP session depends on one or more other RTP sessions
2060	         (in this case tx-mode MUST be equal to "MST") or sprop-depack-
2061	         buf-nalus is present and is greater than 0, this parameter
2062	         MUST be present and the value of sprop-depack-buf-bytes MUST
2063	         be greater than 0.

2065	            Informative  note:  The  value  of  sprop-depack-buf-bytes
2066	            indicates the required size of the de-packetization buffer
2067	            only.  When network jitter can occur, an appropriately
2068	            sized jitter buffer has to be available as well.

2070	      depack-buf-cap:

2072	         This  parameter  signals  the  capabilities  of  a  receiver
2073	         implementation and indicates the amount of de-packetization
2074	         buffer space in units of bytes that the receiver has available
2075	         for reconstructing the NAL unit decoding order.  A receiver is
2076	         able to handle any stream for which the value of the sprop-
2077	         depack-buf-bytes parameter is smaller than or equal to this
2078	         parameter.

2080	         When not present, the value of depack-buf-cap is inferred to
2081	         be equal to 0.  The value of depack-buf-cap MUST be an integer
2082	         in the range of 0 to 4294967295, inclusive.

2084	            Informative  note:  depack-buf-cap  indicates  the  maximum
2085	            possible  size  of  the  de-packetization  buffer  of  the
2086	            receiver  only.    When  network  jitter  can  occur,  an
2087	            appropriately sized jitter buffer has to be available as
2088	            well.

2090	      sprop-segmentation-id:

2092	         This parameter MAY be used to signal the segmentation tools
2093	         present  in  the  stream  and  that  can  be  used  for
2094	         parallelization.  The value of sprop-segmentation-id MUST be
2095	         an integer in the range of 0 to 3, inclusive.  When not
2096	         present, the value of sprop-segmentation-id is inferred to be
2097	         equal to 0.

2099	         When sprop-segmentation-id is equal to 0, no information about
2100	         the segmentation tools is provided.  When sprop-segmentation-
2101	         id is equal to 1, it indicates that slices are present in the
2102	         stream.    When  sprop-segmentation-id  is  equal  to  2,  it
2103	         indicates that tiles are present in the stream.  When sprop-
2104	         segmentation-id is equal to 3, it indicates that WPP is used
2105	         in the stream.

2107	      sprop-spatial-segmentation-idc:

2109	         A  base16  [RFC4648]  representation  of  the  syntax  element
2110	         min_spatial_segmentation_idc as specified in [HEVC].  This
2111	         parameter MAY be used to describe parallelization capabilities
2112	         of the stream.

2114	      dec-parallel-cap:

2116	         This  parameter  MAY  be  used  to  indicate  the  decoder's
2117	         additional decoding capabilities given the presence of tools
2118	         enabling parallel decoding, such as slices, tiles, and WPP, in
2119	         the video stream.  The decoding capability of the decoder may
2120	         vary with the setting of the parallel decoding tools present
2121	         in the stream, e.g. the size of the tiles that are present in
2122	         a stream.  Therefore, multiple capability points may be
2123	         provided,  each  indicating  the  minimum  required  decoding
2124	         capability that is associated with a parallelism requirement,
2125	         which is a requirement on the video stream that enables
2126	         parallel decoding.

2128	         Each capability point is defined as a combination of 1) a
2129	         parallelism requirement, 2) a profile (determined by profile-
2130	         space and profile-id), 3) a highest level, and 4) a maximum
2131	         processing rate, a maximum picture size, and a maximum video
2132	         bitrate that may be equal to or greater than that determined
2133	         by the highest level.The parameter's syntax in ABNF [RFC5234]
2134	         is as follows:

2136	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2137	                               cap-point) "}"

2139	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2140	                         cap-parameter)

2142	            spatial-seg-idc = 1*4DIGIT ; 1-4095

2144	            cap-parameter = tier-flag / level-id / max-ls
2145	                            / max-lps / max-br

2147	         The set of capability points expressed by the dec-parallel-cap
2148	         parameter is enclosed in a pair of curly braces ("{}").  Each
2149	         set of two consecutive capability points is separated by a
2150	         comma (',').  Within each capability point, each set of two
2151	         consecutive parameters, and when present, their values, is
2152	         separated by a semicolon (';').

2154	         The profile of all capability points is determined by profile-
2155	         space and profile-id that are outside the dec-parallel-cap
2156	         parameter.

2158	         Each  capability  point  starts  with  an  indication  of  the
2159	         parallelism requirement, which consists of a parallel tool
2160	         type, which may be equal to 'w' or 't', and a decimal value of
2161	         the spatial-seg-idc parameter.  When the type is 'w', the
2162	         capability point is valid only for H.265 bitstreams with WPP
2163	         in use, i.e., entropy_coding_sync_enabled_flag equal to 1.
2164	         When the type is 't', the capability point is valid only for
2165	         H.265   bitstreams   with   WPP   not   in   use   (i.e.

2167	         entropy_coding_sync_enabled_flag equal to 0).  The capability-
2168	         point   is   valid   only   for   H.265   bitstreams   with
2169	         min_spatial_segmentation_idc equal to or greater than spatial-
2170	         seg-idc.

2172	         The value of spatial-seg-idc MUST be greater than 0.

2174	         After the parallelism requirement indication, each capability
2175	         point continues with one or more pairs of parameter and value
2176	         in any order for any of the following parameters:

2178	            o tier-flag
2179	            o level-id
2180	            o max-ls
2181	            o max-lps
2182	            o max-br

2184	         At most one occurrence of each of the above five parameters is
2185	         allowed within each capability point.

2187	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2188	         cap.level-id for a capability point indicate the highest level
2189	         of the capability point.  The values of dec-parallel-cap.max-
2190	         ls, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
2191	         a capability point indicate the maximum processing rate in
2192	         units of luma samples per second, the maximum picture size in
2193	         units of luma samples, and the maximum video bitrate (in units
2194	         of CpbBrVclFactor bits per second for the VCL HRD parameters
2195	         and in units of CpbBrNalFactor bits per second for the NAL HRD
2196	         parameters)  where  CpbBrVclFactor  and  CpbBrNalFactor  are
2197	         defined in Section A.4 of [HEVC]).

2199	         When not present, the value of dec-parallel-cap.tier-flag is
2200	         inferred to be equal to the value of tier-flag outside the
2201	         dec-parallel-cap parameter.  When not present, the value of
2202	         dec-parallel-cap.level-id is inferred to be equal to the value
2203	         of max-recv-level-id outside the dec-parallel-cap parameter.
2204	         When not present, the value of dec-parallel-cap.max-ls, dec-
2205	         parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
2206	         to be equal to the value of max-ls, max-lps, or max-br,
2207	         respectively, outside the dec-parallel-cap parameter.

2209	         The general decoding capability, expressed by the set of
2210	         parameters outside of dec-parallel-cap, is defined as the
2211	         capability  point  that  is  determined  by  the  following
2212	         combination of parameters: 1) the parallelism requirement
2213	         corresponding to the value of sprop-segmentation-id equal to 0
2214	         for a stream, 2) the profile determined by profile-space and
2215	         profile-id, 3) the highest level determined by tier-flag and
2216	         max-recv-level-id, and 4) the maximum processing rate, the
2217	         maximum picture size, and the maximum video bitrate determined
2218	         by the highest level.  The general decoding capability MUST
2219	         NOT be included as one of the set of capability points in the
2220	         dec-parallel-cap parameter.

2222	         For example, the following parameters express the general
2223	         decoding capability of 720p30 (Level 3.1) plus an additional
2224	         decoding capability of 1080p30 (Level 4) given that the
2225	         spatially largest tile or slice used in the bitstream is equal
2226	         to or less than 1/3 of the picture size:

2228	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}

2230	         For another example, the following parameters express an
2231	         additional decoding capability of 1080p30, using dec-parallel-
2232	         cap.max-ls and dec-parallel-cap.max-lps, given that WPP is
2233	         used in the stream:

2235	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2236	                        max-ls=2088960;max-lps=62668800}

2238	            Informative  note:  When  min_spatial_segmentation_idc  is
2239	            present in a stream and WPP is not used, [HEVC] specifies
2240	            that there is no slice or no tile in the stream containing
2241	            more      than      4      *      PicSizeInSamplesY      /
2242	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2244	      Encoding considerations:

2246	         This type is only defined for transfer via RTP (RFC 3550).

2248	      Security considerations:

2250	         See Section 9 of RFC XXXX.

2252	      Public specification:

2254	         Please refer to Section 13 of RFC XXXX.

2256	      Additional information: None

2258	      File extensions: none

2260	      Macintosh file type code: none

2262	      Object identifier or OID: none

2264	      Person & email address to contact for further information:

2266	      Intended usage: COMMON

2268	      Author: See Section 14 of RFC XXXX.

2270	      Change controller:

2272	         IETF Audio/Video Transport Payloads working group delegated
2273	         from the IESG.

2275	7.2 SDP Parameters

2277	   The receiver MUST ignore any parameter unspecified in this memo.

2279	7.2.1 Mapping of Payload Type Parameters to SDP

2281	   The media type video/H265 string is mapped to fields in the Session
2282	   Description Protocol (SDP) [RFC4566] as follows:

2284	   o  The media name in the "m=" line of SDP MUST be video.

2286	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
2287	      media subtype).

2289	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2291	   o  The  OPTIONAL  parameters  "profile-space",  "profile-id",  "tier-
2292	      flag", "level-id", "interop-constraints", "profile-compatibility-
2293	      indicator", "sub-layer-id", "recv-sub-layer-id", "max-recv-level-
2294	      id", "max-ls", "max-lps", "max-cpb", "max-dpb", "max-br", "max-
2295	      tr",  "max-tc",  "max-fps",  "tx-mode",  "sprop-depack-buf-nalus",
2296	      "sprop-depack-buf-bytes",  "depack-buf-cap",  "sprop-segmentation-
2297	      id",  "sprop-spatial-segmentation-idc",  and  "dec-parallel-cap",
2298	      when present, MUST be included in the "a=fmtp" line of SDP.  This
2299	      parameter is expressed as a media type string, in the form of a
2300	      semicolon separated list of parameter=value pairs.

2302	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2303	      pps", when present, MUST be included in the "a=fmtp" line of SDP
2304	      or conveyed using the "fmtp" source attribute as specified in
2305	      section 6.3 of [RFC5576].  For a particular media format (i.e.,
2306	      RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
2307	      NOT be both included in the "a=fmtp" line of SDP and conveyed
2308	      using the "fmtp" source attribute.  When included in the "a=fmtp"
2309	      line of SDP, these parameters are expressed as a media type
2310	      string,  in  the  form  of  a  semicolon  separated  list  of
2311	      parameter=value pairs.  When conveyed using the "fmtp" source
2312	      attribute, these parameters are only associated with the given
2313	      source and payload type as parts of the "fmtp" source attribute.

2315	          Informative note: Conveyance of "sprop-vps", "sprop-sps", and
2316	          "sprop-pps" using the "fmtp" source attribute allows for out-
2317	          of-band transport of parameter sets in topologies like Topo-
2318	          Video-switch-MCU as specified in [RFC5117].

2320	   An example of media representation in SDP is as follows:

2322	         m=video 49170 RTP/AVP 98
2323	         a=rtpmap:98 H265/90000
2324	         a=fmtp:98 profile-id=1;
2325	                   sprop-vps=<video parameter sets data>

2327	7.2.2 Usage with SDP Offer/Answer Model

2329	   When HEVC is offered over RTP using SDP in an Offer/Answer model
2330	   [RFC3264]  for  negotiation  for  unicast  usage,  the  following
2331	   limitations and rules apply:

2333	   o  The parameters identifying a media format configuration for HEVC
2334	      are  profile-space,  profile-id,  tier-flag,  level-id,  interop-
2335	      constraints, tx-mode, and sprop-depack-buf-nalus.  These media
2336	      configuration  parameters,  except  for  level-id,  MUST  be  used
2337	      symmetrically when the answerer does not include recv-sub-layer-
2338	      id in the answer; i.e., the answerer MUST either maintain all
2339	      configuration parameters or remove the media format (payload
2340	      type) completely, if one or more of the parameter values are not
2341	      supported.  The value of level-id) is changeable.

2343	          Informative note: The requirement for symmetric use does not
2344	          apply for level-id, and does not apply for the other stream
2345	          properties and capability parameters.

2347	   To simplify handling and matching of these configurations, the same
2348	   RTP payload type number used in the offer SHOULD also be used in the
2349	   answer, as specified in [RFC3264].  The same RTP payload type number
2350	   used in the offer MUST also be used in the answer when the answer
2351	   includes recv-sub-layer-id.  When the answer does not include recv-
2352	   sub-layer-id, the answer MUST NOT contain a payload type number used
2353	   in the offer unless the configuration is exactly the same as in the
2354	   offer or the configuration in the answer only differs from that in
2355	   the offer with a different value of level-id.  The answer MAY
2356	   contain the recv-sub-layer-id parameter if an HEVC stream contains
2357	   multiple  operation  points  (using  temporal  scalability  and  sub-
2358	   layers) and sprop-vps is included in the offer where sub-layers are
2359	   present in the video parameter set.  If the sprop-vps is provided in
2360	   an offer, an answerer MAY select a particular operation point in the
2361	   received and/or in the sent stream.  When recv-sub-layer-id is
2362	   present in the answer, the media configuration parameters MUST NOT
2363	   be present in the answer.  Rather, the media configuration that the
2364	   answerer will use for receiving and/or sending is the one used for
2365	   the selected operation point as indicated in the offer.

2367	          Informative note: When an offerer receives an answer that
2368	          does not include recv-sub-layer-id, it has to compare payload
2369	          types not declared in the offer based on the media type
2370	          (i.e.,  video/H265)  and  the  above  media  configuration
2371	          parameters with any payload types it has already declared.
2372	          This will enable it to determine whether the configuration in
2373	          question is new or if it is equivalent to configuration
2374	          already offered, since a different payload type number may be
2375	          used in the answer.  The ability to perform operation point
2376	          selection enables a receiver to utilize the temporal scalable
2377	          nature of an HEVC stream.

2379	   o  The parameters sprop-depack-buf-nalus and sprop-depack-buf-bytes
2380	      describe the properties of the RTP packet stream that the offerer
2381	      or the answerer is sending for the media format configuration.
2382	      This  differs  from  the  normal  usage  of  the  Offer/Answer
2383	      parameters: normally such parameters declare the properties of
2384	      the stream that the offerer or the answerer is able to receive.
2385	      When dealing with HEVC, the offerer assumes that the answerer
2386	      will be able to receive media encoded using the configuration
2387	      being offered.

2389	            Informative note:  The above parameters apply for any
2390	            stream  sent  by  a  declaring  entity  with  the  same
2391	            configuration; i.e., they are dependent on their source.
2392	            Rather than being bound to the payload type, the values may
2393	            have to be applied to another payload type when being sent,
2394	            as they apply for the configuration.

2396	   o  The capability parameters max-ls, max-lps, max-cpb, max-dpb, max-
2397	      br,  max-tr,  and  max-tc  MAY  be  used  to  declare  further
2398	      capabilities of the offerer or answerer for receiving.  These
2399	      parameters MUST NOT be present when the direction attribute is
2400	      "sendonly".

2402	   o  The capability parameter max-fps MAY be used to declare lower
2403	      capabilities of the offerer or answerer for receiving.  The
2404	      parameters MUST NOT be present when the direction attribute is
2405	      "sendonly".

2407	   o  The capability parameter dec-parallel-cap MAY be used to declare
2408	      additional decoding capabilities of the offerer or answerer for
2409	      receiving.  Upon receiving such a declaration of a receiver, a
2410	      sender  MAY  send  a  stream  to  the  receiver  utilizing  those
2411	      capabilities under the assumption that the stream fulfills the
2412	      parallelism requirement.  A stream that is sent based on choosing
2413	      a capability point with parallel tool type 'w' from dec-parallel-
2414	      cap MUST have entropy_coding_sync_enabled_flag equal to 1.  A
2415	      stream that is sent based on choosing a capability point with
2416	      parallel  tool  type  't'  from  dec-parallel-cap  MUST  have
2417	      entropy_coding_sync_enabled_flag     equal     to     0     and
2418	      min_spatial_segmentation_idc  equal  to  or  larger  than  dec-
2419	      parallel-cap.spatial-seg-idc of the capability point.

2421	   o  An offerer has to include the size of the de-packetization
2422	      buffer,  sprop-depack-buf-bytes,  and  sprop-depack-buf-nalus,  in
2423	      the  offer  for  an  interleaved  HEVC  stream  or  for  the  MST
2424	      transmission mode.  To enable the offerer and answerer to inform
2425	      each  other  about  their  capabilities  for  de-packetization
2426	      buffering in receiving streams, both parties are RECOMMENDED to
2427	      include depack-buf-cap.  For interleaved streams or in MST, it is
2428	      also RECOMMENDED to consider offering multiple payload types with
2429	      different buffering requirements when the capabilities of the
2430	      receiver are unknown.

2432	   For streams being delivered over multicast, the following rules
2433	   apply:

2435	   o  The media format configuration is identified by profile-space,
2436	      profile-id, tier-flag, level-id, interop-constraints, tx-mode and
2437	      sprop-depack-buf-nalus.    These  media  format  configuration
2438	      parameters, including level-id, MUST be used symmetrically; that
2439	      is,  the  answerer  MUST  either  maintain  all  configuration
2440	      parameters or remove the media format (payload type) completely.
2441	      Note that this implies that the level-id for Offer/Answer in
2442	      multicast is not changeable.

2444	   To simplify the handling and matching of these configurations, the
2445	   same RTP payload type number used in the offer SHOULD also be used
2446	   in the answer, as specified in [RFC3264].  An answer MUST NOT
2447	   contain  a  payload  type  number  used  in  the  offer  unless  the
2448	   configuration is the same as in the offer.

2450	   o  The rules for other parameters are the same as above for unicast
2451	      as long as the above rules are obeyed.

2453	   Table 1 lists the interpretation of all the parameters that MUST be
2454	   used for the various combinations of offer, answer, and direction
2455	   attributes.  Note that the two columns wherein the recv-sub-layer-id
2456	   parameter is used only apply to answers, whereas the other columns
2457	   apply to both offers and answers.

2459	   Table 1.  Interpretation of parameters for various combinations of
2460	   offers, answers, direction attributes, with and without recv-sub-
2461	   layer-id.  Columns that do not indicate offer or answer apply to
2462	   both.

2464	                                          sendonly --+
2465	            answer: recvonly, recv-sub-layer-id --+  |
2466	              recvonly w/o recv-sub-layer-id --+  |  |
2467	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
2468	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
2469	                                         |  |  |  |  |
2470	      profile-space                      C  X  C  X  P
2471	      profile-id                         C  X  C  X  P
2472	      tier-flag                          C  X  C  X  P
2473	      level-id                           C  X  C  X  P
2474	      interop-constraints                C  X  C  X  P
2475	      profile-compatibility-indicator    C  X  C  X  P
2476	      max-recv-level-id                  R  R  R  R  -
2477	      tx-mode                            C  X  C  X  P
2478	      sprop-depack-buf-nalus             P  P  -  -  P
2479	      sprop-depack-buf-bytes             P  P  -  -  P
2480	      depack-buf-cap                     R  R  R  R  -
2481	      sprop-segmentation-id              P  P  P  P  P
2482	      sprop-spatial-segmentation-idc     P  P  P  P  P
2483	      max-br                             R  R  R  R  -
2484	      max-cpb                            R  R  R  R  -
2485	      max-dpb                            R  R  R  R  -
2486	      max-ls                             R  R  R  R  -
2487	      max-lps                            R  R  R  R  -
2488	      max-tr                             R  R  R  R  -
2489	      max-tc                             R  R  R  R  -
2490	      max-fps                            R  R  R  R  -
2491	      sprop-vps                          P  P  -  -  P
2492	      sprop-sps                          P  P  -  -  P
2493	      sprop-pps                          P  P  -  -  P
2494	      sub-layer-id                       P  P  -  -  P
2495	      recv-sub-layer-id                  X  O  X  O  -
2496	      dec-parallel-cap                   R  R  R  R  -

2498	     Legend:

2500	      C: configuration for sending and receiving streams
2501	      P: properties of the stream to be sent
2502	      R: receiver capabilities
2503	      O: operation point selection
2504	      X: MUST NOT be present
2505	      -: not usable, when present SHOULD be ignored

2507	   Parameters used for declaring receiver capabilities are in general
2508	   downgradable; i.e., they express the upper limit for a sender's
2509	   possible behavior.  Thus, a sender MAY select to set its encoder
2510	   using only lower/lesser or equal values of these parameters.

2512	   Parameters declaring a configuration point are not changeable, with
2513	   the exception of the level-id parameter for unicast usage.  This
2514	   expresses values a receiver expects to be used and MUST be used
2515	   verbatim on the sender side.  If level-id is changed, an answerer
2516	   MUST NOT include the recv-sub-layer-id parameter.

2518	   When  a  sender's  capabilities  are  declared,  and  non-changeable
2519	   parameters are used in this declaration, these parameters express a
2520	   configuration that is acceptable for the sender to receive streams.
2521	   In order to achieve high interoperability levels, it is often
2522	   advisable to offer multiple alternative configurations.  It is
2523	   impossible to offer multiple configurations in a single payload
2524	   type.  Thus, when multiple configuration offers are made, each offer
2525	   requires its own RTP payload type associated with the offer.

2527	   A receiver SHOULD understand all media type parameters, even if it
2528	   only supports a subset of the payload format's functionality.  This
2529	   ensures that a receiver is capable of understanding when an offer to
2530	   receive media can be downgraded to what is supported by the receiver
2531	   of the offer.

2533	   An answerer MAY extend the offer with additional media format
2534	   configurations.  However, to enable their usage, in most cases a
2535	   second offer is required from the offerer to provide the stream
2536	   property parameters that the media sender will use.  This also has
2537	   the effect that the offerer has to be able to receive this media
2538	   format configuration, not only to send it.

2540	7.2.3 Usage in Declarative Session Descriptions

2542	   When HEVC over RTP is offered with SDP in a declarative style, as in
2543	   Real  Time  Streaming  Protocol  (RTSP)  [RFC2326]  or  Session
2544	   Announcement Protocol (SAP) [RFC2974], the following considerations
2545	   are necessary.

2547	   o  All parameters capable of indicating both stream properties and
2548	      receiver  capabilities  are  used  to  indicate  only  stream
2549	      properties.  For example, in this case, the parameter profile-
2550	      tier-level-id declares the values used by the stream, not the
2551	      capabilities for receiving streams.  This results in that the
2552	      following interpretation of the parameters MUST be used:

2554	   Declaring actual configuration or stream properties:

2556	     - profile-space
2557	     - profile-id
2558	     - tier-flag
2559	     - level-id
2560	     - interop-constraints
2561	     - tx-mode
2562	     - sprop-vps
2563	     - sprop-sps
2564	     - sprop-pps
2565	     - sprop-depack-buf-nalus
2566	     - sprop-depack-buf-bytes
2567	     - sprop-segmentation-id
2568	     - sprop-spatial-segmentation-idc

2570	   Not usable (when present, they SHOULD be ignored):

2572	     - max-lps
2573	     - max-ls
2574	     - max-cpb
2575	     - max-dpb
2576	     - max-br
2577	     - max-tr
2578	     - max-tc
2579	     - max-fps
2580	     - max-recv-level-id
2581	     - depack-buf-cap
2582	     - sub-layer-id
2583	     - dec-parallel-cap

2585	   o  A receiver of the SDP is required to support all parameters and
2586	      values of the parameters provided; otherwise, the receiver MUST
2587	      reject (RTSP) or not participate in (SAP) the session.  It falls
2588	      on the creator of the session to use values that are expected to
2589	      be supported by the receiving application.

2591	7.2.4 Dependency Signaling in Multi-Session Transmission

2593	   If MST is used, the rules on signaling media decoding dependency in
2594	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
2595	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
2596	   apply, i.e., the notation for Connection Data "c=" SHALL NOT be used
2597	   with more than one address.  The order of session dependency is
2598	   given from the RTP session containing the lowest temporal sub-layer
2599	   to the RTP session containing the highest temporal sub-layer.

2601	8. Use with Feedback Messages

2603	   As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
2604	   Feedback messages are identified by the RTCP packet type value PSFB
2605	   (206).    AVPF  [RFC4585]  defines  three  payload-specific  feedback
2606	   messages  and  one  application  layer  feedback  message,  and  CCM
2607	   [RFC5104] specifies four payload-specific feedback messages.

2609	   In  addition,  this  memo  defines  one  payload-specific  feedback
2610	   message.

2612	   These feedback messages are identified by means of the feedback
2613	   message type (FMT) parameter as follows:

2615	   Assigned in [RFC4585]:

2617	      1:     Picture Loss Indication (PLI)
2618	      2:     Slice Lost Indication (SLI)
2619	      3:     Reference Picture Selection Indication (RPSI)
2620	      15:    Application layer FB message
2621	      31:    reserved for future expansion of the number space

2623	   Assigned in [RFC5104]:

2625	      4:     Full Intra Request (FIR) Command
2626	      5:     Temporal-Spatial Trade-off Request (TSTR)
2627	      6:     Temporal-Spatial Trade-off Notification (TSTN)
2628	      7:     Video Back Channel Message (VBCM)

2630	   Assigned in this memo:

2632	      8:     Specific Picture Loss Indication (SPLI)

2634	   Unassigned:

2636	      0:      unassigned
2637	      9-14:   unassigned
2638	      16-30:  unassigned

2640	   The following subsections define the Feedback Control Information
2641	   (FCI) format for the new payload-specific feedback message and how
2642	   to use HEVC with the RPSI and SPLI messages, both for the purpose of
2643	   feedback  based  reference  picture  selection  for  improved  error
2644	   resilience in real-time conversational video applications such as
2645	   video telephone and video conferencing.

2647	   Feedback based reference picture selection has been shown as a
2648	   powerful tool to stop temporal error propagation for improved error
2649	   resilience [Girod99][Wang05].  In one approach, the decoder side
2650	   tracks errors in the decoded pictures and informs to the encoder
2651	   side that a particular picture that has been decoded relatively
2652	   earlier is correct and still present in the decoded picture buffer
2653	   and requests the encoder to use that correct picture for reference
2654	   when encoding the next picture, so to stop further temporal error
2655	   propagation.  For this approach, the decoder side should use the
2656	   RPSI feedback message.  In another approach, the decoder side only
2657	   reports, to the encoder side, which pictures has been entirely or
2658	   partially  lost,  and  the  encoder  tracks  errors  in  the  decoded
2659	   pictures at the decoder side based on the feedback messages, and if
2660	   it infers that an earlier decoded picture is correct at the decoder
2661	   side and is still in the decoded picture buffer of the decoder, it
2662	   encodes the next picture using that correct picture for reference.
2663	   The SPLI message defined below is for use with the second approach
2664	   described above.

2666	   Encoders can encode some long-term reference pictures as specified
2667	   in H.264 or HEVC for purposes described in the previous paragraph
2668	   without the need of a huge decoded picture buffer.  As shown in
2669	   [Wang05], with a flexible reference picture management scheme as in
2670	   H.264 and HEVC, even a decoded picture buffer size of two would work
2671	   for both the approaches described in the previous paragraph.

2673	8.1 Definition of the SPLI Feedback Message

2675	   The SPLI feedback message is identified by PT=PSFB and FMT=8.  There
2676	   MUST be exactly one SPLI contained in the FCI field.

2678	      Informative note: The SPLI message defined in this memo also
2679	      applies to other codecs, and may later be moved to another
2680	      extension of RFC 4585.

2682	   The FCI format of the SPLI message is exactly the same as that of
2683	   the RPSI message, with the name of the field "Native RPSI bit string
2684	   defined per codec" being replaced with "Native SPLI bit string
2685	   defined per codec", as shown in Figure 11.

2687	   0                   1                   2                   3
2688	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
2689	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2690	   |      PB       |0| Payload Type|    Native SPLI bit string     |
2691	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2692	   |   defined per codec          ...                | Padding (0) |
2693	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2695	                  Figure 11   The PCI format of the SPLI

2697	   PB: 8 bits

2699	      The number of unused bits required to pad the length of the SPLI
2700	      message to a multiple of 32 bits.

2702	   0: 1 bit

2704	      MUST be set to zero upon transmission and ignored upon reception.

2706	   Payload Type: 7 bits

2708	      Indicates the RTP payload type in the context of which the native
2709	      SPLI bit string MUST be interpreted.

2711	   Native SPLI bit string: variable length

2713	      Indicates the SPLI information as natively defined by the video
2714	      codec.

2716	   Padding: #PB bits

2718	      A number of bits set to zero to fill up the contents of the SPLI
2719	      message to the next 32-bit boundary.  The number of padding bits
2720	      MUST be indicated by the PB field.

2722	   The same timing rules as for the RPSI message, as defined in
2723	   [RFC4585], apply for the SPLI message.

2725	8.2 Use of HEVC with the RPSI Feedback Message

2727	   The field "Native RPSI bit string defined per codec" is a base16
2728	   [RFC4648]  representation  of  the  8  bits  consisting  of  2  most
2729	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
2730	   in [HEVC], followed by the 32 bits representing the value of the
2731	   PicOrderCntVal (in network byte order), as defined in [HEVC], for
2732	   the picture that is requested to be used for reference when encoding
2733	   the next picture.

2735	   Use of the RPSI feedback message as positive acknowledgement is
2736	   deprecated.  In other words, the RPSI feedback message MUST only be
2737	   used as a reference picture selection request, such that it can also
2738	   be used in multicast.

2740	8.3 Use of HEVC with the SPLI Feedback Message

2742	   The field "Native SPLI bit string defined per codec" is a base16
2743	   [RFC4648]  representation  of  the  8  bits  consisting  of  2  most
2744	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
2745	   in [HEVC], followed by the 32 bits representing the value of the
2746	   PicOrderCntVal, as defined in [HEVC], for the picture that is
2747	   indicated as entirely or partially lost.

2749	9. Security Considerations

2751	   RTP packets using the payload format defined in this specification
2752	   are subject to the security considerations discussed in the RTP
2753	   specification [RFC3550], and in any applicable RTP profile such as
2754	   RTP/AVP  [RFC3551],  RTP/AVPF  [RFC4585],  RTP/SAVP  [RFC3711]  or
2755	   RTP/SAVPF  [RFC5124].    However,  as  "Securing  the  RTP  Protocol
2756	   Framework:  Why  RTP  Does  Not  Mandate  a  Single  Media  Security
2757	   Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
2758	   RTP payload format's responsibility to discuss or mandate what
2759	   solutions  are  used  to  meet  the  basic  security  goals  like
2760	   confidentiality,  integrity,  and  source  authenticity  for  RTP  in
2761	   general.  This responsibility lays on anyone using RTP in an
2762	   application.    They  can  find  guidance  on  available  security
2763	   mechanisms and important considerations as discussed in "Options for
2764	   Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].

2766	   The rest of this section discusses the security impacting properties
2767	   of the payload format itself.

2769	   Because the data compression used with this payload format is
2770	   applied end-to-end, any encryption needs to be performed after
2771	   compression.  A potential denial-of-service threat exists for data
2772	   encodings  using  compression  techniques  that  have  non-uniform
2773	   receiver-end  computational  load.    The  attacker  can  inject
2774	   pathological datagrams into the stream that are complex to decode
2775	   and that cause the receiver to be overloaded.  H.265 is particularly
2776	   vulnerable to such attacks, as it is extremely simple to generate
2777	   datagrams containing NAL units that affect the decoding process of
2778	   many  future  NAL  units.    Therefore,  the  usage  of  data  origin
2779	   authentication and data integrity protection of at least the RTP
2780	   packet is RECOMMENDED, for example, with SRTP [RFC 3711].

2782	   Note that the appropriate mechanism to ensure confidentiality and
2783	   integrity of RTP packets and their payloads is very dependent on the
2784	   application and on the transport and signaling protocols employed.
2785	   Thus, although SRTP is given as an example above, other possible
2786	   choices exist.

2788	   Decoders MUST exercise caution with respect to the handling of user
2789	   data SEI messages, particularly if they contain active elements, and
2790	   MUST restrict their domain of applicability to the presentation
2791	   containing the stream.

2793	   End-to-end    security    with    authentication,    integrity,    or
2794	   confidentiality  protection  will  prevent  a  MANE  from  performing
2795	   media-aware operations other than discarding complete packets.  In
2796	   the case of confidentiality protection, it will even be prevented
2797	   from discarding packets in a media-aware way.  To be allowed to
2798	   perform such operations, a MANE is required to be a trusted entity
2799	   that is included in the security context establishment.

2801	10. Congestion Control

2803	   Congestion control for RTP SHALL be used in accordance with RTP
2804	   [RFC3550] and with any applicable RTP profile, e.g., AVP [RFC 3551].
2805	   If best-effort service is being used, an additional requirement is
2806	   that users of this payload format MUST monitor packet loss to ensure
2807	   that the packet loss rate is within an acceptable range.  Packet
2808	   loss is considered acceptable if a TCP flow across the same network
2809	   path, and experiencing the same network conditions, would achieve an
2810	   average throughput, measured on a reasonable timescale, that is not
2811	   less than the RTP flow is achieving.  This condition can be
2812	   satisfied by implementing congestion control mechanisms to adapt the
2813	   transmission rate, the number of layers subscribed for a layered
2814	   multicast session, or by arranging for a receiver to leave the
2815	   session if the loss rate is unacceptably high.

2817	   The bitrate adaptation necessary for obeying the congestion control
2818	   principle is easily achievable when real-time encoding is used, for
2819	   example by adequately tuning the quantization parameter.

2821	   However, when pre-encoded content is being transmitted, bandwidth
2822	   adaptation requires the pre-coded bitstream to be tailored for such
2823	   adaptivity.    The  key  mechanism  available  in  HEVC  is  temporal
2824	   scalability.  A media sender can remove NAL units belonging to
2825	   higher temporal sub-layers (i.e. those NAL units with a high value
2826	   of TID) until the sending bitrate drops to an acceptable range.
2827	   HEVC contains mechanisms that allow the lightweight identification
2828	   of switching points in temporal enhancement layers, as discussed in
2829	   Section 1.1.2 of this memo.  An HEVC media sender can send packets
2830	   belonging to NAL units of temporal enhancement layers starting from
2831	   these switching points to probe for available bandwidth and to
2832	   utilized bandwidth that has been shown to be available.

2834	   Above mechanisms generally work within a defined profile and level
2835	   and, therefore, no renegotiation of the channel is required.  Only
2836	   when non-downgradable parameters (such as profile) are required to
2837	   be changed does it become necessary to terminate and restart the
2838	   media stream.  This may be accomplished by using a different RTP
2839	   payload type.

2841	   MANEs MAY remove certain unusable packets from the packet stream
2842	   when that stream was damaged due to previous packet losses.  This
2843	   can help reduce the network load in certain special cases.  For
2844	   example, MANES can remove those FUs where the leading FUs belonging
2845	   to the same NAL unit have been lost or those dependent slice
2846	   segments when the leading slice segments belonging to the same slice
2847	   have been lost, because the trailing FUs or dependent slice segments
2848	   are meaningless to most decoders.  MANES can also remove higher
2849	   temporal scalable layers if the outbound transmission (from the
2850	   MANE's viewpoint) experiences congestion.

2852	11. IANA Consideration

2854	   A new media type, as specified in Section 7.1 of this memo, should
2855	   be registered with IANA.

2857	12. Acknowledgements

2859	   Muhammed Coban and Marta Karczewicz are thanked for discussions on
2860	   the specification of the use with feedback messages and other
2861	   aspects in this memo.  Rickard Sjoberg, Arild Fuldseth, Bo Burman
2862	   Magnus  Westerlund,  and  Tom  Kristensen  are  thanked  for  their
2863	   contributions to parallel processing related signalling.  Roni Even,
2864	   Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, and Ross
2865	   Finlayson made valuable reviewing comments that led to improvements.

2867	   This document was prepared using 2-Word-v2.0.template.dot.

2869	13. References

2871	13.1 Normative References

2873	   [HEVC]    JCT-VC,  "High  Efficiency  Video  Coding  (HEVC)  text
2874	             specification draft 10 (for FDIS & Last Call)", JCTVC-
2875	             L1003v34, March 2013.

2877	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
2878	             generic audiovisual services", January 2012.

2880	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
2881	             Payload Format for H.264 Video", RFC 6184, May 2011.

2883	   [RFC6190] Wenger,   S.,   Wang,   Y.-K.,   Schierl,   T.,   and   A.
2884	             Eleftheriadis,  "RTP  Payload  Format  for  Scalable  Video
2885	             Coding", RFC 6190, May 2011.

2887	   [RFC6051] C. Perkins and T. Schierl, "Rapid Synchronisation of RTP
2888	             Flows"

2890	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
2891	             Requirement Levels", BCP 14, RFC 2119, March 1997.

2893	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
2894	             with Session Description Protocol (SDP)", RFC 3264, June
2895	             2002.

2897	   [RFC4648] Josefsson,  S.,  "The  Base16,  Base32,  and  Base64  Data
2898	             Encodings", RFC 4648, October 2006.

2900	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
2901	             V.,   "RTP:   A   Transport   Protocol   for   Real-Time
2902	             Applications", STD 64, RFC 3550, July 2003.

2904	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
2905	             Description Protocol", RFC 4566, July 2006.

2907	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
2908	             Media Attributes in the Session Description Protocol", RFC
2909	             5576, June 2009.

2911	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
2912	             J., "Extended RTP Profile for Real-time Transport Control
2913	             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
2914	             2006.

2916	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
2917	             "Codec Control Messages in the RTP Audio-Visual Profile
2918	             with Feedback (AVPF)", RFC 5104, February 2008.

2920	13.2 Informative References

2922	   [Ed. (YK): Details for some of the following references are to be
2923	             added.]

2925	   [3GPDASH] 3GPP TS 26.247.

2927	   [3GPPFF]  3GPP TS 26.244.

2929	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
2930	             for mobile video transmission", Proceedings IEEE, Vol. 87,
2931	             No. 10, pp. 1707-1723, October 1999.

2933	   [ISOBMFF] IS0/IEC 14496-12.

2935	   [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
2936	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107,
2937	             10th JCT-VC meeting, July 2012, Stockholm, Sweden.

2939	   [MPEG2S]  IS0/IEC 13818-2.

2941	   [MPEGDASH] IS0/IEC 23009-1.

2943	   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
2944	             Correction", RFC 5109, December 2007.

2946	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
2947	             coding   using   flexible   reference   fames",   Visual
2948	             Communications and Image Processing 2005 (VCIP 2005), July
2949	             2005, Beijing, China.

2951	14. Authors' Addresses

2953	   Ye-Kui Wang
2954	   Qualcomm Incorporated
2955	   5775 Morehouse Drive
2956	   San Diego, CA 92121
2957	   USA
2958	   Phone: +1-858-651-8345
2959	   EMail: yekuiw@qti.qualcomm.com

2961	   Yago Sanchez
2962	   Fraunhofer HHI
2963	   Einsteinufer 37
2964	   D-10587 Berlin
2965	   Germany
2966	   Phone: +49-30-31002-227
2967	   Email: yago.sanchez@hhi.fraunhofer.de

2969	   Thomas Schierl
2970	   Fraunhofer HHI
2971	   Einsteinufer 37
2972	   D-10587 Berlin
2973	   Germany
2974	   Phone: +49-30-31002-227
2975	   Email: ts@thomas-schierl.de

2977	   Stephan Wenger
2978	   Vidyo, Inc.
2979	   433 Hackensack Ave., 7th floor
2980	   Hackensack, N.J. 07601
2981	   USA
2982	   Phone: +1-415-713-5473
2983	   EMail: stewe@stewe.org

2985	   Miska M. Hannuksela
2986	   Nokia Corporation
2987	   P.O. Box 1000
2988	   33721 Tampere
2989	   Finland
2990	   Phone: +358-7180-08000
2991	   EMail: miska.hannuksela@nokia.com