idnits 2.17.1 

draft-ietf-payload-rtp-h265-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 33 instances of too long lines in the document, the longest
     one being 14 characters in excess of 72.

  ** The abstract seems to contain references ([HEVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1637 has weird spacing: '...n  must  under...'

  == Line 3410 has weird spacing: '...  value   of  ...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The FU payload consists of fragments of the payload of the
     fragmented NAL unit so that if the FU payloads of consecutive FUs,
     starting with an FU with the S bit equal to 1 and ending with an FU with
     the E bit equal to 1, are sequentially concatenated, the payload of the
     fragmented NAL unit can be reconstructed.  The NAL unit header of the
     fragmented NAL unit is not included as such in the FU payload, but rather
     the information of the NAL unit header of the fragmented NAL unit is
     conveyed in F, LayerId, and TID fields of the FU payload headers of the
     FUs and the FuType field of the FU header of the FUs.  An FU payload MUST
     not be empty.

  -- The document date (August 5, 2014) is 3545 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '3GP' is mentioned on line 274, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1079

  == Missing Reference: 'RFC5234' is mentioned on line 2675, but not defined

  == Missing Reference: 'RFC5117' is mentioned on line 2898, but not defined

  ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667)

  == Missing Reference: 'RFC2326' is mentioned on line 3267, but not defined

  ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826)

  == Missing Reference: 'RFC2974' is mentioned on line 3268, but not defined

  == Missing Reference: 'RFC3551' is mentioned on line 3501, but not defined

  == Missing Reference: 'RFC3711' is mentioned on line 3501, but not defined

  == Missing Reference: 'RFC5124' is mentioned on line 3502, but not defined

  == Missing Reference: 'RFC 3711' is mentioned on line 3527, but not defined

  == Missing Reference: 'RFC 3551' is mentioned on line 3551, but not defined

  == Unused Reference: '3GPPFF' is defined on line 3677, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5109' is defined on line 3740, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-01

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-05

  == Outdated reference: A later version (-08) exists of
     draft-ietf-avtext-rtp-grouping-taxonomy-01


     Summary: 5 errors (**), 0 flaws (~~), 20 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: February 2015                                       T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                         August 5, 2014

12	            RTP Payload Format for High Efficiency Video Coding
13	                    draft-ietf-payload-rtp-h265-05.txt

15	Abstract

17	   This memo describes an RTP payload format for the video coding
18	   standard ITU-T Recommendation H.265 and ISO/IEC International
19	   Standard 23008-2, both also known as High Efficiency Video Coding
20	   (HEVC) [HEVC] and developed by the Joint Collaborative Team on Video
21	   Coding (JCT-VC).  The RTP payload format allows for packetization of
22	   one or more Network Abstraction Layer (NAL) units in each RTP packet
23	   payload, as well as fragmentation of a NAL unit into multiple RTP
24	   packets.  Furthermore, it supports transmission of an HEVC bitstream
25	   over a single as well as multiple RTP streams.  The payload format
26	   has wide applicability in videoconferencing, Internet video
27	   streaming, and high bit-rate entertainment-quality video, among
28	   others.

30	Status of this Memo

32	   This Internet-Draft is submitted to IETF in full conformance with
33	   the provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF), its areas, and its working groups.  Note that
37	   other groups may also distribute working documents as Internet-
38	   Drafts.

40	   Internet-Drafts are draft documents valid for a maximum of six
41	   months and may be updated, replaced, or obsoleted by other documents
42	   at any time.  It is inappropriate to use Internet-Drafts as
43	   reference material or to cite them other than as "work in progress."

45	   The list of current Internet-Drafts can be accessed at
46	   http://www.ietf.org/ietf/1id-abstracts.txt.

48	   The list of Internet-Draft Shadow Directories can be accessed at
49	   http://www.ietf.org/shadow.html.

51	   This Internet-Draft will expire on February 5, 2015.

53	Copyright and License Notice

55	   Copyright (c) 2014 IETF Trust and the persons identified as the
56	   document authors.  All rights reserved.

58	   This document is subject to BCP 78 and the IETF Trust's Legal
59	   Provisions Relating to IETF Documents
60	   (http://trustee.ietf.org/license-info) in effect on the date of
61	   publication of this document.  Please review these documents
62	   carefully, as they describe your rights and restrictions with
63	   respect to this document.  Code Components extracted from this
64	   document must include Simplified BSD License text as described in
65	   Section 4.e of the Trust Legal Provisions and are provided without
66	   warranty as described in the Simplified BSD License.

68	Table of Contents

70	   Abstract..........................................................1
71	   Status of this Memo...............................................1
72	   Table of Contents.................................................3
73	   1 Introduction....................................................5
74	      1.1 Overview of the HEVC Codec.................................5
75	         1.1.1 Coding-Tool Features..................................5
76	         1.1.2 Systems and Transport Interfaces......................7
77	         1.1.3 Parallel Processing Support..........................14
78	         1.1.4 NAL Unit Header......................................16
79	      1.2 Overview of the Payload Format............................17
80	   2 Conventions....................................................18
81	   3 Definitions and Abbreviations..................................18
82	      3.1 Definitions...............................................18
83	         3.1.1 Definitions from the HEVC Specification..............18
84	         3.1.2 Definitions Specific to This Memo....................20
85	      3.2 Abbreviations.............................................22
86	   4 RTP Payload Format.............................................23
87	      4.1 RTP Header Usage..........................................23
88	      4.2 Payload Header Usage......................................26
89	      4.3 Payload Structures........................................26
90	      4.4 Transmission Modes........................................27
91	      4.5 Decoding Order Number.....................................28
92	      4.6 Single NAL Unit Packets...................................30
93	      4.7 Aggregation Packets (APs).................................31
94	      4.8 Fragmentation Units (FUs).................................35
95	      4.9 PACI packets..............................................38
96	         4.9.1 Reasons for the PACI rules (informative).............41
97	         4.9.2 PACI extensions (Informative)........................41
98	      4.10 Temporal Scalability Control Information.................43
99	   5 Packetization Rules............................................45
100	   6 De-packetization Process.......................................45
101	   7 Payload Format Parameters......................................48
102	      7.1 Media Type Registration...................................48
103	      7.2 SDP Parameters............................................73
104	         7.2.1 Mapping of Payload Type Parameters to SDP............73
105	         7.2.2 Usage with SDP Offer/Answer Model....................74
106	         7.2.3 Usage in Declarative Session Descriptions............83
107	         7.2.4 Parameter Sets Considerations........................84
108	         7.2.5 Dependency Signaling in Multi-Stream Mode............85
109	   8 Use with Feedback Messages.....................................85
110	      8.1 Picture Loss Indication (PLI).............................86
111	      8.2 Slice Loss Indication.....................................86
112	      8.3 Use of HEVC with the RPSI Feedback Message................87
113	      8.4 Full Intra Request (FIR)..................................88
114	   9 Security Considerations........................................88
115	   10 Congestion Control............................................90
116	   11 IANA Consideration............................................91
117	   12 Acknowledgements..............................................91
118	   13 References....................................................91
119	      13.1 Normative References.....................................91
120	      13.2 Informative References...................................93
121	   14 Authors' Addresses............................................95

123	1 Introduction

125	1.1 Overview of the HEVC Codec

127	   High Efficiency Video Coding [HEVC], formally known as ITU-T
128	   Recommendation H.265 and ISO/IEC International Standard 23008-2 was
129	   ratified by ITU-T in April 2013 and reportedly provides significant
130	   coding efficiency gains over H.264 [H.264].

132	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
133	   widely deployed and generally known in the relevant implementer
134	   communities, frequently only the differences between those two
135	   specifications are highlighted in non-normative, explanatory parts
136	   of this memo.  Basic familiarity with both specifications is assumed
137	   for those parts.  However, the normative parts of this memo do not
138	   require study of H.264 or its RTP payload format.

140	   H.264 and HEVC share a similar hybrid video codec design.
141	   Conceptually, both technologies include a video coding layer (VCL),
142	   which is often used to refer to the coding-tool features, and a
143	   network abstraction layer (NAL), which is often used to refer to the
144	   systems and transport interface aspects of the codecs.

146	1.1.1 Coding-Tool Features

148	   Similarly to earlier hybrid-video-coding-based standards, including
149	   H.264, the following basic video coding design is employed by HEVC.
150	   A prediction signal is first formed either by intra or motion
151	   compensated prediction, and the residual (the difference between the
152	   original and the prediction) is then coded.  The gains in coding
153	   efficiency are achieved by redesigning and improving almost all
154	   parts of the codec over earlier designs.  In addition, HEVC includes
155	   several tools to make the implementation on parallel architectures
156	   easier.  Below is a summary of HEVC coding-tool features.

158	   Quad-tree block and transform structure

160	   One of the major tools that contribute significantly to the coding
161	   efficiency of HEVC is the usage of flexible coding blocks and
162	   transforms, which are defined in a hierarchical quad-tree manner.
163	   Unlike H.264, where the basic coding block is a macroblock of fixed
164	   size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
165	   of 64x64.  Each CTU can be divided into smaller units in a
166	   hierarchical quad-tree manner and can represent smaller blocks down
167	   to size 4x4.  Similarly, the transforms used in HEVC can have
168	   different sizes, starting from 4x4 and going up to 32x32.  Utilizing
169	   large blocks and transforms contribute to the major gain of HEVC,
170	   especially at high resolutions.

172	   Entropy coding

174	   HEVC uses a single entropy coding engine, which is based on Context
175	   Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two
176	   distinct entropy coding engines.  CABAC in HEVC shares many
177	   similarities with CABAC of H.264, but contains several improvements.
178	   Those include improvements in coding efficiency and lowered
179	   implementation complexity, especially for parallel architectures.

181	   In-loop filtering

183	   H.264 includes an in-loop adaptive deblocking filter, where the
184	   blocking artifacts around the transform edges in the reconstructed
185	   picture are smoothed to improve the picture quality and compression
186	   efficiency.  In HEVC, a similar deblocking filter is employed but
187	   with somewhat lower complexity.  In addition, pictures undergo a
188	   subsequent filtering operation called Sample Adaptive Offset (SAO),
189	   which is a new design element in HEVC.  SAO basically adds a pixel-
190	   level offset in an adaptive manner and usually acts as a de-ringing
191	   filter.  It is observed that SAO improves the picture quality,
192	   especially around sharp edges contributing substantially to visual
193	   quality improvements of HEVC.

195	   Motion prediction and coding

197	   There have been a number of improvements in this area that are
198	   summarized as follows.  The first category is motion merge and
199	   advanced motion vector prediction (AMVP) modes.  The motion
200	   information of a prediction block can be inferred from the spatially
201	   or temporally neighboring blocks.  This is similar to the DIRECT
202	   mode in H.264 but includes new aspects to incorporate the flexible
203	   quad-tree structure and methods to improve the parallel
204	   implementations.  In addition, the motion vector predictor can be
205	   signaled for improved efficiency.  The second category is high-
206	   precision interpolation.  The interpolation filter length is
207	   increased to 8-tap from 6-tap, which improves the coding efficiency
208	   but also comes with increased complexity.  In addition, the
209	   interpolation filter is defined with higher precision without any
210	   intermediate rounding operations to further improve the coding
211	   efficiency.

213	   Intra prediction and intra coding

215	   Compared to 8 intra prediction modes in H.264, HEVC supports angular
216	   intra prediction with 33 directions.  This increased flexibility
217	   improves both objective coding efficiency and visual quality as the
218	   edges can be better predicted and ringing artifacts around the edges
219	   can be reduced.  In addition, the reference samples are adaptively
220	   smoothed based on the prediction direction.  To avoid contouring
221	   artifacts a new interpolative prediction generation is included to
222	   improve the visual quality.  Furthermore, discrete sine transform
223	   (DST) is utilized instead of traditional discrete cosine transform
224	   (DCT) for 4x4 intra transform blocks.

226	   Other coding-tool features

228	   HEVC includes some tools for lossless coding and efficient screen
229	   content coding, such as skipping the transform for certain blocks.
230	   These tools are particularly useful for example when streaming the
231	   user-interface of a mobile device to a large display.

233	1.1.2 Systems and Transport Interfaces

235	   HEVC inherited the basic systems and transport interfaces designs,
236	   such as the NAL-unit-based syntax structure, the hierarchical syntax
237	   and data unit structure from sequence-level parameter sets, multi-
238	   picture-level or picture-level parameter sets, slice-level header
239	   parameters, lower-level parameters, the supplemental enhancement
240	   information (SEI) message mechanism, the hypothetical reference
241	   decoder (HRD) based video buffering model, and so on.  In the
242	   following, a list of differences in these aspects compared to H.264
243	   is summarized.

245	   Video parameter set

247	   A new type of parameter set, called video parameter set (VPS), was
248	   introduced.  For the first (2013) version of [HEVC], the video
249	   parameter set NAL unit is required to be available prior to its
250	   activation, while the information contained in the video parameter
251	   set is not necessary for operation of the decoding process.  For
252	   future HEVC extensions, such as the 3D or scalable extensions, the
253	   video parameter set is expected to include information necessary for
254	   operation of the decoding process, e.g. decoding dependency or
255	   information for reference picture set construction of enhancement
256	   layers.  The VPS provides a "big picture" of a bitstream, including
257	   what types of operation points are provided, the profile, tier, and
258	   level of the operation points, and some other high-level properties
259	   of the bitstream that can be used as the basis for session
260	   negotiation and content selection, etc. (see section 7.1).

262	   Profile, tier and level

264	   The profile, tier and level syntax structure that can be included in
265	   both VPS and sequence parameter set (SPS) includes 12 bytes of data
266	   to describe the entire bitstream (including all temporally scalable
267	   layers, which are referred to as sub-layers in the HEVC
268	   specification), and can optionally include more profile, tier and
269	   level information pertaining to individual temporally scalable
270	   layers.  The profile indicator indicates the "best viewed as"
271	   profile when the bitstream conforms to multiple profiles, similar to
272	   the major brand concept in the ISO base media file format (ISOBMFF)
273	   [ISOBMFF] and file formats derived based on ISOBMFF, such as the
274	   3GPP file format [3GP].  The profile, tier and level syntax
275	   structure also includes the indications of whether the bitstream is
276	   free of frame-packed content, whether the bitstream is free of
277	   interlaced source content and free of field pictures, i.e. contains
278	   only frame pictures of progressive source, such that clients/players
279	   with no support of post-processing functionalities for handling of
280	   frame-packed or interlaced source content or field pictures can
281	   reject those bitstreams.

283	   Bitstream and elementary stream

285	   HEVC includes a definition of an elementary stream, which is new
286	   compared to H.264.  An elementary stream consists of a sequence of
287	   one or more bitstreams.  An elementary stream that consists of two
288	   or more bitstreams has typically been formed by splicing together
289	   two or more bitstreams (or parts thereof).  When an elementary
290	   stream contains more than one bitstream, the last NAL unit of the
291	   last access unit of a bitstream (except the last bitstream in the
292	   elementary stream) must contain an end of bitstream NAL unit and the
293	   first access unit of the subsequent bitstream must be an intra
294	   random access point (IRAP) access unit.  This IRAP access unit may
295	   be a clean random access (CRA), broken link access (BLA), or
296	   instantaneous decoding refresh (IDR) access unit.

298	   Random access support

300	   HEVC includes signaling in NAL unit header, through NAL unit types,
301	   of IRAP pictures beyond IDR pictures.  Three types of IRAP pictures,
302	   namely IDR, CRA and BLA pictures are supported, wherein IDR pictures
303	   are conventionally referred to as closed group-of-pictures (closed-
304	   GOP) random access points, and CRA and BLA pictures are those
305	   conventionally referred to as open-GOP random access points.  BLA
306	   pictures usually originate from splicing of two bitstreams or part
307	   thereof at a CRA picture, e.g. during stream switching.  To enable
308	   better systems usage of IRAP pictures, altogether six different NAL
309	   units are defined to signal the properties of the IRAP pictures,
310	   which can be used to better match the stream access point (SAP)
311	   types as defined in the ISOBMFF [ISOBMFF], which are utilized for
312	   random access support in both 3GP-DASH [3GPDASH] and MPEG DASH
313	   [MPEGDASH].  Pictures following an IRAP picture in decoding order
314	   and preceding the IRAP picture in output order are referred to as
315	   leading pictures associated with the IRAP picture.  There are two
316	   types of leading pictures, namely random access decodable leading
317	   (RADL) pictures and random access skipped leading (RASL) pictures.
318	   RADL pictures are decodable when the decoding started at the
319	   associated IRAP picture, and RASL pictures are not decodable when
320	   the decoding started at the associated IRAP picture and are usually
321	   discarded.  HEVC provides mechanisms to enable the specification of
322	   conformance of bitstreams with RASL pictures being discarded, thus
323	   to provide a standard-compliant way to enable systems components to
324	   discard RASL pictures when needed.

326	   Temporal scalability support

328	   HEVC includes an improved support of temporal scalability, by
329	   inclusion of the signaling of TemporalId in the NAL unit header, the
330	   restriction that pictures of a particular temporal sub-layer cannot
331	   be used for inter prediction reference by pictures of a lower
332	   temporal sub-layer, the sub-bitstream extraction process, and the
333	   requirement that each sub-bitstream extraction output be a
334	   conforming bitstream.  Media-aware network elements (MANEs) can
335	   utilize the TemporalId in the NAL unit header for stream adaptation
336	   purposes based on temporal scalability.

338	   Temporal sub-layer switching support

340	   HEVC specifies, through NAL unit types present in the NAL unit
341	   header, the signaling of temporal sub-layer access (TSA) and
342	   stepwise temporal sub-layer access (STSA).  A TSA picture and
343	   pictures following the TSA picture in decoding order do not use
344	   pictures prior to the TSA picture in decoding order with TemporalId
345	   greater than or equal to that of the TSA picture for inter
346	   prediction reference.  A TSA picture enables up-switching, at the
347	   TSA picture, to the sub-layer containing the TSA picture or any
348	   higher sub-layer, from the immediately lower sub-layer.  An STSA
349	   picture does not use pictures with the same TemporalId as the STSA
350	   picture for inter prediction reference.  Pictures following an STSA
351	   picture in decoding order with the same TemporalId as the STSA
352	   picture do not use pictures prior to the STSA picture in decoding
353	   order with the same TemporalId as the STSA picture for inter
354	   prediction reference.  An STSA picture enables up-switching, at the
355	   STSA picture, to the sub-layer containing the STSA picture, from the
356	   immediately lower sub-layer.

358	   Sub-layer reference or non-reference pictures

360	   The concept and signaling of reference/non-reference pictures in
361	   HEVC are different from H.264.  In H.264, if a picture may be used
362	   by any other picture for inter prediction reference, it is a
363	   reference picture; otherwise it is a non-reference picture, and this
364	   is signaled by two bits in the NAL unit header.  In HEVC, a picture
365	   is called a reference picture only when it is marked as "used for
366	   reference".  In addition, the concept of sub-layer reference picture
367	   was introduced.  If a picture may be used by another other picture
368	   with the same TemporalId for inter prediction reference, it is a
369	   sub-layer reference picture; otherwise it is a sub-layer non-
370	   reference picture.  Whether a picture is a sub-layer reference
371	   picture or sub-layer non-reference picture is signaled through NAL
372	   unit type values.

374	   Extensibility

376	   Besides the TemporalId in the NAL unit header, HEVC also includes
377	   the signaling of a six-bit layer ID in the NAL unit header, which
378	   must be equal to 0 for a single-layer bitstream.  Extension
379	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice
380	   headers, and so on.  All these extension mechanisms enable future
381	   extensions in a backward compatible manner, such that bitstreams
382	   encoded according to potential future HEVC extensions can be fed to
383	   then-legacy decoders (e.g. HEVC version 1 decoders) and the then-
384	   legacy decoders can decode and output the base layer bitstream.

386	   Bitstream extraction

388	   HEVC includes a bitstream extraction process as an integral part of
389	   the overall decoding process, as well as specification of the use of
390	   the bitstream extraction process in description of bitstream
391	   conformance tests as part of the hypothetical reference decoder
392	   (HRD) specification.

394	   Reference picture management

396	   The reference picture management of HEVC, including reference
397	   picture marking and removal from the decoded picture buffer (DPB) as
398	   well as reference picture list construction (RPLC), differs from
399	   that of H.264.  Instead of the sliding window plus adaptive memory
400	   management control operation (MMCO) based reference picture marking
401	   mechanism in H.264, HEVC specifies a reference picture set (RPS)
402	   based reference picture management and marking mechanism, and the
403	   RPLC is consequently based on the RPS mechanism.  A reference
404	   picture set consists of a set of reference pictures associated with
405	   a picture, consisting of all reference pictures that are prior to
406	   the associated picture in decoding order, that may be used for inter
407	   prediction of the associated picture or any picture following the
408	   associated picture in decoding order.  The reference picture set
409	   consists of five lists of reference pictures; RefPicSetStCurrBefore,
410	   RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
411	   RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter and
412	   RefPicSetLtCurr contain all reference pictures that may be used in
413	   inter prediction of the current picture and that may be used in
414	   inter prediction of one or more of the pictures following the
415	   current picture in decoding order.  RefPicSetStFoll and
416	   RefPicSetLtFoll consist of all reference pictures that are not used
417	   in inter prediction of the current picture but may be used in inter
418	   prediction of one or more of the pictures following the current
419	   picture in decoding order.  RPS provides an "intra-coded" signaling
420	   of the DPB status, instead of an "inter-coded" signaling, mainly for
421	   improved error resilience.  The RPLC process in HEVC is based on the
422	   RPS, by signaling an index to an RPS subset for each reference
423	   index; this process is simpler than the RPLC process in H.264.

425	   Ultra low delay support

427	   HEVC specifies a sub-picture-level HRD operation, for support of the
428	   so-called ultra-low delay.  The mechanism specifies a standard-
429	   compliant way to enable delay reduction below one picture interval.
430	   Sub-picture-level coded picture buffer (CPB) and DPB parameters may
431	   be signaled, and utilization of these information for the derivation
432	   of CPB timing (wherein the CPB removal time corresponds to decoding
433	   time) and DPB output timing (display time) is specified.  Decoders
434	   are allowed to operate the HRD at the conventional access-unit-
435	   level, even when the sub-picture-level HRD parameters are present.

437	   New SEI messages

439	   HEVC inherits many H.264 SEI messages with changes in syntax and/or
440	   semantics making them applicable to HEVC.  Additionally, there are a
441	   few new SEI messages reviewed briefly in the following paragraphs.

443	   The display orientation SEI message informs the decoder of a
444	   transformation that is recommended to be applied to the cropped
445	   decoded picture prior to display, such that the pictures can be
446	   properly displayed, e.g. in an upside-up manner.

448	   The structure of pictures SEI message provides information on the
449	   NAL unit types, picture order count values, and prediction
450	   dependencies of a sequence of pictures.  The SEI message can be used
451	   for example for concluding what impact a lost picture has on other
452	   pictures.

454	   The decoded picture hash SEI message provides a checksum derived
455	   from the sample values of a decoded picture.  It can be used for
456	   detecting whether a picture was correctly received and decoded.

458	   The active parameter sets SEI message includes the IDs of the active
459	   video parameter set and the active sequence parameter set and can be
460	   used to activate VPSs and SPSs.  In addition, the SEI message
461	   includes the following indications: 1) An indication of whether
462	   "full random accessibility" is supported (when supported, all
463	   parameter sets needed for decoding of the remaining of the bitstream
464	   when random accessing from the beginning of the current coded video
465	   sequence by completely discarding all access units earlier in
466	   decoding order are present in the remaining bitstream and all coded
467	   pictures in the remaining bitstream can be correctly decoded); 2) An
468	   indication of whether there is no parameter set within the current
469	   coded video sequence that updates another parameter set of the same
470	   type preceding in decoding order.  An update of a parameter set
471	   refers to the use of the same parameter set ID but with some other
472	   parameters changed.  If this property is true for all coded video
473	   sequences in the bitstream, then all parameter sets can be sent out-
474	   of-band before session start.

476	   The decoding unit information SEI message provides coded picture
477	   buffer removal delay information for a decoding unit.  The message
478	   can be used in very-low-delay buffering operations.

480	   The region refresh information SEI message can be used together with
481	   the recovery point SEI message (present in both H.264 and HEVC) for
482	   improved support of gradual decoding refresh (GDR).  This supports
483	   random access from inter-coded pictures, wherein complete pictures
484	   can be correctly decoded or recovered after an indicated number of
485	   pictures in output/display order.

487	1.1.3 Parallel Processing Support

489	   The reportedly significantly higher encoding computational demand of
490	   HEVC over H.264, in conjunction with the ever increasing video
491	   resolution (both spatially and temporally) required by the market,
492	   led to the adoption of VCL coding tools specifically targeted to
493	   allow for parallelization on the sub-picture level.  That is,
494	   parallelization occurs, at the minimum, at the granularity of an
495	   integer number of CTUs.  The targets for this type of high-level
496	   parallelization are multicore CPUs and DSPs as well as
497	   multiprocessor systems.  In a system design, to be useful, these
498	   tools require signaling support, which is provided in Section 7 of
499	   this memo.  This section provides a brief overview of the tools
500	   available in [HEVC].

502	   Many of the tools incorporated in HEVC were designed keeping in mind
503	   the potential parallel implementations in multi-core/multi-processor
504	   architectures.  Specifically, for parallelization, four picture
505	   partition strategies are available.

507	   Slices are segments of the bitstream that can be reconstructed
508	   independently from other slices within the same picture (though
509	   there may still be interdependencies through loop filtering
510	   operations).  Slices are the only tool that can be used for
511	   parallelization that is also available, in virtually identical form,
512	   in H.264.  Slices based parallelization does not require much inter-
513	   processor or inter-core communication (except for inter-processor or
514	   inter-core data sharing for motion compensation when decoding a
515	   predictively coded picture, which is typically much heavier than
516	   inter-processor or inter-core data sharing due to in-picture
517	   prediction), as slices are designed to be independently decodable.
518	   However, for the same reason, slices can require some coding
519	   overhead.  Further, slices (in contrast to some of the other tools
520	   mentioned below) also serve as the key mechanism for bitstream
521	   partitioning to match Maximum Transfer Unit (MTU) size requirements,
522	   due to the in-picture independence of slices and the fact that each
523	   regular slice is encapsulated in its own NAL unit.  In many cases,
524	   the goal of parallelization and the goal of MTU size matching can
525	   place contradicting demands to the slice layout in a picture.  The
526	   realization of this situation led to the development of the more
527	   advanced tools mentioned below.

529	   Dependent slice segments allow for fragmentation of a coded slice
530	   into fragments at CTU boundaries without breaking any in-picture
531	   prediction mechanism.  They are complementary to the fragmentation
532	   mechanism described in this memo in that they need the cooperation
533	   of the encoder.  As a dependent slice segment necessarily contains
534	   an integer number of CTUs, a decoder using multiple cores operating
535	   on CTUs can process a dependent slice segment without communicating
536	   parts of the slice segment's bitstream to other cores.
537	   Fragmentation, as specified in this memo, in contrast, does not
538	   guarantee that a fragment contains an integer number of CTUs.

540	   In wavefront parallel processing (WPP), the picture is partitioned
541	   into rows of CTUs.  Entropy decoding and prediction are allowed to
542	   use data from CTUs in other partitions.  Parallel processing is
543	   possible through parallel decoding of CTU rows, where the start of
544	   the decoding of a row is delayed by two CTUs, so to ensure that data
545	   related to a CTU above and to the right of the subject CTU is
546	   available before the subject CTU is being decoded.  Using this
547	   staggered start (which appears like a wavefront when represented
548	   graphically), parallelization is possible with up to as many
549	   processors/cores as the picture contains CTU rows.

551	   Because in-picture prediction between neighboring CTU rows within a
552	   picture is allowed, the required inter-processor/inter-core
553	   communication to enable in-picture prediction can be substantial.
554	   The WPP partitioning does not result in the creation of more NAL
555	   units compared to when it is not applied, thus WPP cannot be used
556	   for MTU size matching, though slices can be used in combination for
557	   that purpose.

559	   Tiles define horizontal and vertical boundaries that partition a
560	   picture into tile columns and rows.  The scan order of CTUs is
561	   changed to be local within a tile (in the order of a CTU raster scan
562	   of a tile), before decoding the top-left CTU of the next tile in the
563	   order of tile raster scan of a picture.  Similar to slices, tiles
564	   break in-picture prediction dependencies (including entropy decoding
565	   dependencies).  However, they do not need to be included into
566	   individual NAL units (same as WPP in this regard), hence tiles
567	   cannot be used for MTU size matching, though slices can be used in
568	   combination for that purpose.  Each tile can be processed by one
569	   processor/core, and the inter-processor/inter-core communication
570	   required for in-picture prediction between processing units decoding
571	   neighboring tiles is limited to conveying the shared slice header in
572	   cases a slice is spanning more than one tile, and loop filtering
573	   related sharing of reconstructed samples and metadata.  Insofar,
574	   tiles are less demanding in terms of inter-processor communication
575	   bandwidth compared to WPP due to the in-picture independence between
576	   two neighboring partitions.

578	1.1.4 NAL Unit Header

580	   HEVC maintains the NAL unit concept of H.264 with modifications.
581	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
582	   payload of a NAL unit refers to the NAL unit excluding the NAL unit
583	   header.

585	                     +---------------+---------------+
586	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
587	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
588	                     |F|   Type    |  LayerId  | TID |
589	                     +-------------+-----------------+

591	              Figure 1 The structure of HEVC NAL unit header

593	   The semantics of the fields in the NAL unit header are as specified
594	   in [HEVC] and described briefly below for convenience.  In addition
595	   to the name and size of each field, the corresponding syntax element
596	   name in [HEVC] is also provided.

598	   F: 1 bit
599	      forbidden_zero_bit.  Required to be zero in [HEVC].  HEVC
600	      declares a value of 1 as a syntax violation.  Note that the
601	      inclusion of this bit in the NAL unit header is to enable
602	      transport of HEVC video over MPEG-2 transport systems (avoidance
603	      of start code emulations) [MPEG2S].

605	   Type: 6 bits
606	      nal_unit_type.  This field specifies the NAL unit type as defined
607	      in Table 7-1 of [HEVC].  If the most significant bit of this
608	      field of a NAL unit is equal to 0 (i.e. the value of this field
609	      is less than 32), the NAL unit is a VCL NAL unit.  Otherwise, the
610	      NAL unit is a non-VCL NAL unit.  For a reference of all currently
611	      defined NAL unit types and their semantics, please refer to
612	      Section 7.4.1 in [HEVC].

614	   LayerId: 6 bits
615	      nuh_layer_id.  Required to be equal to zero in [HEVC].  It is
616	      anticipated that in future scalable or 3D video coding extensions
617	      of this specification, this syntax element will be used to
618	      identify additional layers that may be present in the coded video
619	      sequence, wherein a layer may be, e.g. a spatial scalable layer,
620	      a quality scalable layer, a texture view, or a depth view.

622	   TID: 3 bits
623	      nuh_temporal_id_plus1.  This field specifies the temporal
624	      identifier of the NAL unit plus 1.  The value of TemporalId is
625	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
626	      there is at least one bit in the NAL unit header equal to 1, so
627	      to enable independent considerations of start code emulations in
628	      the NAL unit header and in the NAL unit payload data.

630	1.2 Overview of the Payload Format

632	   This payload format defines the following processes required for
633	   transport of HEVC coded data over RTP [RFC3550]:

635	   o Usage of RTP header with this payload format

637	   o Packetization of HEVC coded NAL units into RTP packets using three
638	     types of payload structures, namely single NAL unit packet,
639	     aggregation packet, and fragment unit

641	   o Transmission of HEVC NAL units of the same bitstream within a
642	     single RTP stream or multiple RTP streams within one or more RTP
643	     sessions, where within an RTP stream transmission of NAL units may
644	     be either non-interleaved (i.e. the transmission order of NAL
645	     units is the same as their decoding order) or interleaved (i.e.
646	     the transmission order of NAL units is different from their
647	     decoding order)

649	   o Media type parameters to be used with the Session Description
650	     Protocol (SDP) [RFC4566]

652	   o A payload header extension mechanism and data structures for
653	     enhanced support of temporal scalability based on that extension
654	     mechanism.

656	2 Conventions

658	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
659	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
660	   document are to be interpreted as described in BCP 14, RFC 2119
661	   [RFC2119].

663	   In this document, these key words will appear with that
664	   interpretation only when in ALL CAPS.  Lower case uses of these
665	   words are not to be interpreted as carrying the RFC 2119
666	   significance.

668	   This specification uses the notion of setting and clearing a bit
669	   when bit fields are handled.  Setting a bit is the same as assigning
670	   that bit the value of 1 (On).  Clearing a bit is the same as
671	   assigning that bit the value of 0 (Off).

673	3 Definitions and Abbreviations

675	3.1 Definitions

677	   This document uses the terms and definitions of [HEVC].  Section
678	   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
679	   Section 3.1.2 provides definitions specific to this memo.

681	3.1.1 Definitions from the HEVC Specification

683	   access unit: A set of NAL units that are associated with each other
684	   according to a specified classification rule, are consecutive in
685	   decoding order, and contain exactly one coded picture.

687	   BLA access unit: An access unit in which the coded picture is a BLA
688	   picture.

690	   BLA picture: An IRAP picture for which each VCL NAL unit has
691	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

693	   coded video sequence: A sequence of access units that consists, in
694	   decoding order, of an IRAP access unit with NoRaslOutputFlag equal
695	   to 1, followed by zero or more access units that are not IRAP access
696	   units with NoRaslOutputFlag equal to 1, including all subsequent
697	   access units up to but not including any subsequent access unit that
698	   is an IRAP access unit with NoRaslOutputFlag equal to 1.

700	      Informative note: An IRAP access unit may be an IDR access unit,
701	      a BLA access unit, or a CRA access unit.  The value of
702	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
703	      access unit, and each CRA access unit that is the first access
704	      unit in the bitstream in decoding order, is the first access unit
705	      that follows an end of sequence NAL unit in decoding order, or
706	      has HandleCraAsBlaFlag equal to 1.

708	   CRA access unit: An access unit in which the coded picture is a CRA
709	   picture.

711	   CRA picture: A RAP picture for which each VCL NAL unit has
712	   nal_unit_type equal to CRA_NUT.

714	   IDR access unit: An access unit in which the coded picture is an IDR
715	   picture.

717	   IDR picture: A RAP picture for which each VCL NAL unit has
718	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

720	   IRAP access unit: An access unit in which the coded picture is an
721	   IRAP picture.

723	   IRAP picture: A coded picture for which each VCL NAL unit has
724	   nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23),
725	   inclusive.

727	   layer: A set of VCL NAL units that all have a particular value of
728	   nuh_layer_id and the associated non-VCL NAL units, or one of a set
729	   of syntactical structures having a hierarchical relationship.

731	   operation point: bitstream created from another bitstream by
732	   operation of the sub-bitstream extraction process with the another
733	   bitstream, a target highest TemporalId, and a target layer
734	   identifier list as inputs.

736	   random access: The act of starting the decoding process for a
737	   bitstream at a point other than the beginning of the bitstream.

739	   sub-layer: A temporal scalable layer of a temporal scalable
740	   bitstream consisting of VCL NAL units with a particular value of the
741	   TemporalId variable, and the associated non-VCL NAL units.

743	   sub-layer representation: A subset of the bitstream consisting of
744	   NAL units of a particular sub-layer and the lower sub-layers.

746	   tile: A rectangular region of coding tree blocks within a particular
747	   tile column and a particular tile row in a picture.

749	   tile column: A rectangular region of coding tree blocks having a
750	   height equal to the height of the picture and a width specified by
751	   syntax elements in the picture parameter set.

753	   tile row: A rectangular region of coding tree blocks having a height
754	   specified by syntax elements in the picture parameter set and a
755	   width equal to the width of the picture.

757	3.1.2 Definitions Specific to This Memo

759	   dependee RTP stream: An RTP stream on which another RTP stream
760	   depends.  All RTP streams in an MSM except for the highest RTP
761	   stream are dependee RTP streams.

763	   highest RTP stream: The RTP stream on which no other RTP stream
764	   depends.  The RTP stream in an SSM is the highest RTP stream.

766	   media aware network element (MANE): A network element, such as a
767	   middlebox, selective forwarding unit, or application layer gateway
768	   that is capable of parsing certain aspects of the RTP payload
769	   headers or the RTP payload and reacting to their contents.

771	      Informative note: The concept of a MANE goes beyond normal
772	      routers or gateways in that a MANE has to be aware of the
773	      signaling (e.g. to learn about the payload type mappings of the
774	      media streams), and in that it has to be trusted when working
775	      with SRTP.  The advantage of using MANEs is that they allow
776	      packets to be dropped according to the needs of the media coding.
777	      For example, if a MANE has to drop packets due to congestion on a
778	      certain link, it can identify and remove those packets whose
779	      elimination produces the least adverse effect on the user
780	      experience.  After dropping packets, MANEs must rewrite RTCP
781	      packets to match the changes to the RTP stream as specified in
782	      Section 7 of [RFC3550].

784	   multi-stream mode(MSM): Transmission of an HEVC bitstream using more
785	   than one RTP stream.

787	   NAL unit decoding order: A NAL unit order that conforms to the
788	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

790	   NAL-unit-like structure: A data structure that is similar to NAL
791	   units in the sense that it also has a NAL unit header and a payload,
792	   with a difference that the payload does not follow the start code
793	   emulation prevention mechanism required for the NAL unit syntax as
794	   specified in Section 7.3.1.1 of [HEVC].  Examples NAL-unit-like
795	   structures defined in this memo are packet payloads of AP, PACI, and
796	   FU packets.

798	   NALU-time: The value that the RTP timestamp would have if the NAL
799	   unit would be transported in its own RTP packet.

801	   RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy].  Within the
802	   scope of this memo, one RTP stream is utilized to transport one or
803	   more temporal sub-layers.

805	   single-stream mode (SSM): Transmission of an HEVC bitstream using
806	   only one RTP stream.

808	   transmission order: The order of packets in ascending RTP sequence
809	   number order (in modulo arithmetic).  Within an aggregation packet,
810	   the NAL unit transmission order is the same as the order of
811	   appearance of NAL units in the packet.

813	3.2 Abbreviations

815	   AP       Aggregation Packet

817	   BLA      Broken Link Access

819	   CRA      Clean Random Access

821	   CTB      Coding Tree Block

823	   CTU      Coding Tree Unit

825	   CVS      Coded Video Sequence

827	   DPH      Decoded Picture Hash

829	   FU       Fragmentation Unit

831	   GDR      Gradual Decoding Refresh

833	   HRD      Hypothetical Reference Decoder

835	   IDR      Instantaneous Decoding Refresh

837	   IRAP     Intra Random Access Point

839	   MANE     Media Aware Network Element

841	   MSM      Multi-Stream Mode

843	   MTU      Maximum Transfer Unit

845	   NAL      Network Abstraction Layer

847	   NALU     Network Abstraction Layer Unit

849	   PACI     PAyload Content Information

851	   PHES     Payload Header Extension Structure

853	   PPS      Picture Parameter Set

855	   RADL     Random Access Decodable Leading (Picture)
856	   RASL     Random Access Skipped Leading (Picture)

858	   RPS      Reference Picture Set

860	   SEI      Supplemental Enhancement Information

862	   SPS      Sequence Parameter Set

864	   SSM      Single-Stream Mode

866	   STSA     Step-wise Temporal Sub-layer Access

868	   TSA      Temporal Sub-layer Access

870	   TCSI     Temporal Scalability Control Information

872	   VCL      Video Coding Layer

874	   VPS      Video Parameter Set

876	4 RTP Payload Format

878	4.1 RTP Header Usage

880	   The format of the RTP header is specified in [RFC3550] and reprinted
881	   in Figure 2 for convenience.  This payload format uses the fields of
882	   the header in a manner consistent with that specification.

884	   The RTP payload (and the settings for some RTP header bits) for
885	   aggregation packets and fragmentation units are specified in
886	   Sections 4.7 and 4.8, respectively.

888	    0                   1                   2                   3
889	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
890	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
891	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
892	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
893	   |                           timestamp                           |
894	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
895	   |           synchronization source (SSRC) identifier            |
896	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
897	   |            contributing source (CSRC) identifiers             |
898	   |                             ....                              |
899	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

901	                Figure 2 RTP header according to [RFC3550]

903	   The RTP header information to be set according to this RTP payload
904	   format is set as follows:

906	   Marker bit (M): 1 bit

908	      Set for the last packet, carried in the current RTP stream, of
909	      the access unit, in line with the normal use of the M bit in
910	      video formats, to allow an efficient playout buffer handling.
911	      When MSM is in use, if an access unit appears in multiple RTP
912	      streams, the marker bit is set on each RTP stream's last packet
913	      of the access unit.

915	         Informative note: The content of a NAL unit does not tell
916	         whether or not the NAL unit is the last NAL unit, in decoding
917	         order, of an access unit.  An RTP sender implementation may
918	         obtain this information from the video encoder.  If, however,
919	         the implementation cannot obtain this information directly
920	         from the encoder, e.g. when the bitstream was pre-encoded, and
921	         also there is no timestamp allocated for each NAL unit, then
922	         the sender implementation can inspect subsequent NAL units in
923	         decoding order to determine whether or not the NAL unit is the
924	         last NAL unit of an access unit as follows.  A NAL unit naluX
925	         is the last NAL unit of an access unit if it is the last NAL
926	         unit of the bitstream or the next VCL NAL unit naluY in
927	         decoding order has the high-order bit of the first byte after
928	         its NAL unit header equal to 1, and all NAL units between
929	         naluX and naluY, when present, have nal_unit_type in the range
930	         of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to
931	         44, inclusive, or 48 to 55, inclusive.

933	   Payload type (PT): 7 bits

935	      The assignment of an RTP payload type for this new packet format
936	      is outside the scope of this document and will not be specified
937	      here.  The assignment of a payload type has to be performed
938	      either through the profile used or in a dynamic way.

940	         Informative note: It is not required to use different payload
941	         type values for different RTP streams in MSM.

943	   Sequence number (SN): 16 bits

945	      Set and used in accordance with RFC 3550.

947	   Timestamp: 32 bits

949	      The RTP timestamp is set to the sampling timestamp of the
950	      content.  A 90 kHz clock rate MUST be used.

952	      If the NAL unit has no timing properties of its own (e.g.
953	      parameter set and SEI NAL units), the RTP timestamp MUST be set
954	      to the RTP timestamp of the coded picture of the access unit in
955	      which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is
956	      included.

958	      Receivers MUST use the RTP timestamp for the display process,
959	      even when the bitstream contains picture timing SEI messages or
960	      decoding unit information SEI messages as specified in [HEVC].
961	      However, this does not mean that picture timing SEI messages in
962	      the bitstream should be discarded, as picture timing SEI messages
963	      may contain frame-field information that is important in
964	      appropriately rendering interlaced video.

966	   Synchronization source (SSRC): 32-bits

968	      Used to identify the source of the RTP packets.  In SSM, by
969	      definition a single SSRC is used for all parts of a single
970	      bitstream.  In MSM, each SSRC is used for an RTP stream
971	      containing a subset of the sub-layers for a single (temporally
972	      scalable) bitstream.  A receiver is required to correctly
973	      associate the set of SSRCs that are included parts of the same
974	      bitstream.

976	         Informative note: The term "bitstream" in this document is
977	         equivalent to the term "encoded stream" in [I-D.ietf-avtext-
978	         rtp-grouping-taxonomy].

980	4.2 Payload Header Usage

982	   The TID value indicates (among other things) the relative importance
983	   of an RTP packet, for example because NAL units belonging to higher
984	   temporal sub-layers are not used for the decoding of lower temporal
985	   sub-layers.  A lower value of TID indicates a higher importance.
986	   More important NAL units MAY be better protected against
987	   transmission losses than less important NAL units.

989	4.3 Payload Structures

991	   The first two bytes of the payload of an RTP packet are referred to
992	   as the payload header.  The payload header consists of the same
993	   fields (F, Type, LayerId, and TID) as the NAL unit header as shown
994	   in section 1.1.4, irrespective of the type of the payload structure.

996	   Four different types of RTP packet payload structures are specified.
997	   A receiver can identify the type of an RTP packet payload through
998	   the Type field in the payload header.

1000	   The four different payload structures are as follows:

1002	   o  Single NAL unit packet: Contains a single NAL unit in the
1003	      payload, and the NAL unit header of the NAL unit also serves as
1004	      the payload header.  This payload structure is specified in
1005	      section 4.6.

1007	   o  Aggregation packet (AP): Contains more than one NAL unit within
1008	      one access unit.  This payload structure is specified in
1009	      section 4.7.

1011	   o  Fragmentation unit (FU): Contains a subset of a single NAL unit.
1012	      This payload structure is specified in section 4.8.

1014	   o  PACI carrying RTP packet: Contains a payload header (that differs
1015	      from other payload headers for efficiency), a Payload Header
1016	      Extension Structure (PHES), and a PACI payload.  This payload
1017	      structure is specified in section 4.9.

1019	4.4 Transmission Modes

1021	   This memo enables transmission of an HEVC bitstream over a single
1022	   RTP stream or multiple RTP streams.  The concept and working
1023	   principle is inherited from the design of what was called single and
1024	   multiple session transmission in [RFC6190] and follows a similar
1025	   design.  If only one RTP stream is used for transmission of the HEVC
1026	   bitstream, the transmission mode is referred to as single-stream
1027	   mode (SSM); otherwise (more than one RTP stream is used for
1028	   transmission of the HEVC bitstream), the transmission mode is
1029	   referred to as multi-stream mode (MSM).

1031	   Dependency of one RTP stream on another RTP stream is typically
1032	   indicated as specified in [RFC5583].  When an RTP stream A depends
1033	   on another RTP stream B, the RTP stream B is referred to as a
1034	   dependee RTP stream of the RTP stream A.

1036	      Informative note: An MSM may involve one or more RTP sessions.
1037	      For example, each RTP stream in an MSM may be in its own RTP
1038	      session.  For another example, a set of multiple RTP streams in
1039	      an MSM may belong to the same RTP session, e.g. as indicated by
1040	      the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or
1041	      [I-D.ietf-mmusic-sdp-bundle-negotiation].

1043	   SSM SHOULD be used for point-to-point unicast scenarios, while MSM
1044	   SHOULD be used for point-to-multipoint multicast scenarios where
1045	   different receivers require different operation points of the same
1046	   HEVC bitstream, to improve bandwidth utilizing efficiency.

1048	      Informative note: A multicast may degrade to a unicast after all
1049	      but one receivers have left (this is a justification of the first
1050	      "SHOULD" instead of "MUST"), and there might be scenarios where
1051	      MSM is desirable but not possible e.g. when IP multicast is not
1052	      deployed in certain network (this is a justification of the
1053	      second "SHOULD" instead of "MUST").

1055	   The transmission mode is indicated by the tx-mode media parameter
1056	   (see section 7.1).  If tx-mode is equal to "SSM", SSM MUST be used.
1057	   Otherwise (tx-mode is equal to "MSM"), MSM MUST be used.

1059	   Receivers MUST support both SSM and MSM.

1061	4.5 Decoding Order Number

1063	   For each NAL unit, the variable AbsDon is derived, representing the
1064	   decoding order number that is indicative of the NAL unit decoding
1065	   order.

1067	   Let NAL unit n be the n-th NAL unit in transmission order within an
1068	   RTP stream.

1070	   If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to 0,
1071	   AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal
1072	   to n.

1074	   Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is
1075	   greater than 0), AbsDon[n] is derived as follows, where DON[n] is
1076	   the value of the variable DON for NAL unit n:

1078	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
1079	      transmission order), AbsDon[0] is set equal to DON[0].

1081	   o  Otherwise (n is greater than 0), the following applies for
1082	      derivation of AbsDon[n]:

1084	            If DON[n] == DON[n-1],
1085	                AbsDon[n] = AbsDon[n-1]

1087	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1088	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1090	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1091	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1093	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1094	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1096	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1097	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1099	   For any two NAL units m and n, the following applies:

1101	   o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
1102	      follows NAL unit m in NAL unit decoding order.

1104	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1105	      of the two NAL units can be in either order.

1107	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1108	      NAL unit m in decoding order.

1110	   When two consecutive NAL units in the NAL unit decoding order have
1111	   different values of AbsDon, the value of AbsDon for the second NAL
1112	   unit in decoding order MUST be greater than the value of AbsDon for
1113	   the first NAL unit, and the absolute difference between the two
1114	   AbsDon values MAY be greater than or equal to 1.

1116	      Informative note: There are multiple reasons to allow for the
1117	      absolute difference of the values of AbsDon for two consecutive
1118	      NAL units in the NAL unit decoding order to be greater than one.
1119	      An increment by one is not required, as at the time of
1120	      associating values of AbsDon to NAL units, it may not be known
1121	      whether all NAL units are to be delivered to the receiver.  For
1122	      example, a gateway may not forward VCL NAL units of higher sub-
1123	      layers or some SEI NAL units when there is congestion in the
1124	      network.  In another example, the first intra-coded picture of a
1125	      pre-encoded clip is transmitted in advance to ensure that it is
1126	      readily available in the receiver, and when transmitting the
1127	      first intra-coded picture, the originator does not exactly know
1128	      how many NAL units will be encoded before the first intra-coded
1129	      picture of the pre-encoded clip follows in decoding order.  Thus,
1130	      the values of AbsDon for the NAL units of the first intra-coded
1131	      picture of the pre-encoded clip have to be estimated when they
1132	      are transmitted, and gaps in values of AbsDon may occur.  Another
1133	      example is MSM where the AbsDon values must indicate cross-layer
1134	      decoding order for NAL units conveyed in all the RTP streams.

1136	4.6 Single NAL Unit Packets

1138	   A single NAL unit packet contains exactly one NAL unit, and consists
1139	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
1140	   DONL field (in network byte order), and the NAL unit payload data
1141	   (the NAL unit excluding its NAL unit header) of the contained NAL
1142	   unit, as shown in Figure 3.

1144	   0                   1                   2                   3
1145	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1146	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1147	   |           PayloadHdr          |      DONL (conditional)       |
1148	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1149	   |                                                               |
1150	   |                  NAL unit payload data                        |
1151	   |                                                               |
1152	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1153	   |                               :...OPTIONAL RTP padding        |
1154	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1156	              Figure 3 The structure a single NAL unit packet

1158	   The payload header SHOULD be an exact copy of the NAL unit header of
1159	   the contained NAL unit.  However, the Type (i.e. nal_unit_type)
1160	   field MAY be changed, e.g. when it is desirable to handle a CRA
1161	   picture to be a BLA picture [JCTVC-J0107].

1163	   The DONL field, when present, specifies the value of the 16 least
1164	   significant bits of the decoding order number of the contained NAL
1165	   unit.  If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
1166	   than 0, the DONL field MUST be present, and the variable DON for the
1167	   contained NAL unit is derived as equal to the value of the DONL
1168	   field.  Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff
1169	   is equal to 0), the DONL field MUST NOT be present.

1171	4.7 Aggregation Packets (APs)

1173	   Aggregation packets (APs) are introduced to enable the reduction of
1174	   packetization overhead for small NAL units, such as most of the non-
1175	   VCL NAL units, which are often only a few octets in size.

1177	   An AP aggregates NAL units within one access unit.  Each NAL unit to
1178	   be carried in an AP is encapsulated in an aggregation unit.  NAL
1179	   units aggregated in one AP are in NAL unit decoding order.

1181	   An AP consists of a payload header (denoted as PayloadHdr) followed
1182	   by two or more aggregation units, as shown in Figure 4.

1184	   0                   1                   2                   3
1185	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1186	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1187	   |    PayloadHdr (Type=48)       |                               |
1188	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1189	   |                                                               |
1190	   |             two or more aggregation units                     |
1191	   |                                                               |
1192	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1193	   |                               :...OPTIONAL RTP padding        |
1194	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1196	              Figure 4 The structure of an aggregation packet

1198	   The fields in the payload header are set as follows.  The F bit MUST
1199	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
1200	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be
1201	   equal to 48.  The value of LayerId MUST be equal to the lowest value
1202	   of LayerId of all the aggregated NAL units.  The value of TID MUST
1203	   be the lowest value of TID of all the aggregated NAL units.

1205	      Informative Note: All VCL NAL units in an AP have the same TID
1206	      value since they belong to the same access unit.  However, an AP
1207	      may contain non-VCL NAL units for which the TID value in the NAL
1208	      unit header may be different than the TID value of the VCL NAL
1209	      units in the same AP.

1211	   An AP MUST carry at least two aggregation units and can carry as
1212	   many aggregation units as necessary; however, the total amount of
1213	   data in an AP obviously MUST fit into an IP packet, and the size
1214	   SHOULD be chosen so that the resulting IP packet is smaller than the
1215	   MTU size so to avoid IP layer fragmentation.  An AP MUST NOT contain
1216	   Fragmentation Units (FUs) specified in section 4.8.  APs MUST NOT be
1217	   nested; i.e. an AP MUST NOT contain another AP.

1219	   The first aggregation unit in an AP consists of a conditional 16-bit
1220	   DONL field (in network byte order) followed by a 16-bit unsigned
1221	   size information (in network byte order) that indicates the size of
1222	   the NAL unit in bytes (excluding these two octets, but including the
1223	   NAL unit header), followed by the NAL unit itself, including its NAL
1224	   unit header, as shown in Figure 5.

1226	   0                   1                   2                   3
1227	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1228	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1229	                   :       DONL (conditional)      |   NALU size   |
1230	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1231	   |   NALU size   |                                               |
1232	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1233	   |                                                               |
1234	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1235	   |                               :
1236	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1238	       Figure 5 The structure of the first aggregation unit in an AP

1240	   The DONL field, when present, specifies the value of the 16 least
1241	   significant bits of the decoding order number of the aggregated NAL
1242	   unit.

1244	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than
1245	   0, the DONL field MUST be present in an aggregation unit that is the
1246	   first aggregation unit in an AP, and the variable DON for the
1247	   aggregated NAL unit is derived as equal to the value of the DONL
1248	   field.  Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff
1249	   is equal to 0), the DONL field MUST NOT be present in an aggregation
1250	   unit that is the first aggregation unit in an AP.

1252	   An aggregation unit that is not the first aggregation unit in an AP
1253	   consists of a conditional 8-bit DOND field followed by a 16-bit
1254	   unsigned size information (in network byte order) that indicates the
1255	   size of the NAL unit in bytes (excluding these two octets, but
1256	   including the NAL unit header), followed by the NAL unit itself,
1257	   including its NAL unit header, as shown in Figure 6.

1259	   0                   1                   2                   3
1260	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1261	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1262	                   : DOND (cond)   |          NALU size            |
1263	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1264	   |                                                               |
1265	   |                       NAL unit                                |
1266	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1267	   |                               :
1268	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1270	    Figure 6 The structure of an aggregation unit that is not the first
1271	                         aggregation unit in an AP

1273	   When present, the DOND field plus 1 specifies the difference between
1274	   the decoding order number values of the current aggregated NAL unit
1275	   and the preceding aggregated NAL unit in the same AP.

1277	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than
1278	   0, the DOND field MUST be present in an aggregation unit that is not
1279	   the first aggregation unit in an AP, and the variable DON for the
1280	   aggregated NAL unit is derived as equal to the DON of the preceding
1281	   aggregated NAL unit in the same AP plus the value of the DOND field
1282	   plus 1 modulo 65536.  Otherwise (tx-mode is equal to "SSM" and
1283	   sprop-max-don-diff is equal to 0), the DOND field MUST NOT be
1284	   present in an aggregation unit that is not the first aggregation
1285	   unit in an AP, and in this case the transmission order and decoding
1286	   order of NAL units carried in the AP are the same as the order the
1287	   NAL units appear in the AP.

1289	   Figure 7 presents an example of an AP that contains two aggregation
1290	   units, labeled as 1 and 2 in the figure, without the DONL and DOND
1291	   fields being present.

1293	    0                   1                   2                   3
1294	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1295	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1296	   |                          RTP Header                           |
1297	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1298	   |   PayloadHdr (Type=48)        |         NALU 1 Size           |
1299	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1300	   |          NALU 1 HDR           |                               |
1301	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1302	   |                   . . .                                       |
1303	   |                                                               |
1304	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1305	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1306	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1307	   | NALU 2 HDR    |                                               |
1308	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1309	   |                   . . .                                       |
1310	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1311	   |                               :...OPTIONAL RTP padding        |
1312	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1314	   Figure 7 An example of an AP packet containing two aggregation units
1315	                     without the DONL and DOND fields

1317	   Figure 8 presents an example of an AP that contains two aggregation
1318	   units, labeled as 1 and 2 in the figure, with the DONL and DOND
1319	   fields being present.

1321	    0                   1                   2                   3
1322	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1323	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1324	   |                          RTP Header                           |
1325	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1326	   |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
1327	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1328	   |          NALU 1 Size          |            NALU 1 HDR         |
1329	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1330	   |                                                               |
1331	   |                 NALU 1 Data   . . .                           |
1332	   |                                                               |
1333	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1334	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1335	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1336	   |          NALU 2 HDR           |                               |
1337	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1338	   |                                                               |
1339	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1340	   |                               :...OPTIONAL RTP padding        |
1341	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1343	    Figure 8 An example of an AP containing two aggregation units with
1344	                         the DONL and DOND fields

1346	4.8 Fragmentation Units (FUs)

1348	   Fragmentation units (FUs) are introduced to enable fragmenting a single
1349	   NAL unit into multiple RTP packets, possibly without cooperation or
1350	   knowledge of the HEVC encoder.  A fragment of a NAL unit consists of
1351	   an integer number of consecutive octets of that NAL unit.  Fragments
1352	   of the same NAL unit MUST be sent in consecutive order with ascending
1353	   RTP sequence numbers (with no other RTP packets within the same RTP
1354	   stream being sent between the first and last fragment).

1356	   When a NAL unit is fragmented and conveyed within FUs, it is
1357	   referred to as a fragmented NAL unit.  APs MUST NOT be fragmented.
1358	   FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of
1359	   another FU.

1361	   The RTP timestamp of an RTP packet carrying an FU is set to the
1362	   NALU-time of the fragmented NAL unit.

1364	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1365	   header of one octet, a conditional 16-bit DONL field (in network
1366	   byte order), and an FU payload, as shown in Figure 9.

1368	    0                   1                   2                   3
1369	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1370	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1371	   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
1372	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1373	   | DONL (cond)   |                                               |
1374	   |-+-+-+-+-+-+-+-+                                               |
1375	   |                         FU payload                            |
1376	   |                                                               |
1377	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1378	   |                               :...OPTIONAL RTP padding        |
1379	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1381	                      Figure 9 The structure of an FU

1383	   The fields in the payload header are set as follows.  The Type field
1384	   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal
1385	   to the fields F, LayerId, and TID, respectively, of the fragmented
1386	   NAL unit.

1388	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1389	   field, as shown in Figure 10.

1391	                            +---------------+
1392	                            |0|1|2|3|4|5|6|7|
1393	                            +-+-+-+-+-+-+-+-+
1394	                            |S|E|  FuType   |
1395	                            +---------------+

1397	                  Figure 10   The structure of FU header

1399	   The semantics of the FU header fields are as follows:
1400	   S: 1 bit
1401	      When set to one, the S bit indicates the start of a fragmented
1402	      NAL unit i.e. the first byte of the FU payload is also the first
1403	      byte of the payload of the fragmented NAL unit.  When the FU
1404	      payload is not the start of the fragmented NAL unit payload, the
1405	      S bit MUST be set to zero.

1407	   E: 1 bit
1408	      When set to one, the E bit indicates the end of a fragmented NAL
1409	      unit, i.e. the last byte of the payload is also the last byte of
1410	      the fragmented NAL unit.  When the FU payload is not the last
1411	      fragment of a fragmented NAL unit, the E bit MUST be set to zero.

1413	   FuType: 6 bits
1414	      The field FuType MUST be equal to the field Type of the
1415	      fragmented NAL unit.

1417	   The DONL field, when present, specifies the value of the 16 least
1418	   significant bits of the decoding order number of the fragmented NAL
1419	   unit.

1421	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than
1422	   0, and the S bit is equal to 1, the DONL field MUST be present in
1423	   the FU, and the variable DON for the fragmented NAL unit is derived
1424	   as equal to the value of the DONL field.  Otherwise (tx-mode is
1425	   equal to "SSM" and sprop-max-don-diff is equal to 0, or the S bit is
1426	   equal to 0), the DONL field MUST NOT be present in the FU.

1428	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
1429	   the Start bit and End bit MUST NOT both be set to one in the same FU
1430	   header.

1432	   The FU payload consists of fragments of the payload of the
1433	   fragmented NAL unit so that if the FU payloads of consecutive FUs,
1434	   starting with an FU with the S bit equal to 1 and ending with an FU
1435	   with the E bit equal to 1, are sequentially concatenated, the
1436	   payload of the fragmented NAL unit can be reconstructed.  The NAL
1437	   unit header of the fragmented NAL unit is not included as such in
1438	   the FU payload, but rather the information of the NAL unit header of
1439	   the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
1440	   the FU payload headers of the FUs and the FuType field of the FU
1441	   header of the FUs.  An FU payload MUST not be empty.

1443	   If an FU is lost, the receiver SHOULD discard all following
1444	   fragmentation units in transmission order corresponding to the same
1445	   fragmented NAL unit, unless the decoder in the receiver is known to
1446	   be prepared to gracefully handle incomplete NAL units.

1448	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1449	   fragments of a NAL unit to an (incomplete) NAL unit, even if
1450	   fragment n of that NAL unit is not received.  In this case, the
1451	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1452	   syntax violation.

1454	4.9 PACI packets

1456	   This section specifies the PACI packet structure.  The basic payload
1457	   header specified in this memo is intentionally limited to the 16
1458	   bits of the NAL unit header so to keep the packetization overhead to
1459	   a minimum.  However, cases have been identified where it is
1460	   advisable to include control information in an easily accessible
1461	   position in the packet header, despite the additional overhead.  One
1462	   such control information is the Temporal Scalability Control
1463	   Information as specified in section 4.10 below.  PACI packets carry
1464	   this and future, similar structures.

1466	   The PACI packet structure is based on a payload header extension
1467	   mechanism that is generic and extensible to carry payload header
1468	   extensions.  In this section, the focus lies on the use within this
1469	   specification.  Section 4.9.2 below provides guidance for the
1470	   specification designers in how to employ the extension mechanism in
1471	   future specifications.

1473	   A PACI packet consists of a payload header (denoted as PayloadHdr),
1474	   for which the structure follows what is described in section 4.3
1475	   above.  The payload header is followed by the fields A, cType,
1476	   PHSsize, F[0..2] and Y.

1478	   Figure 11 shows a PACI packet in compliance with this memo; that is,
1479	   without any extensions.

1481	      0                   1                   2                   3
1482	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1483	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1484	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1485	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1486	      |        Payload Header Extension Structure (PHES)              |
1487	      |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
1488	      |                                                               |
1489	      |                  PACI payload: NAL unit                       |
1490	      |                   . . .                                       |
1491	      |                                                               |
1492	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1493	      |                               :...OPTIONAL RTP padding        |
1494	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

1496	                    Figure 11   The structure of a PACI

1498	   The fields in the payload header are set as follows.  The F bit MUST
1499	   be equal to 0.  The Type field MUST be equal to 50.  The value of
1500	   LayerId MUST be a copy of the LayerId field of the PACI payload NAL
1501	   unit or NAL-unit-like structure.  The value of TID MUST be a copy of
1502	   the TID field of the PACI payload NAL unit or NAL-unit-like
1503	   structure.

1505	   The semantics of other fields are as follows:

1507	   A: 1 bit
1508	      Copy of the F bit of the PACI payload NAL unit or NAL-unit-like
1509	      structure.

1511	   cType: 6 bits
1512	      Copy of the Type field of the PACI payload NAL unit or NAL-unit-
1513	      like structure.

1515	   PHSsize: 5 bits
1516	      Indicates the length of the PHES field.  The value is limited to
1517	      be less than or equal to 32 octets, to simplify encoder design
1518	      for MTU size matching.

1520	   F0
1521	      This field equal to 1 specifies the presence of a temporal
1522	      scalability support extension in the PHES.

1524	   F1, F2
1525	      MUST be 0, available for future extensions, see section 4.9.2.

1527	   Y: 1 bit
1528	      MUST be 0, available for future extensions, see section 4.9.2.

1530	   PHES: variable number of octets
1531	      A variable number of octets as indicated by the value of PHSsize.

1533	   PACI Payload
1534	      The single NAL unit packet or NAL-unit-like structure (such as:
1535	      FU or AP) to be carried, not including the first two octets.

1537	         Informative note: The first two octets of the NAL unit or NAL-
1538	         unit-like structure carried in the PACI payload are not
1539	         included in the PACI payload. Rather, the respective values
1540	         are copied in locations of the PayloadHdr of the RTP packet.
1541	         This design offers two advantages: first, the overall
1542	         structure of the payload header is preserved, i.e. there is no
1543	         special case of payload header structure that needs to be
1544	         implemented for PACI.  Second, no additional overhead is
1545	         introduced.

1547	      A PACI payload MAY be a single NAL unit, an FU, or an AP.  PACIs
1548	      MUST NOT be fragmented or aggregated.  The following subsection
1549	      documents the reasons for these design choices.

1551	4.9.1 Reasons for the PACI rules (informative)

1553	   A PACI cannot be fragmented.  If a PACI could be fragmented, and a
1554	   fragment other than the first fragment would get lost, access to the
1555	   information in the PACI would not be possible.  Therefore, a PACI
1556	   must not be fragmented.  In other words, an FU must not carry
1557	   (fragments of) a PACI.

1559	   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
1560	   from a compression viewpoint, as, in many cases, several to be
1561	   aggregated NAL units would share identical PACI fields and values
1562	   which would be carried redundantly for no reason.   Most, if not all
1563	   the practical effects of PACI aggregation can be achieved by
1564	   aggregating NAL units and bundling them with a PACI (see below).
1565	   Therefore, a PACI must not be aggregated.  In other words, an AP
1566	   must not contain a PACI.

1568	   The payload of a PACI can be a fragment.  Both middleboxes and
1569	   sending systems with inflexible (often hardware-based) encoders
1570	   occasionally find themselves in situations where a PACI and its
1571	   headers, combined, are larger than the MTU size.  In such a
1572	   scenario, the middlebox or sender can fragment the NAL unit and
1573	   encapsulate the fragment in a PACI.  Doing so preserves the payload
1574	   header extension information for all fragments, allowing downstream
1575	   middleboxes and the receiver to take advantage of that information.
1576	   Therefore, a sender may place a fragment into a PACI, and a receiver
1577	   must be able to handle such a PACI.

1579	   The payload of a PACI can be an aggregation NAL unit.  HEVC
1580	   bitstreams can contain unevenly sized and/or small (when compared to
1581	   the MTU size) NAL units.  In order to efficiently packetize such
1582	   small NAL units, AP were introduced.  The benefits of APs are
1583	   independent from the need for a payload header extension.
1584	   Therefore, a sender may place an AP into a PACI, and a receiver must
1585	   be able to handle such a PACI.

1587	4.9.2 PACI extensions (Informative)

1589	   This subsection includes recommendations for future specification
1590	   designers on how to extent the PACI syntax to accommodate future
1591	   extensions.  Obviously, designers are free to specify whatever appears
1592	   to be appropriate to them at the time of their design.  However, a lot
1593	   of thought has been invested into the extension mechanism described
1594	   below, and we suggest that deviations from it warrant a good
1595	   explanation.

1597	   This memo defines only a single payload header extension (Temporal
1598	   Scalability Control Information, described below in section 4.10),
1599	   and, therefore, only the F0 bit carries semantics.  F1 and F2 are
1600	   already named (and not just marked as reserved, as a typical video
1601	   spec designer would do).  They are intended to signal two additional
1602	   extensions.  The Y bit allows to, recursively, add further F and Y
1603	   bits to extend the mechanism beyond 3 possible payload header
1604	   extensions.  It is suggested to define a new packet type (using a
1605	   different value for Type) when assigning the F1, F2, or Y bits
1606	   different semantics than what is suggested below.

1608	   When a Y bit is set, an 8 bit flag-extension is inserted after the Y
1609	   bit.  A flag-extension consists of 7 flags F[n..n+6], and another Y
1610	   bit.

1612	   The basic PACI header already includes F0, F1, and F2.  Therefore,
1613	   the Fx bits in the first flag-extensions are numbered F3, F4, ...,
1614	   F9, the F bits in the second flag-extension are numbered F10, F11,
1615	   ..., F16, and so forth.  As a result, at least 3 Fx bits are always
1616	   in the PACI, but the number of Fx bits (and associated types of
1617	   extensions), can be increased by setting the next Y bit and adding
1618	   an octet of flag-extensions, carrying 7 flags and another Y bit.
1619	   The size of this list of flags is subject to the limits specified in
1620	   section 4.9 (32 octets for all flag-extensions and the PHES
1621	   information combined).

1623	   Each of the F bits can indicate either the presence of information in
1624	   the Payload Header Extension Structure (PHES), described below, or a
1625	   given F bit can indicate a certain condition, without including
1626	   additional information in the PHES.

1628	   When a spec developer devises a new syntax that takes advantage of the
1629	   PACI extension mechanism, he/she must follow the constraints listed
1630	   below; otherwise the extension mechanism may break.

1632	     1) The fields added for a particular Fx bit MUST be fixed in length
1633	        and not depend on what other Fx bits are set (no parsing
1634	        dependency).
1635	     2) The Fx bits must be assigned in order.
1636	     3) An implementation that supports the n-th Fn bit for any value of
1637	        n  must  understand  the  syntax  (though  not  necessarily  the
1638	        semantics) of the fields Fk (with k < n), so to be able to either
1639	        use those bits when present, or at least be able to skip over
1640	        them.

1642	4.10 Temporal Scalability Control Information

1644	   This section describes the single payload header extension defined
1645	   in this specification, known as Temporal Scalability Control
1646	   Information (TSCI).  If, in the future, additional payload header
1647	   extensions become necessary, they could be specified in this section
1648	   of an updated version of this document, or in their own documents.

1650	   When F0 is set to 1 in a PACI, this specifies that the PHES field
1651	   includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as follows:

1653	     0                   1                   2                   3
1654	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1655	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1656	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1657	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1658	      |   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
1659	      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1660	      |                           ....                                |
1661	      |               PACI payload: NAL unit                          |
1662	      |                                                               |
1663	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1664	      |                               :...OPTIONAL RTP padding        |
1665	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1667	     Figure 12   The structure of a PACI with a PHES containing a TSCI

1669	   TL0PICIDX (8 bits)
1670	      When present, the TL0PICIDX field MUST be set to equal to
1671	      temporal_sub_layer_zero_idx as specified in Section D.3.22 of
1672	      [H.265] for the access unit containing the NAL unit in the PACI.

1674	   IrapPicID (8 bits)
1675	      When present, the IrapPicID field MUST be set to equal to
1676	      irap_pic_id as specified in Section D.3.22 of [H.265] for the
1677	      access unit containing the NAL unit in the PACI.

1679	   S (1 bit)
1680	      The S bit MUST be set to 1 if any of the following conditions is
1681	      true and MUST be set to 0 otherwise:

1683	      . The NAL unit in the payload of the PACI is the first VCL NAL
1684	        unit, in decoding order, of a picture.
1685	      . The NAL unit in the payload of the PACI is an AP and the NAL
1686	        unit in the first contained aggregation unit is the first VCL
1687	        NAL unit, in decoding order, of a picture.
1688	      . The NAL unit in the payload of the PACI is an FU with its S bit
1689	        equal to 1 and the FU payload containing a fragment of the
1690	        first VCL NAL unit, in decoding order of a picture.

1692	   E (1 bit)
1693	      The E bit MUST be set to 1 if any of the following conditions is
1694	      true and MUST be set to 0 otherwise:

1696	      . The NAL unit in the payload of the PACI is the last VCL NAL
1697	        unit, in decoding order, of a picture.
1698	      . The NAL unit in the payload of the PACI is an AP and the NAL
1699	        unit in the last contained aggregation unit is the last VCL NAL
1700	        unit, in decoding order, of a picture.
1701	      . The NAL unit in the payload of the PACI is an FU with its E bit
1702	        equal to 1 and the FU payload containing a fragment of the last
1703	        VCL NAL unit, in decoding order of a picture.

1705	   RES (6 bits)
1706	      MUST be equal to 0.  Reserved for future extensions.

1708	   The value of PHSsize MUST be set to 3.  Receivers MUST allow other
1709	   values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any
1710	   additional fields, when present, than specified above in the PHES.

1712	5 Packetization Rules

1714	   The following packetization rules apply:

1716	   o  If tx-mode is equal to "MSM" or sprop-max-don-diff is greater than
1717	      0 for an RTP stream, the transmission order of NAL units carried in
1718	      the RTP stream MAY be different than the NAL unit decoding order.
1719	      Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is equal
1720	      to 0 for an RTP stream), the transmission order of NAL units carried
1721	      in the RTP stream MUST be the same as the NAL unit decoding order.

1723	   o  A NAL unit of a small size SHOULD be encapsulated in an
1724	      aggregation packet together with one or more other NAL units in
1725	      order to avoid the unnecessary packetization overhead for small
1726	      NAL units.  For example, non-VCL NAL units such as access unit
1727	      delimiters, parameter sets, or SEI NAL units are typically small
1728	      and can often be aggregated with VCL NAL units without violating
1729	      MTU size constraints.

1731	   o  Each non-VCL NAL unit SHOULD, when possible from an MTU size
1732	      match viewpoint, be encapsulated in an aggregation packet
1733	      together with its associated VCL NAL unit, as typically a non-VCL
1734	      NAL unit would be meaningless without the associated VCL NAL unit
1735	      being available.

1737	   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
1738	      unit packet MUST be used.

1740	6 De-packetization Process

1742	   The general concept behind de-packetization is to get the NAL units
1743	   out of the RTP packets in an RTP stream and all RTP streams the RTP
1744	   stream depends on, if any, and pass them to the decoder in the NAL
1745	   unit decoding order.

1747	   The de-packetization process is implementation dependent.
1748	   Therefore, the following description should be seen as an example of
1749	   a suitable implementation.  Other schemes may be used as well as
1750	   long as the output for the same input is the same as the process
1751	   described below.  The output is the same when the set of output NAL
1752	   units and their order are both identical.  Optimizations relative to
1753	   the described algorithms are possible.

1755	   All normal RTP mechanisms related to buffer management apply.  In
1756	   particular, duplicated or outdated RTP packets (as indicated by the
1757	   RTP sequences number and the RTP timestamp) are removed.  To
1758	   determine the exact time for decoding, factors such as a possible
1759	   intentional delay to allow for proper inter-stream synchronization
1760	   must be factored in.

1762	   NAL units with NAL unit type values in the range of 0 to 47,
1763	   inclusive may be passed to the decoder.  NAL-unit-like structures
1764	   with NAL unit type values in the range of 48 to 63, inclusive, MUST
1765	   NOT be passed to the decoder.

1767	   The receiver includes a receiver buffer, which is used to compensate
1768	   for transmission delay jitter within individual RTP streams and
1769	   across RTP streams, to reorder NAL units from transmission order to
1770	   the NAL unit decoding order, and to recover the NAL unit decoding
1771	   order in MSM, when applicable.  In this section, the receiver
1772	   operation is described under the assumption that there is no
1773	   transmission delay jitter within an RTP stream and across RTP
1774	   streams.  To make a difference from a practical receiver buffer that
1775	   is also used for compensation of transmission delay jitter, the
1776	   receiver buffer is here after called the de-packetization buffer in
1777	   this section.  Receivers should also prepare for transmission delay
1778	   jitter; i.e. either reserve separate buffers for transmission delay
1779	   jitter buffering and de-packetization buffering or use a receiver
1780	   buffer for both transmission delay jitter and de-packetization.
1781	   Moreover, receivers should take transmission delay jitter into
1782	   account in the buffering operation; e.g. by additional initial
1783	   buffering before starting of decoding and playback.

1785	   If only one RTP stream is being received and sprop-max-don-diff of
1786	   the only RTP stream being received is equal to 0, the de-
1787	   packetization buffer size is zero bytes, i.e. the NAL units carried
1788	   in the RTP stream are directly passed to the decoder in their
1789	   transmission order, which is identical to the decoding order of the
1790	   NAL units. Otherwise, the process described in the remainder of this
1791	   section applies.

1793	   There are two buffering states in the receiver: initial buffering
1794	   and buffering while playing.  Initial buffering starts when the
1795	   reception is initialized.  After initial buffering, decoding and
1796	   playback are started, and the buffering-while-playing mode is used.

1798	   Regardless of the buffering state, the receiver stores incoming NAL
1799	   units, in reception order, into the de-packetization buffer.  NAL
1800	   units carried in RTP packets are stored in the de-packetization
1801	   buffer individually, and the value of AbsDon is calculated and
1802	   stored for each NAL unit.  When MSM is in use, NAL units of all RTP
1803	   streams of a bitstream are stored in the same de-packetization
1804	   buffer.  When NAL units carried in any two RTP streams are available
1805	   to be placed into the de-packetization buffer, those NAL units
1806	   carried in the RTP stream that is lower in the dependency tree are
1807	   placed into the buffer first.  For example, if RTP stream A depends
1808	   on RTP stream B, then NAL units carried in RTP stream B are placed
1809	   into the buffer first.

1811	   Initial buffering lasts until condition A (the difference between
1812	   the greatest and smallest AbsDon values of the NAL units in the de-
1813	   packetization buffer is greater than or equal to the value of sprop-
1814	   max-don-diff of the highest RTP stream) or condition B (the number
1815	   of NAL units in the de-packetization buffer is greater than the
1816	   value of sprop-depack-buf-nalus) is true.

1818	   After initial buffering, whenever condition A or condition B is
1819	   true, the following operation is repeatedly applied until both
1820	   condition A and condition A become false:

1822	   o  The NAL unit in the de-packetization buffer with the smallest
1823	      value of AbsDon is removed from the de-packetization buffer and
1824	      passed to the decoder.

1826	   When no more NAL units are flowing into the de-packetization buffer,
1827	   all NAL units remaining in the de-packetization buffer are removed
1828	   from the buffer and passed to the decoder in the order of increasing
1829	   AbsDon values.

1831	7 Payload Format Parameters

1833	   This section specifies the parameters that MAY be used to select
1834	   optional features of the payload format and certain features or
1835	   properties of the bitstream or the RTP stream.  The parameters are
1836	   specified here as part of the media type registration for the HEVC
1837	   codec.  A mapping of the parameters into the Session Description
1838	   Protocol (SDP) [RFC4566] is also provided for applications that use
1839	   SDP.  Equivalent parameters could be defined elsewhere for use with
1840	   control protocols that do not use SDP.

1842	7.1 Media Type Registration

1844	   The media subtype for the HEVC codec is allocated from the IETF
1845	   tree.

1847	   The receiver MUST ignore any unrecognized parameter.

1849	   Media Type name:     video

1851	   Media subtype name:  H265

1853	   Required parameters: none

1855	   OPTIONAL parameters:

1857	      profile-space, tier-flag, profile-id, profile-compatibility-
1858	      indicator, interop-constraints, and level-id:

1860	         These parameters indicate the profile, tier, default level,
1861	         and some constraints of the bitstream carried by the RTP
1862	         stream and all RTP streams the RTP stream depends on, or a
1863	         specific set of the profile, tier, default level, and some
1864	         constraints the receiver supports.

1866	         The profile and some constraints are indicated collectively by
1867	         profile-space, profile-id, profile-compatibility-indicator,
1868	         and interop-constraints.  The profile specifies the subset of
1869	         coding tools that may have been used to generate the bitstream
1870	         or that the receiver supports.

1872	            Informative note: There are 32 values of profile-id, and
1873	            there are 32 flags in profile-compatibility-indicator, each
1874	            flag corresponding to one value of profile-id.  According
1875	            to HEVC version 1 in [HEVC], when more than one of the 32
1876	            flags is set for a bitstream, the bitstream would comply
1877	            with all the profiles corresponding to the set flags.
1878	            However, in a draft of HEVC version 2 in [HEVC draft v2],
1879	            subclause A.3.5, 19 Format Range Extensions profiles have
1880	            been specified, all using the same value of profile-id (4),
1881	            differentiated by some of the 48 bits in interop-
1882	            constraints - this (rather unexpected way of profile
1883	            signalling) means that one of the 32 flags may correspond
1884	            to multiple profiles.  To be able to support whatever HEVC
1885	            extension profile that might be specified and indicated
1886	            using profile-space, profile-id, profile-compatibility-
1887	            indicator, and interop-constraints in the future, it would
1888	            be safe to require symmetric use of these parameters in SDP
1889	            offer/answer unless recv-sub-layer-id is included in the
1890	            SDP answer for choosing one of the sub-layers offered.

1892	         The tier is indicated by tier-flag.  The default level is
1893	         indicated by level-id.  The tier and the default level specify
1894	         the limits on values of syntax elements or arithmetic
1895	         combinations of values of syntax elements that are followed
1896	         when generating the bitstream or that the receiver supports.

1898	         A set of profile-space, tier-flag, profile-id, profile-
1899	         compatibility-indicator, interop-constraints, and level-id
1900	         parameters ptlA is said to be consistent with another set of
1901	         these parameters ptlB if any decoder that conforms to the
1902	         profile, tier, level, and constraints indicated by ptlB can
1903	         decode any bitstream that conforms to the profile, tier,
1904	         level, and constraints indicated by ptlA.

1906	         In SDP offer/answer, when the SDP answer does not include the
1907	         recv-sub-layer-id parameter that is less than the sprop-sub-
1908	         layer-id parameter in the SDP offer, the following applies:

1910	            o The profile-space, tier-flag, profile-id, profile-
1911	              compatibility-indicator, and interop-constraints
1912	              parameters MUST be used symmetrically, i.e. the value of
1913	              each of these parameters in the offer MUST be the same as
1914	              that in the answer, either explicitly signalled or
1915	              implicitly inferred.
1916	            o The level-id parameter is changeable as long as the
1917	              highest level indicated by the answer is either equal to
1918	              or lower than that in the offer.  Note that the highest
1919	              level is indicated by level-id and max-recv-level-id
1920	              together.

1922	         In SDP offer/answer, when the SDP answer does include the
1923	         recv-sub-layer-id parameter that is less than the sprop-sub-
1924	         layer-id parameter in the SDP offer, the set of profile-space,
1925	         tier-flag, profile-id, profile-compatibility-indicator,
1926	         interop-constraints, and level-id parameters included in the
1927	         answer MUST be consistent with that for the chosen sub-layer
1928	         representation as indicated in the SDP offer, with the
1929	         exception that the level-id parameter in the SDP answer is
1930	         changable as long as the highest level indicated by the answer
1931	         is either lower than or equal to that in the offer.

1933	         More specifications of these parameters, including how they
1934	         relate to the values of the profile, tier, and level syntax
1935	         elements specified in [HEVC] are provided below.

1937	      profile-space, profile-id:

1939	         The value of profile-space MUST be in the range of 0 to 3,
1940	         inclusive.  The value of profile-id MUST be in the range of 0
1941	         to 31, inclusive.

1943	         When profile-space is not present, a value of 0 MUST be
1944	         inferred.  When profile-id is not present, a value of 1 (i.e.
1945	         the Main profile) MUST be inferred.

1947	         When used to indicate properties of a bitstream, profile-space
1948	         and profile-id are derived from the profile, tier, and level
1949	         syntax elements in SPS or VPS NAL units as follows, where
1950	         general_profile_space, general_profile_idc,
1951	         sub_layer_profile_space[j], and sub_layer_profile_idc[j] are
1952	         specified in [HEVC]:

1954	            If the RTP stream is the highest RTP stream, the following
1955	            applies:

1957	            o profile_space = general_profile_space
1958	            o profile_id = general_profile_idc

1960	            Otherwise (the RTP stream is a dependee RTP stream), the
1961	            following applies, with j being the value of the sprop-sub-
1962	            layer-id parameter:

1964	            o profile_space = sub_layer_profile_space[j]
1965	            o profile_id = sub_layer_profile_idc[j]

1967	      tier-flag, level-id:

1969	         The value of tier-flag MUST be in the range of 0 to 1,
1970	         inclusive.  The value of level-id MUST be in the range of 0
1971	         to 255, inclusive.

1973	         If the tier-flag and level-id parameters are used to indicate
1974	         properties of a bitstream, they indicate the tier and the
1975	         highest level the bitstream complies with.

1977	         If the tier-flag and level-id parameters are used for
1978	         capability exchange, the following applies.  If max-recv-
1979	         level-id is not present, the default level defined by level-id
1980	         indicates the highest level the codec wishes to support.
1981	         Otherwise, max-recv-level-id indicates the highest level the
1982	         codec supports for receiving.  For either receiving or
1983	         sending, all levels that are lower than the highest level
1984	         supported MUST also be supported.

1986	         If no tier-flag is present, a value of 0 MUST be inferred and
1987	         if no level-id is present, a value of 93 (i.e. level 3.1) MUST
1988	         be inferred.

1990	         When used to indicate properties of a bitstream, the tier-flag
1991	         and level-id parameters are derived from the profile, tier,
1992	         and level syntax elements in SPS or VPS NAL units as follows,
1993	         where general_tier_flag, general_level_idc,
1994	         sub_layer_tier_flag[j], and sub_layer_level_idc[j] are
1995	         specified in [HEVC]:

1997	            If the RTP stream is the highest RTP stream, the following
1998	            applies:

2000	            o tier-flag = general_tier_flag
2001	            o level-id = general_level_idc

2003	            Otherwise (the RTP stream is a dependee RTP stream), the
2004	            following applies, with j being the value of the sprop-sub-
2005	            layer-id parameter:

2007	            o tier-flag = sub_layer_tier_flag[j]
2008	            o level-id = sub_layer_level_idc[j]

2010	      interop-constraints:

2012	         A base16 [RFC4648] (hexadecimal) representation of six bytes
2013	         of data, consisting of progressive_source_flag,
2014	         interlaced_source_flag, non_packed_constraint_flag,
2015	         frame_only_constraint_flag, and reserved_zero_44bits.

2017	         If the interop-constraints parameter is not present, the
2018	         following MUST be inferred:

2020	            o progressive_source_flag = 1
2021	            o interlaced_source_flag = 0
2022	            o non_packed_constraint_flag = 1
2023	            o frame_only_constraint_flag = 1
2024	            o reserved_zero_44bits = 0

2026	         When the interop-constraints parameter is used to indicate
2027	         properties of a bitstream, the following applies, where
2028	         general_progressive_source_flag,
2029	         general_interlaced_source_flag,
2030	         general_non_packed_constraint_flag,
2031	         general_non_packed_constraint_flag,
2032	         general_frame_only_constraint_flag,
2033	         general_reserved_zero_44bits,
2034	         sub_layer_progressive_source_flag[j],
2035	         sub_layer_interlaced_source_flag[j],
2036	         sub_layer_non_packed_constraint_flag[j],
2037	         sub_layer_frame_only_constraint_flag[j], and
2038	         sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:

2040	            If the RTP stream is the highest RTP stream, the following
2041	            applies:

2043	            o progressive_source_flag = general_progressive_source_flag
2044	            o interlaced_source_flag = general_interlaced_source_flag
2045	            o non_packed_constraint_flag =
2046	                              general_non_packed_constraint_flag
2047	            o frame_only_constraint_flag =
2048	                              general_frame_only_constraint_flag
2049	            o reserved_zero_44bits = general_reserved_zero_44bits

2051	            Otherwise (the RTP stream is a dependee RTP stream), the
2052	            following applies, with j being the value of the sprop-sub-
2053	            layer-id parameter:

2055	            o progressive_source_flag =
2056	                              sub_layer_progressive_source_flag[j]
2057	            o interlaced_source_flag =
2058	                              sub_layer_interlaced_source_flag[j]
2059	            o non_packed_constraint_flag =
2060	                              sub_layer_non_packed_constraint_flag[j]
2061	            o frame_only_constraint_flag =
2062	                              sub_layer_frame_only_constraint_flag[j]
2063	            o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

2065	         Using interop-constraints for capability exchange results in a
2066	         requirement on any bitstream to be compliant with the interop-
2067	         constraints.

2069	      profile-compatibility-indicator:

2071	         A base16 [RFC4648] representation of four bytes of data.

2073	         When profile-compatibility-indicator is used to indicate
2074	         properties of a bitstream, the following applies, where
2075	         general_profile_compatibility_flag[j] and
2076	         sub_layer_profile_compatibility_flag[i][j] are specified in
2077	         [HEVC]:

2079	            The profile-compatibility-indicator in this case indicates
2080	            additional profiles to the profile defined by
2081	            profile_space, profile_id, and interop-constraints the
2082	            bitstream conforms to.  A decoder that conforms to any of
2083	            all the profiles the bitstream conforms to would be capable
2084	            of decoding the bitstream.  These additional profiles are
2085	            defined by profile-space, each set bit of profile-
2086	            compatibility-indicator, and interop-constraints.

2088	            If the RTP stream is the highest RTP stream, the following
2089	            applies for each value of j in the range of 0 to 31,
2090	            inclusive:

2092	            o bit j of profile-compatibility-indicator =
2093	                  general_profile_compatibility_flag[j]

2095	            Otherwise (the RTP stream is a dependee RTP stream), the
2096	            following applies for i equal to sprop-sub-layer-id and for
2097	            each value of j in the range of 0 to 31, inclusive:

2099	            o bit j of profile-compatibility-indicator =
2100	                  sub_layer_profile_compatibility_flag[i][j]

2102	         Using profile-compatibility-indicator for capability exchange
2103	         results in a requirement on any bitstream to be compliant with
2104	         the profile-compatibility-indicator.  This is intended to
2105	         handle cases where any future HEVC profile is defined as an
2106	         intersection of two or more profiles.

2108	         If this parameter is not present, this parameter defaults to
2109	         the following: bit j, with j equal to profile-id, of profile-
2110	         compatibility-indicator is inferred to be equal to 1, and all
2111	         other bits are inferred to be equal to 0.

2113	      sprop-sub-layer-id:

2115	         This parameter MAY be used to indicate the highest allowed
2116	         value of TID in the bitstream.  When not present, the value of
2117	         sprop-sub-layer-id is inferred to be equal to 6.

2119	         The value of sprop-sub-layer-id MUST be in the range of 0
2120	         to 6, inclusive.

2122	      recv-sub-layer-id:

2124	         This parameter MAY be used to signal a receiver's choice of
2125	         the offered or declared sub-layer representations in the
2126	         sprop-vps.  The value of recv-sub-layer-id indicates the TID
2127	         of the highest sub-layer of the bitstream that a receiver
2128	         supports.  When not present, the value of recv-sub-layer-id is
2129	         inferred to be equal to the value of the sprop-sub-layer-id
2130	         parameter in the SDP offer.

2132	         The value of recv-sub-layer-id MUST be in the range of 0 to 6,
2133	         inclusive.

2135	      max-recv-level-id:

2137	         This parameter MAY be used to indicate the highest level a
2138	         receiver supports.  The highest level the receiver supports is
2139	         equal to the value of max-recv-level-id divided by 30.

2141	         The value of max-recv-level-id MUST be in the range of 0
2142	         to 255, inclusive.

2144	         When max-recv-level-id is not present, the value is inferred
2145	         to be equal to level-id.

2147	         max-recv-level-id MUST NOT be present when the highest level
2148	         the receiver supports is not higher than the default level.

2150	      tx-mode:

2152	         This parameter indicates whether the transmission mode is SSM
2153	         or MSM.

2155	         The value of tx-mode MUST be equal to either "MSM" or "SSM".
2156	         When not present, the value of tx-mode is inferred to be equal
2157	         to "SSM".

2159	         If the value is equal to "MSM", MSM MUST be in use.  Otherwise
2160	         (the value is equal to "SSM"), SSM MUST be in use.

2162	         The value of tx-mode MUST be equal to "MSM" for all RTP sessions
2163	         in an MSM.

2165	      sprop-vps:

2167	         This parameter MAY be used to convey any video parameter set
2168	         NAL unit of the bitstream for out-of-band transmission of
2169	         video parameter sets.  The parameter MAY also be used for
2170	         capability exchange and to indicate sub-stream characteristics
2171	         (i.e. properties of sub-layer representations as defined in
2172	         [HEVC]).  The value of the parameter is a comma-separated
2173	         (',') list of base64 [RFC4648] representations of the video
2174	         parameter set NAL units as specified in Section 7.3.2.1 of
2175	         [HEVC].

2177	         The sprop-vps parameter MAY contain one or more than one video
2178	         parameter set NAL unit. However, all other video parameter
2179	         sets contained in the sprop-vps parameter MUST be consistent
2180	         with the first video parameter set in the sprop-vps parameter.
2181	         A video parameter set vpsB is said to be consistent with
2182	         another video parameter set vpsA if any decoder that conforms
2183	         to the profile, tier, level, and constraints indicated by the
2184	         12 bytes of data starting from the syntax element
2185	         general_profile_space to the syntax element general_level_id,
2186	         inclusive, in the first profile_tier_level( ) syntax structure
2187	         in vpsA can decode any bitstream that conforms to the profile,
2188	         tier, level, and constraints indicated by the 12 bytes of data
2189	         starting from the syntax element general_profile_space to the
2190	         syntax element general_level_id, inclusive, in the first
2191	         profile_tier_level( ) syntax structure in vpsB.

2193	      sprop-sps:

2195	         This parameter MAY be used to convey sequence parameter set
2196	         NAL units of the bitstream for out-of-band transmission of
2197	         sequence parameter sets.  The value of the parameter is a
2198	         comma-separated (',') list of base64 [RFC4648] representations
2199	         of the sequence parameter set NAL units as specified in
2200	         Section 7.3.2.2 of [HEVC].

2202	      sprop-pps:

2204	         This parameter MAY be used to convey picture parameter set NAL
2205	         units of the bitstream for out-of-band transmission of picture
2206	         parameter sets.  The value of the parameter is a comma-
2207	         separated (',') list of base64 [RFC4648] representations of
2208	         the picture parameter set NAL units as specified in Section
2209	         7.3.2.3 of [HEVC].

2211	      sprop-sei:

2213	         This parameter MAY be used to convey one or more SEI messages
2214	         that describe bitstream characteristics.  When present, a
2215	         decoder can rely on the bitstream characteristics that are
2216	         described in the SEI messages for the entire duration of the
2217	         session, independently from the persistence scopes of the SEI
2218	         messages as specified in [HEVC].

2220	         The value of the parameter is a comma-separated (',') list of
2221	         base64 [RFC4648] representations of SEI NAL units as specified
2222	         in Section 7.3.2.4 of [HEVC].

2224	            Informative note: Intentionally, no list of applicable or
2225	            inapplicable SEI messages is specified here.  Conveying
2226	            certain SEI messages in sprop-sei may be sensible in some
2227	            application scenarios and meaningless in others.  However,
2228	            a few examples are described below:

2230	           1) In an environment where the bitstream was created from
2231	               film-based source material, and no splicing is going to
2232	               occur during the lifetime of the session, the film grain
2233	               characteristics SEI message or the tone mapping
2234	               information SEI message are likely meaningful, and
2235	               sending them in sprop-sei rather than in the bitstream
2236	               at each entry point may help saving bits and allows to
2237	               configure the renderer only once, avoiding unwanted
2238	               artifacts.
2239	           2) The structure of pictures information SEI message in
2240	               sprop-sei can be used to inform a decoder of information
2241	               on the NAL unit types, picture order count values, and
2242	               prediction dependencies of a sequence of pictures.
2243	               Having such knowledge can be helpful for error recovery.
2244	           3) Examples for SEI messages that would be meaningless to
2245	               be conveyed in sprop-sei include the decoded picture
2246	               hash SEI message (it is close to impossible that all
2247	               decoded pictures have the same hash-tag), the display
2248	               orientation SEI message when the device is a handheld
2249	               device (as the display orientation may change when the
2250	               handheld device is turned around), or the filler payload
2251	               SEI message (as there is no point in just having more
2252	               bits in SDP).

2254	      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

2256	         These parameters MAY be used to signal the capabilities of a
2257	         receiver implementation.  These parameters MUST NOT be used
2258	         for any other purpose.  The highest level (specified by max-
2259	         recv-level-id) MUST be such that the receiver is fully capable
2260	         of supporting.  max-lsr, max-lps, max-cpb, max-dpb, max-br,
2261	         max-tr, and max-tc MAY be used to indicate capabilities of the
2262	         receiver that extend the required capabilities of the highest
2263	         level, as specified below.

2265	         When more than one parameter from the set (max-lsr, max-lps,
2266	         max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
2267	         receiver MUST support all signaled capabilities
2268	         simultaneously.  For example, if both max-lsr and max-br are
2269	         present, the highest level with the extension of both the
2270	         picture rate and bitrate is supported.  That is, the receiver
2271	         is able to decode bitstreams in which the luma sample rate is
2272	         up to max-lsr (inclusive), the bitrate is up to max-br
2273	         (inclusive), the coded picture buffer size is derived as
2274	         specified in the semantics of the max-br parameter below, and
2275	         the other properties comply with the highest level specified
2276	         by max-recv-level-id.

2278	            Informative note: When the OPTIONAL media type parameters
2279	            are used to signal the properties of a bitstream, and max-
2280	            lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc
2281	            are not present, the values of profile-space, tier-flag,
2282	            profile-id, profile-compatibility-indicator, interop-
2283	            constraints, and level-id must always be such that the
2284	            bitstream complies fully with the specified profile, tier,
2285	            and level.

2287	      max-lsr:
2288	         The value of max-lsr is an integer indicating the maximum
2289	         processing rate in units of luma samples per second.  The max-
2290	         lsr parameter signals that the receiver is capable of decoding
2291	         video at a higher rate than is required by the highest level.

2293	         When max-lsr is signaled, the receiver MUST be able to decode
2294	         bitstreams that conform to the highest level, with the
2295	         exception that the MaxLumaSR value in Table A-2 of [HEVC] for
2296	         the highest level is replaced with the value of max-lsr.
2297	         Senders MAY use this knowledge to send pictures of a given
2298	         size at a higher picture rate than is indicated in the highest
2299	         level.

2301	         When not present, the value of max-lsr is inferred to be equal
2302	         to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
2303	         highest level.

2305	         The value of max-lsr MUST be in the range of MaxLumaSR to
2306	         16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table
2307	         A-2 of [HEVC] for the highest level.

2309	      max-lps:
2310	         The value of max-lps is an integer indicating the maximum
2311	         picture size in units of luma samples.  The max-lps parameter
2312	         signals that the receiver is capable of decoding larger
2313	         picture sizes than are required by the highest level.  When
2314	         max-lps is signaled, the receiver MUST be able to decode
2315	         bitstreams that conform to the highest level, with the
2316	         exception that the MaxLumaPS value in Table A-1 of [HEVC] for
2317	         the highest level is replaced with the value of max-lps.
2318	         Senders MAY use this knowledge to send larger pictures at a
2319	         proportionally lower picture rate than is indicated in the
2320	         highest level.

2322	         When not present, the value of max-lps is inferred to be equal
2323	         to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
2324	         highest level.

2326	         The value of max-lps MUST be in the range of MaxLumaPS to
2327	         16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table
2328	         A-1 of [HEVC] for the highest level.

2330	      max-cpb:
2331	         The value of max-cpb is an integer indicating the maximum
2332	         coded picture buffer size in units of CpbBrVclFactor bits for
2333	         the VCL HRD parameters and in units of CpbBrNalFactor bits for
2334	         the NAL HRD parameters, where CpbBrVclFactor and
2335	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The max-
2336	         cpb parameter signals that the receiver has more memory than
2337	         the minimum amount of coded picture buffer memory required by
2338	         the highest level.  When max-cpb is signaled, the receiver
2339	         MUST be able to decode bitstreams that conform to the highest
2340	         level, with the exception that the MaxCPB value in Table A-1
2341	         of [HEVC] for the highest level is replaced with the value of
2342	         max-cpb.  Senders MAY use this knowledge to construct coded
2343	         bitstreams with greater variation of bitrate than can be
2344	         achieved with the MaxCPB value in Table A-1 of [HEVC].

2346	         When not present, the value of max-cpb is inferred to be equal
2347	         to the value of MaxCPB given in Table A-1 of [HEVC] for the
2348	         highest level.

2350	         The value of max-cpb MUST be in the range of MaxCPB to
2351	         16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1
2352	         of [HEVC] for the highest level.

2354	            Informative note: The coded picture buffer is used in the
2355	            hypothetical reference decoder (Annex C of HEVC).  The use
2356	            of the hypothetical reference decoder is recommended in
2357	            HEVC encoders to verify that the produced bitstream
2358	            conforms to the standard and to control the output bitrate.
2359	            Thus, the coded picture buffer is conceptually independent
2360	            of any other potential buffers in the receiver, including
2361	            de-packetization and de-jitter buffers.  The coded picture
2362	            buffer need not be implemented in decoders as specified in
2363	            Annex C of HEVC, but rather standard-compliant decoders can
2364	            have any buffering arrangements provided that they can
2365	            decode standard-compliant bitstreams.  Thus, in practice,
2366	            the input buffer for a video decoder can be integrated with
2367	            de-packetization and de-jitter buffers of the receiver.

2369	      max-dpb:
2370	         The value of max-dpb is an integer indicating the maximum
2371	         decoded picture buffer size in units decoded pictures at the
2372	         MaxLumaPS for the highest level, i.e. the number of decoded
2373	         pictures at the maximum picture size defined by the highest
2374	         level.  The value of max-dpb MUST be in the range of 1 to 16,
2375	         respectively.  The max-dpb parameter signals that the receiver
2376	         has more memory than the minimum amount of decoded picture
2377	         buffer memory required by default, which is MaxDpbPicBuf as
2378	         defined in [HEVC] (equal to 6).  When max-dpb is signaled, the
2379	         receiver MUST be able to decode bitstreams that conform to the
2380	         highest level, with the exception that the MaxDpbPicBuff value
2381	         defined in [HEVC] as 6 is replaced with the value of max-dpb.
2382	         Consequently, a receiver that signals max-dpb MUST be capable
2383	         of storing the following number of decoded pictures
2384	         (MaxDpbSize) in its decoded picture buffer:

2386	                          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
2387	              MaxDpbSize = Min( 4 * max-dpb, 16 )
2388	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
2389	              MaxDpbSize = Min( 2 * max-dpb, 16 )
2390	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
2391	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
2392	           else
2393	              MaxDpbSize = max-dpb

2395	                        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
2396	         level and PicSizeInSamplesY is the current size of each
2397	         decoded picture in units of luma samples as defined in [HEVC].

2399	                        The value of max-dpb MUST be greater than or equal to the
2400	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].  Senders
2401	         MAY use this knowledge to construct coded bitstreams with
2402	         improved compression.

2404	                        When not present, the value of max-dpb is inferred to be equal
2405	         to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].

2407	            Informative note: This parameter was added primarily to
2408	            complement a similar codepoint in the ITU-T Recommendation
2409	            H.245, so as to facilitate signaling gateway designs.  The
2410	            decoded picture buffer stores reconstructed samples.  There
2411	            is no relationship between the size of the decoded picture
2412	            buffer and the buffers used in RTP, especially de-
2413	            packetization and de-jitter buffers.

2415	      max-br:
2416	         The value of max-br is an integer indicating the maximum video
2417	         bitrate in units of CpbBrVclFactor bits per second for the VCL
2418	         HRD parameters and in units of CpbBrNalFactor bits per second
2419	         for the NAL HRD parameters, where CpbBrVclFactor and
2420	         CpbBrNalFactor are defined in Section A.4 of [HEVC].

2422	         The max-br parameter signals that the video decoder of the
2423	         receiver is capable of decoding video at a higher bitrate than
2424	         is required by the highest level.

2426	         When max-br is signaled, the video codec of the receiver MUST
2427	         be able to decode bitstreams that conform to the highest
2428	         level, with the following exceptions in the limits specified
2429	         by the highest level:

2431	          o The value of max-br replaces the MaxBR value in Table A-2
2432	            of [HEVC] for the highest level.
2433	          o When the max-cpb parameter is not present, the result of
2434	            the following formula replaces the value of MaxCPB in Table
2435	            A-1 of [HEVC]:

2437	               (MaxCPB of the highest level) * max-br / (MaxBR of the
2438	               highest level)

2440	         For example, if a receiver signals capability for Main profile
2441	         Level 2 with max-br equal to 2000, this indicates a maximum
2442	         video bitrate of 2000 kbits/sec for VCL HRD parameters, a
2443	         maximum video bitrate of 2200 kbits/sec for NAL HRD
2444	         parameters, and a CPB size of 2000000 bits (2000000 / 1500000
2445	         * 1500000).

2447	         Senders MAY use this knowledge to send higher bitrate video as
2448	         allowed in the level definition of Annex A of HEVC to achieve
2449	         improved video quality.

2451	         When not present, the value of max-br is inferred to be equal
2452	         to the value of MaxBR given in Table A-2 of [HEVC] for the
2453	         highest level.

2455	         The value of max-br MUST be in the range of MaxBR to
2456	         16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of
2457	         [HEVC] for the highest level.

2459	            Informative note: This parameter was added primarily to
2460	            complement a similar codepoint in the ITU-T Recommendation
2461	            H.245, so as to facilitate signaling gateway designs.  The
2462	            assumption that the network is capable of handling such
2463	            bitrates at any given time cannot be made from the value of
2464	            this parameter.  In particular, no conclusion can be drawn
2465	            that the signaled bitrate is possible under congestion
2466	            control constraints.

2468	      max-tr:
2469	         The value of max-tr is an integer indication the maximum
2470	         number of tile rows.  The max-tr parameter signals that the
2471	         receiver is capable of decoding video with a larger number of
2472	         tile rows than the value allowed by the highest level.

2474	         When max-tr is signaled, the receiver MUST be able to decode
2475	         bitstreams that conform to the highest level, with the
2476	         exception that the MaxTileRows value in Table A-1 of [HEVC]
2477	         for the highest level is replaced with the value of max-tr.

2479	         Senders MAY use this knowledge to send pictures utilizing a
2480	         larger number of tile rows than the value allowed by the
2481	         highest level.

2483	         When not present, the value of max-tr is inferred to be equal
2484	         to the value of MaxTileRows given in Table A-1 of [HEVC] for
2485	         the highest level.

2487	         The value of max-tr MUST be in the range of MaxTileRows to
2488	         16 * MaxTileRows, inclusive, where MaxTileRows is given in
2489	         Table A-1 of [HEVC] for the highest level.

2491	      max-tc:
2492	         The value of max-tc is an integer indication the maximum
2493	         number of tile columns.  The max-tc parameter signals that the
2494	         receiver is capable of decoding video with a larger number of
2495	         tile columns than the value allowed by the highest level.

2497	         When max-tc is signaled, the receiver MUST be able to decode
2498	         bitstreams that conform to the highest level, with the
2499	         exception that the MaxTileCols value in Table A-1 of [HEVC]
2500	         for the highest level is replaced with the value of max-tc.

2502	         Senders MAY use this knowledge to send pictures utilizing a
2503	         larger number of tile columns than the value allowed by the
2504	         highest level.

2506	         When not present, the value of max-tc is inferred to be equal
2507	         to the value of MaxTileCols given in Table A-1 of [HEVC] for
2508	         the highest level.

2510	         The value of max-tc MUST be in the range of MaxTileCols to
2511	         16 * MaxTileCols, inclusive, where MaxTileCols is given in
2512	         Table A-1 of [HEVC] for the highest level.

2514	      max-fps:

2516	         The value of max-fps is an integer indicating the maximum
2517	         picture rate in units of pictures per 100 seconds that can be
2518	         effectively processed by the receiver.  The max-fps parameter
2519	         MAY be used to signal that the receiver has a constraint in
2520	         that it is not capable of processing video effectively at the
2521	         full picture rate that is implied by the highest level and,
2522	         when present, one or more of the parameters max-lsr, max-lps,
2523	         and max-br.

2525	         The value of max-fps is not necessarily the picture rate at
2526	         which the maximum picture size can be sent, it constitutes a
2527	         constraint on maximum picture rate for all resolutions.

2529	            Informative note: The max-fps parameter is semantically
2530	            different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
2531	            max-tr, and max-tc in that max-fps is used to signal a
2532	            constraint, lowering the maximum picture rate from what is
2533	            implied by other parameters.

2535	         The encoder MUST use a picture rate equal to or less than this
2536	         value.  In cases where the max-fps parameter is absent the
2537	         encoder is free to choose any picture rate according to the
2538	         highest level and any signaled optional parameters.

2540	         The value of max-fps MUST be smaller than or equal to the full
2541	         picture rate that is implied by the highest level and, when
2542	         present, one or more of the parameters max-lsr, max-lps, and
2543	         max-br.

2545	      sprop-max-don-diff:

2547	         The value of this parameter MUST be equal to 0, if the RTP
2548	         stream does not depend on other RTP streams and there is no
2549	         NAL unit naluA that is followed in transmission order by any
2550	         NAL unit preceding naluA in decoding order.  Otherwise, this
2551	         parameter specifies the maximum absolute difference between
2552	         the decoding order number (i.e., AbsDon) values of any two NAL
2553	         units naluA and naluB, where naluA follows naluB in decoding
2554	         order and precedes naluB in transmission order.

2556	         The value of sprop-max-don-diff MUST be an integer in the
2557	         range of 0 to 32767, inclusive.

2559	         When not present, the value of sprop-max-don-diff is inferred
2560	         to be equal to 0.

2562	         When the RTP stream depends on one or more other RTP streams
2563	         (in this case tx-mode MUST be equal to "MSM" and MSM is in
2564	         use), this parameter MUST be present and the value MUST be
2565	         greater than 0.

2567	            Informative note: When the RTP stream does not depend on
2568	            other RTP streams, either MSM or SSM may be in use.

2570	      sprop-depack-buf-nalus:

2572	         This parameter specifies the maximum number of NAL units that
2573	         precede a NAL unit in transmission order and follow the NAL
2574	         unit in decoding order.

2576	         The value of sprop-depack-buf-nalus MUST be an integer in the
2577	         range of 0 to 32767, inclusive.

2579	         When not present, the value of sprop-depack-buf-nalus is
2580	         inferred to be equal to 0.

2582	         When the RTP stream depends on one or more other RTP streams
2583	         (in this case tx-mode MUST be equal to "MSM" and MSM is in
2584	         use), this parameter MUST be present and the value MUST be
2585	         greater than 0.

2587	      sprop-depack-buf-bytes:

2589	         This parameter signals the required size of the de-
2590	         packetization buffer in units of bytes.  The value of the
2591	         parameter MUST be greater than or equal to the maximum buffer
2592	         occupancy (in units of bytes) of the de-packetization buffer
2593	         as specified in section 6.

2595	         The value of sprop-depack-buf-bytes MUST be an integer in the
2596	         range of 0 to 4294967295, inclusive.

2598	         When the RTP stream depends on one or more other RTP streams
2599	         (in this case tx-mode MUST be equal to "MSM" and MSM is in
2600	         use) or sprop-max-don-diff is present and greater than 0, this
2601	         parameter MUST be present and the value MUST be greater than
2602	         0.

2604	            Informative note: The value of sprop-depack-buf-bytes
2605	            indicates the required size of the de-packetization buffer
2606	            only.  When network jitter can occur, an appropriately
2607	            sized jitter buffer has to be available as well.

2609	      depack-buf-cap:

2611	         This parameter signals the capabilities of a receiver
2612	         implementation and indicates the amount of de-packetization
2613	         buffer space in units of bytes that the receiver has available
2614	         for reconstructing the NAL unit decoding order from NAL units
2615	         carried in one or more RTP streams.  A receiver is able to
2616	         handle any RTP stream, and all RTP streams the RTP stream
2617	         depends on, when present, for which the value of the sprop-
2618	         depack-buf-bytes parameter is smaller than or equal to this
2619	         parameter.

2621	         When not present, the value of depack-buf-cap is inferred to
2622	         be equal to 4294967295.  The value of depack-buf-cap MUST be
2623	         an integer in the range of 1 to 4294967295, inclusive.

2625	            Informative note: depack-buf-cap indicates the maximum
2626	            possible size of the de-packetization buffer of the
2627	            receiver only.  When network jitter can occur, an
2628	            appropriately sized jitter buffer has to be available as
2629	            well.

2631	      sprop-segmentation-id:

2633	         This parameter MAY be used to signal the segmentation tools
2634	         present in the bitstream and that can be used for
2635	         parallelization.  The value of sprop-segmentation-id MUST be
2636	         an integer in the range of 0 to 3, inclusive.  When not
2637	         present, the value of sprop-segmentation-id is inferred to be
2638	         equal to 0.

2640	         When sprop-segmentation-id is equal to 0, no information about
2641	         the segmentation tools is provided.  When sprop-segmentation-
2642	         id is equal to 1, it indicates that slices are present in the
2643	         bitstream.  When sprop-segmentation-id is equal to 2, it
2644	         indicates that tiles are present in the bitstream.  When
2645	         sprop-segmentation-id is equal to 3, it indicates that WPP is
2646	         used in the bitstream.

2648	      sprop-spatial-segmentation-idc:

2650	         A base16 [RFC4648] representation of the syntax element
2651	         min_spatial_segmentation_idc as specified in [HEVC].  This
2652	         parameter MAY be used to describe parallelization capabilities
2653	         of the bitstream.

2655	      dec-parallel-cap:

2657	         This parameter MAY be used to indicate the decoder's
2658	         additional decoding capabilities given the presence of tools
2659	         enabling parallel decoding, such as slices, tiles, and WPP, in
2660	         the bitstream.  The decoding capability of the decoder may
2661	         vary with the setting of the parallel decoding tools present
2662	         in the bitstream, e.g. the size of the tiles that are present
2663	         in a bitstream.  Therefore, multiple capability points may be
2664	         provided, each indicating the minimum required decoding
2665	         capability that is associated with a parallelism requirement,
2666	         which is a requirement on the bitstream that enables parallel
2667	         decoding.

2669	         Each capability point is defined as a combination of 1) a
2670	         parallelism requirement, 2) a profile (determined by profile-
2671	         space and profile-id), 3) a highest level, and 4) a maximum
2672	         processing rate, a maximum picture size, and a maximum video
2673	         bitrate that may be equal to or greater than that determined
2674	         by the highest level.  The parameter's syntax in ABNF
2675	         [RFC5234] is as follows:

2677	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2678	                               cap-point) "}"

2680	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2681	                         cap-parameter)

2683	            spatial-seg-idc = 1*4DIGIT ; (1-4095)
2684	            cap-parameter = tier-flag / level-id / max-lsr
2685	                            / max-lps / max-br

2687	            tier-flag = "tier-flag" EQ ("0" / "1")

2689	            level-id  = "level-id" EQ 1*3DIGIT ; (0-255)

2691	            max-lsr   = "max-lsr" EQ  1*20DIGIT ; (0-
2692	            18,446,744,073,709,551,615)

2694	            max-lps   = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)

2696	            max-br    = "max-br"  EQ 1*20DIGIT ; (0-
2697	            18,446,744,073,709,551,615)

2699	            EQ = "="

2701	         The set of capability points expressed by the dec-parallel-cap
2702	         parameter is enclosed in a pair of curly braces ("{}").  Each
2703	         set of two consecutive capability points is separated by a
2704	         comma (',').  Within each capability point, each set of two
2705	         consecutive parameters, and when present, their values, is
2706	         separated by a semicolon (';').

2708	         The profile of all capability points is determined by profile-
2709	         space and profile-id that are outside the dec-parallel-cap
2710	         parameter.

2712	         Each capability point starts with an indication of the
2713	         parallelism requirement, which consists of a parallel tool
2714	         type, which may be equal to 'w' or 't', and a decimal value of
2715	         the spatial-seg-idc parameter.  When the type is 'w', the
2716	         capability point is valid only for H.265 bitstreams with WPP
2717	         in use, i.e. entropy_coding_sync_enabled_flag equal to 1.
2718	         When the type is 't', the capability point is valid only for
2719	         H.265 bitstreams with WPP not in use (i.e.
2720	         entropy_coding_sync_enabled_flag equal to 0).  The capability-
2721	         point is valid only for H.265 bitstreams with
2722	         min_spatial_segmentation_idc equal to or greater than spatial-
2723	         seg-idc.

2725	         After the parallelism requirement indication, each capability
2726	         point continues with one or more pairs of parameter and value
2727	         in any order for any of the following parameters:

2729	            o tier-flag
2730	            o level-id
2731	            o max-lsr
2732	            o max-lps
2733	            o max-br

2735	         At most one occurrence of each of the above five parameters is
2736	         allowed within each capability point.

2738	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2739	         cap.level-id for a capability point indicate the highest level
2740	         of the capability point.  The values of dec-parallel-cap.max-
2741	         lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
2742	         a capability point indicate the maximum processing rate in
2743	         units of luma samples per second, the maximum picture size in
2744	         units of luma samples, and the maximum video bitrate (in units
2745	         of CpbBrVclFactor bits per second for the VCL HRD parameters
2746	         and in units of CpbBrNalFactor bits per second for the NAL HRD
2747	         parameters where CpbBrVclFactor and CpbBrNalFactor are defined
2748	         in Section A.4 of [HEVC]).

2750	         When not present, the value of dec-parallel-cap.tier-flag is
2751	         inferred to be equal to the value of tier-flag outside the
2752	         dec-parallel-cap parameter.  When not present, the value of
2753	         dec-parallel-cap.level-id is inferred to be equal to the value
2754	         of max-recv-level-id outside the dec-parallel-cap parameter.
2755	         When not present, the value of dec-parallel-cap.max-lsr, dec-
2756	         parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
2757	         to be equal to the value of max-lsr, max-lps, or max-br,
2758	         respectively, outside the dec-parallel-cap parameter.

2760	         The general decoding capability, expressed by the set of
2761	         parameters outside of dec-parallel-cap, is defined as the
2762	         capability point that is determined by the following
2763	         combination of parameters: 1) the parallelism requirement
2764	         corresponding to the value of sprop-segmentation-id equal to 0
2765	         for a bitstream, 2) the profile determined by profile-space,
2766	         profile-id, profile-compatibility-indicator, and interop-
2767	         constraints, 3) the tier and the highest level determined by
2768	         tier-flag and max-recv-level-id, and 4) the maximum processing
2769	         rate, the maximum picture size, and the maximum video bitrate
2770	         determined by the highest level.  The general decoding
2771	         capability MUST NOT be included as one of the set of
2772	         capability points in the dec-parallel-cap parameter.

2774	         For example, the following parameters express the general
2775	         decoding capability of 720p30 (Level 3.1) plus an additional
2776	         decoding capability of 1080p30 (Level 4) given that the
2777	         spatially largest tile or slice used in the bitstream is equal
2778	         to or less than 1/3 of the picture size:

2780	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}

2782	         For another example, the following parameters express an
2783	         additional decoding capability of 1080p30, using dec-parallel-
2784	         cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
2785	         used in the bitstream:

2787	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2788	                        max-lsr=62668800;max-lps=2088960}

2790	            Informative note: When min_spatial_segmentation_idc is
2791	            present in a bitstream and WPP is not used, [HEVC]
2792	            specifies that there is no slice or no tile in the
2793	            bitstream containing more than 4 * PicSizeInSamplesY /
2794	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2796	      include-dph:

2798	         This parameter is used to indicate the capability and
2799	         preference to utilize or include decoded picture hash (DPH)
2800	         SEI messages (See Section D.3.19 of [HEVC]) in the bitstream.
2801	         DPH SEI messages can be used to detect picture corruption so
2802	         the receiver can request picture repair, see Section 8.  The
2803	         value is a comma separated list of hash types that is
2804	         supported or requested to be used, each hash type provided as
2805	         an unsigned integer value (0-255), with the hash types listed
2806	         from most preferred to the least preferred.  Example:

2808	         "include-dph=0,2", which indicates the capability for MD5
2809	         (most preferred) and Checksum (less preferred).  If the
2810	         parameter is not included or the value contains no hash types,
2811	         then no capability to utilize DPH SEI messages is assumed.
2812	         Note that DPH SEI messages MAY still be included in the
2813	         bitstream even when there is no declaration of capability to
2814	         use them, as in general SEI messages do not affect the
2815	         normative decoding process and decoders are allowed to ignore
2816	         SEI messages.

2818	      Encoding considerations:

2820	         This type is only defined for transfer via RTP (RFC 3550).

2822	      Security considerations:

2824	         See Section 9 of RFC XXXX.

2826	      Public specification:

2828	         Please refer to Section 13 of RFC XXXX.

2830	      Additional information: None

2832	      File extensions: none

2834	      Macintosh file type code: none

2836	      Object identifier or OID: none

2838	      Person & email address to contact for further information:

2840	         Ye-Kui Wang (yekuiw@qti.qualcomm.com).

2842	      Intended usage: COMMON

2844	      Author: See Section 14 of RFC XXXX.

2846	      Change controller:

2848	         IETF Audio/Video Transport Payloads working group delegated
2849	         from the IESG.

2851	7.2 SDP Parameters

2853	   The receiver MUST ignore any parameter unspecified in this memo.

2855	7.2.1 Mapping of Payload Type Parameters to SDP

2857	   The media type video/H265 string is mapped to fields in the Session
2858	   Description Protocol (SDP) [RFC4566] as follows:

2860	   o  The media name in the "m=" line of SDP MUST be video.

2862	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
2863	      media subtype).

2865	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2867	   o  The OPTIONAL parameters "profile-space", "profile-id", "tier-
2868	      flag", "level-id", "interop-constraints", "profile-compatibility-
2869	      indicator", "sprop-sub-layer-id", "recv-sub-layer-id", "max-recv-
2870	      level-id", "tx-mode", "max-lsr", "max-lps", "max-cpb", "max-dpb",
2871	      "max-br", "max-tr", "max-tc", "max-fps", "sprop-max-don-diff",
2872	      "sprop-depack-buf-nalus", "sprop-depack-buf-bytes", "depack-buf-
2873	      cap", "sprop-segmentation-id", "sprop-spatial-segmentation-idc",
2874	      "dec-parallel-cap", and "include-dph", when present, MUST be
2875	      included in the "a=fmtp" line of SDP.  This parameter is
2876	      expressed as a media type string, in the form of a semicolon
2877	      separated list of parameter=value pairs.

2879	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2880	      pps", when present, MUST be included in the "a=fmtp" line of SDP
2881	      or conveyed using the "fmtp" source attribute as specified in
2882	      section 6.3 of [RFC5576].  For a particular media format (i.e.
2883	      RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
2884	      NOT be both included in the "a=fmtp" line of SDP and conveyed
2885	      using the "fmtp" source attribute.  When included in the "a=fmtp"
2886	      line of SDP, these parameters are expressed as a media type
2887	      string, in the form of a semicolon separated list of
2888	      parameter=value pairs.  When conveyed in the "a=fmtp" line of SDP
2889	      for a particular payload type, the parameters "sprop-vps",
2890	      "sprop-sps", and "sprop-pps" MUST be applied to each SSRC with
2891	      the payload type.  When conveyed using the "fmtp" source
2892	      attribute, these parameters are only associated with the given
2893	      source and payload type as parts of the "fmtp" source attribute.

2895	          Informative note: Conveyance of "sprop-vps", "sprop-sps", and
2896	          "sprop-pps" using the "fmtp" source attribute allows for out-
2897	          of-band transport of parameter sets in topologies like Topo-
2898	          Video-switch-MCU as specified in [RFC5117].

2900	   An example of media representation in SDP is as follows:

2902	         m=video 49170 RTP/AVP 98
2903	         a=rtpmap:98 H265/90000
2904	         a=fmtp:98 profile-id=1;
2905	                   sprop-vps=<video parameter sets data>

2907	7.2.2 Usage with SDP Offer/Answer Model

2909	   When HEVC is offered over RTP using SDP in an Offer/Answer model
2910	   [RFC3264] for negotiation for unicast usage, the following
2911	   limitations and rules apply:

2913	   o  The parameters identifying a media format configuration for HEVC
2914	      are profile-space, profile-id, tier-flag, level-id, interop-
2915	      constraints, profile-compatibility-indicator, and tx-mode.  These
2916	      media configuration parameters, except level-id, MUST be used
2917	      symmetrically when the answerer does not include recv-sub-layer-
2918	      id in the answer for the media format (payload type) or the
2919	      included recv-sub-layer-id is equal to sprop-sub-layer-id in the
2920	      offer.  The answerer MUST

2922	        1) maintain all configuration parameters with the values
2923	           remaining the same as in the offer for the media format
2924	           (payload type), with the exception that the value of level-
2925	           id is changeable as long as the highest level indicated by
2926	           the answer is not higher than that indicated by the offer;

2928	        2) include in the answer the recv-sub-layer-id parameter, with
2929	           a value less than the sprop-sub-layer-id parameter in the
2930	           offer, for the media format (payload type), and maintain all
2931	           configuration parameters with the values being the same as
2932	           signalled in the sprop-vps for the chosen sub-layer
2933	           representation, with the exception that the value of level-
2934	           id is changeable as long as the highest level indicated by
2935	           the answer is not higher than the level indicated by the
2936	           sprop-vps in offer for the chosen sub-layer representation;
2937	           or

2939	        3) remove the media format (payload type) completely (when one
2940	           or more of the parameter values are not supported).

2942	          Informative note: The above requirement for symmetric use
2943	          does not apply for level-id, and does not apply for the other
2944	          bitstream or RTP stream properties and capability parameters.

2946	   o  The profile-compatibility-indicator, when offered as sendonly,
2947	      describe bitstream properties.  The answerer MAY accept an RTP
2948	      payload type even if the decoder is not capable of handling the
2949	      profile indicated by the profile-space, profile-id, and interop-
2950	      constraints parameters, but capable of any of the profiles
2951	      indicated by the profile-space, profile-compatibility-indicator,
2952	      and interop-constraints.  However, when the profile-
2953	      compatibility-indicator is used in a recvonly or sendrecv media
2954	      description, the bitstream using this RTP payload type is
2955	      required to conform to all profiles indicated by profile-space,
2956	      profile-compatibility-indicator, and interop-constraints.

2958	   o  To simplify handling and matching of these configurations, the
2959	      same RTP payload type number used in the offer SHOULD also be
2960	      used in the answer, as specified in [RFC3264].

2962	   o  The same RTP payload type number used in the offer MUST be used
2963	      in the answer when the answer includes recv-sub-layer-id.  When
2964	      the answer does not include recv-sub-layer-id, the answer MUST
2965	      NOT contain a payload type number used in the offer unless the
2966	      configuration is exactly the same as in the offer or the
2967	      configuration in the answer only differs from that in the offer
2968	      with a different value of level-id.  The answer MAY contain the
2969	      recv-sub-layer-id parameter if an HEVC bitstream contains
2970	      multiple operation points (using temporal scalability and sub-
2971	      layers) and sprop-vps is included in the offer where information
2972	      of sub-layers are present in the first video parameter set
2973	      contained in sprop-vps.  If the sprop-vps is provided in an
2974	      offer, an answerer MAY select a particular operation point
2975	      indicated in the first video parameter set contained in sprop-
2976	      vps.  When the answer includes recv-sub-layer-id that is less
2977	      than sprop-sub-layer-id in the offer, all video parameter sets
2978	      contained in the sprop-vps parameter in the SDP answer and all
2979	      video parameter sets sent in-band for either the offerer-to-
2980	      answerer direction or the answerer-to-offerer direction MUST be
2981	      consistent with the first video parameter set in the sprop-vps
2982	      parameter of the offer (see the semantics of sprop-vps in section
2983	      7.1 of this document on one video parameter set being consistent
2984	      with another video parameter set), and the bitstream sent in
2985	      either direction MUST conform to the profile, tier, level, and
2986	      constraints of the chosen sub-layer representation as indicated
2987	      by the first profile_tier_level( ) syntax structure in the first
2988	      video parameter set in the sprop-vps parameter of the offer.

2990	          Informative note: When an offerer receives an answer that
2991	          does not include recv-sub-layer-id, it has to compare payload
2992	          types not declared in the offer based on the media type (i.e.
2993	          video/H265) and the above media configuration parameters with
2994	          any payload types it has already declared.  This will enable
2995	          it to determine whether the configuration in question is new
2996	          or if it is equivalent to configuration already offered,
2997	          since a different payload type number may be used in the
2998	          answer.  The ability to perform operation point selection
2999	          enables a receiver to utilize the temporal scalable nature of
3000	          an HEVC bitstream.

3002	   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
3003	      sprop-depack-buf-bytes describe the properties of an RTP stream,
3004	      and all RTP streams the RTP stream depends on, when present, that
3005	      the offerer or the answerer is sending for the media format
3006	      configuration.  This differs from the normal usage of the
3007	      Offer/Answer parameters: normally such parameters declare the
3008	      properties of the bitstream or RTP stream that the offerer or the
3009	      answerer is able to receive.  When dealing with HEVC, the offerer
3010	      assumes that the answerer will be able to receive media encoded
3011	      using the configuration being offered.

3013	          Informative note:  The above parameters apply for any RTP
3014	          stream and all RTP streams the RTP stream depends on, when
3015	          present, sent by a declaring entity with the same
3016	          configuration; i.e. they are dependent on their source
3017	          endpoint.  Rather than being bound to the payload type, the
3018	          values may have to be applied to another payload type when
3019	          being sent, as they apply for the configuration.

3021	   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
3022	      max-br, max-tr, and max-tc MAY be used to declare further
3023	      capabilities of the offerer or answerer for receiving.  These
3024	      parameters MUST NOT be present when the direction attribute is
3025	      "sendonly".

3027	   o  The capability parameter max-fps MAY be used to declare lower
3028	      capabilities of the offerer or answerer for receiving.  The
3029	      parameters MUST NOT be present when the direction attribute is
3030	      "sendonly".

3032	   o  The capability parameter dec-parallel-cap MAY be used to declare
3033	      additional decoding capabilities of the offerer or answerer for
3034	      receiving.  Upon receiving such a declaration of a receiver, a
3035	      sender MAY send a bitstream to the receiver utilizing those
3036	      capabilities under the assumption that the bitstream fulfills the
3037	      parallelism requirement.  A bitstream that is sent based on
3038	      choosing a capability point with parallel tool type 'w' from dec-
3039	      parallel-cap MUST have entropy_coding_sync_enabled_flag equal to
3040	      1 and min_spatial_segmentation_idc equal to or larger than dec-
3041	      parallel-cap.spatial-seg-idc of the capability point.  A
3042	      bitstream that is sent based on choosing a capability point with
3043	      parallel tool type 't' from dec-parallel-cap MUST have
3044	      entropy_coding_sync_enabled_flag equal to 0 and
3045	      min_spatial_segmentation_idc equal to or larger than dec-
3046	      parallel-cap.spatial-seg-idc of the capability point.

3048	   o  An offerer has to include the size of the de-packetization
3049	      buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff and
3050	      sprop-depack-buf-nalus, in the offer for an interleaved HEVC
3051	      bitstream or for the MSM transmission mode.  To enable the
3052	      offerer and answerer to inform each other about their
3053	      capabilities for de-packetization buffering in receiving RTP
3054	      streams, both parties are RECOMMENDED to include depack-buf-cap.
3055	      For interleaved RTP streams or in MSM, it is also RECOMMENDED to
3056	      consider offering multiple payload types with different buffering
3057	      requirements when the capabilities of the receiver are unknown.

3059	   o  The capability parameter include-dph MAY be used to declare the
3060	      capability to utilize decoded picture hash SEI messages and which
3061	      types of hashes in any HEVC RTP streams received by the offerer
3062	      or answerer.

3064	   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
3065	      the "a=fmtp" line of SDP or conveyed using the "fmtp" source
3066	      attribute as specified in section 6.3 of [RFC5576]), are used for
3067	      out-of-band transport of the parameter sets (VPS, SPS, or PPS
3068	      respectively).

3070	   o  The answerer MAY use either out-of-band or in-band transport of
3071	      parameter sets for the bitstream it is sending, regardless of
3072	      whether out-of-band parameter sets transport has been used in the
3073	      offerer-to-answerer direction.  Parameter sets included in an
3074	      answer are independent of those parameter sets included in the
3075	      offer, as they are used for decoding two different bitstreams,
3076	      one from the answerer to the offerer and the other in the
3077	      opposite direction.  In case some RTP stream(s) are sent before
3078	      SDP offer/answer settles down, in-band parameter sets MUST be
3079	      used for those RTP stream parts sent before the SDP offer/answer.

3081	   o  The following rules apply to transport of parameter set in the
3082	      offerer-to-answerer direction.

3084	       o An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
3085	          If none of these parameters is present in the offer, then
3086	          only in-band transport of parameter sets is used.

3088	       o If the level to use in the offerer-to-answerer direction is
3089	          equal to the default level in the offer, the answerer MUST be
3090	          prepared to use the parameter sets included in sprop-vps,
3091	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
3092	          line of SDP or conveyed using the "fmtp" source attribute)
3093	          for decoding the incoming bitstream, e.g. by passing these
3094	          parameter set NAL units to the video decoder before passing
3095	          any NAL units carried in the RTP streams.  Otherwise, the
3096	          answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps
3097	          (either included in the "a=fmtp" line of SDP or conveyed
3098	          using the "fmtp" source attribute) and the offerer MUST
3099	          transmit parameter sets in-band.

3101	       o In MSM, the answerer MUST be prepared to use the parameter
3102	          sets out-of-band transmitted for the RTP stream and all RTP
3103	          streams the RTP stream depends on, when present, for decoding
3104	          the incoming bitstream, e.g. by passing these parameter set
3105	          NAL units to the video decoder before passing any NAL units
3106	          carried in the RTP streams.

3108	   o  The following rules apply to transport of parameter set in the
3109	      answerer-to-offerer direction.

3111	       o An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
3112	          If none of these parameters is present in the answer, then
3113	          only in-band transport of parameter sets is used.

3115	       o The offerer MUST be prepared to use the parameter sets
3116	          included in sprop-vps, sprop-sps, and sprop-pps (either
3117	          included in the "a=fmtp" line of SDP or conveyed using the
3118	          "fmtp" source attribute) for decoding the incoming bitstream,
3119	          e.g. by passing these parameter set NAL units to the video
3120	          decoder before passing any NAL units carried in the RTP
3121	          streams.

3123	       o In MSM, the offerer MUST be prepared to use the parameter
3124	          sets out-of-band transmitted for the RTP stream and all RTP
3125	          streams the RTP stream depends on, when present, for decoding
3126	          the incoming bitstream, e.g. by passing these parameter set
3127	          NAL units to the video decoder before passing any NAL units
3128	          carried in the RTP streams.

3130	   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
3131	      the "fmtp" source attribute as specified in section 6.3 of
3132	      [RFC5576], the receiver of the parameters MUST store the
3133	      parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps
3134	      and associate them with the source given as part of the "fmtp"
3135	      source attribute.  Parameter sets associated with one source
3136	      (given as part of the "fmtp" source attribute) MUST only be used
3137	      to decode NAL units conveyed in RTP packets from the same source
3138	      (given as part of the "fmtp" source attribute).  When this
3139	      mechanism is in use, SSRC collision detection and resolution MUST
3140	      be performed as specified in [RFC5576].

3142	   For bitstreams being delivered over multicast, the following rules
3143	   apply:

3145	   o  The media format configuration is identified by profile-space,
3146	      profile-id, tier-flag, level-id, interop-constraints, profile-
3147	      compatibility-indicator, and tx-mode.  These media format
3148	      configuration parameters, including level-id, MUST be used
3149	      symmetrically; that is, the answerer MUST either maintain all
3150	      configuration parameters or remove the media format (payload
3151	      type) completely.  Note that this implies that the level-id for
3152	      Offer/Answer in multicast is not changeable.

3154	   o  To simplify the handling and matching of these configurations,
3155	      the same RTP payload type number used in the offer SHOULD also be
3156	      used in the answer, as specified in [RFC3264].  An answer MUST
3157	      NOT contain a payload type number used in the offer unless the
3158	      configuration is the same as in the offer.

3160	   o  Parameter sets received MUST be associated with the originating
3161	      source and MUST only be used in decoding the incoming bitstream
3162	      from the same source.

3164	   o  The rules for other parameters are the same as above for unicast
3165	      as long as the three above rules are obeyed.

3167	   Table 1 lists the interpretation of all the parameters that MUST be
3168	   used for the various combinations of offer, answer, and direction
3169	   attributes.  Note that the two columns wherein the recv-sub-layer-id
3170	   parameter is used only apply to answers, whereas the other columns
3171	   apply to both offers and answers.

3173	   Table 1.  Interpretation of parameters for various combinations of
3174	   offers, answers, direction attributes, with and without recv-sub-
3175	   layer-id.  Columns that do not indicate offer or answer apply to
3176	   both.

3178	                                          sendonly --+
3179	            answer: recvonly, recv-sub-layer-id --+  |
3180	              recvonly w/o recv-sub-layer-id --+  |  |
3181	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
3182	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
3183	                                         |  |  |  |  |
3184	      profile-space                      C  D  C  D  P
3185	      profile-id                         C  D  C  D  P
3186	      tier-flag                          C  D  C  D  P
3187	      level-id                           D  D  D  D  P
3188	      interop-constraints                C  D  C  D  P
3189	      profile-compatibility-indicator    C  D  C  D  P
3190	      tx-mode                            C  C  C  C  P
3191	      max-recv-level-id                  R  R  R  R  -
3192	      sprop-max-don-diff                 P  P  -  -  P
3193	      sprop- depack-buf-nalus            P  P  -  -  P
3194	      sprop-depack-buf-bytes             P  P  -  -  P
3195	      depack-buf-cap                     R  R  R  R  -
3196	      sprop-segmentation-id              P  P  P  P  P
3197	      sprop-spatial-segmentation-idc     P  P  P  P  P
3198	      max-br                             R  R  R  R  -
3199	      max-cpb                            R  R  R  R  -
3200	      max-dpb                            R  R  R  R  -
3201	      max-lsr                            R  R  R  R  -
3202	      max-lps                            R  R  R  R  -
3203	      max-tr                             R  R  R  R  -
3204	      max-tc                             R  R  R  R  -
3205	      max-fps                            R  R  R  R  -
3206	      sprop-vps                          P  P  -  -  P
3207	      sprop-sps                          P  P  -  -  P
3208	      sprop-pps                          P  P  -  -  P
3209	      sprop-sub-layer-id                 P  P  -  -  P
3210	      recv-sub-layer-id                  X  O  X  O  -
3211	      dec-parallel-cap                   R  R  R  R  -
3212	      include-dph                        R  R  R  R  -

3214	     Legend:

3216	      C: configuration for sending and receiving bitstreams
3217	      D: changable configuration, same as C except possible
3218	         to answer with a different but consistent value (see the
3219	         semantics of the six parameters related to profile, tier,
3220	         and level on these parameters being consistent)
3221	      P: properties of the bitstream to be sent
3222	      R: receiver capabilities
3223	      O: operation point selection
3224	      X: MUST NOT be present
3225	      -: not usable, when present SHOULD be ignored

3227	   Parameters used for declaring receiver capabilities are in general
3228	   downgradable; i.e. they express the upper limit for a sender's
3229	   possible behavior.  Thus, a sender MAY select to set its encoder
3230	   using only lower/lesser or equal values of these parameters.

3232	   When the answer does not include recv-sub-layer-id that is less than
3233	   the sprop-sub-layer-id in the offer, parameters declaring a
3234	   configuration point are not changeable, with the exception of the
3235	   level-id parameter for unicast usage, and these parameters express
3236	   values a receiver expects to be used and MUST be used verbatim in
3237	   the answer as in the offer.

3239	   When a sender's capabilities are declared with the configuration
3240	   parameters, these parameters express a configuration that is
3241	   acceptable for the sender to receive bitstreams.  In order to
3242	   achieve high interoperability levels, it is often advisable to offer
3243	   multiple alternative configurations.  It is impossible to offer
3244	   multiple configurations in a single payload type.  Thus, when
3245	   multiple configuration offers are made, each offer requires its own
3246	   RTP payload type associated with the offer.  However, it is possible
3247	   to offer multiple operation points using one configuration in a
3248	   single payload type by including sprop-vps in the offer and recv-
3249	   sub-layer-id in the answer.

3251	   A receiver SHOULD understand all media type parameters, even if it
3252	   only supports a subset of the payload format's functionality.  This
3253	   ensures that a receiver is capable of understanding when an offer to
3254	   receive media can be downgraded to what is supported by the receiver
3255	   of the offer.

3257	   An answerer MAY extend the offer with additional media format
3258	   configurations.  However, to enable their usage, in most cases a
3259	   second offer is required from the offerer to provide the bitstream
3260	   property parameters that the media sender will use.  This also has
3261	   the effect that the offerer has to be able to receive this media
3262	   format configuration, not only to send it.

3264	7.2.3 Usage in Declarative Session Descriptions

3266	   When HEVC over RTP is offered with SDP in a declarative style, as in
3267	   Real Time Streaming Protocol (RTSP) [RFC2326] or Session
3268	   Announcement Protocol (SAP) [RFC2974], the following considerations
3269	   are necessary.

3271	   o  All parameters capable of indicating both bitstream properties
3272	      and receiver capabilities are used to indicate only bitstream
3273	      properties.  For example, in this case, the parameter profile-
3274	      tier-level-id declares the values used by the bitstream, not the
3275	      capabilities for receiving bitstreams.  This results in that the
3276	      following interpretation of the parameters MUST be used:

3278	   Declaring actual configuration or bitstream properties:

3280	     - profile-space
3281	     - profile-id
3282	     - tier-flag
3283	     - level-id
3284	     - interop-constraints
3285	     - profile-compatibility-indicator
3286	     - tx-mode
3287	     - sprop-vps
3288	     - sprop-sps
3289	     - sprop-pps
3290	     - sprop-max-don-diff
3291	     - sprop-depack-buf-nalus
3292	     - sprop-depack-buf-bytes
3293	     - sprop-segmentation-id
3294	     - sprop-spatial-segmentation-idc

3296	   Not usable (when present, they SHOULD be ignored):

3298	     - max-lps
3299	     - max-lsr
3300	     - max-cpb
3301	     - max-dpb
3302	     - max-br
3303	     - max-tr
3304	     - max-tc
3305	     - max-fps
3306	     - max-recv-level-id
3307	     - depack-buf-cap
3308	     - sprop-sub-layer-id
3309	     - dec-parallel-cap
3310	     - include-dph

3312	   o  A receiver of the SDP is required to support all parameters and
3313	      values of the parameters provided; otherwise, the receiver MUST
3314	      reject (RTSP) or not participate in (SAP) the session.  It falls
3315	      on the creator of the session to use values that are expected to
3316	      be supported by the receiving application.

3318	7.2.4 Parameter Sets Considerations

3320	   When out-of-band transport of parameter sets is used, parameter sets
3321	   MAY still be additionally transported in-band unless explicitly
3322	   disallowed by an application, and some of these additionally in-band
3323	   transported parameter sets may update some of the out-of-band
3324	   transported parameter sets.  Update of a parameter set refers to
3325	   sending of a parameter set of the same type using the same parameter
3326	   set ID but with different values for at least one other parameter of
3327	   the parameter set.

3329	   If MSM is used, the rules on signaling media decoding dependency in
3330	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
3331	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
3332	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
3333	   with more than one address.  The order of session dependency is
3334	   given from the RTP stream containing the lowest temporal sub-layer
3335	   to the RTP stream containing the highest temporal sub-layer.

3337	7.2.5 Dependency Signaling in Multi-Stream Mode

3339	   If MSM is used, the rules on signaling media decoding dependency in
3340	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
3341	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
3342	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
3343	   with more than one address.  The order of session dependency is
3344	   given from the RTP stream containing the lowest temporal sub-layer
3345	   to the RTP stream containing the highest temporal sub-layer.

3347	8 Use with Feedback Messages

3349	   As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
3350	   Feedback messages are identified by the RTCP packet type value PSFB
3351	   (206).  AVPF [RFC4585] defines three payload-specific feedback
3352	   messages and one application layer feedback message, and CCM
3353	   [RFC5104] specifies four payload-specific feedback messages.

3355	   These feedback messages are identified by means of the feedback
3356	   message type (FMT) parameter as follows:

3358	   Assigned in [RFC4585]:

3360	      1:     Picture Loss Indication (PLI)
3361	      2:     Slice Lost Indication (SLI)
3362	      3:     Reference Picture Selection Indication (RPSI)
3363	      15:    Application layer FB message
3364	      31:    reserved for future expansion of the number space

3366	   Assigned in [RFC5104]:

3368	      4:     Full Intra Request (FIR) Command
3369	      5:     Temporal-Spatial Trade-off Request (TSTR)
3370	      6:     Temporal-Spatial Trade-off Notification (TSTN)
3371	      7:     Video Back Channel Message (VBCM)

3373	   Unassigned:

3375	      0:      unassigned
3376	      8-14:   unassigned
3377	      16-30:  unassigned

3379	   The following subsections define the use of the PLI, SLI, RPSI, and
3380	   FIR feedback messages with HEVC.

3382	8.1 Picture Loss Indication (PLI)

3384	   As specified in RFC 4585 section 6.3.1, the reception of a picture
3385	   loss indication by a media sender indicates "the loss of an undefined
3386	   amount of coded video data belonging to one or more pictures.".
3387	   Without having any specific knowledge of the setup of the bitstream
3388	   (such as: use and location of in-band parameter sets, non-IDR decoder
3389	   refresh points, picture structures, and so forth) a reaction to the
3390	   reception of an PLI by an HEVC sender SHOULD be to send an IDR picture
3391	   and relevant parameter sets; potentially with sufficient redundancy so
3392	   to ensure correct reception.  However, sometimes information about the
3393	   bitstream structure is known.  For example, state could have been
3394	   established outside of the mechanisms defined in this document that
3395	   parameter sets are conveyed out of band only, and stay static for the
3396	   duration of the session.  In that case, it is obviously unnecessary to
3397	   send them in-band as a result of the reception of a PLI.  Other examples
3398	   could be devised based on a priori knowledge of different aspects of
3399	   the bitstream structure.  In all cases, the timing and congestion
3400	   control mechanisms of RFC 4585 MUST be observed.

3402	8.2 Slice Loss Indication

3404	   RFC 4585's Slice Loss Indication can be used to indicate, to a sender,
3405	   the loss of a number of Coded Tree Blocks (CTBs) in CTB raster scan
3406	   order of a picture.  In the SLI's Feedback Control Indication (FCI)
3407	   field, the subfield "First" MUST be set to the CTB address of the first
3408	   lost CTB.  Note that the CTB address is in CTB raster scan order of a
3409	   picture.  For the first CTB of a slice segment, the CTB address is the
3410	   value   of   slice_segment_address   when   present;   or   0   when
3411	   first_slice_segement_in_pic_flag is equal to 1; both syntax elements
3412	   are in the slice segment header.  The subfield "Number" MUST be set to
3413	   the number of consecutive lost CTBs, again in CTB raster scan order of
3414	   a picture.  Note that due to both the "First" and "Number" are counted
3415	   in CTBs in CTB raster scan order, of a picture, not in tile scan order
3416	   (which is the bitstream order of CTBs), multiple SLI messages may be
3417	   needed to report the loss of one tile covering multiple CTB rows but
3418	   less wide than the picture.

3420	   The subfield "PictureID" MUST be set to the 6 least significant bits
3421	   of a binary representation of the value of PicOrderCntVal, as defined
3422	   in [HEVC], of the picture for which the lost CTBs are indicated.  Note
3423	   that for IDR pictures the syntax element slice_pic_order_cnt_lsb is
3424	   not present, but then the value is inferred to be equal to 0.

3426	   As described in RFC 4585, an encoder in a media sender can use this
3427	   information to "clean up" the corrupted picture by sending intra
3428	   information, while observing the constraints described in RFC4585, for
3429	   example with respect to congestion control.  In many cases, error
3430	   tracking is required to identify the corrupted region in the receiver's
3431	   state (reference pictures) because of error import in uncorrupted
3432	   regions of the picture through motion compensation.  Reference picture
3433	   selection can also be used to "clean up" the corrupted picture, which
3434	   is usually more efficient and less likely to generate congestion than
3435	   sending intra information.

3437	   In contrast to the video codecs contemplated in RFC 4585 and RFC 5104,
3438	   in HEVC, the "macroblock size" is not fixed to 16x16 luma samples, but
3439	   variable.  That, however, does not create a conceptual difficulty with
3440	   SLI, because the setting of the CTB size is a sequence-level
3441	   functionality, and using a slice loss indication across coded video
3442	   sequence boundaries is meaningless as there is no prediction across
3443	   sequence boundaries.  However, a proper use of SLI messages is not as
3444	   straightforward as it was with older, fixed-macroblock-sized video
3445	   codecs, as the state of the sequence parameter set (where the CTB size
3446	   is located) has to be taken into account when interpreting the "First"
3447	   subfield in the FCI.

3449	8.3 Use of HEVC with the RPSI Feedback Message

3451	   Feedback based reference picture selection has been shown as a
3452	   powerful tool to stop temporal error propagation for improved error
3453	   resilience [Girod99][Wang05].  In one approach, the decoder side
3454	   tracks errors in the decoded pictures and informs to the encoder
3455	   side that a particular picture that has been decoded relatively
3456	   earlier is correct and still present in the decoded picture buffer
3457	   and requests the encoder to use that correct picture for reference
3458	   when encoding the next picture, so to stop further temporal error
3459	   propagation.  For this approach, the decoder side should use the
3460	   RPSI feedback message.

3462	   Encoders can encode some long-term reference pictures as specified
3463	   in H.264 or HEVC for purposes described in the previous paragraph
3464	   without the need of a huge decoded picture buffer.  As shown in
3465	   [Wang05], with a flexible reference picture management scheme as in
3466	   H.264 and HEVC, even a decoded picture buffer size of two would work
3467	   for the approach described in the previous paragraph.

3469	   The field "Native RPSI bit string defined per codec" is a base16
3470	   [RFC4648] representation of the 8 bits consisting of 2 most
3471	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
3472	   in [HEVC], followed by the 32 bits representing the value of the
3473	   PicOrderCntVal (in network byte order), as defined in [HEVC], for
3474	   the picture that is requested to be used for reference when encoding
3475	   the next picture.

3477	   The use of the RPSI feedback message as positive acknowledgement
3478	   with HEVC is deprecated.  In other words, the RPSI feedback message
3479	   MUST only be used as a reference picture selection request, such
3480	   that it can also be used in multicast.

3482	8.4 Full Intra Request (FIR)

3484	   The purpose of the FIR message is to force an encoder to send an
3485	   independent decoder refresh point as soon as possible (observing,
3486	   for example, the congestion control related constraints set out in
3487	   RFC 5104).

3489	   Upon reception of a FIR, a sender MUST send an IDR picture.
3490	   Parameter sets MUST also be sent, except when there is a priori
3491	   knowledge that the parameter sets have been correctly established.
3492	   A typical example for that is an understanding between sender and
3493	   receiver, established by means outside this document, that parameter
3494	   sets are exclusively sent out of band.

3496	9 Security Considerations

3498	   RTP packets using the payload format defined in this specification
3499	   are subject to the security considerations discussed in the RTP
3500	   specification [RFC3550], and in any applicable RTP profile such as
3501	   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711] or
3502	   RTP/SAVPF [RFC5124].  However, as "Securing the RTP Protocol
3503	   Framework: Why RTP Does Not Mandate a Single Media Security
3504	   Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
3505	   RTP payload format's responsibility to discuss or mandate what
3506	   solutions are used to meet the basic security goals like
3507	   confidentiality, integrity, and source authenticity for RTP in
3508	   general.  This responsibility lays on anyone using RTP in an
3509	   application.  They can find guidance on available security
3510	   mechanisms and important considerations as discussed in "Options for
3511	   Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].

3513	   The rest of this section discusses the security impacting properties
3514	   of the payload format itself.

3516	   Because the data compression used with this payload format is
3517	   applied end-to-end, any encryption needs to be performed after
3518	   compression.  A potential denial-of-service threat exists for data
3519	   encodings using compression techniques that have non-uniform
3520	   receiver-end computational load.  The attacker can inject
3521	   pathological datagrams into the bitstream that are complex to decode
3522	   and that cause the receiver to be overloaded.  H.265 is particularly
3523	   vulnerable to such attacks, as it is extremely simple to generate
3524	   datagrams containing NAL units that affect the decoding process of
3525	   many future NAL units.  Therefore, the usage of data origin
3526	   authentication and data integrity protection of at least the RTP
3527	   packet is RECOMMENDED, for example, with SRTP [RFC 3711].

3529	   Note that the appropriate mechanism to ensure confidentiality and
3530	   integrity of RTP packets and their payloads is very dependent on the
3531	   application and on the transport and signaling protocols employed.
3532	   Thus, although SRTP is given as an example above, other possible
3533	   choices exist.

3535	   Decoders MUST exercise caution with respect to the handling of user
3536	   data SEI messages, particularly if they contain active elements, and
3537	   MUST restrict their domain of applicability to the presentation
3538	   containing the bitstream.

3540	   End-to-end security with authentication, integrity, or
3541	   confidentiality protection will prevent a MANE from performing
3542	   media-aware operations other than discarding complete packets.  In
3543	   the case of confidentiality protection, it will even be prevented
3544	   from discarding packets in a media-aware way.  To be allowed to
3545	   perform such operations, a MANE is required to be a trusted entity
3546	   that is included in the security context establishment.

3548	10 Congestion Control

3550	   Congestion control for RTP SHALL be used in accordance with RTP
3551	   [RFC3550] and with any applicable RTP profile, e.g. AVP [RFC 3551].
3552	   If best-effort service is being used, an additional requirement is
3553	   that users of this payload format MUST monitor packet loss to ensure
3554	   that the packet loss rate is within an acceptable range.  Packet
3555	   loss is considered acceptable if a TCP flow across the same network
3556	   path, and experiencing the same network conditions, would achieve an
3557	   average throughput, measured on a reasonable timescale, that is not
3558	   less than all RTP streams combined is achieving.  This condition can
3559	   be satisfied by implementing congestion control mechanisms to adapt
3560	   the transmission rate, the number of layers subscribed for a layered
3561	   multicast session, or by arranging for a receiver to leave the
3562	   session if the loss rate is unacceptably high.

3564	   The bitrate adaptation necessary for obeying the congestion control
3565	   principle is easily achievable when real-time encoding is used, for
3566	   example by adequately tuning the quantization parameter.

3568	   However, when pre-encoded content is being transmitted, bandwidth
3569	   adaptation requires the pre-coded bitstream to be tailored for such
3570	   adaptivity.  The key mechanism available in HEVC is temporal
3571	   scalability.  A media sender can remove NAL units belonging to
3572	   higher temporal sub-layers (i.e. those NAL units with a high value
3573	   of TID) until the sending bitrate drops to an acceptable range.
3574	   HEVC contains mechanisms that allow the lightweight identification
3575	   of switching points in temporal enhancement layers, as discussed in
3576	   Section 1.1.2 of this memo.  An HEVC media sender can send packets
3577	   belonging to NAL units of temporal enhancement layers starting from
3578	   these switching points to probe for available bandwidth and to
3579	   utilized bandwidth that has been shown to be available.

3581	   Above mechanisms generally work within a defined profile and level
3582	   and, therefore, no renegotiation of the channel is required.  Only
3583	   when non-downgradable parameters (such as profile) are required to
3584	   be changed does it become necessary to terminate and restart the RTP
3585	   stream(s).  This may be accomplished by using different RTP payload
3586	   types.

3588	   MANEs MAY remove certain unusable packets from the RTP stream when
3589	   that RTP stream was damaged due to previous packet losses.  This can
3590	   help reduce the network load in certain special cases.  For example,
3591	   MANES can remove those FUs where the leading FUs belonging to the
3592	   same NAL unit have been lost or those dependent slice segments when
3593	   the leading slice segments belonging to the same slice have been
3594	   lost, because the trailing FUs or dependent slice segments are
3595	   meaningless to most decoders.  MANES can also remove higher temporal
3596	   scalable layers if the outbound transmission (from the MANE's
3597	   viewpoint) experiences congestion.

3599	11 IANA Consideration

3601	   A new media type, as specified in Section 7.1 of this memo, should
3602	   be registered with IANA.

3604	12 Acknowledgements

3606	   Muhammed Coban and Marta Karczewicz are thanked for discussions on
3607	   the specification of the use with feedback messages and other
3608	   aspects in this memo.  Jonathan Lennox and Jill Boyce are thanked
3609	   for their contributions to the PACI design included in this memo.
3610	   Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and
3611	   Tom Kristensen are thanked for their contributions to parallel
3612	   processing related signalling.  Magnus Westerlund, Jonathan Lennox,
3613	   Bernard Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg,
3614	   Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross Finlayson, and Danny
3615	   Hong made valuable reviewing comments that led to improvements.

3617	   This document was prepared using 2-Word-v2.0.template.dot.

3619	13 References

3621	13.1 Normative References

3623	   [HEVC]    ITU-T Recommendation H.265, "High efficiency video
3624	             coding", April 2013.

3626	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
3627	             generic audiovisual services", April 2013.

3629	   [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
3630	             Dependency in the Session Description Protocol (SDP)", RFC
3631	             5583, July 2009.

3633	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
3634	             Payload Format for H.264 Video", RFC 6184, May 2011.

3636	   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
3637	             Eleftheriadis, "RTP Payload Format for Scalable Video
3638	             Coding", RFC 6190, May 2011.

3640	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
3641	             Requirement Levels", BCP 14, RFC 2119, March 1997.

3643	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
3644	             with Session Description Protocol (SDP)", RFC 3264, June
3645	             2002.

3647	   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
3648	             Encodings", RFC 4648, October 2006.

3650	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
3651	             V., "RTP: A Transport Protocol for Real-Time
3652	             Applications", STD 64, RFC 3550, July 2003.

3654	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
3655	             Description Protocol", RFC 4566, July 2006.

3657	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
3658	             Media Attributes in the Session Description Protocol", RFC
3659	             5576, June 2009.

3661	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
3662	             J., "Extended RTP Profile for Real-time Transport Control
3663	             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
3664	             2006.

3666	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
3667	             "Codec Control Messages in the RTP Audio-Visual Profile
3668	             with Feedback (AVPF)", RFC 5104, February 2008.

3670	13.2 Informative References

3672	   [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
3673	             Streaming Service (PSS); Progressive Download and Dynamic
3674	             Adaptive Streaming over HTTP (3GP-DASH)", v12.1.0,
3675	             December 2013.

3677	   [3GPPFF]  3GPP TS 26.244, "Transparent end-to-end packet switched
3678	             streaming service (PSS); 3GPP file format (3GP)", v12.20,
3679	             December 2013.

3681	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
3682	             for mobile video transmission", Proceedings IEEE, Vol. 87,
3683	             No. 10, pp. 1707-1723, October 1999.

3685	   [HEVC draft v2]
3686	             Draft version 2 of HEVC, "High Efficiency Video Coding
3687	             (HEVC) Range Extensions text specification: Draft 7", JCT-
3688	             VC document JCTVC-Q1005, 17th JCT-VC meeting, 27 March - 4
3689	             April 2014, Valencia, Spain.

3691	   [I-D.ietf-avt-srtp-not-mandatory]
3692	             Perkins, C. and M. Westerlund, "Securing the RTP
3693	             ProtocolFramework: Why RTP Does Not Mandate a Single
3694	             MediaSecurity Solution", draft-ietf-avt-srtp-not-
3695	             mandatory-16 (work in progress), January 2014.

3697	   [I-D.ietf-avtcore-rtp-security-options]
3698	             Westerlund, M. and C. Perkins, "Options for Securing RTP
3699	             Sessions", draft-ietf-avtcore-rtp-security-options-10
3700	             (work in progress), January 2014.

3702	   [I-D.ietf-avtcore-rtp-multi-stream]
3703	             Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
3704	             "Sending Multiple Media Streams in a Single RTP Session",
3705	             draft-ietf-avtcore-rtp-multi-stream-01 (work in progress),
3706	             July 2013.

3708	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
3709	             Holmberg, C., Alvestrand, H., and C. Jennings,
3710	             "Multiplexing Negotiation Using Session Description
3711	             Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
3712	             bundle-negotiation-05 (work in progress), October 2013.

3714	   [I-D.ietf-avtext-rtp-grouping-taxonomy]
3715	             Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
3716	             Burman, B. "A Taxonomy of Grouping Semantics and
3717	             Mechanisms for Real-Time Transport", draft-ietf-avtext-
3718	             rtp-grouping-taxonomy-01 (work in progress), February
3719	             2014.

3721	   [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
3722	             Coding of audio-visual objects - Part 12: ISO base media
3723	             file format" | "Information technology - JPEG 2000 image
3724	             coding system - Part 12: ISO base media file format",
3725	             2012.

3727	   [JCTVC-J0107]
3728	             Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian, K.,
3729	             "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107, 10th
3730	             JCT-VC meeting, July 2012, Stockholm, Sweden.

3732	   [MPEG2S]  ISO/IEC 13818-1, "Information technology - Generic coding
3733	             of moving pictures and associated audio information:
3734	             Systems", 2013.

3736	   [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
3737	             adaptive streaming over HTTP (DASH) - Part 1: Media
3738	             presentation description and segment formats", 2012.

3740	   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
3741	             Correction", RFC 5109, December 2007.

3743	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
3744	             coding using flexible reference fames", Visual
3745	             Communications and Image Processing 2005 (VCIP 2005), July
3746	             2005, Beijing, China.

3748	14 Authors' Addresses

3750	   Ye-Kui Wang
3751	   Qualcomm Incorporated
3752	   5775 Morehouse Drive
3753	   San Diego, CA 92121, USA
3754	   Phone: +1-858-651-8345
3755	   EMail: yekuiw@qti.qualcomm.com

3757	   Yago Sanchez
3758	   Fraunhofer HHI
3759	   Einsteinufer 37
3760	   D-10587 Berlin, Germany
3761	   Phone: +49-30-31002-227
3762	   Email: yago.sanchez@hhi.fraunhofer.de

3764	   Thomas Schierl
3765	   Fraunhofer HHI
3766	   Einsteinufer 37
3767	   D-10587 Berlin, Germany
3768	   Phone: +49-30-31002-227
3769	   Email: ts@thomas-schierl.de

3771	   Stephan Wenger
3772	   Vidyo, Inc.
3773	   433 Hackensack Ave., 7th floor
3774	   Hackensack, N.J. 07601, USA
3775	   Phone: +1-415-713-5473
3776	   EMail: stewe@stewe.org

3778	   Miska M. Hannuksela
3779	   Nokia Corporation
3780	   P.O. Box 1000
3781	   33721 Tampere, Finland
3782	   Phone: +358-7180-08000
3783	   EMail: miska.hannuksela@nokia.com