idnits 2.17.1 

draft-ietf-payload-rtp-h265-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 3 instances of too long lines in the document, the longest one
     being 14 characters in excess of 72.

  ** The abstract seems to contain references ([HEVC]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1346 has weird spacing: '...L  unit  into ...'

  == Line 3279 has weird spacing: '...UST  be  set  ...'

  == Line 3280 has weird spacing: '...ntation  of  t...'

  == Line 3304 has weird spacing: '...k-sized  video...'

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The FU payload consists of fragments of the payload of the
     fragmented NAL unit so that if the FU payloads of consecutive FUs,
     starting with an FU with the S bit equal to 1 and ending with an FU with
     the E bit equal to 1, are sequentially concatenated, the payload of the
     fragmented NAL unit can be reconstructed.  The NAL unit header of the
     fragmented NAL unit is not included as such in the FU payload, but rather
     the information of the NAL unit header of the fragmented NAL unit is
     conveyed in F, LayerId, and TID fields of the FU payload headers of the
     FUs and the FuType field of the FU header of the FUs.  An FU payload MUST
     not be empty.

  -- The document date (April 30, 2014) is 3649 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '3GP' is mentioned on line 274, but not defined

  -- Looks like a reference, but probably isn't: '0' on line 1076

  == Missing Reference: 'RFC5234' is mentioned on line 2620, but not defined

  == Missing Reference: 'RFC5117' is mentioned on line 2816, but not defined

  ** Obsolete undefined reference: RFC 5117 (Obsoleted by RFC 7667)

  == Missing Reference: 'RFC2326' is mentioned on line 3131, but not defined

  ** Obsolete undefined reference: RFC 2326 (Obsoleted by RFC 7826)

  == Missing Reference: 'RFC2974' is mentioned on line 3132, but not defined

  == Missing Reference: 'RFC3551' is mentioned on line 3360, but not defined

  == Missing Reference: 'RFC3711' is mentioned on line 3360, but not defined

  == Missing Reference: 'RFC5124' is mentioned on line 3361, but not defined

  == Missing Reference: 'RFC 3711' is mentioned on line 3386, but not defined

  == Missing Reference: 'RFC 3551' is mentioned on line 3410, but not defined

  == Unused Reference: '3GPPFF' is defined on line 3536, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC5109' is defined on line 3592, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-01

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-05

  == Outdated reference: A later version (-08) exists of
     draft-ietf-avtext-rtp-grouping-taxonomy-01


     Summary: 5 errors (**), 0 flaws (~~), 22 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: October 2014                                        T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                         April 30, 2014

12	            RTP Payload Format for High Efficiency Video Coding
13	                    draft-ietf-payload-rtp-h265-03.txt

15	Abstract

17	   This memo describes an RTP payload format for the video coding
18	   standard ITU-T Recommendation H.265 and ISO/IEC International
19	   Standard 23008-2, both also known as High Efficiency Video Coding
20	   (HEVC) [HEVC] and developed by the Joint Collaborative Team on Video
21	   Coding (JCT-VC).  The RTP payload format allows for packetization of
22	   one or more Network Abstraction Layer (NAL) units in each RTP packet
23	   payload, as well as fragmentation of a NAL unit into multiple RTP
24	   packets.  Furthermore, it supports transmission of an HEVC bitstream
25	   over a single as well as multiple RTP streams.  The payload format
26	   has wide applicability in videoconferencing, Internet video
27	   streaming, and high bit-rate entertainment-quality video, among
28	   others.

30	Status of this Memo

32	   This Internet-Draft is submitted to IETF in full conformance with
33	   the provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF), its areas, and its working groups.  Note that
37	   other groups may also distribute working documents as Internet-
38	   Drafts.

40	   Internet-Drafts are draft documents valid for a maximum of six
41	   months and may be updated, replaced, or obsoleted by other documents
42	   at any time.  It is inappropriate to use Internet-Drafts as
43	   reference material or to cite them other than as "work in progress."

45	   The list of current Internet-Drafts can be accessed at
46	   http://www.ietf.org/ietf/1id-abstracts.txt.

48	   The list of Internet-Draft Shadow Directories can be accessed at
49	   http://www.ietf.org/shadow.html.

51	   This Internet-Draft will expire on October 30, 2014.

53	Copyright and License Notice

55	   Copyright (c) 2014 IETF Trust and the persons identified as the
56	   document authors.  All rights reserved.

58	   This document is subject to BCP 78 and the IETF Trust's Legal
59	   Provisions Relating to IETF Documents
60	   (http://trustee.ietf.org/license-info) in effect on the date of
61	   publication of this document.  Please review these documents
62	   carefully, as they describe your rights and restrictions with
63	   respect to this document.  Code Components extracted from this
64	   document must include Simplified BSD License text as described in
65	   Section 4.e of the Trust Legal Provisions and are provided without
66	   warranty as described in the Simplified BSD License.

68	Table of Contents

70	   Abstract..........................................................1
71	   Status of this Memo...............................................1
72	   Table of Contents.................................................3
73	   1 . Introduction..................................................5
74	      1.1 . Overview of the HEVC Codec...............................5
75	         1.1.1 Coding-Tool Features..................................5
76	         1.1.2 Systems and Transport Interfaces......................7
77	         1.1.3 Parallel Processing Support..........................14
78	         1.1.4 NAL Unit Header......................................16
79	      1.2 . Overview of the Payload Format..........................17
80	   2 . Conventions..................................................18
81	   3 . Definitions and Abbreviations................................18
82	      3.1 Definitions...............................................18
83	         3.1.1 Definitions from the HEVC Specification..............18
84	         3.1.2 Definitions Specific to This Memo....................20
85	      3.2 Abbreviations.............................................22
86	   4 . RTP Payload Format...........................................23
87	      4.1 RTP Header Usage..........................................23
88	      4.2 Payload Header Usage......................................26
89	      4.3 Payload Structures........................................26
90	      4.4 Transmission Modes........................................27
91	      4.5 Decoding Order Number.....................................28
92	      4.6 Single NAL Unit Packets...................................30
93	      4.7 Aggregation Packets (APs).................................31
94	      4.8 Fragmentation Units (FUs).................................35
95	      4.9 PACI packets..............................................38
96	         4.9.1 Reasons for the PACI rules (informative).............41
97	         4.9.2 PACI extensions (Informative)........................41
98	      4.10 Temporal Scalability Control Information.................43
99	   5 . Packetization Rules..........................................45
100	   6 . De-packetization Process.....................................45
101	   7 . Payload Format Parameters....................................48
102	      7.1 Media Type Registration...................................48
103	      7.2 SDP Parameters............................................71
104	         7.2.1 Mapping of Payload Type Parameters to SDP............71
105	         7.2.2 Usage with SDP Offer/Answer Model....................72
106	         7.2.3 Usage in Declarative Session Descriptions............80
107	         7.2.4 Parameter Sets Considerations........................81
108	         7.2.5 Dependency Signaling in Multi-Stream Transmission....82
109	   8 . Use with Feedback Messages...................................82
110	      8.1 Picture Loss Indication (PLI).............................83
111	      8.2 Slice Loss Indication.....................................83
112	      8.3 Use of HEVC with the RPSI Feedback Message................84
113	      8.4 Full Intra Request (FIR)..................................85
114	   9 . Security Considerations......................................85
115	   10 . Congestion Control..........................................87
116	   11 . IANA Consideration..........................................88
117	   12 . Acknowledgements............................................88
118	   13 . References..................................................88
119	      13.1 Normative References.....................................88
120	      13.2 Informative References...................................90
121	   14 . Authors' Addresses..........................................91

123	1. Introduction

125	1.1. Overview of the HEVC Codec

127	   High Efficiency Video Coding [HEVC], formally known as ITU-T
128	   Recommendation H.265 and ISO/IEC International Standard 23008-2 was
129	   ratified by ITU-T in April 2013 and reportedly provides significant
130	   coding efficiency gains over H.264 [H.264].

132	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
133	   widely deployed and generally known in the relevant implementer
134	   communities, frequently only the differences between those two
135	   specifications are highlighted in non-normative, explanatory parts
136	   of this memo.  Basic familiarity with both specifications is assumed
137	   for those parts.  However, the normative parts of this memo do not
138	   require study of H.264 or its RTP payload format.

140	   H.264 and HEVC share a similar hybrid video codec design.
141	   Conceptually, both technologies include a video coding layer (VCL),
142	   which is often used to refer to the coding-tool features, and a
143	   network abstraction layer (NAL), which is often used to refer to the
144	   systems and transport interface aspects of the codecs.

146	1.1.1 Coding-Tool Features

148	   Similarly to earlier hybrid-video-coding-based standards, including
149	   H.264, the following basic video coding design is employed by HEVC.
150	   A prediction signal is first formed either by intra or motion
151	   compensated prediction, and the residual (the difference between the
152	   original and the prediction) is then coded.  The gains in coding
153	   efficiency are achieved by redesigning and improving almost all
154	   parts of the codec over earlier designs.  In addition, HEVC includes
155	   several tools to make the implementation on parallel architectures
156	   easier.  Below is a summary of HEVC coding-tool features.

158	   Quad-tree block and transform structure

160	   One of the major tools that contribute significantly to the coding
161	   efficiency of HEVC is the usage of flexible coding blocks and
162	   transforms, which are defined in a hierarchical quad-tree manner.
163	   Unlike H.264, where the basic coding block is a macroblock of fixed
164	   size 16x16, HEVC defines a Coding Tree Unit (CTU) of a maximum size
165	   of 64x64.  Each CTU can be divided into smaller units in a
166	   hierarchical quad-tree manner and can represent smaller blocks down
167	   to size 4x4.  Similarly, the transforms used in HEVC can have
168	   different sizes, starting from 4x4 and going up to 32x32.  Utilizing
169	   large blocks and transforms contribute to the major gain of HEVC,
170	   especially at high resolutions.

172	   Entropy coding

174	   HEVC uses a single entropy coding engine, which is based on Context
175	   Adaptive Binary Arithmetic Coding (CABAC), whereas H.264 uses two
176	   distinct entropy coding engines.  CABAC in HEVC shares many
177	   similarities with CABAC of H.264, but contains several improvements.
178	   Those include improvements in coding efficiency and lowered
179	   implementation complexity, especially for parallel architectures.

181	   In-loop filtering

183	   H.264 includes an in-loop adaptive deblocking filter, where the
184	   blocking artifacts around the transform edges in the reconstructed
185	   picture are smoothed to improve the picture quality and compression
186	   efficiency.  In HEVC, a similar deblocking filter is employed but
187	   with somewhat lower complexity.  In addition, pictures undergo a
188	   subsequent filtering operation called Sample Adaptive Offset (SAO),
189	   which is a new design element in HEVC.  SAO basically adds a pixel-
190	   level offset in an adaptive manner and usually acts as a de-ringing
191	   filter.  It is observed that SAO improves the picture quality,
192	   especially around sharp edges contributing substantially to visual
193	   quality improvements of HEVC.

195	   Motion prediction and coding

197	   There have been a number of improvements in this area that are
198	   summarized as follows.  The first category is motion merge and
199	   advanced motion vector prediction (AMVP) modes.  The motion
200	   information of a prediction block can be inferred from the spatially
201	   or temporally neighboring blocks.  This is similar to the DIRECT
202	   mode in H.264 but includes new aspects to incorporate the flexible
203	   quad-tree structure and methods to improve the parallel
204	   implementations.  In addition, the motion vector predictor can be
205	   signaled for improved efficiency.  The second category is high-
206	   precision interpolation.  The interpolation filter length is
207	   increased to 8-tap from 6-tap, which improves the coding efficiency
208	   but also comes with increased complexity.  In addition, the
209	   interpolation filter is defined with higher precision without any
210	   intermediate rounding operations to further improve the coding
211	   efficiency.

213	   Intra prediction and intra coding

215	   Compared to 8 intra prediction modes in H.264, HEVC supports angular
216	   intra prediction with 33 directions.  This increased flexibility
217	   improves both objective coding efficiency and visual quality as the
218	   edges can be better predicted and ringing artifacts around the edges
219	   can be reduced.  In addition, the reference samples are adaptively
220	   smoothed based on the prediction direction.  To avoid contouring
221	   artifacts a new interpolative prediction generation is included to
222	   improve the visual quality.  Furthermore, discrete sine transform
223	   (DST) is utilized instead of traditional discrete cosine transform
224	   (DCT) for 4x4 intra transform blocks.

226	   Other coding-tool features

228	   HEVC includes some tools for lossless coding and efficient screen
229	   content coding, such as skipping the transform for certain blocks.
230	   These tools are particularly useful for example when streaming the
231	   user-interface of a mobile device to a large display.

233	1.1.2 Systems and Transport Interfaces

235	   HEVC inherited the basic systems and transport interfaces designs,
236	   such as the NAL-unit-based syntax structure, the hierarchical syntax
237	   and data unit structure from sequence-level parameter sets, multi-
238	   picture-level or picture-level parameter sets, slice-level header
239	   parameters, lower-level parameters, the supplemental enhancement
240	   information (SEI) message mechanism, the hypothetical reference
241	   decoder (HRD) based video buffering model, and so on.  In the
242	   following, a list of differences in these aspects compared to H.264
243	   is summarized.

245	   Video parameter set

247	   A new type of parameter set, called video parameter set (VPS), was
248	   introduced.  For the first (2013) version of [HEVC], the video
249	   parameter set NAL unit is required to be available prior to its
250	   activation, while the information contained in the video parameter
251	   set is not necessary for operation of the decoding process.  For
252	   future HEVC extensions, such as the 3D or scalable extensions, the
253	   video parameter set is expected to include information necessary for
254	   operation of the decoding process, e.g. decoding dependency or
255	   information for reference picture set construction of enhancement
256	   layers.  The VPS provides a "big picture" of a bitstream, including
257	   what types of operation points are provided, the profile, tier, and
258	   level of the operation points, and some other high-level properties
259	   of the bitstream that can be used as the basis for session
260	   negotiation and content selection, etc. (see section 7.1).

262	   Profile, tier and level

264	   The profile, tier and level syntax structure that can be included in
265	   both VPS and sequence parameter set (SPS) includes 12 bytes of data
266	   to describe the entire bitstream (including all temporally scalable
267	   layers, which are referred to as sub-layers in the HEVC
268	   specification), and can optionally include more profile, tier and
269	   level information pertaining to individual temporally scalable
270	   layers.  The profile indicator indicates the "best viewed as"
271	   profile when the bitstream conforms to multiple profiles, similar to
272	   the major brand concept in the ISO base media file format (ISOBMFF)
273	   [ISOBMFF] and file formats derived based on ISOBMFF, such as the
274	   3GPP file format [3GP].  The profile, tier and level syntax
275	   structure also includes the indications of whether the bitstream is
276	   free of frame-packed content, whether the bitstream is free of
277	   interlaced source content and free of field pictures, i.e. contains
278	   only frame pictures of progressive source, such that clients/players
279	   with no support of post-processing functionalities for handling of
280	   frame-packed or interlaced source content or field pictures can
281	   reject those bitstreams.

283	   Bitstream and elementary stream

285	   HEVC includes a definition of an elementary stream, which is new
286	   compared to H.264.  An elementary stream consists of a sequence of
287	   one or more bitstreams.  An elementary stream that consists of two
288	   or more bitstreams has typically been formed by splicing together
289	   two or more bitstreams (or parts thereof).  When an elementary
290	   stream contains more than one bitstream, the last NAL unit of the
291	   last access unit of a bitstream (except the last bitstream in the
292	   elementary stream) must contain an end of bitstream NAL unit and the
293	   first access unit of the subsequent bitstream must be an intra
294	   random access point (IRAP) access unit.  This IRAP access unit may
295	   be a clean random access (CRA), broken link access (BLA), or
296	   instantaneous decoding refresh (IDR) access unit.

298	   Random access support

300	   HEVC includes signaling in NAL unit header, through NAL unit types,
301	   of IRAP pictures beyond IDR pictures.  Three types of IRAP pictures,
302	   namely IDR, CRA and BLA pictures are supported, wherein IDR pictures
303	   are conventionally referred to as closed group-of-pictures (closed-
304	   GOP) random access points, and CRA and BLA pictures are those
305	   conventionally referred to as open-GOP random access points.  BLA
306	   pictures usually originate from splicing of two bitstreams or part
307	   thereof at a CRA picture, e.g. during stream switching.  To enable
308	   better systems usage of IRAP pictures, altogether six different NAL
309	   units are defined to signal the properties of the IRAP pictures,
310	   which can be used to better match the stream access point (SAP)
311	   types as defined in the ISOBMFF [ISOBMFF], which are utilized for
312	   random access support in both 3GP-DASH [3GPDASH] and MPEG DASH
313	   [MPEGDASH].  Pictures following an IRAP picture in decoding order
314	   and preceding the IRAP picture in output order are referred to as
315	   leading pictures associated with the IRAP picture.  There are two
316	   types of leading pictures, namely random access decodable leading
317	   (RADL) pictures and random access skipped leading (RASL) pictures.
318	   RADL pictures are decodable when the decoding started at the
319	   associated IRAP picture, and RASL pictures are not decodable when
320	   the decoding started at the associated IRAP picture and are usually
321	   discarded.  HEVC provides mechanisms to enable the specification of
322	   conformance of bitstreams with RASL pictures being discarded, thus
323	   to provide a standard-compliant way to enable systems components to
324	   discard RASL pictures when needed.

326	   Temporal scalability support

328	   HEVC includes an improved support of temporal scalability, by
329	   inclusion of the signaling of TemporalId in the NAL unit header, the
330	   restriction that pictures of a particular temporal sub-layer cannot
331	   be used for inter prediction reference by pictures of a lower
332	   temporal sub-layer, the sub-bitstream extraction process, and the
333	   requirement that each sub-bitstream extraction output be a
334	   conforming bitstream.  Media-aware network elements (MANEs) can
335	   utilize the TemporalId in the NAL unit header for stream adaptation
336	   purposes based on temporal scalability.

338	   Temporal sub-layer switching support

340	   HEVC specifies, through NAL unit types present in the NAL unit
341	   header, the signaling of temporal sub-layer access (TSA) and
342	   stepwise temporal sub-layer access (STSA).  A TSA picture and
343	   pictures following the TSA picture in decoding order do not use
344	   pictures prior to the TSA picture in decoding order with TemporalId
345	   greater than or equal to that of the TSA picture for inter
346	   prediction reference.  A TSA picture enables up-switching, at the
347	   TSA picture, to the sub-layer containing the TSA picture or any
348	   higher sub-layer, from the immediately lower sub-layer.  An STSA
349	   picture does not use pictures with the same TemporalId as the STSA
350	   picture for inter prediction reference.  Pictures following an STSA
351	   picture in decoding order with the same TemporalId as the STSA
352	   picture do not use pictures prior to the STSA picture in decoding
353	   order with the same TemporalId as the STSA picture for inter
354	   prediction reference.  An STSA picture enables up-switching, at the
355	   STSA picture, to the sub-layer containing the STSA picture, from the
356	   immediately lower sub-layer.

358	   Sub-layer reference or non-reference pictures

360	   The concept and signaling of reference/non-reference pictures in
361	   HEVC are different from H.264.  In H.264, if a picture may be used
362	   by any other picture for inter prediction reference, it is a
363	   reference picture; otherwise it is a non-reference picture, and this
364	   is signaled by two bits in the NAL unit header.  In HEVC, a picture
365	   is called a reference picture only when it is marked as "used for
366	   reference".  In addition, the concept of sub-layer reference picture
367	   was introduced.  If a picture may be used by another other picture
368	   with the same TemporalId for inter prediction reference, it is a
369	   sub-layer reference picture; otherwise it is a sub-layer non-
370	   reference picture.  Whether a picture is a sub-layer reference
371	   picture or sub-layer non-reference picture is signaled through NAL
372	   unit type values.

374	   Extensibility

376	   Besides the TemporalId in the NAL unit header, HEVC also includes
377	   the signaling of a six-bit layer ID in the NAL unit header, which
378	   must be equal to 0 for a single-layer bitstream.  Extension
379	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit, slice
380	   headers, and so on.  All these extension mechanisms enable future
381	   extensions in a backward compatible manner, such that bitstreams
382	   encoded according to potential future HEVC extensions can be fed to
383	   then-legacy decoders (e.g. HEVC version 1 decoders) and the then-
384	   legacy decoders can decode and output the base layer bitstream.

386	   Bitstream extraction

388	   HEVC includes a bitstream extraction process as an integral part of
389	   the overall decoding process, as well as specification of the use of
390	   the bitstream extraction process in description of bitstream
391	   conformance tests as part of the hypothetical reference decoder
392	   (HRD) specification.

394	   Reference picture management

396	   The reference picture management of HEVC, including reference
397	   picture marking and removal from the decoded picture buffer (DPB) as
398	   well as reference picture list construction (RPLC), differs from
399	   that of H.264.  Instead of the sliding window plus adaptive memory
400	   management control operation (MMCO) based reference picture marking
401	   mechanism in H.264, HEVC specifies a reference picture set (RPS)
402	   based reference picture management and marking mechanism, and the
403	   RPLC is consequently based on the RPS mechanism.  A reference
404	   picture set consists of a set of reference pictures associated with
405	   a picture, consisting of all reference pictures that are prior to
406	   the associated picture in decoding order, that may be used for inter
407	   prediction of the associated picture or any picture following the
408	   associated picture in decoding order.  The reference picture set
409	   consists of five lists of reference pictures; RefPicSetStCurrBefore,
410	   RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
411	   RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter and
412	   RefPicSetLtCurr contain all reference pictures that may be used in
413	   inter prediction of the current picture and that may be used in
414	   inter prediction of one or more of the pictures following the
415	   current picture in decoding order.  RefPicSetStFoll and
416	   RefPicSetLtFoll consist of all reference pictures that are not used
417	   in inter prediction of the current picture but may be used in inter
418	   prediction of one or more of the pictures following the current
419	   picture in decoding order.  RPS provides an "intra-coded" signaling
420	   of the DPB status, instead of an "inter-coded" signaling, mainly for
421	   improved error resilience.  The RPLC process in HEVC is based on the
422	   RPS, by signaling an index to an RPS subset for each reference
423	   index.  The RPLC process has been simplified compared to that in
424	   H.264, by removal of the reference picture list modification (also
425	   referred to as reference picture list reordering) process.

427	   Ultra low delay support

429	   HEVC specifies a sub-picture-level HRD operation, for support of the
430	   so-called ultra-low delay.  The mechanism specifies a standard-
431	   compliant way to enable delay reduction below one picture interval.
432	   Sub-picture-level coded picture buffer (CPB) and DPB parameters may
433	   be signaled, and utilization of these information for the derivation
434	   of CPB timing (wherein the CPB removal time corresponds to decoding
435	   time) and DPB output timing (display time) is specified.  Decoders
436	   are allowed to operate the HRD at the conventional access-unit-
437	   level, even when the sub-picture-level HRD parameters are present.

439	   New SEI messages

441	   HEVC inherits many H.264 SEI messages with changes in syntax and/or
442	   semantics making them applicable to HEVC.  Additionally, there are a
443	   few new SEI messages reviewed briefly in the following paragraphs.

445	   The display orientation SEI message informs the decoder of a
446	   transformation that is recommended to be applied to the cropped
447	   decoded picture prior to display, such that the pictures can be
448	   properly displayed, e.g. in an upside-up manner.

450	   The structure of pictures SEI message provides information on the
451	   NAL unit types, picture order count values, and prediction
452	   dependencies of a sequence of pictures.  The SEI message can be used
453	   for example for concluding what impact a lost picture has on other
454	   pictures.

456	   The decoded picture hash SEI message provides a checksum derived
457	   from the sample values of a decoded picture.  It can be used for
458	   detecting whether a picture was correctly received and decoded.

460	   The active parameter sets SEI message includes the IDs of the active
461	   video parameter set and the active sequence parameter set and can be
462	   used to activate VPSs and SPSs.  In addition, the SEI message
463	   includes the following indications: 1) An indication of whether
464	   "full random accessibility" is supported (when supported, all
465	   parameter sets needed for decoding of the remaining of the bitstream
466	   when random accessing from the beginning of the current coded video
467	   sequence by completely discarding all access units earlier in
468	   decoding order are present in the remaining bitstream and all coded
469	   pictures in the remaining bitstream can be correctly decoded); 2) An
470	   indication of whether there is no parameter set within the current
471	   coded video sequence that updates another parameter set of the same
472	   type preceding in decoding order.  An update of a parameter set
473	   refers to the use of the same parameter set ID but with some other
474	   parameters changed.  If this property is true for all coded video
475	   sequences in the bitstream, then all parameter sets can be sent out-
476	   of-band before session start.

478	   The decoding unit information SEI message provides coded picture
479	   buffer removal delay information for a decoding unit.  The message
480	   can be used in very-low-delay buffering operations.

482	   The region refresh information SEI message can be used together with
483	   the recovery point SEI message (present in both H.264 and HEVC) for
484	   improved support of gradual decoding refresh (GDR).  This supports
485	   random access from inter-coded pictures, wherein complete pictures
486	   can be correctly decoded or recovered after an indicated number of
487	   pictures in output/display order.

489	1.1.3 Parallel Processing Support

491	   The reportedly significantly higher encoding computational demand of
492	   HEVC over H.264, in conjunction with the ever increasing video
493	   resolution (both spatially and temporally) required by the market,
494	   led to the adoption of VCL coding tools specifically targeted to
495	   allow for parallelization on the sub-picture level.  That is,
496	   parallelization occurs, at the minimum, at the granularity of an
497	   integer number of CTUs.  The targets for this type of high-level
498	   parallelization are multicore CPUs and DSPs as well as
499	   multiprocessor systems.  In a system design, to be useful, these
500	   tools require signaling support, which is provided in Section 7 of
501	   this memo.  This section provides a brief overview of the tools
502	   available in [HEVC].

504	   Many of the tools incorporated in HEVC were designed keeping in mind
505	   the potential parallel implementations in multi-core/multi-processor
506	   architectures.  Specifically, for parallelization, four picture
507	   partition strategies are available.

509	   Slices are segments of the bitstream that can be reconstructed
510	   independently from other slices within the same picture (though
511	   there may still be interdependencies through loop filtering
512	   operations).  Slices are the only tool that can be used for
513	   parallelization that is also available, in virtually identical form,
514	   in H.264.  Slices based parallelization does not require much inter-
515	   processor or inter-core communication (except for inter-processor or
516	   inter-core data sharing for motion compensation when decoding a
517	   predictively coded picture, which is typically much heavier than
518	   inter-processor or inter-core data sharing due to in-picture
519	   prediction), as slices are designed to be independently decodable.
520	   However, for the same reason, slices can require some coding
521	   overhead.  Further, slices (in contrast to some of the other tools
522	   mentioned below) also serve as the key mechanism for bitstream
523	   partitioning to match Maximum Transfer Unit (MTU) size requirements,
524	   due to the in-picture independence of slices and the fact that each
525	   regular slice is encapsulated in its own NAL unit.  In many cases,
526	   the goal of parallelization and the goal of MTU size matching can
527	   place contradicting demands to the slice layout in a picture.  The
528	   realization of this situation led to the development of the more
529	   advanced tools mentioned below.

531	   Dependent slice segments allow for fragmentation of a coded slice
532	   into fragments at CTU boundaries without breaking any in-picture
533	   prediction mechanism.  They are complementary to the fragmentation
534	   mechanism described in this memo in that they need the cooperation
535	   of the encoder.  As a dependent slice segment necessarily contains
536	   an integer number of CTUs, a decoder using multiple cores operating
537	   on CTUs can process a dependent slice segment without communicating
538	   parts of the slice segment's bitstream to other cores.
539	   Fragmentation, as specified in this memo, in contrast, does not
540	   guarantee that a fragment contains an integer number of CTUs.

542	   In wavefront parallel processing (WPP), the picture is partitioned
543	   into rows of CTUs.  Entropy decoding and prediction are allowed to
544	   use data from CTUs in other partitions.  Parallel processing is
545	   possible through parallel decoding of CTU rows, where the start of
546	   the decoding of a row is delayed by two CTUs, so to ensure that data
547	   related to a CTU above and to the right of the subject CTU is
548	   available before the subject CTU is being decoded.  Using this
549	   staggered start (which appears like a wavefront when represented
550	   graphically), parallelization is possible with up to as many
551	   processors/cores as the picture contains CTU rows.

553	   Because in-picture prediction between neighboring CTU rows within a
554	   picture is allowed, the required inter-processor/inter-core
555	   communication to enable in-picture prediction can be substantial.
556	   The WPP partitioning does not result in the creation of more NAL
557	   units compared to when it is not applied, thus WPP cannot be used
558	   for MTU size matching, though slices can be used in combination for
559	   that purpose.

561	   Tiles define horizontal and vertical boundaries that partition a
562	   picture into tile columns and rows.  The scan order of CTUs is
563	   changed to be local within a tile (in the order of a CTU raster scan
564	   of a tile), before decoding the top-left CTU of the next tile in the
565	   order of tile raster scan of a picture.  Similar to slices, tiles
566	   break in-picture prediction dependencies (including entropy decoding
567	   dependencies).  However, they do not need to be included into
568	   individual NAL units (same as WPP in this regard), hence tiles
569	   cannot be used for MTU size matching, though slices can be used in
570	   combination for that purpose.  Each tile can be processed by one
571	   processor/core, and the inter-processor/inter-core communication
572	   required for in-picture prediction between processing units decoding
573	   neighboring tiles is limited to conveying the shared slice header in
574	   cases a slice is spanning more than one tile, and loop filtering
575	   related sharing of reconstructed samples and metadata.  Insofar,
576	   tiles are less demanding in terms of inter-processor communication
577	   bandwidth compared to WPP due to the in-picture independence between
578	   two neighboring partitions.

580	1.1.4 NAL Unit Header

582	   HEVC maintains the NAL unit concept of H.264 with modifications.
583	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
584	   payload of a NAL unit refers to the NAL unit excluding the NAL unit
585	   header.

587	                     +---------------+---------------+
588	                     |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
589	                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
590	                     |F|   Type    |  LayerId  | TID |
591	                     +-------------+-----------------+

593	              Figure 1 The structure of HEVC NAL unit header

595	   The semantics of the fields in the NAL unit header are as specified
596	   in [HEVC] and described briefly below for convenience.  In addition
597	   to the name and size of each field, the corresponding syntax element
598	   name in [HEVC] is also provided.

600	   F: 1 bit
601	      forbidden_zero_bit.  MUST be zero.  HEVC declares a value of 1 as
602	      a syntax violation.  Note that the inclusion of this bit in the
603	      NAL unit header is to enable transport of HEVC video over MPEG-2
604	      transport systems (avoidance of start code emulations) [MPEG2S].

606	   Type: 6 bits
607	      nal_unit_type.  This field specifies the NAL unit type as defined
608	      in Table 7-1 of [HEVC].  If the most significant bit of this
609	      field of a NAL unit is equal to 0 (i.e. the value of this field
610	      is less than 32), the NAL unit is a VCL NAL unit.  Otherwise, the
611	      NAL unit is a non-VCL NAL unit.  For a reference of all currently
612	      defined NAL unit types and their semantics, please refer to
613	      Section 7.4.1 in [HEVC].

615	   LayerId: 6 bits
616	      nuh_layer_id.  MUST be equal to zero.  It is anticipated that in
617	      future scalable or 3D video coding extensions of this
618	      specification, this syntax element will be used to identify
619	      additional layers that may be present in the coded video
620	      sequence, wherein a layer may be, e.g. a spatial scalable layer,
621	      a quality scalable layer, a texture view, or a depth view.

623	   TID: 3 bits
624	      nuh_temporal_id_plus1.  This field specifies the temporal
625	      identifier of the NAL unit plus 1.  The value of TemporalId is
626	      equal to TID minus 1.  A TID value of 0 is illegal to ensure that
627	      there is at least one bit in the NAL unit header equal to 1, so
628	      to enable independent considerations of start code emulations in
629	      the NAL unit header and in the NAL unit payload data.

631	1.2. Overview of the Payload Format

633	   This payload format defines the following processes required for
634	   transport of HEVC coded data over RTP [RFC3550]:

636	   o Usage of RTP header with this payload format

638	   o Packetization of HEVC coded NAL units into RTP packets using three
639	     types of payload structures, namely single NAL unit packet,
640	     aggregation packet, and fragment unit

642	   o Transmission of HEVC NAL units of the same bitstream within a
643	     single RTP stream or multiple RTP streams within one or more RTP
644	     sessions, where within an RTP stream transmission of NAL units may
645	     be either non-interleaved (i.e. the transmission order of NAL
646	     units is the same as their decoding order) or interleaved (i.e.

648	     the transmission order of NAL units is different from their
649	     decoding order)

651	   o Media type parameters to be used with the Session Description
652	     Protocol (SDP) [RFC4566]

654	   o A payload header extension mechanism and data structures for
655	     enhanced support of temporal scalability based on that extension
656	     mechanism.

658	2. Conventions

660	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
661	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
662	   document are to be interpreted as described in BCP 14, RFC 2119
663	   [RFC2119].

665	   In this document, these key words will appear with that
666	   interpretation only when in ALL CAPS.  Lower case uses of these
667	   words are not to be interpreted as carrying the RFC 2119
668	   significance.

670	   This specification uses the notion of setting and clearing a bit
671	   when bit fields are handled.  Setting a bit is the same as assigning
672	   that bit the value of 1 (On).  Clearing a bit is the same as
673	   assigning that bit the value of 0 (Off).

675	3. Definitions and Abbreviations

677	3.1 Definitions

679	   This document uses the terms and definitions of [HEVC].  Section
680	   3.1.1 lists relevant definitions copied from [HEVC] for convenience.
681	   Section 3.1.2 provides definitions specific to this memo.

683	3.1.1 Definitions from the HEVC Specification

685	   access unit: A set of NAL units that are associated with each other
686	   according to a specified classification rule, are consecutive in
687	   decoding order, and contain exactly one coded picture.

689	   BLA access unit: An access unit in which the coded picture is a BLA
690	   picture.

692	   BLA picture: An IRAP picture for which each VCL NAL unit has
693	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

695	   coded video sequence: A sequence of access units that consists, in
696	   decoding order, of an IRAP access unit with NoRaslOutputFlag equal
697	   to 1, followed by zero or more access units that are not IRAP access
698	   units with NoRaslOutputFlag equal to 1, including all subsequent
699	   access units up to but not including any subsequent access unit that
700	   is an IRAP access unit with NoRaslOutputFlag equal to 1.

702	      Informative note: An IRAP access unit may be an IDR access unit,
703	      a BLA access unit, or a CRA access unit.  The value of
704	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each BLA
705	      access unit, and each CRA access unit that is the first access
706	      unit in the bitstream in decoding order, is the first access unit
707	      that follows an end of sequence NAL unit in decoding order, or
708	      has HandleCraAsBlaFlag equal to 1.

710	   CRA access unit: An access unit in which the coded picture is a CRA
711	   picture.

713	   CRA picture: A RAP picture for which each VCL NAL unit has
714	   nal_unit_type equal to CRA_NUT.

716	   IDR access unit: An access unit in which the coded picture is an IDR
717	   picture.

719	   IDR picture: A RAP picture for which each VCL NAL unit has
720	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

722	   IRAP access unit: An access unit in which the coded picture is an
723	   IRAP picture.

725	   IRAP picture: A coded picture for which each VCL NAL unit has
726	   nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23 (23),
727	   inclusive.

729	   layer: A set of VCL NAL units that all have a particular value of
730	   nuh_layer_id and the associated non-VCL NAL units, or one of a set
731	   of syntactical structures having a hierarchical relationship.

733	   operation point: bitstream created from another bitstream by
734	   operation of the sub-bitstream extraction process with the another
735	   bitstream, a target highest TemporalId, and a target layer
736	   identifier list as inputs.

738	   random access: The act of starting the decoding process for a
739	   bitstream at a point other than the beginning of the bitstream.

741	   sub-layer: A temporal scalable layer of a temporal scalable
742	   bitstream consisting of VCL NAL units with a particular value of the
743	   TemporalId variable, and the associated non-VCL NAL units.

745	   tile: A rectangular region of coding tree blocks within a particular
746	   tile column and a particular tile row in a picture.

748	   tile column: A rectangular region of coding tree blocks having a
749	   height equal to the height of the picture and a width specified by
750	   syntax elements in the picture parameter set.

752	   tile row: A rectangular region of coding tree blocks having a height
753	   specified by syntax elements in the picture parameter set and a
754	   width equal to the width of the picture.

756	3.1.2 Definitions Specific to This Memo

758	   dependent RTP stream: An RTP stream on which another RTP stream
759	   depends.  All RTP streams in an MST except for the highest RTP
760	   stream are all dependent RTP streams.

762	   highest RTP stream: The packet stream on which no other RTP stream
763	   depends.  The RTP stream in an SST is the highest RTP stream.

765	   media aware network element (MANE): A network element, such as a
766	   middlebox, selective forwarding unit, or application layer gateway
767	   that is capable of parsing certain aspects of the RTP payload
768	   headers or the RTP payload and reacting to their contents.

770	      Informative note: The concept of a MANE goes beyond normal
771	      routers or gateways in that a MANE has to be aware of the
772	      signaling (e.g. to learn about the payload type mappings of the
773	      media streams), and in that it has to be trusted when working
774	      with SRTP.  The advantage of using MANEs is that they allow
775	      packets to be dropped according to the needs of the media coding.
776	      For example, if a MANE has to drop packets due to congestion on a
777	      certain link, it can identify and remove those packets whose
778	      elimination produces the least adverse effect on the user
779	      experience.  After dropping packets, MANEs must rewrite RTCP
780	      packets to match the changes to the RTP stream as specified in
781	      Section 7 of [RFC3550].

783	   multi-stream transmission (MST): Transmission of an HEVC bitstream
784	   using more than one RTP stream.

786	   NAL unit decoding order: A NAL unit order that conforms to the
787	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

789	   NAL-unit-like structure: A data structure that is similar to NAL
790	   units in the sense that it also has a NAL unit header and a payload,
791	   with a difference that the payload does not follow the start code
792	   emulation prevention mechanism required for the NAL unit syntax as
793	   specified in Section 7.3.1.1 of [HEVC].  Examples NAL-unit-like
794	   structures defined in this memo are packet payloads of AP, PACI, and
795	   FU packets.

797	   NALU-time: The value that the RTP timestamp would have if the NAL
798	   unit would be transported in its own RTP packet.

800	   packet stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy].  Within
801	   the scope of this memo, one RTP stream is utilized to transport one
802	   or more temporal sub-layers.

804	   single-stream transmission (SST): Transmission of an HEVC bitstream
805	   using only one RTP stream.

807	   transmission order: The order of packets in ascending RTP sequence
808	   number order (in modulo arithmetic).  Within an aggregation packet,
809	   the NAL unit transmission order is the same as the order of
810	   appearance of NAL units in the packet.

812	3.2 Abbreviations

814	   AP       Aggregation Packet

816	   BLA      Broken Link Access

818	   CRA      Clean Random Access

820	   CTB      Coding Tree Block

822	   CTU      Coding Tree Unit

824	   CVS      Coded Video Sequence

826	   FU       Fragmentation Unit

828	   GDR      Gradual Decoding Refresh

830	   HRD      Hypothetical Reference Decoder

832	   IDR      Instantaneous Decoding Refresh

834	   IRAP     Intra Random Access Point

836	   MANE     Media Aware Network Element

838	   MST      Multi-Stream Transmission

840	   MTU      Maximum Transfer Unit

842	   NAL      Network Abstraction Layer

844	   NALU     Network Abstraction Layer Unit

846	   PACI     PAyload Content Information

848	   PHES     Payload Header Extension Structure

850	   PPS      Picture Parameter Set

852	   RADL     Random Access Decodable Leading (Picture)

854	   RASL     Random Access Skipped Leading (Picture)
855	   RPS      Reference Picture Set

857	   SEI      Supplemental Enhancement Information

859	   SPS      Sequence Parameter Set

861	   SST      Single-Stream Transmission

863	   STSA     Step-wise Temporal Sub-layer Access

865	   TSA      Temporal Sub-layer Access

867	   TCSI     Temporal Scalability Control Information

869	   VCL      Video Coding Layer

871	   VPS      Video Parameter Set

873	4. RTP Payload Format

875	4.1 RTP Header Usage

877	   The format of the RTP header is specified in [RFC3550] and reprinted
878	   in Figure 2 for convenience.  This payload format uses the fields of
879	   the header in a manner consistent with that specification.

881	   The RTP payload (and the settings for some RTP header bits) for
882	   aggregation packets and fragmentation units are specified in
883	   Sections 4.7 and 4.8, respectively.

885	    0                   1                   2                   3
886	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
887	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
888	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
889	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
890	   |                           timestamp                           |
891	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
892	   |           synchronization source (SSRC) identifier            |
893	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
894	   |            contributing source (CSRC) identifiers             |
895	   |                             ....                              |
896	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

898	                Figure 2 RTP header according to [RFC3550]

900	   The RTP header information to be set according to this RTP payload
901	   format is set as follows:

903	   Marker bit (M): 1 bit

905	      Set for the last packet, carried in the current RTP stream, of
906	      the access unit, in line with the normal use of the M bit in
907	      video formats, to allow an efficient playout buffer handling.
908	      When MST is in use, if an access unit appears in multiple RTP
909	      streams, the marker bit is set on each RTP stream's last packet
910	      of the access unit.

912	         Informative note: The content of a NAL unit does not tell
913	         whether or not the NAL unit is the last NAL unit, in decoding
914	         order, of an access unit.  An RTP sender implementation may
915	         obtain this information from the video encoder.  If, however,
916	         the implementation cannot obtain this information directly
917	         from the encoder, e.g. when the bitstream was pre-encoded, and
918	         also there is no timestamp allocated for each NAL unit, then
919	         the sender implementation can inspect subsequent NAL units in
920	         decoding order to determine whether or not the NAL unit is the
921	         last NAL unit of an access unit as follows.  A NAL unit naluX
922	         is the last NAL unit of an access unit if it is the last NAL
923	         unit of the bitstream or the next VCL NAL unit naluY in
924	         decoding order has the high-order bit of the first byte after
925	         its NAL unit header equal to 1, and all NAL units between
926	         naluX and naluY, when present, have nal_unit_type in the range
927	         of 32 to 35, inclusive, equal to 39, or in the ranges of 41 to
928	         44, inclusive, or 48 to 55, inclusive.

930	   Payload type (PT): 7 bits

932	      The assignment of an RTP payload type for this new packet format
933	      is outside the scope of this document and will not be specified
934	      here.  The assignment of a payload type has to be performed
935	      either through the profile used or in a dynamic way.

937	         Informative note: It is not required to use different payload
938	         type values for different RTP streams in MST.

940	   Sequence number (SN): 16 bits

942	      Set and used in accordance with RFC 3550.

944	   Timestamp: 32 bits

946	      The RTP timestamp is set to the sampling timestamp of the
947	      content.  A 90 kHz clock rate MUST be used.

949	      If the NAL unit has no timing properties of its own (e.g.
950	      parameter set and SEI NAL units), the RTP timestamp MUST be set
951	      to the RTP timestamp of the coded picture of the access unit in
952	      which the NAL unit (according to Section 7.4.2.4.4 of [HEVC]) is
953	      included.

955	      Receivers MUST use the RTP timestamp for the display process,
956	      even when the bitstream contains picture timing SEI messages or
957	      decoding unit information SEI messages as specified in [HEVC].
958	      However, this does not mean that picture timing SEI messages in
959	      the bitstream should be discarded, as picture timing SEI messages
960	      may contain frame-field information that is important in
961	      appropriately rendering interlaced video.

963	   Synchronization source (SSRC): 32-bits

965	      Used to identify the source of the RTP packets.  In SST, by
966	      definition a single SSRC is used for all parts of a single
967	      bitstream.  In MST, each SSRC is used for an RTP stream
968	      containing a subset of the sub-layers for a single (temporally
969	      scalable) bitstream.  A receiver is required to correctly
970	      associate the set of SSRCs that are included parts of the same
971	      bitstream.

973	         Informative note: The term "bitstream" in this document is
974	         equivalent to the term "encoded stream" in [I-D.ietf-avtext-
975	         rtp-grouping-taxonomy].

977	4.2 Payload Header Usage

979	   The TID value indicates (among other things) the relative importance
980	   of an RTP packet, for example because NAL units belonging to higher
981	   temporal sub-layers are not used for the decoding of lower temporal
982	   sub-layers.  A lower value of TID indicates a higher importance.
983	   More important NAL units MAY be better protected against
984	   transmission losses than less important NAL units.

986	4.3 Payload Structures

988	   The first two bytes of the payload of an RTP packet are referred to
989	   as the payload header.  The payload header consists of the same
990	   fields (F, Type, LayerId, and TID) as the NAL unit header as shown
991	   in section 1.1.4, irrespective of the type of the payload structure.

993	   Four different types of RTP packet payload structures are specified.
994	   A receiver can identify the type of an RTP packet payload through
995	   the Type field in the payload header.

997	   The four different payload structures are as follows:

999	   o  Single NAL unit packet: Contains a single NAL unit in the
1000	      payload, and the NAL unit header of the NAL unit also serves as
1001	      the payload header.  This payload structure is specified in
1002	      section 4.6.

1004	   o  Aggregation packet (AP): Contains more than one NAL unit within
1005	      one access unit.  This payload structure is specified in
1006	      section 4.7.

1008	   o  Fragmentation unit (FU): Contains a subset of a single NAL unit.
1009	      This payload structure is specified in section 4.8.

1011	   o  PACI carrying RTP packet: Contains a payload header (that differs
1012	      from other payload headers for efficiency), a Payload Header
1013	      Extension Structure (PHES), and a PACI payload.  This payload
1014	      structure is specified in section 4.9.

1016	4.4 Transmission Modes

1018	   This memo enables transmission of an HEVC bitstream over a single
1019	   packet stream or multiple RTP streams.  The concept and working
1020	   principle is inherited from the design of what was called single and
1021	   multiple session transmission in [RFC6190] and follows a similar
1022	   design.  If only one RTP stream is used for transmission of the HEVC
1023	   bitstream, the transmission mode is referred to as single-stream
1024	   transmission (SST); otherwise (more than one RTP stream is used for
1025	   transmission of the HEVC bitstream), the transmission mode is
1026	   referred to as multi-stream transmission (MST).

1028	   Dependency of one RTP stream on another RTP stream is typically
1029	   indicated as specified in [RFC5583].  When an RTP stream A depends
1030	   on another RTP stream B, the RTP stream B is referred to as a
1031	   dependent RTP stream of the RTP stream A.

1033	      Informative note: An MST may involve one or more RTP sessions.
1034	      For example, each RTP stream in an MST may be in its own RTP
1035	      session.  For another example, a set of multiple RTP streams in
1036	      an MST may belong to the same RTP session, e.g. as indicated by
1037	      the mechanism specified in [I-D.ietf-avtcore-rtp-multi-stream] or
1038	      [I-D.ietf-mmusic-sdp-bundle-negotiation].

1040	   SST SHOULD be used for point-to-point unicast scenarios, while MST
1041	   SHOULD be used for point-to-multipoint multicast scenarios where
1042	   different receivers require different operation points of the same
1043	   HEVC bitstream, to improve bandwidth utilizing efficiency.

1045	      Informative note: A multicast may degrade to a unicast after all
1046	      but one receivers have left (this is a justification of the first
1047	      "SHOULD" instead of "MUST"), and there might be scenarios where
1048	      MST is desirable but not possible e.g. when IP multicast is not
1049	      deployed in certain network (this is a justification of the
1050	      second "SHOULD" instead of "MUST").

1052	   The transmission mode is indicated by the tx-mode media parameter
1053	   (see section 7.1).  If tx-mode is equal to "SST", SST MUST be used.
1054	   Otherwise (tx-mode is equal to "MST"), MST MUST be used.

1056	   Receivers MUST support both SST and MST.

1058	4.5 Decoding Order Number

1060	   For each NAL unit, the variable AbsDon is derived, representing the
1061	   decoding order number that is indicative of the NAL unit decoding
1062	   order.

1064	   Let NAL unit n be the n-th NAL unit in transmission order within an
1065	   RTP stream.

1067	   If tx-mode is equal to "SST" and sprop-max-don-diff is equal to 0,
1068	   AbsDon[n], the value of AbsDon for NAL unit n, is derived as equal
1069	   to n.

1071	   Otherwise (tx-mode is equal to "MST" or sprop-max-don-diff is
1072	   greater than 0), AbsDon[n] is derived as follows, where DON[n] is
1073	   the value of the variable DON for NAL unit n:

1075	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit in
1076	      transmission order), AbsDon[0] is set equal to DON[0].

1078	   o  Otherwise (n is greater than 0), the following applies for
1079	      derivation of AbsDon[n]:

1081	            If DON[n] == DON[n-1],
1082	                AbsDon[n] = AbsDon[n-1]

1084	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1085	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1087	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1088	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1090	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1091	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 - DON[n])

1093	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1094	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1096	   For any two NAL units m and n, the following applies:

1098	   o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
1099	      follows NAL unit m in NAL unit decoding order.

1101	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding order
1102	      of the two NAL units can be in either order.

1104	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n precedes
1105	      NAL unit m in decoding order.

1107	   When two consecutive NAL units in the NAL unit decoding order have
1108	   different values of AbsDon, the value of AbsDon for the second NAL
1109	   unit in decoding order MUST be greater than the value of AbsDon for
1110	   the first NAL unit, and the absolute difference between the two
1111	   AbsDon values MAY be greater than or equal to 1.

1113	      Informative note: There are multiple reasons to allow for the
1114	      absolute difference of the values of AbsDon for two consecutive
1115	      NAL units in the NAL unit decoding order to be greater than one.
1116	      An increment by one is not required, as at the time of
1117	      associating values of AbsDon to NAL units, it may not be known
1118	      whether all NAL units are to be delivered to the receiver.  For
1119	      example, a gateway may not forward VCL NAL units of higher sub-
1120	      layers or some SEI NAL units when there is congestion in the
1121	      network.  In another example, the first intra-coded picture of a
1122	      pre-encoded clip is transmitted in advance to ensure that it is
1123	      readily available in the receiver, and when transmitting the
1124	      first intra-coded picture, the originator does not exactly know
1125	      how many NAL units will be encoded before the first intra-coded
1126	      picture of the pre-encoded clip follows in decoding order.  Thus,
1127	      the values of AbsDon for the NAL units of the first intra-coded
1128	      picture of the pre-encoded clip have to be estimated when they
1129	      are transmitted, and gaps in values of AbsDon may occur.  Another
1130	      example is MST where the AbsDon values must indicate cross-layer
1131	      decoding order for NAL units conveyed in all the RTP streams.

1133	4.6 Single NAL Unit Packets

1135	   A single NAL unit packet contains exactly one NAL unit, and consists
1136	   of a payload header (denoted as PayloadHdr), a conditional 16-bit
1137	   DONL field (in network byte order), and the NAL unit payload data
1138	   (the NAL unit excluding its NAL unit header) of the contained NAL
1139	   unit, as shown in Figure 3.

1141	   0                   1                   2                   3
1142	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1143	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1144	   |           PayloadHdr          |      DONL (conditional)       |
1145	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1146	   |                                                               |
1147	   |                  NAL unit payload data                        |
1148	   |                                                               |
1149	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1150	   |                               :...OPTIONAL RTP padding        |
1151	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1153	              Figure 3 The structure a single NAL unit packet

1155	   The payload header SHOULD be an exact copy of the NAL unit header of
1156	   the contained NAL unit.  However, the Type (i.e. nal_unit_type)
1157	   field MAY be changed, e.g. when it is desirable to handle a CRA
1158	   picture to be a BLA picture [JCTVC-J0107].

1160	   The DONL field, when present, specifies the value of the 16 least
1161	   significant bits of the decoding order number of the contained NAL
1162	   unit.  If tx-mode is equal to "MST" or sprop-max-don-diff is greater
1163	   than 0, the DONL field MUST be present, and the variable DON for the
1164	   contained NAL unit is derived as equal to the value of the DONL
1165	   field.  Otherwise (tx-mode is equal to "SST" and sprop-max-don-diff
1166	   is equal to 0), the DONL field MUST NOT be present.

1168	4.7 Aggregation Packets (APs)

1170	   Aggregation packets (APs) are introduced to enable the reduction of
1171	   packetization overhead for small NAL units, such as most of the non-
1172	   VCL NAL units, which are often only a few octets in size.

1174	   An AP aggregates NAL units within one access unit.  Each NAL unit to
1175	   be carried in an AP is encapsulated in an aggregation unit.  NAL
1176	   units aggregated in one AP are in NAL unit decoding order.

1178	   An AP consists of a payload header (denoted as PayloadHdr) followed
1179	   by two or more aggregation units, as shown in Figure 4.

1181	   0                   1                   2                   3
1182	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1183	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1184	   |    PayloadHdr (Type=48)       |                               |
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1186	   |                                                               |
1187	   |             two or more aggregation units                     |
1188	   |                                                               |
1189	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1190	   |                               :...OPTIONAL RTP padding        |
1191	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1193	              Figure 4 The structure of an aggregation packet

1195	   The fields in the payload header are set as follows.  The F bit MUST
1196	   be equal to 0 if the F bit of each aggregated NAL unit is equal to
1197	   zero; otherwise, it MUST be equal to 1.  The Type field MUST be
1198	   equal to 48.  The value of LayerId MUST be equal to the lowest value
1199	   of LayerId of all the aggregated NAL units.  The value of TID MUST
1200	   be the lowest value of TID of all the aggregated NAL units.

1202	      Informative Note: All VCL NAL units in an AP have the same TID
1203	      value since they belong to the same access unit.  However, an AP
1204	      may contain non-VCL NAL units for which the TID value in the NAL
1205	      unit header may be different than the TID value of the VCL NAL
1206	      units in the same AP.

1208	   An AP MUST carry at least two aggregation units and can carry as
1209	   many aggregation units as necessary; however, the total amount of
1210	   data in an AP obviously MUST fit into an IP packet, and the size
1211	   SHOULD be chosen so that the resulting IP packet is smaller than the
1212	   MTU size so to avoid IP layer fragmentation.  An AP MUST NOT contain
1213	   Fragmentation Units (FUs) specified in section 4.8.  APs MUST NOT be
1214	   nested; i.e. an AP MUST NOT contain another AP.

1216	   The first aggregation unit in an AP consists of a conditional 16-bit
1217	   DONL field (in network byte order) followed by a 16-bit unsigned
1218	   size information (in network byte order) that indicates the size of
1219	   the NAL unit in bytes (excluding these two octets, but including the
1220	   NAL unit header), followed by the NAL unit itself, including its NAL
1221	   unit header, as shown in Figure 5.

1223	   0                   1                   2                   3
1224	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1225	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1226	                   :       DONL (conditional)      |   NALU size   |
1227	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1228	   |   NALU size   |                                               |
1229	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1230	   |                                                               |
1231	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1232	   |                               :
1233	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1235	       Figure 5 The structure of the first aggregation unit in an AP

1237	   The DONL field, when present, specifies the value of the 16 least
1238	   significant bits of the decoding order number of the aggregated NAL
1239	   unit.

1241	   If tx-mode is equal to "MST" or sprop-max-don-diff is greater than
1242	   0, the DONL field MUST be present in an aggregation unit that is the
1243	   first aggregation unit in an AP, and the variable DON for the
1244	   aggregated NAL unit is derived as equal to the value of the DONL
1245	   field.  Otherwise (tx-mode is equal to "SST" and sprop-max-don-diff
1246	   is equal to 0), the DONL field MUST NOT be present in an aggregation
1247	   unit that is the first aggregation unit in an AP.

1249	   An aggregation unit that is not the first aggregation unit in an AP
1250	   consists of a conditional 8-bit DOND field followed by a 16-bit
1251	   unsigned size information (in network byte order) that indicates the
1252	   size of the NAL unit in bytes (excluding these two octets, but
1253	   including the NAL unit header), followed by the NAL unit itself,
1254	   including its NAL unit header, as shown in Figure 6.

1256	   0                   1                   2                   3
1257	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1258	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1259	                   : DOND (cond)   |          NALU size            |
1260	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1261	   |                                                               |
1262	   |                       NAL unit                                |
1263	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1264	   |                               :
1265	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1267	    Figure 6 The structure of an aggregation unit that is not the first
1268	                         aggregation unit in an AP

1270	   When present, the DOND field plus 1 specifies the difference between
1271	   the decoding order number values of the current aggregated NAL unit
1272	   and the preceding aggregated NAL unit in the same AP.

1274	   If tx-mode is equal to "MST" or sprop-max-don-diff is greater than
1275	   0, the DOND field MUST be present in an aggregation unit that is not
1276	   the first aggregation unit in an AP, and the variable DON for the
1277	   aggregated NAL unit is derived as equal to the DON of the preceding
1278	   aggregated NAL unit in the same AP plus the value of the DOND field
1279	   plus 1 modulo 65536.  Otherwise (tx-mode is equal to "SST" and
1280	   sprop-max-don-diff is equal to 0), the DOND field MUST NOT be
1281	   present in an aggregation unit that is not the first aggregation
1282	   unit in an AP, and in this case the transmission order and decoding
1283	   order of NAL units carried in the AP are the same as the order the
1284	   NAL units appear in the AP.

1286	   Figure 7 presents an example of an AP that contains two aggregation
1287	   units, labeled as 1 and 2 in the figure, without the DONL and DOND
1288	   fields being present.

1290	    0                   1                   2                   3
1291	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1292	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1293	   |                          RTP Header                           |
1294	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1295	   |   PayloadHdr (Type=48)        |         NALU 1 Size           |
1296	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1297	   |          NALU 1 HDR           |                               |
1298	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1299	   |                   . . .                                       |
1300	   |                                                               |
1301	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1302	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1303	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1304	   | NALU 2 HDR    |                                               |
1305	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1306	   |                   . . .                                       |
1307	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1308	   |                               :...OPTIONAL RTP padding        |
1309	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1311	   Figure 7 An example of an AP packet containing two aggregation units
1312	                     without the DONL and DOND fields

1314	   Figure 8 presents an example of an AP that contains two aggregation
1315	   units, labeled as 1 and 2 in the figure, with the DONL and DOND
1316	   fields being present.

1318	    0                   1                   2                   3
1319	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1320	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1321	   |                          RTP Header                           |
1322	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1323	   |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
1324	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1325	   |          NALU 1 Size          |            NALU 1 HDR         |
1326	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1327	   |                                                               |
1328	   |                 NALU 1 Data   . . .                           |
1329	   |                                                               |
1330	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1331	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1332	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1333	   |          NALU 2 HDR           |                               |
1334	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1335	   |                                                               |
1336	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1337	   |                               :...OPTIONAL RTP padding        |
1338	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1340	    Figure 8 An example of an AP containing two aggregation units with
1341	                         the DONL and DOND fields

1343	4.8 Fragmentation Units (FUs)

1345	   Fragmentation units (FUs) are introduced to enable fragmenting a
1346	   single  NAL  unit  into  multiple  RTP  packets,  possibly  without
1347	   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
1348	   unit consists of an integer number of consecutive octets of that NAL
1349	   unit.  Fragments of the same NAL unit MUST be sent in consecutive
1350	   order with ascending RTP sequence numbers (with no other RTP packets
1351	   within the same RTP stream being sent between the first and last
1352	   fragment).

1354	   When a NAL unit is fragmented and conveyed within FUs, it is
1355	   referred to as a fragmented NAL unit.  APs MUST NOT be fragmented.
1356	   FUs MUST NOT be nested; i.e. an FU MUST NOT contain a subset of
1357	   another FU.

1359	   The RTP timestamp of an RTP packet carrying an FU is set to the
1360	   NALU-time of the fragmented NAL unit.

1362	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1363	   header of one octet, a conditional 16-bit DONL field (in network
1364	   byte order), and an FU payload, as shown in Figure 9.

1366	    0                   1                   2                   3
1367	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1368	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1369	   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
1370	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1371	   | DONL (cond)   |                                               |
1372	   |-+-+-+-+-+-+-+-+                                               |
1373	   |                         FU payload                            |
1374	   |                                                               |
1375	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1376	   |                               :...OPTIONAL RTP padding        |
1377	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1379	                      Figure 9 The structure of an FU

1381	   The fields in the payload header are set as follows.  The Type field
1382	   MUST be equal to 49.  The fields F, LayerId, and TID MUST be equal
1383	   to the fields F, LayerId, and TID, respectively, of the fragmented
1384	   NAL unit.

1386	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1387	   field, as shown in Figure 10.

1389	                            +---------------+
1390	                            |0|1|2|3|4|5|6|7|
1391	                            +-+-+-+-+-+-+-+-+
1392	                            |S|E|  FuType   |
1393	                            +---------------+

1395	                  Figure 10   The structure of FU header

1397	   The semantics of the FU header fields are as follows:
1398	   S: 1 bit
1399	      When set to one, the S bit indicates the start of a fragmented
1400	      NAL unit i.e. the first byte of the FU payload is also the first
1401	      byte of the payload of the fragmented NAL unit.  When the FU
1402	      payload is not the start of the fragmented NAL unit payload, the
1403	      S bit MUST be set to zero.

1405	   E: 1 bit
1406	      When set to one, the E bit indicates the end of a fragmented NAL
1407	      unit, i.e. the last byte of the payload is also the last byte of
1408	      the fragmented NAL unit.  When the FU payload is not the last
1409	      fragment of a fragmented NAL unit, the E bit MUST be set to zero.

1411	   FuType: 6 bits
1412	      The field FuType MUST be equal to the field Type of the
1413	      fragmented NAL unit.

1415	   The DONL field, when present, specifies the value of the 16 least
1416	   significant bits of the decoding order number of the fragmented NAL
1417	   unit.

1419	   If tx-mode is equal to "MST" or sprop-max-don-diff is greater than
1420	   0, and the S bit is equal to 1, the DONL field MUST be present in
1421	   the FU, and the variable DON for the fragmented NAL unit is derived
1422	   as equal to the value of the DONL field.  Otherwise (tx-mode is
1423	   equal to "SST" and sprop-max-don-diff is equal to 0, or the S bit is
1424	   equal to 0), the DONL field MUST NOT be present in the FU.

1426	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
1427	   the Start bit and End bit MUST NOT both be set to one in the same FU
1428	   header.

1430	   The FU payload consists of fragments of the payload of the
1431	   fragmented NAL unit so that if the FU payloads of consecutive FUs,
1432	   starting with an FU with the S bit equal to 1 and ending with an FU
1433	   with the E bit equal to 1, are sequentially concatenated, the
1434	   payload of the fragmented NAL unit can be reconstructed.  The NAL
1435	   unit header of the fragmented NAL unit is not included as such in
1436	   the FU payload, but rather the information of the NAL unit header of
1437	   the fragmented NAL unit is conveyed in F, LayerId, and TID fields of
1438	   the FU payload headers of the FUs and the FuType field of the FU
1439	   header of the FUs.  An FU payload MUST not be empty.

1441	   If an FU is lost, the receiver SHOULD discard all following
1442	   fragmentation units in transmission order corresponding to the same
1443	   fragmented NAL unit, unless the decoder in the receiver is known to
1444	   be prepared to gracefully handle incomplete NAL units.

1446	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1447	   fragments of a NAL unit to an (incomplete) NAL unit, even if
1448	   fragment n of that NAL unit is not received.  In this case, the
1449	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1450	   syntax violation.

1452	4.9 PACI packets

1454	   This section specifies the PACI packet structure.  The basic payload
1455	   header specified in this memo is intentionally limited to the 16
1456	   bits of the NAL unit header so to keep the packetization overhead to
1457	   a minimum.  However, cases have been identified where it is
1458	   advisable to include control information in an easily accessible
1459	   position in the packet header, despite the additional overhead.  One
1460	   such control information is the Temporal Scalability Control
1461	   Information as specified in section 4.10 below.  PACI packets carry
1462	   this and future, similar structures.

1464	   The PACI packet structure is based on a payload header extension
1465	   mechanism that is generic and extensible to carry payload header
1466	   extensions.  In this section, the focus lies on the use within this
1467	   specification.  Section 4.9.2 below provides guidance for the
1468	   specification designers in how to employ the extension mechanism in
1469	   future specifications.

1471	   A PACI packet consists of a payload header (denoted as PayloadHdr),
1472	   for which the structure follows what is described in section 4.3
1473	   above.  The payload header is followed by the fields A, cType,
1474	   PHSsize, F[0..2] and Y.

1476	   Figure 11 shows a PACI packet in compliance with this memo; that is,
1477	   without any extensions.

1479	      0                   1                   2                   3
1480	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1481	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1482	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1483	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1484	      |        Payload Header Extension Structure (PHES)              |
1485	      |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
1486	      |                                                               |
1487	      |                  PACI payload: NAL unit                       |
1488	      |                   . . .                                       |
1489	      |                                                               |
1490	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1491	      |                               :...OPTIONAL RTP padding        |
1492	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

1494	                    Figure 11   The structure of a PACI

1496	   The fields in the payload header are set as follows.  The F bit MUST
1497	   be equal to 0.  The Type field MUST be equal to 50.  The value of
1498	   LayerId MUST be a copy of the LayerId field of the PACI payload NAL
1499	   unit or NAL-unit-like structure.  The value of TID MUST be a copy of
1500	   the TID field of the PACI payload NAL unit or NAL-unit-like
1501	   structure.

1503	   The semantics of other fields are as follows:

1505	   A: 1 bit
1506	      Copy of the F bit of the PACI payload NAL unit or NAL-unit-like
1507	      structure.

1509	   cType: 6 bits
1510	      Copy of the Type field of the PACI payload NAL unit or NAL-unit-
1511	      like structure.

1513	   PHSsize: 5 bits
1514	      Indicates the total length of the fields F[0..2], Y, and PHES.
1515	      The value is limited to be less than or equal to 32 octets, to
1516	      simplify encoder design for MTU size matching.

1518	   F0
1519	      This field equal to 1 specifies the presence of a temporal
1520	      scalability support extension in the PHES.

1522	   F1, F2
1523	      MUST be 0, available for future extensions, see section 4.9.2.

1525	   Y: 1 bit
1526	      MUST be 0, available for future extensions, see section 4.9.2.

1528	   PHES: variable number of octets
1529	      A variable number of octets as indicated by the value of PHSsize.

1531	   PACI Payload
1532	      The NAL unit or NAL-unit-like structure (such as: FU or AP) to be
1533	      carried, not including the first two octets.

1535	         Informative note: The first two octets of the NAL unit or NAL-
1536	         unit-like structure carried in the PACI payload are not
1537	         included in the PACI payload. Rather, the respective values
1538	         are copied in locations of the PayloadHdr of the RTP packet.
1539	         This design offers two advantages: first, the overall
1540	         structure of the payload header is preserved, i.e. there is no
1541	         special case of payload header structure that needs to be
1542	         implemented for PACI.  Second, no additional overhead is
1543	         introduced.

1545	      A PACI payload MAY be a single NAL unit, an FU, or an AP.  PACIs
1546	      MUST NOT be fragmented or aggregated.  The following subsection
1547	      documents the reasons for these design choices.

1549	4.9.1 Reasons for the PACI rules (informative)

1551	   A PACI cannot be fragmented.  If a PACI could be fragmented, and a
1552	   fragment other than the first fragment would get lost, access to the
1553	   information in the PACI would not be possible.  Therefore, a PACI
1554	   must not be fragmented.  In other words, an FU must not carry
1555	   (fragments of) a PACI.

1557	   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
1558	   from a compression viewpoint, as, in many cases, several to be
1559	   aggregated NAL units would share identical PACI fields and values
1560	   which would be carried redundantly for no reason.   Most, if not all
1561	   the practical effects of PACI aggregation can be achieved by
1562	   aggregating NAL units and bundling them with a PACI (see below).
1563	   Therefore, a PACI must not be aggregated.  In other words, an AP
1564	   must not contain a PACI.

1566	   The payload of a PACI can be a fragment.  Both middleboxes and
1567	   sending systems with inflexible (often hardware-based) encoders
1568	   occasionally find themselves in situations where a PACI and its
1569	   headers, combined, are larger than the MTU size.  In such a
1570	   scenario, the middlebox or sender can fragment the NAL unit and
1571	   encapsulate the fragment in a PACI.  Doing so preserves the payload
1572	   header extension information for all fragments, allowing downstream
1573	   middleboxes and the receiver to take advantage of that information.
1574	   Therefore, a sender may place a fragment into a PACI, and a receiver
1575	   must be able to handle such a PACI.

1577	   The payload of a PACI can be an aggregation NAL unit.  HEVC
1578	   bitstreams can contain unevenly sized and/or small (when compared to
1579	   the MTU size) NAL units.  In order to efficiently packetize such
1580	   small NAL units, AP were introduced.  The benefits of APs are
1581	   independent from the need for a payload header extension.
1582	   Therefore, a sender may place an AP into a PACI, and a receiver must
1583	   be able to handle such a PACI.

1585	4.9.2 PACI extensions (Informative)

1587	   This subsection includes recommendations for future specification
1588	   designers on how to extent the PACI syntax to accommodate future
1589	   extensions.  Obviously, designers are free to specify whatever
1590	   appears to be appropriate to them at the time of their design.
1591	   However, a lot of thought has been invested into the extension
1592	   mechanism described below, and we suggest that deviations from it
1593	   warrant a good explanation.

1595	   This memo defines only a single payload header extension (Temporal
1596	   Scalability Control Information, described below in section 4.10),
1597	   and, therefore, only the F0 bit carries semantics.  F1 and F2 are
1598	   already named (and not just marked as reserved, as a typical video
1599	   spec designer would do).  They are intended to signal two additional
1600	   extensions.  The Y bit allows to, recursively, add further F and Y
1601	   bits to extend the mechanism beyond 3 possible payload header
1602	   extensions.  It is suggested to define a new packet type (using a
1603	   different value for Type) when assigning the F1, F2, or Y bits
1604	   different semantics than what is suggested below.

1606	   When a Y bit is set, an 8 bit flag-extension is inserted after the Y
1607	   bit.  A flag-extension consists of 7 flags F[n..n+6], and another Y
1608	   bit.

1610	   The basic PACI header already includes F0, F1, and F2.  Therefore,
1611	   the Fx bits in the first flag-extensions are numbered F3, F4, ...,
1612	   F9, the F bits in the second flag-extension are numbered F10, F11,
1613	   ..., F16, and so forth.  As a result, at least 3 Fx bits are always
1614	   in the PACI, but the number of Fx bits (and associated types of
1615	   extensions), can be increased by setting the next Y bit and adding
1616	   an octet of flag-extensions, carrying 7 flags and another Y bit.
1617	   The size of this list of flags is subject to the limits specified in
1618	   section 4.9 (32 octets for all flag-extensions and the PHES
1619	   information combined).

1621	   Each of the F bits can indicate either the presence of information
1622	   in the Payload Header Extension Structure (PHES), described below,
1623	   or a given F bit can indicate a certain condition, without including
1624	   additional information in the PHES.

1626	   When a spec developer devises a new syntax that takes advantage of
1627	   the PACI extension mechanism, he/she must follow the constraints
1628	   listed below; otherwise the extension mechanism may break.

1630	     1) The fields added for a particular Fx bit MUST be fixed in
1631	        length and not depend on what other Fx bits are set (no parsing
1632	        dependency).
1633	     2) The Fx bits must be assigned in order.
1634	     3) An implementation that supports the n-th Fn bit for any value
1635	        of n must understand the syntax (though not necessarily the
1636	        semantics) of the fields Fk (with k < n), so to be able to
1637	        either use those bits when present, or at least be able to skip
1638	        over them.

1640	4.10 Temporal Scalability Control Information

1642	   This section describes the single payload header extension defined
1643	   in this specification, known as Temporal Scalability Control
1644	   Information (TSCI).  If, in the future, additional payload header
1645	   extensions become necessary, they could be specified in this section
1646	   of an updated version of this document, or in their own documents.

1648	   When F0 is set to 1 in a PACI, this specifies that the PHES field
1649	   includes the TSCI fields TL0REFIDX, IrapPicID, S, and E as follows:

1651	     0                   1                   2                   3
1652	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1653	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1654	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1655	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1656	      |   TL0REFIDX   |   IrapPicID   |S|E|RES|                       |
1657	      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1658	      |                           ....                                |
1659	      |               PACI payload: NAL unit                          |
1660	      |                                                               |
1661	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1662	      |                               :...OPTIONAL RTP padding        |
1663	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1665	     Figure 12   The structure of a PACI with a PHES containing a TSCI

1667	   TL0PICIDX (8 bits)
1668	      When present, the TL0PICIDX field MUST be set to equal to
1669	      temporal_sub_layer_zero_idx as specified in Section D.3.32 of
1670	      [H.265] for the access unit containing the NAL unit in the PACI.

1672	   IrapPicID (8 bits)
1673	      When present, the IrapPicID field MUST be set to equal to
1674	      irap_pic_id as specified in Section D.3.22 of [H.265] for the
1675	      access unit containing the NAL unit in the PACI.

1677	   S (1 bit)
1678	      The S bit MUST be set to 1 if any of the following conditions is
1679	      true and MUST be set to 0 otherwise:

1681	      . The NAL unit in the payload of the PACI is the first VCL NAL
1682	        unit, in decoding order, of a picture.
1683	      . The NAL unit in the payload of the PACI is an AP and the NAL
1684	        unit in the first contained aggregation unit is the first VCL
1685	        NAL unit, in decoding order, of a picture.
1686	      . The NAL unit in the payload of the PACI is an FU with its S bit
1687	        equal to 1 and the FU payload containing a fragment of the
1688	        first VCL NAL unit, in decoding order of a picture.

1690	   E (1 bit)
1691	      The E bit MUST be set to 1 if any of the following conditions is
1692	      true and MUST be set to 0 otherwise:

1694	      . The NAL unit in the payload of the PACI is the last VCL NAL
1695	        unit, in decoding order, of a picture.
1696	      . The NAL unit in the payload of the PACI is an AP and the NAL
1697	        unit in the last contained aggregation unit is the last VCL NAL
1698	        unit, in decoding order, of a picture.
1699	      . The NAL unit in the payload of the PACI is an FU with its E bit
1700	        equal to 1 and the FU payload containing a fragment of the last
1701	        VCL NAL unit, in decoding order of a picture.

1703	   RES (2 bits)
1704	      MUST be equal to 0.  Reserved for future extensions.

1706	   The value of PHSsize MUST be set to 3.  Receivers MUST allow other
1707	   values of the fields F0, F1, F2, Y, and PHSsize, and MUST ignore any
1708	   additional fields, when present, than specified above in the PHES.

1710	5. Packetization Rules

1712	   The following packetization rules apply:

1714	   o  If tx-mode is equal to "MST" or sprop-max-don-diff is greater
1715	      than 0 for an RTP stream, the transmission order of NAL units
1716	      carried in the RTP stream MAY be different than the NAL unit
1717	      decoding order.  Otherwise (tx-mode is equal to "SST" and sprop-
1718	      max-don-diff is equal to 0 for an RTP stream), the transmission
1719	      order of NAL units carried in the RTP stream MUST be the same as
1720	      the NAL unit decoding order.

1722	   o  A NAL unit of a small size SHOULD be encapsulated in an
1723	      aggregation packet together with one or more other NAL units in
1724	      order to avoid the unnecessary packetization overhead for small
1725	      NAL units.  For example, non-VCL NAL units such as access unit
1726	      delimiters, parameter sets, or SEI NAL units are typically small
1727	      and can often be aggregated with VCL NAL units without violating
1728	      MTU size constraints.

1730	   o  Each non-VCL NAL unit SHOULD, when possible from an MTU size
1731	      match viewpoint, be encapsulated in an aggregation packet
1732	      together with its associated VCL NAL unit, as typically a non-VCL
1733	      NAL unit would be meaningless without the associated VCL NAL unit
1734	      being available.

1736	   o  For carrying exactly one NAL unit in an RTP packet, a single NAL
1737	      unit packet MUST be used.

1739	6. De-packetization Process

1741	   The general concept behind de-packetization is to get the NAL units
1742	   out of the RTP packets in an RTP stream and all the dependent RTP
1743	   streams, if any, and pass them to the decoder in the NAL unit
1744	   decoding order.

1746	   The de-packetization process is implementation dependent.
1747	   Therefore, the following description should be seen as an example of
1748	   a suitable implementation.  Other schemes may be used as well as
1749	   long as the output for the same input is the same as the process
1750	   described below.  The output is the same when the set of output NAL
1751	   units and their order are both identical.  Optimizations relative to
1752	   the described algorithms are possible.

1754	   All normal RTP mechanisms related to buffer management apply.  In
1755	   particular, duplicated or outdated RTP packets (as indicated by the
1756	   RTP sequences number and the RTP timestamp) are removed.  To
1757	   determine the exact time for decoding, factors such as a possible
1758	   intentional delay to allow for proper inter-stream synchronization
1759	   must be factored in.

1761	   NAL units with NAL unit type values in the range of 0 to 47,
1762	   inclusive may be passed to the decoder.  NAL-unit-like structures
1763	   with NAL unit type values in the range of 48 to 63, inclusive, MUST
1764	   NOT be passed to the decoder.

1766	   The receiver includes a receiver buffer, which is used to compensate
1767	   for transmission delay jitter within individual RTP streams and
1768	   across RTP streams, to reorder NAL units from transmission order to
1769	   the NAL unit decoding order, and to recover the NAL unit decoding
1770	   order in MST, when applicable.  In this section, the receiver
1771	   operation is described under the assumption that there is no
1772	   transmission delay jitter within a packet stream and across RTP
1773	   streams.  To make a difference from a practical receiver buffer that
1774	   is also used for compensation of transmission delay jitter, the
1775	   receiver buffer is here after called the de-packetization buffer in
1776	   this section.  Receivers should also prepare for transmission delay
1777	   jitter; i.e. either reserve separate buffers for transmission delay
1778	   jitter buffering and de-packetization buffering or use a receiver
1779	   buffer for both transmission delay jitter and de-packetization.
1780	   Moreover, receivers should take transmission delay jitter into
1781	   account in the buffering operation; e.g. by additional initial
1782	   buffering before starting of decoding and playback.

1784	   If only one RTP stream is being received and sprop-max-don-diff of
1785	   the only RTP stream being received is equal to 0, the de-
1786	   packetization buffer size is zero bytes, i.e. the NAL units carried
1787	   in the RTP stream are directly passed to the decoder in their
1788	   transmission order, which is identical to the decoding order of the
1789	   NAL units. Otherwise, the process described in the remainder of this
1790	   section applies.

1792	   There are two buffering states in the receiver: initial buffering
1793	   and buffering while playing.  Initial buffering starts when the
1794	   reception is initialized.  After initial buffering, decoding and
1795	   playback are started, and the buffering-while-playing mode is used.

1797	   Regardless of the buffering state, the receiver stores incoming NAL
1798	   units, in reception order, into the de-packetization buffer.  NAL
1799	   units carried in RTP packets are stored in the de-packetization
1800	   buffer individually, and the value of AbsDon is calculated and
1801	   stored for each NAL unit.  When MST is in use, NAL units of all RTP
1802	   streams of a bitstream are stored in the same de-packetization
1803	   buffer.  When NAL units carried in any two RTP streams are available
1804	   to be placed into the de-packetization buffer, those NAL units
1805	   carried in the RTP stream that is lower in the dependency tree are
1806	   placed into the buffer first.  For example, if RTP stream A depends
1807	   on RTP stream B, then NAL units carried in RTP stream B are placed
1808	   into the buffer first.

1810	   Initial buffering lasts until condition A (the difference between
1811	   the greatest and smallest AbsDon values of the NAL units in the de-
1812	   packetization buffer is greater than or equal to the value of sprop-
1813	   max-don-diff of the highest RTP stream) or condition B (the number
1814	   of NAL units in the de-packetization buffer is greater than the
1815	   value of sprop-depack-buf-nalus) is true.

1817	   After initial buffering, whenever condition A or condition B is
1818	   true, the following operation is repeatedly applied until both
1819	   condition A and condition A become false:

1821	   o  The NAL unit in the de-packetization buffer with the smallest
1822	      value of AbsDon is removed from the de-packetization buffer and
1823	      passed to the decoder.

1825	   When no more NAL units are flowing into the de-packetization buffer,
1826	   all NAL units remaining in the de-packetization buffer are removed
1827	   from the buffer and passed to the decoder in the order of increasing
1828	   AbsDon values.

1830	7. Payload Format Parameters

1832	   This section specifies the parameters that MAY be used to select
1833	   optional features of the payload format and certain features or
1834	   properties of the bitstream or the RTP stream.  The parameters are
1835	   specified here as part of the media type registration for the HEVC
1836	   codec.  A mapping of the parameters into the Session Description
1837	   Protocol (SDP) [RFC4566] is also provided for applications that use
1838	   SDP.  Equivalent parameters could be defined elsewhere for use with
1839	   control protocols that do not use SDP.

1841	7.1 Media Type Registration

1843	   The media subtype for the HEVC codec is allocated from the IETF
1844	   tree.

1846	   The receiver MUST ignore any unrecognized parameter.

1848	   Media Type name:     video

1850	   Media subtype name:  H265

1852	   Required parameters: none

1854	   OPTIONAL parameters:

1856	      profile-space, profile-id:

1858	         The profile-space parameter indicates the context for
1859	         interpretation of the profile-id parameter value.  The
1860	         profile, which specifies the subset of coding tools that may
1861	         have been used to generate the bitstream or that the receiver
1862	         supports, as specified in [HEVC], is defined by the
1863	         combination of profile-space and profile-id.

1865	         The value of profile-space MUST be in the range of 0 to 3,
1866	         inclusive.  The value of profile-id MUST be in the range of 0
1867	         to 31, inclusive.

1869	         If the profile-space and profile-id parameters are used to
1870	         indicate properties of a bitstream, it indicates that, to
1871	         decode the bitstream, the minimum subset of coding tools a
1872	         decoder has to support is the profile specified by both
1873	         parameters.

1875	         If the profile-space and profile-id parameters are used for
1876	         capability exchange or session setup, it indicates the subset
1877	         of coding tools, which is equal to the profile, that the codec
1878	         supports for both receiving and sending.

1880	         If no profile-space is present, a value of 0 MUST be inferred
1881	         and if no profile-id is present the Main profile (i.e. a value
1882	         of 1) MUST be inferred.

1884	         When used to indicate properties of a bitstream, the profile-
1885	         space and profile-id parameters are derived from the SPS or
1886	         VPS NAL units as follows, where general_profile_space,
1887	         general_profile_idc, sub_layer_profile_space[j], and
1888	         sub_layer_profile_idc[j] are specified in [HEVC].

1890	            If the RTP stream is the highest RTP stream, the following
1891	            applies:

1893	            o profile_space = general_profile_space
1894	            o profile_id = general_profile_idc

1896	            Otherwise (the RTP stream is a dependent RTP stream), the
1897	            following applies, with j being the value of the sprop-sub-
1898	            layer-id parameter:

1900	            o profile_space = sub_layer_profile_space[j]
1901	            o profile_id = sub_layer_profile_idc[j]

1903	      tier-flag, level-id:

1905	         The tier-flag parameter indicates the context for
1906	         interpretation of the level-id value.  The default level,
1907	         which limits values of syntax elements or on arithmetic
1908	         combinations of values of syntax elements, as specified in

1910	         [HEVC], is defined by the combination of tier-flag and level-
1911	         id.

1913	         The value of tier-flag MUST be in the range of 0 to 1,
1914	         inclusive.  The value of level-id MUST be in the range of 0
1915	         to 255, inclusive.

1917	         If the tier-flag and level-id parameters are used to indicate
1918	         properties of a bitstream, it indicates that, to decode the
1919	         bitstream the lowest level the decoder has to support is the
1920	         default level.

1922	         If the tier-flag and level-id parameters are used for
1923	         capability exchange or session setup, the following applies.
1924	         If max-recv-level-id is not present, the default level defined
1925	         by tier-flag and level-id indicates the highest level the
1926	         codec wishes to support.  Otherwise, tier-flag and max-recv-
1927	         level-id indicate the highest level the codec supports for
1928	         receiving.  For either receiving or sending, all levels that
1929	         are lower than the highest level supported MUST also be
1930	         supported.

1932	         If no tier-flag is present, a value of 0 MUST be inferred and
1933	         if no level-id is present, a value of 93 (i.e. level 3.1) MUST
1934	         be inferred.

1936	         When used to indicate properties of a bitstream, the tier-flag
1937	         and level-id parameters are derived from the SPS or VPS NAL
1938	         units as follows, where general_tier_flag, general_level_idc,
1939	         sub_layer_tier_flag[j], and sub_layer_level_idc[j] are
1940	         specified in [HEVC].

1942	            If the RTP stream is the highest RTP stream, the following
1943	            applies:

1945	            o tier-flag = general_tier_flag
1946	            o level-id = general_level_idc

1948	            Otherwise (the RTP stream is a dependent RTP stream), the
1949	            following applies, with j being the value of the sprop-sub-
1950	            layer-id parameter:

1952	            o tier-flag = sub_layer_tier_flag[j]
1953	            o level-id = sub_layer_level_idc[j]

1955	      interop-constraints:

1957	         A base16 [RFC4648] (hexadecimal) representation of six bytes
1958	         of data, consisting of progressive_source_flag,
1959	         interlaced_source_flag, non_packed_constraint_flag,
1960	         frame_only_constraint_flag, and reserved_zero_44bits.

1962	         If the interop-constraints parameter is not present, the
1963	         following MUST be inferred:

1965	            o progressive_source_flag = 1
1966	            o interlaced_source_flag = 0
1967	            o non_packed_constraint_flag = 1
1968	            o frame_only_constraint_flag = 1
1969	            o reserved_zero_44bits = 0

1971	         When the interop-constraints parameter is used to indicate
1972	         properties of a bitstream, the following applies, where
1973	         general_progressive_source_flag,
1974	         general_interlaced_source_flag,
1975	         general_non_packed_constraint_flag,
1976	         general_non_packed_constraint_flag,
1977	         general_frame_only_constraint_flag,
1978	         general_reserved_zero_44bits,
1979	         sub_layer_progressive_source_flag[j],
1980	         sub_layer_interlaced_source_flag[j],
1981	         sub_layer_non_packed_constraint_flag[j],
1982	         sub_layer_frame_only_constraint_flag[j], and
1983	         sub_layer_reserved_zero_44bits[j] are specified in [HEVC].

1985	            If the RTP stream is the highest RTP stream, the following
1986	            applies:

1988	            o progressive_source_flag = general_progressive_source_flag
1989	            o interlaced_source_flag = general_interlaced_source_flag
1990	            o non_packed_constraint_flag =
1991	                              general_non_packed_constraint_flag
1992	            o frame_only_constraint_flag =
1993	                              general_frame_only_constraint_flag
1994	            o reserved_zero_44bits = general_reserved_zero_44bits

1996	            Otherwise (the RTP stream is a dependent RTP stream), the
1997	            following applies, with j being the value of the sprop-sub-
1998	            layer-id parameter:

2000	            o progressive_source_flag =
2001	                              sub_layer_progressive_source_flag[j]
2002	            o interlaced_source_flag =
2003	                              sub_layer_interlaced_source_flag[j]
2004	            o non_packed_constraint_flag =
2005	                              sub_layer_non_packed_constraint_flag[j]
2006	            o frame_only_constraint_flag =
2007	                              sub_layer_frame_only_constraint_flag[j]
2008	            o reserved_zero_44bits = sub_layer_reserved_zero_44bits[j]

2010	         When the interop-constraints parameter is used for capability
2011	         exchange or session setup, for both the sent bitstream, when
2012	         present, and the received bitstream, when present, the values
2013	         of general_progressive_source_flag,
2014	         general_interlaced_source_flag,
2015	         general_non_packed_constraint_flag,
2016	         general_frame_only_constraint_flag, and
2017	         general_reserved_zero_44bits in the SPS or VPS NAL units MUST
2018	         be equal to progressive_source_flag, interlaced_source_flag,
2019	         non_packed_constraint_flag, frame_only_constraint_flag, and
2020	         reserved_zero_44bits, respectively, and for any value of j,
2021	         the values of sub_layer_progressive_source_flag[j],
2022	         sub_layer_interlaced_source_flag[j],
2023	         sub_layer_non_packed_constraint_flag[j],
2024	         sub_layer_frame_only_constraint_flag[j], and
2025	         sub_layer_reserved_zero_44bits[j] in the SPS or VPS NAL units
2026	         MUST be equal to progressive_source_flag,
2027	         interlaced_source_flag, non_packed_constraint_flag,
2028	         frame_only_constraint_flag, and reserved_zero_44bits,
2029	         respectively.

2031	      profile-compatibility-indicator:

2033	         A base16 [RFC4648] representation of the four bytes
2034	         representing the 32 profile compatibility flags in the SPS or
2035	         VPS NAL units.  A decoder conforming to a certain profile may
2036	         be able to decode bitstreams conforming to other profiles.
2037	         The profile-compatibility-indicator provides exact information
2038	         of the ability of a decoder conforming to a certain profile to
2039	         decode bitstreams conforming to another profile.  More
2040	         concretely, if the profile compatibility flag corresponding to
2041	         the profile a decoder conforms to is set, then the decoder is
2042	         able to decode any bitstream with the flag set, irrespective
2043	         of the profile the bitstream conforms to (provided that the
2044	         decoder supports the highest level of the bitstream).

2046	         When profile-compatibility-indicator is used to indicate
2047	         properties of a bitstream, the following applies, where
2048	         general_profile_compatibility_flag[j] and
2049	         sub_layer_profile_compatibility_flag[i][j] are specified in
2050	         [HEVC].

2052	            If the RTP stream is the highest RTP stream, the following
2053	            applies with j = 0..31:

2055	            o The 32 flags = general_profile_compatibility_flag[j]

2057	            Otherwise (the RTP stream is a dependent RTP stream), the
2058	            following applies with i being the value of the sprop-sub-
2059	            layer-id parameter and j = 0..31:

2061	            o The 32 flags = sub_layer_profile_compatibility_flag[i][j]

2063	         When profile-compatibility-indicator is used for capability
2064	         exchange or session setup, the values of
2065	         general_profile_compatibility_flag[j] with j = 0..31 MUST be
2066	         equal to bits 0 to 31, inclusive, of profile-compatibility-
2067	         indicator, respectively, and for any value of i, the values of
2068	         sub_layer_profile_compatibility_flag[i][j] with j = 0..31 MUST
2069	         be equal to bits 0 to 31, inclusive, of profile-compatibility-
2070	         indicator, respectively.

2072	      sprop-sub-layer-id:

2074	         This parameter MAY be used to indicate the highest allowed
2075	         value of TID in the bitstream.  When not present, the value of
2076	         sprop-sub-layer-id is inferred to be equal to 6.

2078	         The value of sprop-sub-layer-id MUST be in the range of 0
2079	         to 6, inclusive.

2081	      recv-sub-layer-id:

2083	         This parameter MAY be used to signal a receiver's choice of
2084	         the offered or declared sub-layers in the sprop-vps.  The
2085	         value of recv-sub-layer-id indicates the TID of the highest
2086	         sub-layer of the bitstream that a receiver supports.  When not
2087	         present, the value of recv-sub-layer-id is inferred to be
2088	         equal to sprop-sub-layer-id.

2090	         The value of recv-sub-layer-id MUST be in the range of 0 to 6,
2091	         inclusive.

2093	      max-recv-level-id:

2095	         This parameter MAY be used, together with tier-flag, to
2096	         indicate the highest level a receiver supports.  The highest
2097	         level the receiver supports is equal to the value of max-recv-
2098	         level-id divided by 30 for the Main or High tier (as
2099	         determined by tier-flag equal to 0 or 1, respectively).

2101	         The value of max-recv-level-id MUST be in the range of 0
2102	         to 255, inclusive.

2104	         When max-recv-level-id is not present, the value is inferred
2105	         to be equal to level-id.

2107	         max-recv-level-id MUST NOT be present when the highest level
2108	         the receiver supports is not higher than the default level.

2110	      tx-mode:

2112	         This parameter indicates whether the transmission mode is SST
2113	         or MST.

2115	         The value of tx-mode MUST be equal to either "MST" or "SST".
2116	         When not present, the value of tx-mode is inferred to be equal
2117	         to "SST".

2119	         If the value is equal to "MST", MST MUST be in use.  Otherwise
2120	         (the value is equal to "SST"), SST MUST be in use.

2122	         The value of tx-mode MUST be equal to "MST" for all RTP
2123	         sessions in an MST.

2125	      sprop-vps:

2127	         This parameter MAY be used to convey any video parameter set
2128	         NAL unit of the bitstream.  When present, the parameter MAY be
2129	         used to indicate codec capability and sub-stream
2130	         characteristics (i.e. properties of sub-layer representations
2131	         as defined in [HEVC]) as well as for out-of-band transmission
2132	         of video parameter sets.  The value of the parameter is a
2133	         comma-separated (',') list of base64 [RFC4648] representations
2134	         of the video parameter set NAL units as specified in Section
2135	         7.3.2.1 of [HEVC].

2137	      sprop-sps:

2139	         This parameter MAY be used to convey sequence parameter set
2140	         NAL units of the bitstream for out-of-band transmission of
2141	         sequence parameter sets.  The value of the parameter is a
2142	         comma-separated (',') list of base64 [RFC4648] representations
2143	         of the sequence parameter set NAL units as specified in
2144	         Section 7.3.2.2 of [HEVC].

2146	      sprop-pps:

2148	         This parameter MAY be used to convey picture parameter set NAL
2149	         units of the bitstream for out-of-band transmission of picture
2150	         parameter sets.  The value of the parameter is a comma-
2151	         separated (',') list of base64 [RFC4648] representations of
2152	         the picture parameter set NAL units as specified in Section
2153	         7.3.2.3 of [HEVC].

2155	      sprop-sei:

2157	         This parameter MAY be used to convey one or more SEI messages
2158	         that describe bitstream characteristics.  When present, a
2159	         decoder can rely on the bitstream characteristics that are
2160	         described in the SEI messages for the entire duration of the
2161	         session, independently from the persistence scopes of the SEI
2162	         messages as specified in [HEVC].

2164	         The value of the parameter is a comma-separated (',') list of
2165	         base64 [RFC4648] representations of SEI NAL units as specified
2166	         in Section 7.3.2.4 of [HEVC].

2168	            Informative note: Intentionally, no list of applicable or
2169	            inapplicable SEI messages is specified here.  Conveying
2170	            certain SEI messages in sprop-sei may be sensible in some
2171	            application scenarios and meaningless in others.  However,
2172	            a few examples are described below:

2174	           1) In an environment where the encoded bitstream was
2175	               created from film-based source material, and no splicing
2176	               is going to occur during the lifetime of the session,
2177	               the film grain characteristics SEI message or the tone
2178	               mapping information SEI message are likely meaningful,
2179	               and sending them in sprop-sei rather than in the
2180	               bitstream at each entry point may help saving bits and
2181	               allows to configure the renderer only once, avoiding
2182	               unwanted artifacts.
2183	           2) The structure of pictures information SEI message in
2184	               sprop-sei can be used to inform a decoder of information
2185	               on the NAL unit types, picture order count values, and
2186	               prediction dependencies of a sequence of pictures.
2187	               Having such knowledge can be helpful for error recovery.
2188	           3) Examples for SEI messages that would be meaningless to
2189	               be conveyed in sprop-sei include the decoded picture
2190	               hash SEI message (it is close to impossible that all
2191	               decoded pictures have the same hash-tag), the display
2192	               orientation SEI message when the device is a handheld
2193	               device (as the display orientation may change when the
2194	               handheld device is turned around), or the filler payload
2195	               SEI message (as there is no point in just having more
2196	               bits in SDP).

2198	      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

2200	         These parameters MAY be used to signal the capabilities of a
2201	         receiver implementation.  These parameters MUST NOT be used
2202	         for any other purpose.  The highest level (specified by tier-
2203	         flag and max-recv-level-id) MUST be such that the receiver is
2204	         fully capable of supporting.  max-lsr, max-lps, max-cpb, max-
2205	         dpb, max-br, max-tr, and max-tc MAY be used to indicate
2206	         capabilities of the receiver that extend the required
2207	         capabilities of the highest level, as specified below.

2209	         When more than one parameter from the set (max-lsr, max-lps,
2210	         max-cpb, max-dpb, max-br, max-tr, max-tc) is present, the
2211	         receiver MUST support all signaled capabilities
2212	         simultaneously.  For example, if both max-lsr and max-br are
2213	         present, the highest level with the extension of both the
2214	         picture rate and bitrate is supported.  That is, the receiver
2215	         is able to decode bitstreams in which the luma sample rate is
2216	         up to max-lsr (inclusive), the bitrate is up to max-br
2217	         (inclusive), the coded picture buffer size is derived as
2218	         specified in the semantics of the max-br parameter below, and
2219	         the other properties comply with the highest level specified
2220	         by tier-flag and max-recv-level-id.

2222	            Informative note: When the OPTIONAL media type parameters
2223	            are used to signal the properties of a bitstream, and max-
2224	            lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, and max-tc
2225	            are not present, the values of profile-space, profile-id,
2226	            tier-flag, and level-id must always be such that the
2227	            bitstream complies fully with the specified profile and
2228	            level.

2230	      max-lsr:
2231	         The value of max-lsr is an integer indicating the maximum
2232	         processing rate in units of luma samples per second.  The max-
2233	         lsr parameter signals that the receiver is capable of decoding
2234	         video at a higher rate than is required by the highest level.

2236	         When max-lsr is signaled, the receiver MUST be able to decode
2237	         bitstreams that conform to the highest level, with the
2238	         exception that the MaxLumaSR value in Table A-2 of [HEVC] for
2239	         the highest level is replaced with the value of max-lsr.
2240	         Senders MAY use this knowledge to send pictures of a given
2241	         size at a higher picture rate than is indicated in the highest
2242	         level.

2244	         When not present, the value of max-lsr is inferred to be equal
2245	         to the value of MaxLumaSR given in Table A-2 of [HEVC] for the
2246	         highest level.

2248	         The value of max-lsr MUST be in the range of MaxLumaSR to
2249	         16 * MaxLumaSR, inclusive, where MaxLumaSR is given in Table
2250	         A-2 of [HEVC] for the highest level.

2252	      max-lps:
2253	         The value of max-lps is an integer indicating the maximum
2254	         picture size in units of luma samples.  The max-lps parameter
2255	         signals that the receiver is capable of decoding larger
2256	         picture sizes than are required by the highest level.  When
2257	         max-lps is signaled, the receiver MUST be able to decode
2258	         bitstreams that conform to the highest level, with the
2259	         exception that the MaxLumaPS value in Table A-1 of [HEVC] for
2260	         the highest level is replaced with the value of max-lps.
2261	         Senders MAY use this knowledge to send larger pictures at a
2262	         proportionally lower picture rate than is indicated in the
2263	         highest level.

2265	         When not present, the value of max-lps is inferred to be equal
2266	         to the value of MaxLumaPS given in Table A-1 of [HEVC] for the
2267	         highest level.

2269	         The value of max-lps MUST be in the range of MaxLumaPS to
2270	         16 * MaxLumaPS, inclusive, where MaxLumaPS is given in Table
2271	         A-1 of [HEVC] for the highest level.

2273	      max-cpb:
2274	         The value of max-cpb is an integer indicating the maximum
2275	         coded picture buffer size in units of CpbBrVclFactor bits for
2276	         the VCL HRD parameters and in units of CpbBrNalFactor bits for
2277	         the NAL HRD parameters, where CpbBrVclFactor and
2278	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The max-
2279	         cpb parameter signals that the receiver has more memory than
2280	         the minimum amount of coded picture buffer memory required by
2281	         the highest level.  When max-cpb is signaled, the receiver
2282	         MUST be able to decode bitstreams that conform to the highest
2283	         level, with the exception that the MaxCPB value in Table A-1
2284	         of [HEVC] for the highest level is replaced with the value of
2285	         max-cpb.  Senders MAY use this knowledge to construct coded
2286	         bitstreams with greater variation of bitrate than can be
2287	         achieved with the MaxCPB value in Table A-1 of [HEVC].

2289	         When not present, the value of max-cpb is inferred to be equal
2290	         to the value of MaxCPB given in Table A-1 of [HEVC] for the
2291	         highest level.

2293	         The value of max-cpb MUST be in the range of MaxCPB to
2294	         16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table A-1
2295	         of [HEVC] for the highest level.

2297	            Informative note: The coded picture buffer is used in the
2298	            hypothetical reference decoder (Annex C of HEVC).  The use
2299	            of the hypothetical reference decoder is recommended in
2300	            HEVC encoders to verify that the produced bitstream
2301	            conforms to the standard and to control the output bitrate.
2302	            Thus, the coded picture buffer is conceptually independent
2303	            of any other potential buffers in the receiver, including
2304	            de-packetization and de-jitter buffers.  The coded picture
2305	            buffer need not be implemented in decoders as specified in
2306	            Annex C of HEVC, but rather standard-compliant decoders can
2307	            have any buffering arrangements provided that they can
2308	            decode standard-compliant bitstreams.  Thus, in practice,
2309	            the input buffer for a video decoder can be integrated with
2310	            de-packetization and de-jitter buffers of the receiver.

2312	      max-dpb:
2313	         The value of max-dpb is an integer indicating the maximum
2314	         decoded picture buffer size in units decoded pictures at the
2315	         MaxLumaPS for the highest level, i.e. the number of decoded
2316	         pictures at the maximum picture size defined by the highest
2317	         level.  The value of max-dpb MUST be in the range of 1 to 16,
2318	         respectively.  The max-dpb parameter signals that the receiver
2319	         has more memory than the minimum amount of decoded picture
2320	         buffer memory required by default, which is MaxDpbPicBuf as
2321	         defined in [HEVC] (equal to 6).  When max-dpb is signaled, the
2322	         receiver MUST be able to decode bitstreams that conform to the
2323	         highest level, with the exception that the MaxDpbPicBuff value
2324	         defined in [HEVC] as 6 is replaced with the value of max-dpb.
2325	         Consequently, a receiver that signals max-dpb MUST be capable
2326	         of storing the following number of decoded pictures
2327	         (MaxDpbSize) in its decoded picture buffer:

2329	                          if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
2330	              MaxDpbSize = Min( 4 * max-dpb, 16 )
2331	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
2332	              MaxDpbSize = Min( 2 * max-dpb, 16 )
2333	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2 ) )
2334	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
2335	           else
2336	              MaxDpbSize = max-dpb

2338	                        Wherein MaxLumaPS given in Table A-1 of [HEVC] for the highest
2339	         level and PicSizeInSamplesY is the current size of each
2340	         decoded picture in units of luma samples as defined in [HEVC].

2342	                        The value of max-dpb MUST be greater than or equal to the
2343	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].  Senders
2344	         MAY use this knowledge to construct coded bitstreams with
2345	         improved compression.

2347	                        When not present, the value of max-dpb is inferred to be equal
2348	         to the value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].

2350	            Informative note: This parameter was added primarily to
2351	            complement a similar codepoint in the ITU-T Recommendation
2352	            H.245, so as to facilitate signaling gateway designs.  The
2353	            decoded picture buffer stores reconstructed samples.  There
2354	            is no relationship between the size of the decoded picture
2355	            buffer and the buffers used in RTP, especially de-
2356	            packetization and de-jitter buffers.

2358	      max-br:
2359	         The value of max-br is an integer indicating the maximum video
2360	         bitrate in units of CpbBrVclFactor bits per second for the VCL
2361	         HRD parameters and in units of CpbBrNalFactor bits per second
2362	         for the NAL HRD parameters, where CpbBrVclFactor and
2363	         CpbBrNalFactor are defined in Section A.4 of [HEVC].

2365	         The max-br parameter signals that the video decoder of the
2366	         receiver is capable of decoding video at a higher bitrate than
2367	         is required by the highest level.

2369	         When max-br is signaled, the video codec of the receiver MUST
2370	         be able to decode bitstreams that conform to the highest
2371	         level, with the following exceptions in the limits specified
2372	         by the highest level:

2374	          o The value of max-br replaces the MaxBR value in Table A-2
2375	            of [HEVC] for the highest level.
2376	          o When the max-cpb parameter is not present, the result of
2377	            the following formula replaces the value of MaxCPB in Table
2378	            A-1 of [HEVC]:

2380	               (MaxCPB of the highest level) * max-br / (MaxBR of the
2381	               highest level)

2383	         For example, if a receiver signals capability for Main profile
2384	         Level 2 with max-br equal to 2000, this indicates a maximum
2385	         video bitrate of 2000 kbits/sec for VCL HRD parameters, a
2386	         maximum video bitrate of 2200 kbits/sec for NAL HRD
2387	         parameters, and a CPB size of 2000000 bits (2000000 / 1500000
2388	         * 1500000).

2390	         Senders MAY use this knowledge to send higher bitrate video as
2391	         allowed in the level definition of Annex A of HEVC to achieve
2392	         improved video quality.

2394	         When not present, the value of max-br is inferred to be equal
2395	         to the value of MaxBR given in Table A-2 of [HEVC] for the
2396	         highest level.

2398	         The value of max-br MUST be in the range of MaxBR to
2399	         16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of
2400	         [HEVC] for the highest level.

2402	            Informative note: This parameter was added primarily to
2403	            complement a similar codepoint in the ITU-T Recommendation
2404	            H.245, so as to facilitate signaling gateway designs.  The
2405	            assumption that the network is capable of handling such
2406	            bitrates at any given time cannot be made from the value of
2407	            this parameter.  In particular, no conclusion can be drawn
2408	            that the signaled bitrate is possible under congestion
2409	            control constraints.

2411	      max-tr:
2412	         The value of max-tr is an integer indication the maximum
2413	         number of tile rows.  The max-tr parameter signals that the
2414	         receiver is capable of decoding video with a larger number of
2415	         tile rows than the value allowed by the highest level.

2417	         When max-tr is signaled, the receiver MUST be able to decode
2418	         bitstreams that conform to the highest level, with the
2419	         exception that the MaxTileRows value in Table A-1 of [HEVC]
2420	         for the highest level is replaced with the value of max-tr.

2422	         Senders MAY use this knowledge to send pictures utilizing a
2423	         larger number of tile rows than the value allowed by the
2424	         highest level.

2426	         When not present, the value of max-tr is inferred to be equal
2427	         to the value of MaxTileRows given in Table A-1 of [HEVC] for
2428	         the highest level.

2430	         The value of max-tr MUST be in the range of MaxTileRows to
2431	         16 * MaxTileRows, inclusive, where MaxTileRows is given in
2432	         Table A-1 of [HEVC] for the highest level.

2434	      max-tc:
2435	         The value of max-tc is an integer indication the maximum
2436	         number of tile columns.  The max-tc parameter signals that the
2437	         receiver is capable of decoding video with a larger number of
2438	         tile columns than the value allowed by the highest level.

2440	         When max-tc is signaled, the receiver MUST be able to decode
2441	         bitstreams that conform to the highest level, with the
2442	         exception that the MaxTileCols value in Table A-1 of [HEVC]
2443	         for the highest level is replaced with the value of max-tc.

2445	         Senders MAY use this knowledge to send pictures utilizing a
2446	         larger number of tile columns than the value allowed by the
2447	         highest level.

2449	         When not present, the value of max-tc is inferred to be equal
2450	         to the value of MaxTileCols given in Table A-1 of [HEVC] for
2451	         the highest level.

2453	         The value of max-tc MUST be in the range of MaxTileCols to
2454	         16 * MaxTileCols, inclusive, where MaxTileCols is given in
2455	         Table A-1 of [HEVC] for the highest level.

2457	      max-fps:

2459	         The value of max-fps is an integer indicating the maximum
2460	         picture rate in units of pictures per 100 seconds that can be
2461	         effectively processed by the receiver.  The max-fps parameter
2462	         MAY be used to signal that the receiver has a constraint in
2463	         that it is not capable of processing video effectively at the
2464	         full picture rate that is implied by the highest level and,
2465	         when present, one or more of the parameters max-lsr, max-lps,
2466	         and max-br.

2468	         The value of max-fps is not necessarily the picture rate at
2469	         which the maximum picture size can be sent, it constitutes a
2470	         constraint on maximum picture rate for all resolutions.

2472	            Informative note: The max-fps parameter is semantically
2473	            different from max-lsr, max-lps, max-cpb, max-dpb, max-br,
2474	            max-tr, and max-tc in that max-fps is used to signal a
2475	            constraint, lowering the maximum picture rate from what is
2476	            implied by other parameters.

2478	         The encoder SHOULD use a picture rate equal to or less than
2479	         this value.  An exception is when sending a pre-encoded
2480	         bitstream, in which case the picture rate may be greater than
2481	         the value of max-fps.  In cases where the max-fps parameter is
2482	         absent the encoder is free to choose any picture rate
2483	         according to the highest level and any signaled optional
2484	         parameters.

2486	         The value of max-fps MUST be smaller than or equal to the full
2487	         picture rate that is implied by the highest level and, when
2488	         present, one or more of the parameters max-lsr, max-lps, and
2489	         max-br.

2491	      sprop-max-don-diff:

2493	         The value of this parameter MUST be equal to 0, if the RTP
2494	         stream does not depend on other RTP streams and there is no
2495	         NAL unit naluA that is followed in transmission order by any
2496	         NAL unit preceding naluA in decoding order.  Otherwise, this
2497	         parameter specifies the maximum absolute difference between
2498	         the decoding order number (i.e., AbsDon) values of any two NAL
2499	         units naluA and naluB, where naluA follows naluB in decoding
2500	         order and precedes naluB in transmission order.

2502	         The value of sprop-max-don-diff MUST be an integer in the
2503	         range of 0 to 32767, inclusive.

2505	         When not present, the value of sprop-max-don-diff is inferred
2506	         to be equal to 0.

2508	         When the RTP stream depends on one or more other RTP streams
2509	         (in this case tx-mode MUST be equal to "MST" and MST is in
2510	         use), this parameter MUST be present and the value MUST be
2511	         greater than 0.

2513	            Informative note: When the RTP stream does not depend on
2514	            other RTP streams, either MST or SST may be in use.

2516	      sprop-depack-buf-nalus:

2518	         This parameter specifies the maximum number of NAL units that
2519	         precede a NAL unit in transmission order and follow the NAL
2520	         unit in decoding order.

2522	         The value of sprop-depack-buf-nalus MUST be an integer in the
2523	         range of 0 to 32767, inclusive.

2525	         When not present, the value of sprop-depack-buf-nalus is
2526	         inferred to be equal to 0.

2528	         When the RTP stream depends on one or more other RTP streams
2529	         (in this case tx-mode MUST be equal to "MST" and MST is in
2530	         use), this parameter MUST be present and the value MUST be
2531	         greater than 0.

2533	      sprop-depack-buf-bytes:

2535	         This parameter signals the required size of the de-
2536	         packetization buffer in units of bytes.  The value of the
2537	         parameter MUST be greater than or equal to the maximum buffer
2538	         occupancy (in units of bytes) of the de-packetization buffer
2539	         as specified in section 6.

2541	         The value of sprop-depack-buf-bytes MUST be an integer in the
2542	         range of 0 to 4294967295, inclusive.

2544	         When the RTP stream depends on one or more other RTP streams
2545	         (in this case tx-mode MUST be equal to "MST" and MST is in
2546	         use) or sprop-max-don-diff is present and greater than 0, this
2547	         parameter MUST be present and the value MUST be greater than
2548	         0.

2550	            Informative note: The value of sprop-depack-buf-bytes
2551	            indicates the required size of the de-packetization buffer
2552	            only.  When network jitter can occur, an appropriately
2553	            sized jitter buffer has to be available as well.

2555	      depack-buf-cap:

2557	         This parameter signals the capabilities of a receiver
2558	         implementation and indicates the amount of de-packetization
2559	         buffer space in units of bytes that the receiver has available
2560	         for reconstructing the NAL unit decoding order from NAL units
2561	         carried in one or more RTP streams.  A receiver is able to
2562	         handle any RTP stream, and its dependent RTP streams, when
2563	         present, for which the value of the sprop-depack-buf-bytes
2564	         parameter is smaller than or equal to this parameter.

2566	         When not present, the value of depack-buf-cap is inferred to
2567	         be equal to 4294967295.  The value of depack-buf-cap MUST be
2568	         an integer in the range of 1 to 4294967295, inclusive.

2570	            Informative note: depack-buf-cap indicates the maximum
2571	            possible size of the de-packetization buffer of the
2572	            receiver only.  When network jitter can occur, an
2573	            appropriately sized jitter buffer has to be available as
2574	            well.

2576	      sprop-segmentation-id:

2578	         This parameter MAY be used to signal the segmentation tools
2579	         present in the bitstream and that can be used for
2580	         parallelization.  The value of sprop-segmentation-id MUST be
2581	         an integer in the range of 0 to 3, inclusive.  When not
2582	         present, the value of sprop-segmentation-id is inferred to be
2583	         equal to 0.

2585	         When sprop-segmentation-id is equal to 0, no information about
2586	         the segmentation tools is provided.  When sprop-segmentation-
2587	         id is equal to 1, it indicates that slices are present in the
2588	         bitstream.  When sprop-segmentation-id is equal to 2, it
2589	         indicates that tiles are present in the bitstream.  When
2590	         sprop-segmentation-id is equal to 3, it indicates that WPP is
2591	         used in the bitstream.

2593	      sprop-spatial-segmentation-idc:

2595	         A base16 [RFC4648] representation of the syntax element
2596	         min_spatial_segmentation_idc as specified in [HEVC].  This
2597	         parameter MAY be used to describe parallelization capabilities
2598	         of the bitstream.

2600	      dec-parallel-cap:

2602	         This parameter MAY be used to indicate the decoder's
2603	         additional decoding capabilities given the presence of tools
2604	         enabling parallel decoding, such as slices, tiles, and WPP, in
2605	         the bitstream.  The decoding capability of the decoder may
2606	         vary with the setting of the parallel decoding tools present
2607	         in the bitstream, e.g. the size of the tiles that are present
2608	         in a bitstream.  Therefore, multiple capability points may be
2609	         provided, each indicating the minimum required decoding
2610	         capability that is associated with a parallelism requirement,
2611	         which is a requirement on the bitstream that enables parallel
2612	         decoding.

2614	         Each capability point is defined as a combination of 1) a
2615	         parallelism requirement, 2) a profile (determined by profile-
2616	         space and profile-id), 3) a highest level, and 4) a maximum
2617	         processing rate, a maximum picture size, and a maximum video
2618	         bitrate that may be equal to or greater than that determined
2619	         by the highest level.  The parameter's syntax in ABNF
2620	         [RFC5234] is as follows:

2622	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2623	                               cap-point) "}"

2625	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2626	                         cap-parameter)

2628	            spatial-seg-idc = 1*4DIGIT ; (1-4095)

2630	            cap-parameter = tier-flag / level-id / max-lsr
2631	                            / max-lps / max-br

2633	            tier-flag = "tier-flag" EQ ("0" / "1")

2635	            level-id  = "level-id" EQ 1*3DIGIT ; (0-255)

2637	            max-lsr   = "max-lsr" EQ  1*20DIGIT ; (0-
2638	            18,446,744,073,709,551,615)

2640	            max-lps   = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)

2642	            max-br    = "max-br"  EQ 1*20DIGIT ; (0-
2643	            18,446,744,073,709,551,615)

2645	            EQ = "="

2647	         The set of capability points expressed by the dec-parallel-cap
2648	         parameter is enclosed in a pair of curly braces ("{}").  Each
2649	         set of two consecutive capability points is separated by a
2650	         comma (',').  Within each capability point, each set of two
2651	         consecutive parameters, and when present, their values, is
2652	         separated by a semicolon (';').

2654	         The profile of all capability points is determined by profile-
2655	         space and profile-id that are outside the dec-parallel-cap
2656	         parameter.

2658	         Each capability point starts with an indication of the
2659	         parallelism requirement, which consists of a parallel tool
2660	         type, which may be equal to 'w' or 't', and a decimal value of
2661	         the spatial-seg-idc parameter.  When the type is 'w', the
2662	         capability point is valid only for H.265 bitstreams with WPP
2663	         in use, i.e. entropy_coding_sync_enabled_flag equal to 1.
2664	         When the type is 't', the capability point is valid only for
2665	         H.265 bitstreams with WPP not in use (i.e.
2666	         entropy_coding_sync_enabled_flag equal to 0).  The capability-
2667	         point is valid only for H.265 bitstreams with
2668	         min_spatial_segmentation_idc equal to or greater than spatial-
2669	         seg-idc.

2671	         After the parallelism requirement indication, each capability
2672	         point continues with one or more pairs of parameter and value
2673	         in any order for any of the following parameters:

2675	            o tier-flag
2676	            o level-id
2677	            o max-lsr
2678	            o max-lps
2679	            o max-br

2681	         At most one occurrence of each of the above five parameters is
2682	         allowed within each capability point.

2684	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2685	         cap.level-id for a capability point indicate the highest level
2686	         of the capability point.  The values of dec-parallel-cap.max-
2687	         lsr, dec-parallel-cap.max-lps, and dec-parallel-cap.max-br for
2688	         a capability point indicate the maximum processing rate in
2689	         units of luma samples per second, the maximum picture size in
2690	         units of luma samples, and the maximum video bitrate (in units
2691	         of CpbBrVclFactor bits per second for the VCL HRD parameters
2692	         and in units of CpbBrNalFactor bits per second for the NAL HRD
2693	         parameters where CpbBrVclFactor and CpbBrNalFactor are defined
2694	         in Section A.4 of [HEVC]).

2696	         When not present, the value of dec-parallel-cap.tier-flag is
2697	         inferred to be equal to the value of tier-flag outside the
2698	         dec-parallel-cap parameter.  When not present, the value of
2699	         dec-parallel-cap.level-id is inferred to be equal to the value
2700	         of max-recv-level-id outside the dec-parallel-cap parameter.
2701	         When not present, the value of dec-parallel-cap.max-lsr, dec-
2702	         parallel-cap.max-lps, or dec-parallel-cap.max-br is inferred
2703	         to be equal to the value of max-lsr, max-lps, or max-br,
2704	         respectively, outside the dec-parallel-cap parameter.

2706	         The general decoding capability, expressed by the set of
2707	         parameters outside of dec-parallel-cap, is defined as the
2708	         capability point that is determined by the following
2709	         combination of parameters: 1) the parallelism requirement
2710	         corresponding to the value of sprop-segmentation-id equal to 0
2711	         for a bitstream, 2) the profile determined by profile-space
2712	         and profile-id, 3) the highest level determined by tier-flag
2713	         and max-recv-level-id, and 4) the maximum processing rate, the
2714	         maximum picture size, and the maximum video bitrate determined
2715	         by the highest level.  The general decoding capability MUST
2716	         NOT be included as one of the set of capability points in the
2717	         dec-parallel-cap parameter.

2719	         For example, the following parameters express the general
2720	         decoding capability of 720p30 (Level 3.1) plus an additional
2721	         decoding capability of 1080p30 (Level 4) given that the
2722	         spatially largest tile or slice used in the bitstream is equal
2723	         to or less than 1/3 of the picture size:

2725	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-id=120}

2727	         For another example, the following parameters express an
2728	         additional decoding capability of 1080p30, using dec-parallel-
2729	         cap.max-lsr and dec-parallel-cap.max-lps, given that WPP is
2730	         used in the bitstream:

2732	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2733	                        max-lsr=62668800;max-lps=2088960}

2735	            Informative note: When min_spatial_segmentation_idc is
2736	            present in a bitstream and WPP is not used, [HEVC]
2737	            specifies that there is no slice or no tile in the
2738	            bitstream containing more than 4 * PicSizeInSamplesY /
2739	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2741	      Encoding considerations:

2743	         This type is only defined for transfer via RTP (RFC 3550).

2745	      Security considerations:

2747	         See Section 9 of RFC XXXX.

2749	      Public specification:

2751	         Please refer to Section 13 of RFC XXXX.

2753	      Additional information: None

2755	      File extensions: none

2757	      Macintosh file type code: none

2759	      Object identifier or OID: none

2761	      Person & email address to contact for further information:

2763	      Intended usage: COMMON

2765	      Author: See Section 14 of RFC XXXX.

2767	      Change controller:

2769	         IETF Audio/Video Transport Payloads working group delegated
2770	         from the IESG.

2772	7.2 SDP Parameters

2774	   The receiver MUST ignore any parameter unspecified in this memo.

2776	7.2.1 Mapping of Payload Type Parameters to SDP

2778	   The media type video/H265 string is mapped to fields in the Session
2779	   Description Protocol (SDP) [RFC4566] as follows:

2781	   o  The media name in the "m=" line of SDP MUST be video.

2783	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265 (the
2784	      media subtype).

2786	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2788	   o  The OPTIONAL parameters "profile-space", "profile-id", "tier-
2789	      flag", "level-id", "interop-constraints", "profile-compatibility-
2790	      indicator", "sprop-sub-layer-id", "recv-sub-layer-id", "max-recv-
2791	      level-id", "tx-mode", "max-lsr", "max-lps", "max-cpb", "max-dpb",
2792	      "max-br", "max-tr", "max-tc", "max-fps", "sprop-max-don-diff",
2793	      "sprop-depack-buf-nalus", "sprop-depack-buf-bytes", "depack-buf-
2794	      cap", "sprop-segmentation-id", "sprop-spatial-segmentation-idc",
2795	      and "dec-parallel-cap", when present, MUST be included in the
2796	      "a=fmtp" line of SDP.  This parameter is expressed as a media
2797	      type string, in the form of a semicolon separated list of
2798	      parameter=value pairs.

2800	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2801	      pps", when present, MUST be included in the "a=fmtp" line of SDP
2802	      or conveyed using the "fmtp" source attribute as specified in
2803	      section 6.3 of [RFC5576].  For a particular media format (i.e.
2804	      RTP payload type), "sprop-vps" "sprop-sps", or "sprop-pps" MUST
2805	      NOT be both included in the "a=fmtp" line of SDP and conveyed
2806	      using the "fmtp" source attribute.  When included in the "a=fmtp"
2807	      line of SDP, these parameters are expressed as a media type
2808	      string, in the form of a semicolon separated list of
2809	      parameter=value pairs.  When conveyed using the "fmtp" source
2810	      attribute, these parameters are only associated with the given
2811	      source and payload type as parts of the "fmtp" source attribute.

2813	          Informative note: Conveyance of "sprop-vps", "sprop-sps", and
2814	          "sprop-pps" using the "fmtp" source attribute allows for out-
2815	          of-band transport of parameter sets in topologies like Topo-
2816	          Video-switch-MCU as specified in [RFC5117].

2818	   An example of media representation in SDP is as follows:

2820	         m=video 49170 RTP/AVP 98
2821	         a=rtpmap:98 H265/90000
2822	         a=fmtp:98 profile-id=1;
2823	                   sprop-vps=<video parameter sets data>

2825	7.2.2 Usage with SDP Offer/Answer Model

2827	   When HEVC is offered over RTP using SDP in an Offer/Answer model
2828	   [RFC3264] for negotiation for unicast usage, the following
2829	   limitations and rules apply:

2831	   o  The parameters identifying a media format configuration for HEVC
2832	      are profile-space, profile-id, tier-flag, level-id, interop-
2833	      constraints, profile-compatibility-indicator, and tx-mode.  These
2834	      media configuration parameters, except for level-id, MUST be used
2835	      symmetrically when the answerer does not include recv-sub-layer-
2836	      id in the answer for the media format (payload type).  In other
2837	      words, the answerer MUST 1) maintain all configuration parameters
2838	      for the media format (payload type), 2) include recv-sub-layer-id
2839	      in the answer for the media format (payload type), or 3) remove
2840	      the media format (payload type) completely (when one or more of
2841	      the parameter values are not supported).  The value of level-id
2842	      is changeable.

2844	          Informative note: The requirement for symmetric use does not
2845	          apply for level-id, and does not apply for the other
2846	          bitstream or RTP stream properties and capability parameters.

2848	   o  To simplify handling and matching of these configurations, the
2849	      same RTP payload type number used in the offer SHOULD also be
2850	      used in the answer, as specified in [RFC3264].  The same RTP
2851	      payload type number used in the offer MUST also be used in the
2852	      answer when the answer includes recv-sub-layer-id.  When the
2853	      answer does not include recv-sub-layer-id, the answer MUST NOT
2854	      contain a payload type number used in the offer unless the
2855	      configuration is exactly the same as in the offer or the
2856	      configuration in the answer only differs from that in the offer
2857	      with a different value of level-id.  The answer MAY contain the
2858	      recv-sub-layer-id parameter if an HEVC bitstream contains
2859	      multiple operation points (using temporal scalability and sub-
2860	      layers) and sprop-vps is included in the offer where sub-layers
2861	      are present in the video parameter set.  If the sprop-vps is
2862	      provided in an offer, an answerer MAY select a particular
2863	      operation point in the received and/or in the sent bitstream.
2864	      When recv-sub-layer-id is present in the answer, the media
2865	      configuration parameters MUST NOT be present in the answer.
2866	      Rather, the media configuration that the answerer will use for
2867	      receiving and/or sending is the one used for the selected
2868	      operation point as indicated in the offer.

2870	          Informative note: When an offerer receives an answer that
2871	          does not include recv-sub-layer-id, it has to compare payload
2872	          types not declared in the offer based on the media type (i.e.
2873	          video/H265) and the above media configuration parameters with
2874	          any payload types it has already declared.  This will enable
2875	          it to determine whether the configuration in question is new
2876	          or if it is equivalent to configuration already offered,
2877	          since a different payload type number may be used in the
2878	          answer.  The ability to perform operation point selection
2879	          enables a receiver to utilize the temporal scalable nature of
2880	          an HEVC bitstream.

2882	   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
2883	      sprop-depack-buf-bytes describe the properties of an RTP stream,
2884	      and its dependent RTP streams, when present, that the offerer or
2885	      the answerer is sending for the media format configuration.  This
2886	      differs from the normal usage of the Offer/Answer parameters:
2887	      normally such parameters declare the properties of the bitstream
2888	      or RTP stream that the offerer or the answerer is able to
2889	      receive.  When dealing with HEVC, the offerer assumes that the
2890	      answerer will be able to receive media encoded using the
2891	      configuration being offered.

2893	          Informative note:  The above parameters apply for any RTP
2894	          stream and its dependent RTP streams, when present, sent by a
2895	          declaring entity with the same configuration; i.e. they are
2896	          dependent on their source endpoint.  Rather than being bound
2897	          to the payload type, the values may have to be applied to
2898	          another payload type when being sent, as they apply for the
2899	          configuration.

2901	   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
2902	      max-br, max-tr, and max-tc MAY be used to declare further
2903	      capabilities of the offerer or answerer for receiving.  These
2904	      parameters MUST NOT be present when the direction attribute is
2905	      "sendonly".

2907	   o  The capability parameter max-fps MAY be used to declare lower
2908	      capabilities of the offerer or answerer for receiving.  The
2909	      parameters MUST NOT be present when the direction attribute is
2910	      "sendonly".

2912	   o  The capability parameter dec-parallel-cap MAY be used to declare
2913	      additional decoding capabilities of the offerer or answerer for
2914	      receiving.  Upon receiving such a declaration of a receiver, a
2915	      sender MAY send a bitstream to the receiver utilizing those
2916	      capabilities under the assumption that the bitstream fulfills the
2917	      parallelism requirement.  A bitstream that is sent based on
2918	      choosing a capability point with parallel tool type 'w' from dec-
2919	      parallel-cap MUST have entropy_coding_sync_enabled_flag equal to
2920	      1 and min_spatial_segmentation_idc equal to or larger than dec-
2921	      parallel-cap.spatial-seg-idc of the capability point.  A
2922	      bitstream that is sent based on choosing a capability point with
2923	      parallel tool type 't' from dec-parallel-cap MUST have
2924	      entropy_coding_sync_enabled_flag equal to 0 and
2925	      min_spatial_segmentation_idc equal to or larger than dec-
2926	      parallel-cap.spatial-seg-idc of the capability point.

2928	   o  An offerer has to include the size of the de-packetization
2929	      buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff and
2930	      sprop-depack-buf-nalus, in the offer for an interleaved HEVC
2931	      bitstream or for the MST transmission mode.  To enable the
2932	      offerer and answerer to inform each other about their
2933	      capabilities for de-packetization buffering in receiving RTP
2934	      streams, both parties are RECOMMENDED to include depack-buf-cap.
2935	      For interleaved RTP streams or in MST, it is also RECOMMENDED to
2936	      consider offering multiple payload types with different buffering
2937	      requirements when the capabilities of the receiver are unknown.

2939	   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included in
2940	      the "a=fmtp" line of SDP or conveyed using the "fmtp" source
2941	      attribute as specified in section 6.3 of [RFC5576]), are used for
2942	      out-of-band transport of the parameter sets (VPS, SPS, or PPS
2943	      respectively).

2945	   o  The answerer MAY use either out-of-band or in-band transport of
2946	      parameter sets for the bitstream it is sending, regardless of
2947	      whether out-of-band parameter sets transport has been used in the
2948	      offerer-to-answerer direction.  Parameter sets included in an
2949	      answer are independent of those parameter sets included in the
2950	      offer, as they are used for decoding two different bitstreams,
2951	      one from the answerer to the offerer and the other in the
2952	      opposite direction.

2954	   o  The following rules apply to transport of parameter set in the
2955	      offerer-to-answerer direction.

2957	       o An offer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
2958	          If none of these parameters is present in the offer, then
2959	          only in-band transport of parameter sets is used.

2961	       o If the level to use in the offerer-to-answerer direction is
2962	          equal to the default level in the offer, the answerer MUST be
2963	          prepared to use the parameter sets included in sprop-vps,
2964	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
2965	          line of SDP or conveyed using the "fmtp" source attribute)
2966	          for decoding the incoming bitstream, e.g. by passing these
2967	          parameter set NAL units to the video decoder before passing
2968	          any NAL units carried in the RTP streams.  Otherwise, the
2969	          answerer MUST ignore sprop-vps, sprop-sps, and sprop-pps
2970	          (either included in the "a=fmtp" line of SDP or conveyed
2971	          using the "fmtp" source attribute) and the offerer MUST
2972	          transmit parameter sets in-band.

2974	       o In MST, the answerer MUST be prepared to use the parameter
2975	          sets out-of-band transmitted for the current RTP stream and
2976	          its dependent RTP streams, when present, for decoding the
2977	          incoming bitstream, e.g. by passing these parameter set NAL
2978	          units to the video decoder before passing any NAL units
2979	          carried in the RTP streams.

2981	   o  The following rules apply to transport of parameter set in the
2982	      answerer-to-offerer direction.

2984	       o An answer MAY include sprop-vps, sprop-sps, and/or sprop-pps.
2985	          If none of these parameters is present in the answer, then
2986	          only in-band transport of parameter sets is used.

2988	       o The offerer MUST be prepared to use the parameter sets
2989	          included in sprop-vps, sprop-sps, and sprop-pps (either
2990	          included in the "a=fmtp" line of SDP or conveyed using the
2991	          "fmtp" source attribute) for decoding the incoming bitstream,
2992	          e.g. by passing these parameter set NAL units to the video
2993	          decoder before passing any NAL units carried in the RTP
2994	          streams.

2996	       o In MST, the offerer MUST be prepared to use the parameter
2997	          sets out-of-band transmitted for the current RTP stream and
2998	          its dependent RTP streams, when present, for decoding the
2999	          incoming bitstream, e.g. by passing these parameter set NAL
3000	          units to the video decoder before passing any NAL units
3001	          carried in the RTP streams.

3003	   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
3004	      the "fmtp" source attribute as specified in section 6.3 of
3005	      [RFC5576], the receiver of the parameters MUST store the
3006	      parameter sets included in sprop-vps, sprop-sps, and/or sprop-pps
3007	      and associate them with the source given as part of the "fmtp"
3008	      source attribute.  Parameter sets associated with one source
3009	      (given as part of the "fmtp" source attribute) MUST only be used
3010	      to decode NAL units conveyed in RTP packets from the same source
3011	      (given as part of the "fmtp" source attribute).  When this
3012	      mechanism is in use, SSRC collision detection and resolution MUST
3013	      be performed as specified in [RFC5576].

3015	   For bitstreams being delivered over multicast, the following rules
3016	   apply:

3018	   o  The media format configuration is identified by profile-space,
3019	      profile-id, tier-flag, level-id, interop-constraints, profile-
3020	      compatibility-indicator, and tx-mode.  These media format
3021	      configuration parameters, including level-id, MUST be used
3022	      symmetrically; that is, the answerer MUST either maintain all
3023	      configuration parameters or remove the media format (payload
3024	      type) completely.  Note that this implies that the level-id for
3025	      Offer/Answer in multicast is not changeable.

3027	   o  To simplify the handling and matching of these configurations,
3028	      the same RTP payload type number used in the offer SHOULD also be
3029	      used in the answer, as specified in [RFC3264].  An answer MUST
3030	      NOT contain a payload type number used in the offer unless the
3031	      configuration is the same as in the offer.

3033	   o  Parameter sets received MUST be associated with the originating
3034	      source and MUST only be used in decoding the incoming bitstream
3035	      from the same source.

3037	   o  The rules for other parameters are the same as above for unicast
3038	      as long as the three above rules are obeyed.

3040	   Table 1 lists the interpretation of all the parameters that MUST be
3041	   used for the various combinations of offer, answer, and direction
3042	   attributes.  Note that the two columns wherein the recv-sub-layer-id
3043	   parameter is used only apply to answers, whereas the other columns
3044	   apply to both offers and answers.

3046	   Table 1.  Interpretation of parameters for various combinations of
3047	   offers, answers, direction attributes, with and without recv-sub-
3048	   layer-id.  Columns that do not indicate offer or answer apply to
3049	   both.

3051	                                          sendonly --+
3052	            answer: recvonly, recv-sub-layer-id --+  |
3053	              recvonly w/o recv-sub-layer-id --+  |  |
3054	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
3055	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
3056	                                         |  |  |  |  |
3057	      profile-space                      C  X  C  X  P
3058	      profile-id                         C  X  C  X  P
3059	      tier-flag                          C  X  C  X  P
3060	      level-id                           C  X  C  X  P
3061	      interop-constraints                C  X  C  X  P
3062	      profile-compatibility-indicator    C  X  C  X  P
3063	      tx-mode                            C  X  C  X  P
3064	      max-recv-level-id                  R  R  R  R  -
3065	      sprop-max-don-diff                 P  P  -  -  P
3066	      sprop- depack-buf-nalus            P  P  -  -  P
3067	      sprop-depack-buf-bytes             P  P  -  -  P
3068	      depack-buf-cap                     R  R  R  R  -
3069	      sprop-segmentation-id              P  P  P  P  P
3070	      sprop-spatial-segmentation-idc     P  P  P  P  P
3071	      max-br                             R  R  R  R  -
3072	      max-cpb                            R  R  R  R  -
3073	      max-dpb                            R  R  R  R  -
3074	      max-lsr                            R  R  R  R  -
3075	      max-lps                            R  R  R  R  -
3076	      max-tr                             R  R  R  R  -
3077	      max-tc                             R  R  R  R  -
3078	      max-fps                            R  R  R  R  -
3079	      sprop-vps                          P  P  -  -  P
3080	      sprop-sps                          P  P  -  -  P
3081	      sprop-pps                          P  P  -  -  P
3082	      sprop-sub-layer-id                 P  P  -  -  P
3083	      recv-sub-layer-id                  X  O  X  O  -
3084	      dec-parallel-cap                   R  R  R  R  -

3086	     Legend:

3088	      C: configuration for sending and receiving bitstreams
3089	      P: properties of the bitstream to be sent
3090	      R: receiver capabilities
3091	      O: operation point selection
3092	      X: MUST NOT be present
3093	      -: not usable, when present SHOULD be ignored

3095	   Parameters used for declaring receiver capabilities are in general
3096	   downgradable; i.e. they express the upper limit for a sender's
3097	   possible behavior.  Thus, a sender MAY select to set its encoder
3098	   using only lower/lesser or equal values of these parameters.

3100	   Parameters declaring a configuration point are not changeable, with
3101	   the exception of the level-id parameter for unicast usage.  This
3102	   expresses values a receiver expects to be used and MUST be used
3103	   verbatim on the sender side.  If level-id is changed, an answerer
3104	   MUST NOT include the recv-sub-layer-id parameter.

3106	   When a sender's capabilities are declared, and non-changeable
3107	   parameters are used in this declaration, these parameters express a
3108	   configuration that is acceptable for the sender to receive
3109	   bitstreams.  In order to achieve high interoperability levels, it is
3110	   often advisable to offer multiple alternative configurations.  It is
3111	   impossible to offer multiple configurations in a single payload
3112	   type.  Thus, when multiple configuration offers are made, each offer
3113	   requires its own RTP payload type associated with the offer.

3115	   A receiver SHOULD understand all media type parameters, even if it
3116	   only supports a subset of the payload format's functionality.  This
3117	   ensures that a receiver is capable of understanding when an offer to
3118	   receive media can be downgraded to what is supported by the receiver
3119	   of the offer.

3121	   An answerer MAY extend the offer with additional media format
3122	   configurations.  However, to enable their usage, in most cases a
3123	   second offer is required from the offerer to provide the bitstream
3124	   property parameters that the media sender will use.  This also has
3125	   the effect that the offerer has to be able to receive this media
3126	   format configuration, not only to send it.

3128	7.2.3 Usage in Declarative Session Descriptions

3130	   When HEVC over RTP is offered with SDP in a declarative style, as in
3131	   Real Time Streaming Protocol (RTSP) [RFC2326] or Session
3132	   Announcement Protocol (SAP) [RFC2974], the following considerations
3133	   are necessary.

3135	   o  All parameters capable of indicating both bitstream properties
3136	      and receiver capabilities are used to indicate only bitstream
3137	      properties.  For example, in this case, the parameter profile-
3138	      tier-level-id declares the values used by the bitstream, not the
3139	      capabilities for receiving bitstreams.  This results in that the
3140	      following interpretation of the parameters MUST be used:

3142	   Declaring actual configuration or bitstream properties:

3144	     - profile-space
3145	     - profile-id
3146	     - tier-flag
3147	     - level-id
3148	     - interop-constraints
3149	     - profile-compatibility-indicator
3150	     - tx-mode
3151	     - sprop-vps
3152	     - sprop-sps
3153	     - sprop-pps
3154	     - sprop-max-don-diff
3155	     - sprop-depack-buf-nalus
3156	     - sprop-depack-buf-bytes
3157	     - sprop-segmentation-id
3158	     - sprop-spatial-segmentation-idc

3160	   Not usable (when present, they SHOULD be ignored):

3162	     - max-lps
3163	     - max-lsr
3164	     - max-cpb
3165	     - max-dpb
3166	     - max-br
3167	     - max-tr
3168	     - max-tc
3169	     - max-fps
3170	     - max-recv-level-id
3171	     - depack-buf-cap
3172	     - sprop-sub-layer-id
3173	     - dec-parallel-cap

3175	   o  A receiver of the SDP is required to support all parameters and
3176	      values of the parameters provided; otherwise, the receiver MUST
3177	      reject (RTSP) or not participate in (SAP) the session.  It falls
3178	      on the creator of the session to use values that are expected to
3179	      be supported by the receiving application.

3181	7.2.4 Parameter Sets Considerations

3183	   When out-of-band transport of parameter sets is used, parameter sets
3184	   MAY still be additionally transported in-band unless explicitly
3185	   disallowed by an application, and some of these additionally in-band
3186	   transported parameter sets may update some of the out-of-band
3187	   transported parameter sets.  Update of a parameter set refers to
3188	   sending of a parameter set of the same type using the same parameter
3189	   set ID but with different values for at least one other parameter of
3190	   the parameter set.

3192	   If MST is used, the rules on signaling media decoding dependency in
3193	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
3194	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
3195	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
3196	   with more than one address.  The order of session dependency is
3197	   given from the RTP stream containing the lowest temporal sub-layer
3198	   to the RTP stream containing the highest temporal sub-layer.

3200	7.2.5 Dependency Signaling in Multi-Stream Transmission

3202	   If MST is used, the rules on signaling media decoding dependency in
3203	   SDP as defined in [RFC5583] apply.  The rules on "hierarchical or
3204	   layered encoding" with multicast in Section 5.7 of [RFC4566] do not
3205	   apply, i.e. the notation for Connection Data "c=" SHALL NOT be used
3206	   with more than one address.  The order of session dependency is
3207	   given from the RTP stream containing the lowest temporal sub-layer
3208	   to the RTP stream containing the highest temporal sub-layer.

3210	8. Use with Feedback Messages

3212	   As specified in section 6.1 of RFC 4585 [RFC4585], payload Specific
3213	   Feedback messages are identified by the RTCP packet type value PSFB
3214	   (206).  AVPF [RFC4585] defines three payload-specific feedback
3215	   messages and one application layer feedback message, and CCM
3216	   [RFC5104] specifies four payload-specific feedback messages.

3218	   These feedback messages are identified by means of the feedback
3219	   message type (FMT) parameter as follows:

3221	   Assigned in [RFC4585]:

3223	      1:     Picture Loss Indication (PLI)
3224	      2:     Slice Lost Indication (SLI)
3225	      3:     Reference Picture Selection Indication (RPSI)
3226	      15:    Application layer FB message
3227	      31:    reserved for future expansion of the number space

3229	   Assigned in [RFC5104]:

3231	      4:     Full Intra Request (FIR) Command
3232	      5:     Temporal-Spatial Trade-off Request (TSTR)
3233	      6:     Temporal-Spatial Trade-off Notification (TSTN)
3234	      7:     Video Back Channel Message (VBCM)

3236	   Unassigned:

3238	      0:      unassigned
3239	      8-14:   unassigned
3240	      16-30:  unassigned

3242	   The following subsections define the use of the PLI, SLI, RPSI, and
3243	   FIR feedback messages with HEVC.

3245	8.1 Picture Loss Indication (PLI)

3247	   As specified in RFC 4585 section 6.3.1, the reception of a picture
3248	   loss indication by a media sender indicates the loss of "the loss of
3249	   an undefined amount of coded video data belonging to one or more
3250	   pictures.".  Without having any specific knowledge of the setup of
3251	   the bitstream (such as: use and location of in-band parameter sets,
3252	   non-IDR decoder refresh points, picture structures, and so forth) a
3253	   reaction to the reception of an PLI by an HEVC sender SHOULD BE to
3254	   send an IDR picture and relevant parameter sets; potentially with
3255	   sufficient redundancy so to ensure correct reception.  However,
3256	   sometimes information about the bitstream structure is known.  For
3257	   example, state could have been established outside of the mechanisms
3258	   defined in this document that parameter sets are conveyed out of
3259	   band only, and stay static for the duration of the session.  In that
3260	   case, it is obviously unnecessary to send them in-band as a result
3261	   of the reception of a PLI.  Other examples could be devised based on
3262	   a priori knowledge of different aspects of the bitstream structure.
3263	   In all cases, the timing and congestion control mechanisms of RFC
3264	   4585 MUST be observed.

3266	8.2 Slice Loss Indication

3268	   RFC 4585's Slice Loss Indication can be used to indicate, to a
3269	   sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB
3270	   raster scan order of a picture.  In the SLI's Feedback Control
3271	   Indication (FCI) field, the subfield "First" MUST be set to the CTB
3272	   address of the first lost CTB.  Note that the CTB address is in CTB
3273	   raster scan order of a picture.  For the first CTB of a slice
3274	   segment, the CTB address is the value of slice_segment_address when
3275	   present; or 0 when first_slice_segement_in_pic_flag is equal to 1;
3276	   both syntax elements are in the slice segment header.  The subfield
3277	   "Number" MUST be set to the number of consecutive lost CTBs, again
3278	   in CTB raster scan order of a picture.  The subfield "PictureID"
3279	   MUST  be  set  to  the  6  least  significant  bits  of  a  binary
3280	   representation  of  the  value  of  slice_pic_order_cnt_lsb  of  the
3281	   picture for which the lost CTBs are indicated.  Note that for IDR
3282	   pictures the syntax element slice_pic_order_cnt_lsb is not present,
3283	   but then the value is inferred to be equal to 0.

3285	   As described in RFC 4585, an encoder in a media sender can use this
3286	   information to "clean up" the corrupted picture by sending intra
3287	   information, while observing the constraints described in RFC4585,
3288	   for example with respect to congestion control.  In many cases,
3289	   error tracking is required to identify the corrupted region in the
3290	   receiver's state (reference pictures) because of error import in
3291	   uncorrupted regions of the picture through motion compensation, and
3292	   reference picture selection can also be used to "clean up" the
3293	   corrupted picture, which is usually more efficient and less likely
3294	   to generate congestion than sending intra information.

3296	   In contrast to the video codecs contemplated in RFC 4585 and RFC
3297	   5104, in HEVC, the "macroblock size" is not fixed to 16x16 luma
3298	   samples, but variable.  That, however, does not create a conceptual
3299	   difficulty with SLI, because the setting of the CTB size is a
3300	   sequence-level functionality, and using a slice loss indication
3301	   across coded video sequence boundaries is meaningless as there is no
3302	   prediction across sequence boundaries.  However, a proper use of SLI
3303	   messages is not as straightforward as it was with older, fixed-
3304	   macroblock-sized  video  codecs,  as  the  state  of  the  sequence
3305	   parameter set (where the CTB size is located) has to be taken into
3306	   account when interpreting the "First" subfield in the FCI.

3308	8.3 Use of HEVC with the RPSI Feedback Message

3310	   Feedback based reference picture selection has been shown as a
3311	   powerful tool to stop temporal error propagation for improved error
3312	   resilience [Girod99][Wang05].  In one approach, the decoder side
3313	   tracks errors in the decoded pictures and informs to the encoder
3314	   side that a particular picture that has been decoded relatively
3315	   earlier is correct and still present in the decoded picture buffer
3316	   and requests the encoder to use that correct picture for reference
3317	   when encoding the next picture, so to stop further temporal error
3318	   propagation.  For this approach, the decoder side should use the
3319	   RPSI feedback message.

3321	   Encoders can encode some long-term reference pictures as specified
3322	   in H.264 or HEVC for purposes described in the previous paragraph
3323	   without the need of a huge decoded picture buffer.  As shown in
3324	   [Wang05], with a flexible reference picture management scheme as in
3325	   H.264 and HEVC, even a decoded picture buffer size of two would work
3326	   for the approach described in the previous paragraph.

3328	   The field "Native RPSI bit string defined per codec" is a base16
3329	   [RFC4648] representation of the 8 bits consisting of 2 most
3330	   significant bits equal to 0 and 6 bits of nuh_layer_id, as defined
3331	   in [HEVC], followed by the 32 bits representing the value of the
3332	   PicOrderCntVal (in network byte order), as defined in [HEVC], for
3333	   the picture that is requested to be used for reference when encoding
3334	   the next picture.

3336	   The use of the RPSI feedback message as positive acknowledgement
3337	   with HEVC is deprecated.  In other words, the RPSI feedback message
3338	   MUST only be used as a reference picture selection request, such
3339	   that it can also be used in multicast.

3341	8.4 Full Intra Request (FIR)

3343	   The purpose of the FIR message is to force an encoder to send an
3344	   independent decoder refresh point as soon as possible (observing,
3345	   for example, the congestion control related constraints set out in
3346	   RFC 5104).

3348	   Upon reception of a FIR, a sender MUST send an IDR picture.
3349	   Parameter sets MUST also be sent, except when there is a priori
3350	   knowledge that the parameter sets have been correctly established.
3351	   (A typical example for that is an understanding between sender and
3352	   receiver, established by means outside this document, that parameter
3353	   sets are exclusively sent out of band.)

3355	9. Security Considerations

3357	   RTP packets using the payload format defined in this specification
3358	   are subject to the security considerations discussed in the RTP
3359	   specification [RFC3550], and in any applicable RTP profile such as
3360	   RTP/AVP [RFC3551], RTP/AVPF [RFC4585], RTP/SAVP [RFC3711] or
3361	   RTP/SAVPF [RFC5124].  However, as "Securing the RTP Protocol
3362	   Framework: Why RTP Does Not Mandate a Single Media Security
3363	   Solution" [I-D.ietf-avt-srtp-not-mandatory] discusses it is not an
3364	   RTP payload format's responsibility to discuss or mandate what
3365	   solutions are used to meet the basic security goals like
3366	   confidentiality, integrity, and source authenticity for RTP in
3367	   general.  This responsibility lays on anyone using RTP in an
3368	   application.  They can find guidance on available security
3369	   mechanisms and important considerations as discussed in "Options for
3370	   Securing RTP Sessions" [I-D.ietf-avtcore-rtp-security-options].

3372	   The rest of this section discusses the security impacting properties
3373	   of the payload format itself.

3375	   Because the data compression used with this payload format is
3376	   applied end-to-end, any encryption needs to be performed after
3377	   compression.  A potential denial-of-service threat exists for data
3378	   encodings using compression techniques that have non-uniform
3379	   receiver-end computational load.  The attacker can inject
3380	   pathological datagrams into the bitstream that are complex to decode
3381	   and that cause the receiver to be overloaded.  H.265 is particularly
3382	   vulnerable to such attacks, as it is extremely simple to generate
3383	   datagrams containing NAL units that affect the decoding process of
3384	   many future NAL units.  Therefore, the usage of data origin
3385	   authentication and data integrity protection of at least the RTP
3386	   packet is RECOMMENDED, for example, with SRTP [RFC 3711].

3388	   Note that the appropriate mechanism to ensure confidentiality and
3389	   integrity of RTP packets and their payloads is very dependent on the
3390	   application and on the transport and signaling protocols employed.
3391	   Thus, although SRTP is given as an example above, other possible
3392	   choices exist.

3394	   Decoders MUST exercise caution with respect to the handling of user
3395	   data SEI messages, particularly if they contain active elements, and
3396	   MUST restrict their domain of applicability to the presentation
3397	   containing the bitstream.

3399	   End-to-end security with authentication, integrity, or
3400	   confidentiality protection will prevent a MANE from performing
3401	   media-aware operations other than discarding complete packets.  In
3402	   the case of confidentiality protection, it will even be prevented
3403	   from discarding packets in a media-aware way.  To be allowed to
3404	   perform such operations, a MANE is required to be a trusted entity
3405	   that is included in the security context establishment.

3407	10. Congestion Control

3409	   Congestion control for RTP SHALL be used in accordance with RTP
3410	   [RFC3550] and with any applicable RTP profile, e.g. AVP [RFC 3551].
3411	   If best-effort service is being used, an additional requirement is
3412	   that users of this payload format MUST monitor packet loss to ensure
3413	   that the packet loss rate is within an acceptable range.  Packet
3414	   loss is considered acceptable if a TCP flow across the same network
3415	   path, and experiencing the same network conditions, would achieve an
3416	   average throughput, measured on a reasonable timescale, that is not
3417	   less than all RTP streams combined is achieving.  This condition can
3418	   be satisfied by implementing congestion control mechanisms to adapt
3419	   the transmission rate, the number of layers subscribed for a layered
3420	   multicast session, or by arranging for a receiver to leave the
3421	   session if the loss rate is unacceptably high.

3423	   The bitrate adaptation necessary for obeying the congestion control
3424	   principle is easily achievable when real-time encoding is used, for
3425	   example by adequately tuning the quantization parameter.

3427	   However, when pre-encoded content is being transmitted, bandwidth
3428	   adaptation requires the pre-coded bitstream to be tailored for such
3429	   adaptivity.  The key mechanism available in HEVC is temporal
3430	   scalability.  A media sender can remove NAL units belonging to
3431	   higher temporal sub-layers (i.e. those NAL units with a high value
3432	   of TID) until the sending bitrate drops to an acceptable range.
3433	   HEVC contains mechanisms that allow the lightweight identification
3434	   of switching points in temporal enhancement layers, as discussed in
3435	   Section 1.1.2 of this memo.  An HEVC media sender can send packets
3436	   belonging to NAL units of temporal enhancement layers starting from
3437	   these switching points to probe for available bandwidth and to
3438	   utilized bandwidth that has been shown to be available.

3440	   Above mechanisms generally work within a defined profile and level
3441	   and, therefore, no renegotiation of the channel is required.  Only
3442	   when non-downgradable parameters (such as profile) are required to
3443	   be changed does it become necessary to terminate and restart the RTP
3444	   stream(s).  This may be accomplished by using different RTP payload
3445	   types.

3447	   MANEs MAY remove certain unusable packets from the RTP stream when
3448	   that RTP stream was damaged due to previous packet losses.  This can
3449	   help reduce the network load in certain special cases.  For example,
3450	   MANES can remove those FUs where the leading FUs belonging to the
3451	   same NAL unit have been lost or those dependent slice segments when
3452	   the leading slice segments belonging to the same slice have been
3453	   lost, because the trailing FUs or dependent slice segments are
3454	   meaningless to most decoders.  MANES can also remove higher temporal
3455	   scalable layers if the outbound transmission (from the MANE's
3456	   viewpoint) experiences congestion.

3458	11. IANA Consideration

3460	   A new media type, as specified in Section 7.1 of this memo, should
3461	   be registered with IANA.

3463	12. Acknowledgements

3465	   Muhammed Coban and Marta Karczewicz are thanked for discussions on
3466	   the specification of the use with feedback messages and other
3467	   aspects in this memo.  Jonathan Lennox and Jill Boyce are thanked
3468	   for their contributions to the PACI design included in this memo.
3469	   Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund, and
3470	   Tom Kristensen are thanked for their contributions to parallel
3471	   processing related signalling.  Magnus Westerlund, Jonathan Lennox,
3472	   Bernard Aboba, Jonatan Samuelsson, Roni Even, Rickard Sjoberg,
3473	   Sachin Deshpande, Woo Johnman, Mo Zanaty, and Ross Finlayson made
3474	   valuable reviewing comments that led to improvements.

3476	   This document was prepared using 2-Word-v2.0.template.dot.

3478	13. References

3480	13.1 Normative References

3482	   [HEVC]    ITU-T Recommendation H.265, "High efficiency video
3483	             coding", April 2013.

3485	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
3486	             generic audiovisual services", April 2013.

3488	   [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
3489	             Dependency in the Session Description Protocol (SDP)", RFC
3490	             5583, July 2009.

3492	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup, "RTP
3493	             Payload Format for H.264 Video", RFC 6184, May 2011.

3495	   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
3496	             Eleftheriadis, "RTP Payload Format for Scalable Video
3497	             Coding", RFC 6190, May 2011.

3499	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
3500	             Requirement Levels", BCP 14, RFC 2119, March 1997.

3502	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
3503	             with Session Description Protocol (SDP)", RFC 3264, June
3504	             2002.

3506	   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
3507	             Encodings", RFC 4648, October 2006.

3509	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson,
3510	             V., "RTP: A Transport Protocol for Real-Time
3511	             Applications", STD 64, RFC 3550, July 2003.

3513	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session
3514	             Description Protocol", RFC 4566, July 2006.

3516	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
3517	             Media Attributes in the Session Description Protocol", RFC
3518	             5576, June 2009.

3520	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
3521	             J., "Extended RTP Profile for Real-time Transport Control
3522	             Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
3523	             2006.

3525	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman, B.,
3526	             "Codec Control Messages in the RTP Audio-Visual Profile
3527	             with Feedback (AVPF)", RFC 5104, February 2008.

3529	13.2 Informative References

3531	   [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
3532	             Streaming Service (PSS); Progressive Download and Dynamic
3533	             Adaptive Streaming over HTTP (3GP-DASH)", v12.1.0,
3534	             December 2013.

3536	   [3GPPFF]  3GPP TS 26.244, "Transparent end-to-end packet switched
3537	             streaming service (PSS); 3GPP file format (3GP)", v12.20,
3538	             December 2013.

3540	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error control
3541	             for mobile video transmission", Proceedings IEEE, Vol. 87,
3542	             No. 10, pp. 1707-1723, October 1999.

3544	   [I-D.ietf-avt-srtp-not-mandatory]
3545	             Perkins, C. and M. Westerlund, "Securing the RTP
3546	             ProtocolFramework: Why RTP Does Not Mandate a Single
3547	             MediaSecurity Solution", draft-ietf-avt-srtp-not-
3548	             mandatory-16 (work in progress), January 2014.

3550	   [I-D.ietf-avtcore-rtp-security-options]
3551	             Westerlund, M. and C. Perkins, "Options for Securing RTP
3552	             Sessions", draft-ietf-avtcore-rtp-security-options-10
3553	             (work in progress), January 2014.

3555	   [I-D.ietf-avtcore-rtp-multi-stream]
3556	             Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
3557	             "Sending Multiple Media Streams in a Single RTP Session",
3558	             draft-ietf-avtcore-rtp-multi-stream-01 (work in progress),
3559	             July 2013.

3561	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
3562	             Holmberg, C., Alvestrand, H., and C. Jennings,
3563	             "Multiplexing Negotiation Using Session Description
3564	             Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
3565	             bundle-negotiation-05 (work in progress), October 2013.

3567	   [I-D.ietf-avtext-rtp-grouping-taxonomy]
3568	             Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and
3569	             Burman, B. "A Taxonomy of Grouping Semantics and
3570	             Mechanisms for Real-Time Transport", draft-ietf-avtext-
3571	             rtp-grouping-taxonomy-01 (work in progress), February
3572	             2014.

3574	   [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
3575	             Coding of audio-visual objects - Part 12: ISO base media
3576	             file format" | "Information technology - JPEG 2000 image
3577	             coding system - Part 12: ISO base media file format",
3578	             2012.

3580	   [JCTVC-J0107] Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
3581	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-L0107,
3582	             10th JCT-VC meeting, July 2012, Stockholm, Sweden.

3584	   [MPEG2S]  ISO/IEC 13818-1, "Information technology - Generic coding
3585	             of moving pictures and associated audio information:
3586	             Systems", 2013.

3588	   [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
3589	             adaptive streaming over HTTP (DASH) - Part 1: Media
3590	             presentation description and segment formats", 2012.

3592	   [RFC5109] Li, A., "RTP Payload Format for Generic Forward Error
3593	             Correction", RFC 5109, December 2007.

3595	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient video
3596	             coding using flexible reference fames", Visual
3597	             Communications and Image Processing 2005 (VCIP 2005), July
3598	             2005, Beijing, China.

3600	14. Authors' Addresses

3602	   Ye-Kui Wang
3603	   Qualcomm Incorporated
3604	   5775 Morehouse Drive
3605	   San Diego, CA 92121
3606	   USA
3607	   Phone: +1-858-651-8345
3608	   EMail: yekuiw@qti.qualcomm.com

3610	   Yago Sanchez
3611	   Fraunhofer HHI
3612	   Einsteinufer 37
3613	   D-10587 Berlin
3614	   Germany
3615	   Phone: +49-30-31002-227
3616	   Email: yago.sanchez@hhi.fraunhofer.de

3618	   Thomas Schierl
3619	   Fraunhofer HHI
3620	   Einsteinufer 37
3621	   D-10587 Berlin
3622	   Germany
3623	   Phone: +49-30-31002-227
3624	   Email: ts@thomas-schierl.de

3626	   Stephan Wenger
3627	   Vidyo, Inc.
3628	   433 Hackensack Ave., 7th floor
3629	   Hackensack, N.J. 07601
3630	   USA
3631	   Phone: +1-415-713-5473
3632	   EMail: stewe@stewe.org

3634	   Miska M. Hannuksela
3635	   Nokia Corporation
3636	   P.O. Box 1000
3637	   33721 Tampere
3638	   Finland
3639	   Phone: +358-7180-08000
3640	   EMail: miska.hannuksela@nokia.com