idnits 2.17.1 

draft-ietf-payload-rtp-h265-15.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1630 has weird spacing: '...   This  memo ...'

  == Line 1635 has weird spacing: '... signal  two  ...'

  -- The document date (November 5, 2015) is 3095 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1767

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-09

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-23

  -- Obsolete informational reference (is this intentional?): RFC 2326
     (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                        Y.-K. Wang
2	Internet Draft                                                 Qualcomm
3	Intended status: Standards track                             Y. Sanchez
4	Expires: May 2016                                            T. Schierl
5	                                                         Fraunhofer HHI
6	                                                              S. Wenger
7	                                                                  Vidyo
8	                                                       M. M. Hannuksela
9	                                                                  Nokia
10	                                                       November 5, 2015

12	                RTP Payload Format for H.265/HEVC Video
13	                   draft-ietf-payload-rtp-h265-15.txt

15	Abstract

17	   This memo describes an RTP payload format for the video coding
18	   standard ITU-T Recommendation H.265 and ISO/IEC International
19	   Standard 23008-2, both also known as High Efficiency Video Coding
20	   (HEVC) and developed by the Joint Collaborative Team on Video
21	   Coding (JCT-VC).  The RTP payload format allows for packetization
22	   of one or more Network Abstraction Layer (NAL) units in each RTP
23	   packet payload, as well as fragmentation of a NAL unit into
24	   multiple RTP packets.  Furthermore, it supports transmission of
25	   an HEVC bitstream over a single as well as multiple RTP streams.
26	   When multiple RTP streams are used, a single or multiple
27	   transports may be utilized.  The payload format has wide
28	   applicability in videoconferencing, Internet video streaming, and
29	   high bit-rate entertainment-quality video, among others.

31	Status of this Memo

33	   This Internet-Draft is submitted to IETF in full conformance with
34	   the provisions of BCP 78 and BCP 79.

36	   Internet-Drafts are working documents of the Internet Engineering
37	   Task Force (IETF), its areas, and its working groups.  Note that
38	   other groups may also distribute working documents as Internet-
39	   Drafts.

41	   Internet-Drafts are draft documents valid for a maximum of six
42	   months and may be updated, replaced, or obsoleted by other
43	   documents at any time.  It is inappropriate to use Internet-
44	   Drafts as reference material or to cite them other than as "work
45	   in progress."

47	   The list of current Internet-Drafts can be accessed at
48	   http://www.ietf.org/ietf/1id-abstracts.txt.

50	   The list of Internet-Draft Shadow Directories can be accessed at
51	   http://www.ietf.org/shadow.html.

53	   This Internet-Draft will expire on May 5, 2016.

55	Copyright and License Notice

57	   Copyright (c) 2015 IETF Trust and the persons identified as the
58	   document authors.  All rights reserved.

60	   This document is subject to BCP 78 and the IETF Trust's Legal
61	   Provisions Relating to IETF Documents
62	   (http://trustee.ietf.org/license-info) in effect on the date of
63	   publication of this document.  Please review these documents
64	   carefully, as they describe your rights and restrictions with
65	   respect to this document.  Code Components extracted from this
66	   document must include Simplified BSD License text as described in
67	   Section 4.e of the Trust Legal Provisions and are provided
68	   without warranty as described in the Simplified BSD License.

70	Table of Contents

72	   Abstract..........................................................1
73	   Status of this Memo...............................................1
74	   Table of Contents.................................................3
75	   1 Introduction....................................................5
76	      1.1 Overview of the HEVC Codec.................................5
77	         1.1.1 Coding-Tool Features..................................6
78	         1.1.2 Systems and Transport Interfaces......................8
79	         1.1.3 Parallel Processing Support..........................14
80	         1.1.4 NAL Unit Header......................................17
81	      1.2 Overview of the Payload Format............................18
82	   2 Conventions....................................................19
83	   3 Definitions and Abbreviations..................................19
84	      3.1 Definitions...............................................19
85	         3.1.1 Definitions from the HEVC Specification..............19
86	         3.1.2 Definitions Specific to This Memo....................21
87	      3.2 Abbreviations.............................................23
88	   4 RTP Payload Format.............................................25
89	      4.1 RTP Header Usage..........................................25
90	      4.2 Payload Header Usage......................................27
91	      4.3 Transmission Modes........................................28
92	      4.4 Payload Structures........................................29
93	         4.4.1 Single NAL Unit Packets..............................30
94	         4.4.2 Aggregation Packets (APs)............................30
95	         4.4.3 Fragmentation Units (FUs)............................35
96	         4.4.4 PACI packets.........................................38
97	            4.4.4.1 Reasons for the PACI rules (informative)........41
98	            4.4.4.2 PACI extensions (Informative)...................42
99	      4.5 Temporal Scalability Control Information..................43
100	      4.6 Decoding Order Number.....................................45
101	   5 Packetization Rules............................................47
102	   6 De-packetization Process.......................................48
103	   7 Payload Format Parameters......................................50
104	      7.1 Media Type Registration...................................51
105	      7.2 SDP Parameters............................................76
106	         7.2.1 Mapping of Payload Type Parameters to SDP............76
107	         7.2.2 Usage with SDP Offer/Answer Model....................78
108	         7.2.3 Usage in Declarative Session Descriptions............87
109	         7.2.4 Parameter Sets Considerations........................88
110	         7.2.5 Dependency Signaling in Multi-Stream Mode............88
111	   8 Use with Feedback Messages.....................................89
112	      8.1 Picture Loss Indication (PLI).............................89
113	      8.2 Slice Loss Indication (SLI)...............................89
114	      8.3 Reference Picture Selection Indication (RPSI).............91
115	      8.4 Full Intra Request (FIR)..................................91
116	   9 Security Considerations........................................92
117	   10 Congestion Control............................................94
118	   11 IANA Consideration............................................95
119	   12 Acknowledgements..............................................95
120	   13 References....................................................96
121	      13.1 Normative References.....................................96
122	      13.2 Informative References...................................97
123	   14 Authors' Addresses............................................99

125	1 Introduction

127	   The High Efficiency Video Coding [HEVC], formally known as ITU-T
128	   Recommendation H.265 and ISO/IEC International Standard 23008-2
129	   was ratified by ITU-T in April 2013 and reportedly provides
130	   significant coding efficiency gains over H.264 [H.264].

132	   This memo describes an RTP payload format for HEVC.  It shares
133	   its basic design with the RTP payload formats of [RFC6184] and
134	   [RFC6190].  With respect to design philosophy, security,
135	   congestion control, and overall implementation complexity, it has
136	   similar properties to those earlier payload format
137	   specifications.  This is a conscious choice, as at least RFC6184
138	   is widely deployed and generally known in the relevant
139	   implementer communities.  Mechanisms from RFC6190 were
140	   incorporated as HEVC version 1 supports temporal scalability.

142	   In order to help the overlapping implementer community,
143	   frequently only the differences between RFC6184/RFC6190 and the
144	   HEVC payload format are highlighted in non-normative, explanatory
145	   parts of this memo.  Basic familiarity with both specifications
146	   is assumed for those parts.  However, the normative parts of this
147	   memo do not require study of RFC6184 or RFC6190.

149	1.1 Overview of the HEVC Codec

151	   H.264 and HEVC share a similar hybrid video codec design.  In
152	   this memo, we provide a very brief overview of those features of
153	   HEVC that are in some form addressed by the payload format
154	   specified herein.  Implementers have to read and understand, and
155	   apply the ITU-T/ISO/IEC specifications pertaining to HEVC to
156	   arrive at interoperable, well-performing implementations.
157	   Implementers should consider testing their design (including the
158	   interworking between the payload format implementation and the
159	   core video codec) using the tools provided by ITU-T/ISO/IEC; for
160	   example, conformance bitstreams as specified in [add confermance
161	   spec).  Not doing so has historically led to badly performing and
162	   unsecure systems.

164	   Conceptually, both H.264 and HEVC include a video coding layer
165	   (VCL), which is often used to refer to the coding-tool features,
166	   and a network abstraction layer (NAL), which is often used to
167	   refer to the systems and transport interface aspects of the
168	   codecs.

170	1.1.1 Coding-Tool Features

172	   Similarly to earlier hybrid-video-coding-based standards,
173	   including H.264, the following basic video coding design is
174	   employed by HEVC.  A prediction signal is first formed either by
175	   intra or motion compensated prediction, and the residual (the
176	   difference between the original and the prediction) is then
177	   coded.  The gains in coding efficiency are achieved by
178	   redesigning and improving almost all parts of the codec over
179	   earlier designs.  In addition, HEVC includes several tools to
180	   make the implementation on parallel architectures easier.  Below
181	   is a summary of HEVC coding-tool features.

183	   Quad-tree block and transform structure

185	   One of the major tools that contribute significantly to the
186	   coding efficiency of HEVC is the usage of flexible coding blocks
187	   and transforms, which are defined in a hierarchical quad-tree
188	   manner.  Unlike H.264, where the basic coding block is a
189	   macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit
190	   (CTU) of a maximum size of 64x64.  Each CTU can be divided into
191	   smaller units in a hierarchical quad-tree manner and can
192	   represent smaller blocks down to size 4x4.  Similarly, the
193	   transforms used in HEVC can have different sizes, starting from
194	   4x4 and going up to 32x32.  Utilizing large blocks and transforms
195	   contribute to the major gain of HEVC, especially at high
196	   resolutions.

198	   Entropy coding

200	   HEVC uses a single entropy coding engine, which is based on
201	   Context Adaptive Binary Arithmetic Coding (CABAC) [CABAC],
202	   whereas H.264 uses two distinct entropy coding engines.  CABAC in
203	   HEVC shares many similarities with CABAC of H.264, but contains
204	   several improvements.  Those include improvements in coding
205	   efficiency and lowered implementation complexity, especially for
206	   parallel architectures.

208	   In-loop filtering

210	   H.264 includes an in-loop adaptive deblocking filter, where the
211	   blocking artifacts around the transform edges in the
212	   reconstructed picture are smoothed to improve the picture quality
213	   and compression efficiency.  In HEVC, a similar deblocking filter
214	   is employed but with somewhat lower complexity.  In addition,
215	   pictures undergo a subsequent filtering operation called Sample
216	   Adaptive Offset (SAO), which is a new design element in HEVC.
217	   SAO basically adds a pixel-level offset in an adaptive manner and
218	   usually acts as a de-ringing filter.  It is observed that SAO
219	   improves the picture quality, especially around sharp edges
220	   contributing substantially to visual quality improvements of
221	   HEVC.

223	   Motion prediction and coding

225	   There have been a number of improvements in this area that are
226	   summarized as follows.  The first category is motion merge and
227	   advanced motion vector prediction (AMVP) modes.  The motion
228	   information of a prediction block can be inferred from the
229	   spatially or temporally neighboring blocks.  This is similar to
230	   the DIRECT mode in H.264 but includes new aspects to incorporate
231	   the flexible quad-tree structure and methods to improve the
232	   parallel implementations.  In addition, the motion vector
233	   predictor can be signaled for improved efficiency.  The second
234	   category is high-precision interpolation.  The interpolation
235	   filter length is increased to 8-tap from 6-tap, which improves
236	   the coding efficiency but also comes with increased complexity.
237	   In addition, the interpolation filter is defined with higher
238	   precision without any intermediate rounding operations to further
239	   improve the coding efficiency.

241	   Intra prediction and intra coding

243	   Compared to 8 intra prediction modes in H.264, HEVC supports
244	   angular intra prediction with 33 directions.  This increased
245	   flexibility improves both objective coding efficiency and visual
246	   quality as the edges can be better predicted and ringing
247	   artifacts around the edges can be reduced.  In addition, the
248	   reference samples are adaptively smoothed based on the prediction
249	   direction.  To avoid contouring artifacts a new interpolative
250	   prediction generation is included to improve the visual quality.
251	   Furthermore, discrete sine transform (DST) is utilized instead of
252	   traditional discrete cosine transform (DCT) for 4x4 intra
253	   transform blocks.

255	   Other coding-tool features

257	   HEVC includes some tools for lossless coding and efficient screen
258	   content coding, such as skipping the transform for certain
259	   blocks.  These tools are particularly useful for example when
260	   streaming the user-interface of a mobile device to a large
261	   display.

263	1.1.2 Systems and Transport Interfaces

265	   HEVC inherited the basic systems and transport interfaces
266	   designs, such as the NAL-unit-based syntax structure, the
267	   hierarchical syntax and data unit structure from sequence-level
268	   parameter sets, multi-picture-level or picture-level parameter
269	   sets, slice-level header parameters, lower-level parameters, the
270	   supplemental enhancement information (SEI) message mechanism, the
271	   hypothetical reference decoder (HRD) based video buffering model,
272	   and so on.  In the following, a list of differences in these
273	   aspects compared to H.264 is summarized.

275	   Video parameter set

277	   A new type of parameter set, called video parameter set (VPS),
278	   was introduced.  For the first (2013) version of [HEVC], the
279	   video parameter set NAL unit is required to be available prior to
280	   its activation, while the information contained in the video
281	   parameter set is not necessary for operation of the decoding
282	   process.  For future HEVC extensions, such as the 3D or scalable
283	   extensions, the video parameter set is expected to include
284	   information necessary for operation of the decoding process, e.g.
285	   decoding dependency or information for reference picture set
286	   construction of enhancement layers.  The VPS provides a "big
287	   picture" of a bitstream, including what types of operation points
288	   are provided, the profile, tier, and level of the operation
289	   points, and some other high-level properties of the bitstream
290	   that can be used as the basis for session negotiation and content
291	   selection, etc. (see Section 7.1).

293	   Profile, tier and level

295	   The profile, tier and level syntax structure that can be included
296	   in both VPS and sequence parameter set (SPS) includes 12 bytes of
297	   data to describe the entire bitstream (including all temporally
298	   scalable layers, which are referred to as sub-layers in the HEVC
299	   specification), and can optionally include more profile, tier and
300	   level information pertaining to individual temporally scalable
301	   layers.  The profile indicator indicates the "best viewed as"
302	   profile when the bitstream conforms to multiple profiles, similar
303	   to the major brand concept in the ISO base media file format
304	   (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF,
305	   such as the 3GPP file format [3GPPFF].  The profile, tier and
306	   level syntax structure also includes indications such as 1)
307	   whether the bitstream is free of frame-packed content, 2) whether
308	   the bitstream is free of interlaced source content, and 3)
309	   whether the bitstream is free of field pictures.  When the answer
310	   is yes for both 2) and 3), the bitstream contains only frame
311	   pictures of progressive source.  Based on these indications,
312	   clients/players without support of post-processing
313	   functionalities for handling of frame-packed, interlaced source
314	   content or field pictures can reject those bitstreams that
315	   contain such pictures.

317	   Bitstream and elementary stream

319	   HEVC includes a definition of an elementary stream, which is new
320	   compared to H.264.  An elementary stream consists of a sequence
321	   of one or more bitstreams.  An elementary stream that consists of
322	   two or more bitstreams has typically been formed by splicing
323	   together two or more bitstreams (or parts thereof).  When an
324	   elementary stream contains more than one bitstream, the last NAL
325	   unit of the last access unit of a bitstream (except the last
326	   bitstream in the elementary stream) must contain an end of
327	   bitstream NAL unit and the first access unit of the subsequent
328	   bitstream must be an intra random access point (IRAP) access
329	   unit.  This IRAP access unit may be a clean random access (CRA),
330	   broken link access (BLA), or instantaneous decoding refresh (IDR)
331	   access unit.

333	   Random access support

335	   HEVC includes signaling in the NAL unit header, through NAL unit
336	   types, of IRAP pictures beyond IDR pictures.  Three types of IRAP
337	   pictures, namely IDR, CRA and BLA pictures are supported, wherein
338	   IDR pictures are conventionally referred to as closed group-of-
339	   pictures (closed-GOP) random access points, and CRA and BLA
340	   pictures are those conventionally referred to as open-GOP random
341	   access points.  BLA pictures usually originate from splicing of
342	   two bitstreams or part thereof at a CRA picture, e.g. during
343	   stream switching.  To enable better systems usage of IRAP
344	   pictures, altogether six different NAL units are defined to
345	   signal the properties of the IRAP pictures, which can be used to
346	   better match the stream access point (SAP) types as defined in
347	   the ISOBMFF [ISOBMFF], which are utilized for random access
348	   support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH].
349	   Pictures following an IRAP picture in decoding order and
350	   preceding the IRAP picture in output order are referred to as
351	   leading pictures associated with the IRAP picture.  There are two
352	   types of leading pictures, namely random access decodable leading
353	   (RADL) pictures and random access skipped leading (RASL)
354	   pictures.  RADL pictures are decodable when the decoding started
355	   at the associated IRAP picture, and RASL pictures are not
356	   decodable when the decoding started at the associated IRAP
357	   picture and are usually discarded.  HEVC provides mechanisms to
358	   enable the specification of conformance of bitstreams with RASL
359	   pictures being discarded, thus to provide a standard-compliant
360	   way to enable systems components to discard RASL pictures when
361	   needed.

363	   Temporal scalability support

365	   HEVC includes an improved support of temporal scalability, by
366	   inclusion of the signaling of TemporalId in the NAL unit header,
367	   the restriction that pictures of a particular temporal sub-layer
368	   cannot be used for inter prediction reference by pictures of a
369	   lower temporal sub-layer, the sub-bitstream extraction process,
370	   and the requirement that each sub-bitstream extraction output be
371	   a conforming bitstream.  Media-aware network elements (MANEs) can
372	   utilize the TemporalId in the NAL unit header for stream
373	   adaptation purposes based on temporal scalability.

375	   Temporal sub-layer switching support

377	   HEVC specifies, through NAL unit types present in the NAL unit
378	   header, the signaling of temporal sub-layer access (TSA) and
379	   stepwise temporal sub-layer access (STSA).  A TSA picture and
380	   pictures following the TSA picture in decoding order do not use
381	   pictures prior to the TSA picture in decoding order with
382	   TemporalId greater than or equal to that of the TSA picture for
383	   inter prediction reference.  A TSA picture enables up-switching,
384	   at the TSA picture, to the sub-layer containing the TSA picture
385	   or any higher sub-layer, from the immediately lower sub-layer.
386	   An STSA picture does not use pictures with the same TemporalId as
387	   the STSA picture for inter prediction reference.  Pictures
388	   following an STSA picture in decoding order with the same
389	   TemporalId as the STSA picture do not use pictures prior to the
390	   STSA picture in decoding order with the same TemporalId as the
391	   STSA picture for inter prediction reference.  An STSA picture
392	   enables up-switching, at the STSA picture, to the sub-layer
393	   containing the STSA picture, from the immediately lower sub-
394	   layer.

396	   Sub-layer reference or non-reference pictures

398	   The concept and signaling of reference/non-reference pictures in
399	   HEVC are different from H.264.  In H.264, if a picture may be
400	   used by any other picture for inter prediction reference, it is a
401	   reference picture; otherwise it is a non-reference picture, and
402	   this is signaled by two bits in the NAL unit header.  In HEVC, a
403	   picture is called a reference picture only when it is marked as
404	   "used for reference".  In addition, the concept of sub-layer
405	   reference picture was introduced.  If a picture may be used by
406	   another other picture with the same TemporalId for inter
407	   prediction reference, it is a sub-layer reference picture;
408	   otherwise it is a sub-layer non-reference picture.  Whether a
409	   picture is a sub-layer reference picture or sub-layer non-
410	   reference picture is signaled through NAL unit type values.

412	   Extensibility

414	   Besides the TemporalId in the NAL unit header, HEVC also includes
415	   the signaling of a six-bit layer ID in the NAL unit header, which
416	   must be equal to 0 for a single-layer bitstream.  Extension
417	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit,
418	   slice headers, and so on.  All these extension mechanisms enable
419	   future extensions in a backward compatible manner, such that
420	   bitstreams encoded according to potential future HEVC extensions
421	   can be fed to then-legacy decoders (e.g. HEVC version 1 decoders)
422	   and the then-legacy decoders can decode and output the base layer
423	   bitstream.

425	   Bitstream extraction

427	   HEVC includes a bitstream extraction process as an integral part
428	   of the overall decoding process, as well as specification of the
429	   use of the bitstream extraction process in description of
430	   bitstream conformance tests as part of the hypothetical reference
431	   decoder (HRD) specification.

433	   Reference picture management

435	   The reference picture management of HEVC, including reference
436	   picture marking and removal from the decoded picture buffer (DPB)
437	   as well as reference picture list construction (RPLC), differs
438	   from that of H.264.  Instead of the sliding window plus adaptive
439	   memory management control operation (MMCO) based reference
440	   picture marking mechanism in H.264, HEVC specifies a reference
441	   picture set (RPS) based reference picture management and marking
442	   mechanism, and the RPLC is consequently based on the RPS
443	   mechanism.  A reference picture set consists of a set of
444	   reference pictures associated with a picture, consisting of all
445	   reference pictures that are prior to the associated picture in
446	   decoding order, that may be used for inter prediction of the
447	   associated picture or any picture following the associated
448	   picture in decoding order.  The reference picture set consists of
449	   five lists of reference pictures; RefPicSetStCurrBefore,
450	   RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
451	   RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter and
452	   RefPicSetLtCurr contain all reference pictures that may be used
453	   in inter prediction of the current picture and that may be used
454	   in inter prediction of one or more of the pictures following the
455	   current picture in decoding order.  RefPicSetStFoll and
456	   RefPicSetLtFoll consist of all reference pictures that are not
457	   used in inter prediction of the current picture but may be used
458	   in inter prediction of one or more of the pictures following the
459	   current picture in decoding order.  RPS provides an "intra-coded"
460	   signaling of the DPB status, instead of an "inter-coded"
461	   signaling, mainly for improved error resilience.  The RPLC
462	   process in HEVC is based on the RPS, by signaling an index to an
463	   RPS subset for each reference index; this process is simpler than
464	   the RPLC process in H.264.

466	   Ultra low delay support

468	   HEVC specifies a sub-picture-level HRD operation, for support of
469	   the so-called ultra-low delay.  The mechanism specifies a
470	   standard-compliant way to enable delay reduction below one
471	   picture interval.  Sub-picture-level coded picture buffer (CPB)
472	   and DPB parameters may be signaled, and utilization of these
473	   information for the derivation of CPB timing (wherein the CPB
474	   removal time corresponds to decoding time) and DPB output timing
475	   (display time) is specified.  Decoders are allowed to operate the
476	   HRD at the conventional access-unit-level, even when the sub-
477	   picture-level HRD parameters are present.

479	   New SEI messages

481	   HEVC inherits many H.264 SEI messages with changes in syntax
482	   and/or semantics making them applicable to HEVC.  Additionally,
483	   there are a few new SEI messages reviewed briefly in the
484	   following paragraphs.

486	   The display orientation SEI message informs the decoder of a
487	   transformation that is recommended to be applied to the cropped
488	   decoded picture prior to display, such that the pictures can be
489	   properly displayed, e.g. in an upside-up manner.

491	   The structure of pictures SEI message provides information on the
492	   NAL unit types, picture order count values, and prediction
493	   dependencies of a sequence of pictures.  The SEI message can be
494	   used for example for concluding what impact a lost picture has on
495	   other pictures.

497	   The decoded picture hash SEI message provides a checksum derived
498	   from the sample values of a decoded picture.  It can be used for
499	   detecting whether a picture was correctly received and decoded.

501	   The active parameter sets SEI message includes the IDs of the
502	   active video parameter set and the active sequence parameter set
503	   and can be used to activate VPSs and SPSs.  In addition, the SEI
504	   message includes the following indications: 1) An indication of
505	   whether "full random accessibility" is supported (when supported,
506	   all parameter sets needed for decoding of the remaining of the
507	   bitstream when random accessing from the beginning of the current
508	   CVS by completely discarding all access units earlier in decoding
509	   order are present in the remaining bitstream and all coded
510	   pictures in the remaining bitstream can be correctly decoded); 2)
511	   An indication of whether there is no parameter set within the
512	   current CVS that updates another parameter set of the same type
513	   preceding in decoding order.  An update of a parameter set refers
514	   to the use of the same parameter set ID but with some other
515	   parameters changed.  If this property is true for all CVSs in the
516	   bitstream, then all parameter sets can be sent out-of-band before
517	   session start.

519	   The decoding unit information SEI message provides coded picture
520	   buffer removal delay information for a decoding unit.  The
521	   message can be used in very-low-delay buffering operations.

523	   The region refresh information SEI message can be used together
524	   with the recovery point SEI message (present in both H.264 and
525	   HEVC) for improved support of gradual decoding refresh.  This
526	   supports random access from inter-coded pictures, wherein
527	   complete pictures can be correctly decoded or recovered after an
528	   indicated number of pictures in output/display order.

530	1.1.3 Parallel Processing Support

532	   The reportedly significantly higher encoding computational demand
533	   of HEVC over H.264, in conjunction with the ever increasing video
534	   resolution (both spatially and temporally) required by the
535	   market, led to the adoption of VCL coding tools specifically
536	   targeted to allow for parallelization on the sub-picture level.
537	   That is, parallelization occurs, at the minimum, at the
538	   granularity of an integer number of CTUs.  The targets for this
539	   type of high-level parallelization are multicore CPUs and DSPs as
540	   well as multiprocessor systems.  In a system design, to be
541	   useful, these tools require signaling support, which is provided
542	   in Section 7 of this memo.  This section provides a brief
543	   overview of the tools available in [HEVC].

545	   Many of the tools incorporated in HEVC were designed keeping in
546	   mind the potential parallel implementations in multi-core/multi-
547	   processor architectures.  Specifically, for parallelization, four
548	   picture partition strategies, as described below, are available.

550	   Slices are segments of the bitstream that can be reconstructed
551	   independently from other slices within the same picture (though
552	   there may still be interdependencies through loop filtering
553	   operations).  Slices are the only tool that can be used for
554	   parallelization that is also available, in virtually identical
555	   form, in H.264.  Slices based parallelization does not require
556	   much inter-processor or inter-core communication (except for
557	   inter-processor or inter-core data sharing for motion
558	   compensation when decoding a predictively coded picture, which is
559	   typically much heavier than inter-processor or inter-core data
560	   sharing due to in-picture prediction), as slices are designed to
561	   be independently decodable.  However, for the same reason, slices
562	   can require some coding overhead.  Further, slices (in contrast
563	   to some of the other tools mentioned below) also serve as the key
564	   mechanism for bitstream partitioning to match Maximum Transfer
565	   Unit (MTU) size requirements, due to the in-picture independence
566	   of slices and the fact that each regular slice is encapsulated in
567	   its own NAL unit.  In many cases, the goal of parallelization and
568	   the goal of MTU size matching can place contradicting demands to
569	   the slice layout in a picture.  The realization of this situation
570	   led to the development of the more advanced tools mentioned
571	   below.

573	   Dependent slice segments allow for fragmentation of a coded slice
574	   into fragments at CTU boundaries without breaking any in-picture
575	   prediction mechanism.  They are complementary to the
576	   fragmentation mechanism described in this memo in that they need
577	   the cooperation of the encoder.  As a dependent slice segment
578	   necessarily contains an integer number of CTUs, a decoder using
579	   multiple cores operating on CTUs can process a dependent slice
580	   segment without communicating parts of the slice segment's
581	   bitstream to other cores.  Fragmentation, as specified in this
582	   memo, in contrast, does not guarantee that a fragment contains an
583	   integer number of CTUs.

585	   In wavefront parallel processing (WPP), the picture is
586	   partitioned into rows of CTUs.  Entropy decoding and prediction
587	   are allowed to use data from CTUs in other partitions.  Parallel
588	   processing is possible through parallel decoding of CTU rows,
589	   where the start of the decoding of a row is delayed by two CTUs,
590	   so to ensure that data related to a CTU above and to the right of
591	   the subject CTU is available before the subject CTU is being
592	   decoded.  Using this staggered start (which appears like a
593	   wavefront when represented graphically), parallelization is
594	   possible with up to as many processors/cores as the picture
595	   contains CTU rows.

597	   Because in-picture prediction between neighboring CTU rows within
598	   a picture is allowed, the required inter-processor/inter-core
599	   communication to enable in-picture prediction can be substantial.
600	   The WPP partitioning does not result in the creation of more NAL
601	   units compared to when it is not applied, thus WPP cannot be used
602	   for MTU size matching, though slices can be used in combination
603	   for that purpose.

605	   Tiles define horizontal and vertical boundaries that partition a
606	   picture into tile columns and rows.  The scan order of CTUs is
607	   changed to be local within a tile (in the order of a CTU raster
608	   scan of a tile), before decoding the top-left CTU of the next
609	   tile in the order of tile raster scan of a picture.  Similar to
610	   slices, tiles break in-picture prediction dependencies (including
611	   entropy decoding dependencies).  However, they do not need to be
612	   included into individual NAL units (same as WPP in this regard),
613	   hence tiles cannot be used for MTU size matching, though slices
614	   can be used in combination for that purpose.  Each tile can be
615	   processed by one processor/core, and the inter-processor/inter-
616	   core communication required for in-picture prediction between
617	   processing units decoding neighboring tiles is limited to
618	   conveying the shared slice header in cases a slice is spanning
619	   more than one tile, and loop filtering related sharing of
620	   reconstructed samples and metadata.  Insofar, tiles are less
621	   demanding in terms of inter-processor communication bandwidth
622	   compared to WPP due to the in-picture independence between two
623	   neighboring partitions.

625	1.1.4 NAL Unit Header

627	   HEVC maintains the NAL unit concept of H.264 with modifications.
628	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
629	   payload of a NAL unit refers to the NAL unit excluding the NAL
630	   unit header.

632	                   +---------------+---------------+
633	                   |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
634	                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
635	                   |F|   Type    |  LayerId  | TID |
636	                   +-------------+-----------------+

638	             Figure 1 The structure of HEVC NAL unit header

640	   The semantics of the fields in the NAL unit header are as
641	   specified in [HEVC] and described briefly below for convenience.
642	   In addition to the name and size of each field, the corresponding
643	   syntax element name in [HEVC] is also provided.

645	   F: 1 bit
646	      forbidden_zero_bit.  Required to be zero in [HEVC].  Note that
647	      the inclusion of this bit in the NAL unit header was to enable
648	      transport of HEVC video over MPEG-2 transport systems
649	      (avoidance of start code emulations) [MPEG2S].  In the context
650	      of this memo, the value 1 may be used to indicate a syntax
651	      violation, e.g. for a NAL unit resulted from aggregating a
652	      number of fragmented units of a NAL unit but missing the last
653	      fragment, as described in Section 4.4.3.

655	   Type: 6 bits
656	      nal_unit_type.  This field specifies the NAL unit type as
657	      defined in Table 7-1 of [HEVC].  If the most significant bit
658	      of this field of a NAL unit is equal to 0 (i.e. the value of
659	      this field is less than 32), the NAL unit is a VCL NAL unit.
660	      Otherwise, the NAL unit is a non-VCL NAL unit.  For a
661	      reference of all currently defined NAL unit types and their
662	      semantics, please refer to Section 7.4.1 in [HEVC].

664	   LayerId: 6 bits
665	      nuh_layer_id.  Required to be equal to zero in [HEVC].  It is
666	      anticipated that in future scalable or 3D video coding
667	      extensions of this specification, this syntax element will be
668	      used to identify additional layers that may be present in the
669	      CVS, wherein a layer may be, e.g. a spatial scalable layer, a
670	      quality scalable layer, a texture view, or a depth view.

672	   TID: 3 bits
673	      nuh_temporal_id_plus1.  This field specifies the temporal
674	      identifier of the NAL unit plus 1.  The value of TemporalId is
675	      equal to TID minus 1.  A TID value of 0 is illegal to ensure
676	      that there is at least one bit in the NAL unit header equal to
677	      1, so to enable independent considerations of start code
678	      emulations in the NAL unit header and in the NAL unit payload
679	      data.

681	1.2 Overview of the Payload Format

683	   This payload format defines the following processes required for
684	   transport of HEVC coded data over RTP [RFC3550]:

686	   o Usage of RTP header with this payload format

688	   o Packetization of HEVC coded NAL units into RTP packets using
689	     three types of payload structures, namely single NAL unit
690	     packet, aggregation packet, and fragment unit

692	   o Transmission of HEVC NAL units of the same bitstream within a
693	     single RTP stream or multiple RTP streams (within one or more
694	     RTP sessions), where within an RTP stream transmission of NAL
695	     units may be either non-interleaved (i.e. the transmission
696	     order of NAL units is the same as their decoding order) or
697	     interleaved (i.e. the transmission order of NAL units is
698	     different from their decoding order)

700	   o Media type parameters to be used with the Session Description
701	     Protocol (SDP) [RFC4566]

703	   o A payload header extension mechanism and data structures for
704	     enhanced support of temporal scalability based on that
705	     extension mechanism.

707	2 Conventions

709	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
710	   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
711	   "OPTIONAL" in this document are to be interpreted as described in
712	   BCP 14, RFC 2119 [RFC2119].

714	   In this document, these key words will appear with that
715	   interpretation only when in ALL CAPS.  Lower case uses of these
716	   words are not to be interpreted as carrying the RFC 2119
717	   significance.

719	   This specification uses the notion of setting and clearing a bit
720	   when bit fields are handled.  Setting a bit is the same as
721	   assigning that bit the value of 1 (On).  Clearing a bit is the
722	   same as assigning that bit the value of 0 (Off).

724	3 Definitions and Abbreviations

726	3.1 Definitions

728	   This document uses the terms and definitions of [HEVC].  Section
729	   3.1.1 lists relevant definitions copied from [HEVC] (the April
730	   2013 version of the H.265 specification) for convenience.
731	   Section 3.1.2 provides definitions specific to this memo.

733	3.1.1 Definitions from the HEVC Specification

735	   access unit: A set of NAL units that are associated with each
736	   other according to a specified classification rule, are
737	   consecutive in decoding order, and contain exactly one coded
738	   picture.

740	   BLA access unit: An access unit in which the coded picture is a
741	   BLA picture.

743	   BLA picture: An IRAP picture for which each VCL NAL unit has
744	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

746	   coded video sequence (CVS): A sequence of access units that
747	   consists, in decoding order, of an IRAP access unit with
748	   NoRaslOutputFlag equal to 1, followed by zero or more access
749	   units that are not IRAP access units with NoRaslOutputFlag equal
750	   to 1, including all subsequent access units up to but not
751	   including any subsequent access unit that is an IRAP access unit
752	   with NoRaslOutputFlag equal to 1.

754	      Informative note: An IRAP access unit may be an IDR access
755	      unit, a BLA access unit, or a CRA access unit.  The value of
756	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each
757	      BLA access unit, and each CRA access unit that is the first
758	      access unit in the bitstream in decoding order, is the first
759	      access unit that follows an end of sequence NAL unit in
760	      decoding order, or has HandleCraAsBlaFlag equal to 1.

762	   CRA access unit: An access unit in which the coded picture is a
763	   CRA picture.

765	   CRA picture: A RAP picture for which each VCL NAL unit has
766	   nal_unit_type equal to CRA_NUT.

768	   IDR access unit: An access unit in which the coded picture is an
769	   IDR picture.

771	   IDR picture: A RAP picture for which each VCL NAL unit has
772	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

774	   IRAP access unit: An access unit in which the coded picture is an
775	   IRAP picture.

777	   IRAP picture: A coded picture for which each VCL NAL unit has
778	   nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23
779	   (23), inclusive.

781	   layer: A set of VCL NAL units that all have a particular value of
782	   nuh_layer_id and the associated non-VCL NAL units, or one of a
783	   set of syntactical structures having a hierarchical relationship.

785	   operation point: bitstream created from another bitstream by
786	   operation of the sub-bitstream extraction process with the
787	   another bitstream, a target highest TemporalId, and a target
788	   layer identifier list as inputs.

790	   random access: The act of starting the decoding process for a
791	   bitstream at a point other than the beginning of the bitstream.

793	   sub-layer: A temporal scalable layer of a temporal scalable
794	   bitstream consisting of VCL NAL units with a particular value of
795	   the TemporalId variable, and the associated non-VCL NAL units.

797	   sub-layer representation: A subset of the bitstream consisting of
798	   NAL units of a particular sub-layer and the lower sub-layers.

800	   tile: A rectangular region of coding tree blocks within a
801	   particular tile column and a particular tile row in a picture.

803	   tile column: A rectangular region of coding tree blocks having a
804	   height equal to the height of the picture and a width specified
805	   by syntax elements in the picture parameter set.

807	   tile row: A rectangular region of coding tree blocks having a
808	   height specified by syntax elements in the picture parameter set
809	   and a width equal to the width of the picture.

811	3.1.2 Definitions Specific to This Memo

813	   dependee RTP stream: An RTP stream on which another RTP stream
814	   depends.  All RTP streams in an MRST or MRMT except for the
815	   highest RTP stream are dependee RTP streams.

817	   highest RTP stream: The RTP stream on which no other RTP stream
818	   depends.  The RTP stream in an SRST is the highest RTP stream.

820	   media aware network element (MANE): A network element, such as a
821	   middlebox, selective forwarding unit, or application layer
822	   gateway that is capable of parsing certain aspects of the RTP
823	   payload headers or the RTP payload and reacting to their
824	   contents.

826	      Informative note: The concept of a MANE goes beyond normal
827	      routers or gateways in that a MANE has to be aware of the
828	      signaling (e.g. to learn about the payload type mappings of
829	      the media streams), and in that it has to be trusted when
830	      working with SRTP.  The advantage of using MANEs is that they
831	      allow packets to be dropped according to the needs of the
832	      media coding.  For example, if a MANE has to drop packets due
833	      to congestion on a certain link, it can identify and remove
834	      those packets whose elimination produces the least adverse
835	      effect on the user experience.  After dropping packets, MANEs
836	      must rewrite RTCP packets to match the changes to the RTP
837	      stream as specified in Section 7 of [RFC3550].

839	   Media Transport: As used in the MRST, MRMT, and SRST definitions
840	   below, Media Transport denotes the transport of packets over a
841	   transport association identified by a 5-tuple (source address,
842	   source port, destination address, destination port, transport
843	   protocol).  See also Section 2.1.13 of [I-D.ietf-avtext-rtp-
844	   grouping-taxonomy].

846	      Informative note: The term "bitstream" in this document is
847	      equivalent to the term "encoded stream" in [I-D.ietf-avtext-
848	      rtp-grouping-taxonomy].

850	   Multiple RTP streams on a Single Transport (MRST):  Multiple RTP
851	   streams carrying a single HEVC bitstream on a Single Transport.
852	   See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].

854	   Multiple RTP streams on Multiple Transports (MRMT):  Multiple RTP
855	   streams carrying a single HEVC bitstream on Multiple Transports.
856	   See also Section 3.5 of [I-D.ietf-avtext-rtp-grouping-taxonomy].

858	   NAL unit decoding order: A NAL unit order that conforms to the
859	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

861	   NAL unit output order: A NAL unit order in which NAL units of
862	   different access units are in the output order of the decoded
863	   pictures corresponding to the access units, as specified in
864	   [HEVC], and in which NAL units within an access unit are in their
865	   decoding order.

867	   NAL-unit-like structure: A data structure that is similar to NAL
868	   units in the sense that it also has a NAL unit header and a
869	   payload, with a difference that the payload does not follow the
870	   start code emulation prevention mechanism required for the NAL
871	   unit syntax as specified in Section 7.3.1.1 of [HEVC].  Examples
872	   NAL-unit-like structures defined in this memo are packet payloads
873	   of AP, PACI, and FU packets.

875	   NALU-time: The value that the RTP timestamp would have if the NAL
876	   unit would be transported in its own RTP packet.

878	   RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy].  Within
879	   the scope of this memo, one RTP stream is utilized to transport
880	   one or more temporal sub-layers.

882	   Single RTP stream on a Single Transport (SRST):  Single RTP
883	   stream carrying a single HEVC bitstream on a Single (Media)
884	   Transport.  See also Section 3.5 of [I-D.ietf-avtext-rtp-
885	   grouping-taxonomy].

887	   transmission order: The order of packets in ascending RTP
888	   sequence number order (in modulo arithmetic).  Within an
889	   aggregation packet, the NAL unit transmission order is the same
890	   as the order of appearance of NAL units in the packet.

892	3.2 Abbreviations

894	   AP       Aggregation Packet

896	   BLA      Broken Link Access

898	   CRA      Clean Random Access

900	   CTB      Coding Tree Block

902	   CTU      Coding Tree Unit
903	   CVS      Coded Video Sequence

905	   DPH      Decoded Picture Hash

907	   FU       Fragmentation Unit

909	   HRD      Hypothetical Reference Decoder

911	   IDR      Instantaneous Decoding Refresh

913	   IRAP     Intra Random Access Point

915	   MANE     Media Aware Network Element

917	   MRMT     Multiple RTP streams on Multiple Transports

919	   MRST     Multiple RTP streams on a Single Transport

921	   MTU      Maximum Transfer Unit

923	   NAL      Network Abstraction Layer

925	   NALU     Network Abstraction Layer Unit

927	   PACI     PAyload Content Information

929	   PHES     Payload Header Extension Structure

931	   PPS      Picture Parameter Set

933	   RADL     Random Access Decodable Leading (Picture)

935	   RASL     Random Access Skipped Leading (Picture)

937	   RPS      Reference Picture Set

939	   SEI      Supplemental Enhancement Information

941	   SPS      Sequence Parameter Set

943	   SRST     Single RTP stream on a Single Transport

945	   STSA     Step-wise Temporal Sub-layer Access
946	   TSA      Temporal Sub-layer Access

948	   TSCI     Temporal Scalability Control Information

950	   VCL      Video Coding Layer

952	   VPS      Video Parameter Set

954	4 RTP Payload Format

956	4.1 RTP Header Usage

958	   The format of the RTP header is specified in [RFC3550] and
959	   reprinted in Figure 2 for convenience.  This payload format uses
960	   the fields of the header in a manner consistent with that
961	   specification.

963	   The RTP payload (and the settings for some RTP header bits) for
964	   aggregation packets and fragmentation units are specified in
965	   Sections 4.4.2 and 4.4.3, respectively.

967	    0                   1                   2                   3
968	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
969	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
970	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
971	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
972	   |                           timestamp                           |
973	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
974	   |           synchronization source (SSRC) identifier            |
975	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
976	   |            contributing source (CSRC) identifiers             |
977	   |                             ....                              |
978	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

980	               Figure 2 RTP header according to [RFC3550]

982	   The RTP header information to be set according to this RTP
983	   payload format is set as follows:

985	   Marker bit (M): 1 bit

987	      Set for the last packet of the access unit, carried in the
988	      current RTP stream.  This is in line with the normal use of
989	      the M bit in video formats to allow an efficient playout
990	      buffer handling.  When MRST or MRMT is in use, if an access
991	      unit appears in multiple RTP streams, the marker bit is set on
992	      each RTP stream's last packet of the access unit.

994	         Informative note: The content of a NAL unit does not tell
995	         whether or not the NAL unit is the last NAL unit, in
996	         decoding order, of an access unit.  An RTP sender
997	         implementation may obtain these information from the video
998	         encoder.  If, however, the implementation cannot obtain
999	         these information directly from the encoder, e.g. when the
1000	         bitstream was pre-encoded, and also there is no timestamp
1001	         allocated for each NAL unit, then the sender implementation
1002	         can inspect subsequent NAL units in decoding order to
1003	         determine whether or not the NAL unit is the last NAL unit
1004	         of an access unit as follows.  A NAL unit is determined to
1005	         be the last NAL unit of an access unit if it is the last
1006	         NAL unit of the bitstream.  A NAL unit naluX is also
1007	         determined to be the last NAL unit of an access unit if
1008	         both the following conditions are true: 1) the next VCL NAL
1009	         unit naluY in decoding order has the high-order bit of the
1010	         first byte after its NAL unit header equal to 1, and 2) all
1011	         NAL units between naluX and naluY, when present, have
1012	         nal_unit_type in the range of 32 to 35, inclusive, equal to
1013	         39, or in the ranges of 41 to 44, inclusive, or 48 to 55,
1014	         inclusive.

1016	   Payload type (PT): 7 bits

1018	      The assignment of an RTP payload type for this new packet
1019	      format is outside the scope of this document and will not be
1020	      specified here.  The assignment of a payload type has to be
1021	      performed either through the profile used or in a dynamic way.

1023	         Informative note: It is not required to use different
1024	         payload type values for different RTP streams in MRST or
1025	         MRMT.

1027	   Sequence number (SN): 16 bits

1029	      Set and used in accordance with RFC 3550 [RFC3550].

1031	   Timestamp: 32 bits

1033	      The RTP timestamp is set to the sampling timestamp of the
1034	      content.  A 90 kHz clock rate MUST be used.

1036	      If the NAL unit has no timing properties of its own (e.g.
1037	      parameter set and SEI NAL units), the RTP timestamp MUST be
1038	      set to the RTP timestamp of the coded picture of the access
1039	      unit in which the NAL unit (according to Section 7.4.2.4.4 of
1040	      [HEVC]) is included.

1042	      Receivers MUST use the RTP timestamp for the display process,
1043	      even when the bitstream contains picture timing SEI messages
1044	      or decoding unit information SEI messages as specified in
1045	      [HEVC].  However, this does not mean that picture timing SEI
1046	      messages in the bitstream should be discarded, as picture
1047	      timing SEI messages may contain frame-field information that
1048	      is important in appropriately rendering interlaced video.

1050	   Synchronization source (SSRC): 32-bits

1052	      Used to identify the source of the RTP packets.  When using
1053	      SRST, by definition a single SSRC is used for all parts of a
1054	      single bitstream.  In MRST or MRMT, different SSRCs are used
1055	      for each RTP stream containing a subset of the sub-layers of
1056	      the single (temporally scalable) bitstream.  A receiver is
1057	      required to correctly associate the set of SSRCs that are
1058	      included parts of the same bitstream.

1060	4.2 Payload Header Usage

1062	   The first two bytes of the payload of an RTP packet are referred
1063	   to as the payload header.  The payload header consists of the
1064	   same fields (F, Type, LayerId, and TID) as the NAL unit header as
1065	   shown in Section 1.1.4, irrespective of the type of the payload
1066	   structure.

1068	   The TID value indicates (among other things) the relative
1069	   importance of an RTP packet, for example because NAL units
1070	   belonging to higher temporal sub-layers are not used for the
1071	   decoding of lower temporal sub-layers.  A lower value of TID
1072	   indicates a higher importance.  More important NAL units MAY be
1073	   better protected against transmission losses than less important
1074	   NAL units.

1076	4.3 Transmission Modes

1078	   This memo enables transmission of an HEVC bitstream over

1080	     . a single RTP stream on a single Media Transport (SRST),
1081	     . multiple RTP streams over a single Media Transport (MRST),
1082	        or
1083	     . multiple RTP streams over multiple Media Transports (MRMT).

1085	     Informative Note: While this specification enables the use of
1086	     MRST within the H.265 RTP payload, the signaling of MRST within
1087	     SDP Offer/Answer is not fully specified at the time of this
1088	     writing. See [RFC5576] and [RFC5583] for what is supported
1089	     today as well as [I-D.ietf-avtcore-rtp-multi-stream] and
1090	     [I-D.ietf-mmusic-sdp-bundle-negotiation] for future directions.

1092	   When in MRMT, the dependency of one RTP stream on another RTP
1093	   stream is typically indicated as specified in [RFC5583].
1094	   [RFC5583] can also be utilized to specify dependencies within
1095	   MRST, but only if the RTP streams utilize distinct payload types.

1097	   SRST or MRST SHOULD be used for point-to-point unicast scenarios,
1098	   while MRMT SHOULD be used for point-to-multipoint multicast
1099	   scenarios where different receivers require different operation
1100	   points of the same HEVC bitstream, to improve bandwidth utilizing
1101	   efficiency.

1103	      Informative note: A multicast may degrade to a unicast after
1104	      all but one receivers have left (this is a justification of
1105	      the first "SHOULD" instead of "MUST"), and there might be
1106	      scenarios where MRMT is desirable but not possible e.g. when
1107	      IP multicast is not deployed in certain network (this is a
1108	      justification of the second "SHOULD" instead of "MUST").

1110	   The transmission mode is indicated by the tx-mode media parameter
1111	   (see Section 7.1).  If tx-mode is equal to "SRST", SRST MUST be
1112	   used.  Otherwise, if tx-mode is equal to "MRST", MRST MUST be
1113	   used.  Otherwise (tx-mode is equal to "MRMT"), MRMT MUST be used.

1115	      Informative note: When an RTP stream does not depend on other
1116	      RTP streams, any of SRST, MRST and MRMT may be in use for the
1117	      RTP stream.

1119	   Receivers MUST support all of SRST, MRST, and MRMT.

1121	      Informative note: The required support of MRMT by receivers
1122	      does not imply that multicast must be supported by receivers.

1124	4.4 Payload Structures

1126	   Four different types of RTP packet payload structures are
1127	   specified.  A receiver can identify the type of an RTP packet
1128	   payload through the Type field in the payload header.

1130	   The four different payload structures are as follows:

1132	   o  Single NAL unit packet: Contains a single NAL unit in the
1133	      payload, and the NAL unit header of the NAL unit also serves
1134	      as the payload header.  This payload structure is specified in
1135	      Section 4.4.1.

1137	   o  Aggregation packet (AP): Contains more than one NAL unit
1138	      within one access unit.  This payload structure is specified
1139	      in Section 4.4.2.

1141	   o  Fragmentation unit (FU): Contains a subset of a single NAL
1142	      unit.  This payload structure is specified in Section 4.4.3.

1144	   o  PACI carrying RTP packet: Contains a payload header (that
1145	      differs from other payload headers for efficiency), a Payload
1146	      Header Extension Structure (PHES), and a PACI payload.  This
1147	      payload structure is specified in Section 4.4.4.

1149	4.4.1 Single NAL Unit Packets

1151	   A single NAL unit packet contains exactly one NAL unit, and
1152	   consists of a payload header (denoted as PayloadHdr), a
1153	   conditional 16-bit DONL field (in network byte order), and the
1154	   NAL unit payload data (the NAL unit excluding its NAL unit
1155	   header) of the contained NAL unit, as shown in Figure 3.

1157	   0                   1                   2                   3
1158	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1159	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1160	   |           PayloadHdr          |      DONL (conditional)       |
1161	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1162	   |                                                               |
1163	   |                  NAL unit payload data                        |
1164	   |                                                               |
1165	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1166	   |                               :...OPTIONAL RTP padding        |
1167	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1169	            Figure 3 The structure a single NAL unit packet

1171	   The payload header SHOULD be an exact copy of the NAL unit header
1172	   of the contained NAL unit.  However, the Type (i.e.
1173	   nal_unit_type) field MAY be changed, e.g. when it is desirable to
1174	   handle a CRA picture to be a BLA picture [JCTVC-J0107].

1176	   The DONL field, when present, specifies the value of the 16 least
1177	   significant bits of the decoding order number of the contained
1178	   NAL unit.  If sprop-max-don-diff is greater than 0 for any of the
1179	   RTP streams, the DONL field MUST be present, and the variable DON
1180	   for the contained NAL unit is derived as equal to the value of
1181	   the DONL field.  Otherwise (sprop-max-don-diff is equal to 0 for
1182	   all the RTP streams), the DONL field MUST NOT be present.

1184	4.4.2 Aggregation Packets (APs)

1186	   Aggregation packets (APs) are introduced to enable the reduction
1187	   of packetization overhead for small NAL units, such as most of
1188	   the non-VCL NAL units, which are often only a few octets in size.

1190	   An AP aggregates NAL units within one access unit.  Each NAL unit
1191	   to be carried in an AP is encapsulated in an aggregation unit.
1192	   NAL units aggregated in one AP are in NAL unit decoding order.

1194	   An AP consists of a payload header (denoted as PayloadHdr)
1195	   followed by two or more aggregation units, as shown in Figure 4.

1197	   0                   1                   2                   3
1198	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1199	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1200	   |    PayloadHdr (Type=48)       |                               |
1201	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1202	   |                                                               |
1203	   |             two or more aggregation units                     |
1204	   |                                                               |
1205	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1206	   |                               :...OPTIONAL RTP padding        |
1207	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1209	            Figure 4 The structure of an aggregation packet

1211	   The fields in the payload header are set as follows.  The F bit
1212	   MUST be equal to 0 if the F bit of each aggregated NAL unit is
1213	   equal to zero; otherwise, it MUST be equal to 1.  The Type field
1214	   MUST be equal to 48.  The value of LayerId MUST be equal to the
1215	   lowest value of LayerId of all the aggregated NAL units.  The
1216	   value of TID MUST be the lowest value of TID of all the
1217	   aggregated NAL units.

1219	      Informative Note: All VCL NAL units in an AP have the same TID
1220	      value since they belong to the same access unit.  However, an
1221	      AP may contain non-VCL NAL units for which the TID value in
1222	      the NAL unit header may be different than the TID value of the
1223	      VCL NAL units in the same AP.

1225	   An AP MUST carry at least two aggregation units and can carry as
1226	   many aggregation units as necessary; however, the total amount of
1227	   data in an AP obviously MUST fit into an IP packet, and the size
1228	   SHOULD be chosen so that the resulting IP packet is smaller than
1229	   the MTU size so to avoid IP layer fragmentation.  An AP MUST NOT
1230	   contain Fragmentation Units (FUs) specified in Section 4.4.3.
1231	   APs MUST NOT be nested; i.e. an AP must not contain another AP.

1233	   The first aggregation unit in an AP consists of a conditional 16-
1234	   bit DONL field (in network byte order) followed by a 16-bit
1235	   unsigned size information (in network byte order) that indicates
1236	   the size of the NAL unit in bytes (excluding these two octets,
1237	   but including the NAL unit header), followed by the NAL unit
1238	   itself, including its NAL unit header, as shown in Figure 5.

1240	   0                   1                   2                   3
1241	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1242	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1243	                   :       DONL (conditional)      |   NALU size   |
1244	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1245	   |   NALU size   |                                               |
1246	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1247	   |                                                               |
1248	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1249	   |                               :
1250	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1252	     Figure 5 The structure of the first aggregation unit in an AP

1254	   The DONL field, when present, specifies the value of the 16 least
1255	   significant bits of the decoding order number of the aggregated
1256	   NAL unit.

1258	   If sprop-max-don-diff is greater than 0 for any of the RTP
1259	   streams, the DONL field MUST be present in an aggregation unit
1260	   that is the first aggregation unit in an AP, and the variable DON
1261	   for the aggregated NAL unit is derived as equal to the value of
1262	   the DONL field.  Otherwise (sprop-max-don-diff is equal to 0 for
1263	   all the RTP streams), the DONL field MUST NOT be present in an
1264	   aggregation unit that is the first aggregation unit in an AP.

1266	   An aggregation unit that is not the first aggregation unit in an
1267	   AP consists of a conditional 8-bit DOND field followed by a 16-
1268	   bit unsigned size information (in network byte order) that
1269	   indicates the size of the NAL unit in bytes (excluding these two
1270	   octets, but including the NAL unit header), followed by the NAL
1271	   unit itself, including its NAL unit header, as shown in Figure 6.

1273	   0                   1                   2                   3
1274	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1275	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1276	                   : DOND (cond)   |          NALU size            |
1277	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1278	   |                                                               |
1279	   |                       NAL unit                                |
1280	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1281	   |                               :
1282	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1284	     Figure 6 The structure of an aggregation unit that is not the
1285	                    first aggregation unit in an AP

1287	   When present, the DOND field plus 1 specifies the difference
1288	   between the decoding order number values of the current
1289	   aggregated NAL unit and the preceding aggregated NAL unit in the
1290	   same AP.

1292	   If sprop-max-don-diff is greater than 0 for any of the RTP
1293	   streams, the DOND field MUST be present in an aggregation unit
1294	   that is not the first aggregation unit in an AP, and the variable
1295	   DON for the aggregated NAL unit is derived as equal to the DON of
1296	   the preceding aggregated NAL unit in the same AP plus the value
1297	   of the DOND field plus 1 modulo 65536.  Otherwise (sprop-max-don-
1298	   diff is equal to 0 for all the RTP streams), the DOND field MUST
1299	   NOT be present in an aggregation unit that is not the first
1300	   aggregation unit in an AP, and in this case the transmission
1301	   order and decoding order of NAL units carried in the AP are the
1302	   same as the order the NAL units appear in the AP.

1304	   Figure 7 presents an example of an AP that contains two
1305	   aggregation units, labeled as 1 and 2 in the figure, without the
1306	   DONL and DOND fields being present.

1308	    0                   1                   2                   3
1309	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1310	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1311	   |                          RTP Header                           |
1312	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1313	   |   PayloadHdr (Type=48)        |         NALU 1 Size           |
1314	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1315	   |          NALU 1 HDR           |                               |
1316	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1317	   |                   . . .                                       |
1318	   |                                                               |
1319	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1320	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1321	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1322	   | NALU 2 HDR    |                                               |
1323	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1324	   |                   . . .                                       |
1325	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1326	   |                               :...OPTIONAL RTP padding        |
1327	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1329	     Figure 7 An example of an AP packet containing two aggregation
1330	                 units without the DONL and DOND fields

1332	   Figure 8 presents an example of an AP that contains two
1333	   aggregation units, labeled as 1 and 2 in the figure, with the
1334	   DONL and DOND fields being present.

1336	    0                   1                   2                   3
1337	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1338	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1339	   |                          RTP Header                           |
1340	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1341	   |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
1342	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1343	   |          NALU 1 Size          |            NALU 1 HDR         |
1344	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1345	   |                                                               |
1346	   |                 NALU 1 Data   . . .                           |
1347	   |                                                               |
1348	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1349	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1350	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1351	   |          NALU 2 HDR           |                               |
1352	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1353	   |                                                               |
1354	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1355	   |                               :...OPTIONAL RTP padding        |
1356	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1358	     Figure 8 An example of an AP containing two aggregation units
1359	                     with the DONL and DOND fields

1361	4.4.3 Fragmentation Units (FUs)

1363	   Fragmentation units (FUs) are introduced to enable fragmenting a
1364	   single NAL unit into multiple RTP packets, possibly without
1365	   cooperation or knowledge of the HEVC encoder.  A fragment of a
1366	   NAL unit consists of an integer number of consecutive octets of
1367	   that NAL unit.  Fragments of the same NAL unit MUST be sent in
1368	   consecutive order with ascending RTP sequence numbers (with no
1369	   other RTP packets within the same RTP stream being sent between
1370	   the first and last fragment).

1372	   When a NAL unit is fragmented and conveyed within FUs, it is
1373	   referred to as a fragmented NAL unit.  APs MUST NOT be
1374	   fragmented.  FUs MUST NOT be nested; i.e. an FU must not contain
1375	   a subset of another FU.

1377	   The RTP timestamp of an RTP packet carrying an FU is set to the
1378	   NALU-time of the fragmented NAL unit.

1380	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1381	   header of one octet, a conditional 16-bit DONL field (in network
1382	   byte order), and an FU payload, as shown in Figure 9.

1384	    0                   1                   2                   3
1385	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1386	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1387	   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
1388	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1389	   | DONL (cond)   |                                               |
1390	   |-+-+-+-+-+-+-+-+                                               |
1391	   |                         FU payload                            |
1392	   |                                                               |
1393	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1394	   |                               :...OPTIONAL RTP padding        |
1395	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1397	                    Figure 9 The structure of an FU

1399	   The fields in the payload header are set as follows.  The Type
1400	   field MUST be equal to 49.  The fields F, LayerId, and TID MUST
1401	   be equal to the fields F, LayerId, and TID, respectively, of the
1402	   fragmented NAL unit.

1404	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1405	   field, as shown in Figure 10.

1407	                            +---------------+
1408	                            |0|1|2|3|4|5|6|7|
1409	                            +-+-+-+-+-+-+-+-+
1410	                            |S|E|  FuType   |
1411	                            +---------------+

1413	                 Figure 10   The structure of FU header

1415	   The semantics of the FU header fields are as follows:
1416	   S: 1 bit
1417	      When set to one, the S bit indicates the start of a fragmented
1418	      NAL unit i.e. the first byte of the FU payload is also the
1419	      first byte of the payload of the fragmented NAL unit.  When
1420	      the FU payload is not the start of the fragmented NAL unit
1421	      payload, the S bit MUST be set to zero.

1423	   E: 1 bit
1424	      When set to one, the E bit indicates the end of a fragmented
1425	      NAL unit, i.e. the last byte of the payload is also the last
1426	      byte of the fragmented NAL unit.  When the FU payload is not
1427	      the last fragment of a fragmented NAL unit, the E bit MUST be
1428	      set to zero.

1430	   FuType: 6 bits
1431	      The field FuType MUST be equal to the field Type of the
1432	      fragmented NAL unit.

1434	   The DONL field, when present, specifies the value of the 16 least
1435	   significant bits of the decoding order number of the fragmented
1436	   NAL unit.

1438	   If sprop-max-don-diff is greater than 0 for any of the RTP
1439	   streams, and the S bit is equal to 1, the DONL field MUST be
1440	   present in the FU, and the variable DON for the fragmented NAL
1441	   unit is derived as equal to the value of the DONL field.
1442	   Otherwise (sprop-max-don-diff is equal to 0 for all the RTP
1443	   streams, or the S bit is equal to 0), the DONL field MUST NOT be
1444	   present in the FU.

1446	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
1447	   the Start bit and End bit must not both be set to one in the same
1448	   FU header.

1450	   The FU payload consists of fragments of the payload of the
1451	   fragmented NAL unit so that if the FU payloads of consecutive
1452	   FUs, starting with an FU with the S bit equal to 1 and ending
1453	   with an FU with the E bit equal to 1, are sequentially
1454	   concatenated, the payload of the fragmented NAL unit can be
1455	   reconstructed.  The NAL unit header of the fragmented NAL unit is
1456	   not included as such in the FU payload, but rather the
1457	   information of the NAL unit header of the fragmented NAL unit is
1458	   conveyed in F, LayerId, and TID fields of the FU payload headers
1459	   of the FUs and the FuType field of the FU header of the FUs.  An
1460	   FU payload MUST NOT be empty.

1462	   If an FU is lost, the receiver SHOULD discard all following
1463	   fragmentation units in transmission order corresponding to the
1464	   same fragmented NAL unit, unless the decoder in the receiver is
1465	   known to be prepared to gracefully handle incomplete NAL units.

1467	   A receiver in an endpoint or in a MANE MAY aggregate the first n-
1468	   1 fragments of a NAL unit to an (incomplete) NAL unit, even if
1469	   fragment n of that NAL unit is not received.  In this case, the
1470	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate
1471	   a syntax violation.

1473	4.4.4 PACI packets

1475	   This section specifies the PACI packet structure.  The basic
1476	   payload header specified in this memo is intentionally limited to
1477	   the 16 bits of the NAL unit header so to keep the packetization
1478	   overhead to a minimum.  However, cases have been identified where
1479	   it is advisable to include control information in an easily
1480	   accessible position in the packet header, despite the additional
1481	   overhead.  One such control information is the Temporal
1482	   Scalability Control Information as specified in Section 4.5
1483	   below.  PACI packets carry this and future, similar structures.

1485	   The PACI packet structure is based on a payload header extension
1486	   mechanism that is generic and extensible to carry payload header
1487	   extensions.  In this section, the focus lies on the use within
1488	   this specification.  Section 4.4.4.2 below provides guidance for
1489	   the specification designers in how to employ the extension
1490	   mechanism in future specifications.

1492	   A PACI packet consists of a payload header (denoted as
1493	   PayloadHdr), for which the structure follows what is described in
1494	   Section 4.2 above.  The payload header is followed by the fields
1495	   A, cType, PHSsize, F[0..2] and Y.

1497	   Figure 11 shows a PACI packet in compliance with this memo; that
1498	   is, without any extensions.

1500	    0                   1                   2                   3
1501	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1502	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1503	   |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1504	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1505	   |        Payload Header Extension Structure (PHES)              |
1506	   |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
1507	   |                                                               |
1508	   |                  PACI payload: NAL unit                       |
1509	   |                   . . .                                       |
1510	   |                                                               |
1511	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1512	   |                               :...OPTIONAL RTP padding        |
1513	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1515	                  Figure 11   The structure of a PACI

1517	   The fields in the payload header are set as follows.  The F bit
1518	   MUST be equal to 0.  The Type field MUST be equal to 50.  The
1519	   value of LayerId MUST be a copy of the LayerId field of the PACI
1520	   payload NAL unit or NAL-unit-like structure.  The value of TID
1521	   MUST be a copy of the TID field of the PACI payload NAL unit or
1522	   NAL-unit-like structure.

1524	   The semantics of other fields are as follows:

1526	   A: 1 bit
1527	      Copy of the F bit of the PACI payload NAL unit or NAL-unit-
1528	      like structure.

1530	   cType: 6 bits
1531	      Copy of the Type field of the PACI payload NAL unit or NAL-
1532	      unit-like structure.

1534	   PHSsize: 5 bits
1535	      Indicates the length of the PHES field.  The value is limited
1536	      to be less than or equal to 32 octets, to simplify encoder
1537	      design for MTU size matching.

1539	   F0
1540	      This field equal to 1 specifies the presence of a temporal
1541	      scalability support extension in the PHES.

1543	   F1, F2
1544	      MUST be 0, available for future extensions, see Section
1545	      4.4.4.2.  Receivers compliant with this version of the HEVC
1546	      payload format MUST ignore F1=1 and/or F2=1, and also ignore
1547	      any information in the PHES indicated as present by F1=1
1548	      and/or F2=1.

1550	         Informative note: The receiver can do that by first
1551	         decoding information associated with F0=1, and then
1552	         skipping over any remaining bytes of the PHES based on the
1553	         value of PHSsize.

1555	   Y: 1 bit
1556	      MUST be 0, available for future extensions, see Section
1557	      4.4.4.2.  Receivers compliant with this version of the HEVC
1558	      payload format MUST ignore Y=1, and also ignore any
1559	      information in the PHES indicated as present by Y.

1561	   PHES: variable number of octets
1562	      A variable number of octets as indicated by the value of
1563	      PHSsize.

1565	   PACI Payload
1566	      The single NAL unit packet or NAL-unit-like structure (such
1567	      as: FU or AP) to be carried, not including the first two
1568	      octets.

1570	         Informative note: The first two octets of the NAL unit or
1571	         NAL-unit-like structure carried in the PACI payload are not
1572	         included in the PACI payload. Rather, the respective values
1573	         are copied in locations of the PayloadHdr of the RTP
1574	         packet.  This design offers two advantages: first, the
1575	         overall structure of the payload header is preserved, i.e.
1576	         there is no special case of payload header structure that
1577	         needs to be implemented for PACI.  Second, no additional
1578	         overhead is introduced.

1580	      A PACI payload MAY be a single NAL unit, an FU, or an AP.
1581	      PACIs MUST NOT be fragmented or aggregated.  The following
1582	      subsection documents the reasons for these design choices.

1584	4.4.4.1 Reasons for the PACI rules (informative)

1586	   A PACI cannot be fragmented.  If a PACI could be fragmented, and
1587	   a fragment other than the first fragment would get lost, access
1588	   to the information in the PACI would not be possible.  Therefore,
1589	   a PACI must not be fragmented.  In other words, an FU must not
1590	   carry (fragments of) a PACI.

1592	   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
1593	   from a compression viewpoint, as, in many cases, several to be
1594	   aggregated NAL units would share identical PACI fields and values
1595	   which would be carried redundantly for no reason.   Most, if not
1596	   all the practical effects of PACI aggregation can be achieved by
1597	   aggregating NAL units and bundling them with a PACI (see below).
1598	   Therefore, a PACI must not be aggregated.  In other words, an AP
1599	   must not contain a PACI.

1601	   The payload of a PACI can be a fragment.  Both middleboxes and
1602	   sending systems with inflexible (often hardware-based) encoders
1603	   occasionally find themselves in situations where a PACI and its
1604	   headers, combined, are larger than the MTU size.  In such a
1605	   scenario, the middlebox or sender can fragment the NAL unit and
1606	   encapsulate the fragment in a PACI.  Doing so preserves the
1607	   payload header extension information for all fragments, allowing
1608	   downstream middleboxes and the receiver to take advantage of that
1609	   information.  Therefore, a sender may place a fragment into a
1610	   PACI, and a receiver must be able to handle such a PACI.

1612	   The payload of a PACI can be an aggregation NAL unit.  HEVC
1613	   bitstreams can contain unevenly sized and/or small (when compared
1614	   to the MTU size) NAL units.  In order to efficiently packetize
1615	   such small NAL units, AP were introduced.  The benefits of APs
1616	   are independent from the need for a payload header extension.
1617	   Therefore, a sender may place an AP into a PACI, and a receiver
1618	   must be able to handle such a PACI.

1620	4.4.4.2 PACI extensions (Informative)

1622	   This section includes recommendations for future specification
1623	   designers on how to extent the PACI syntax to accommodate future
1624	   extensions.  Obviously, designers are free to specify whatever
1625	   appears to be appropriate to them at the time of their design.
1626	   However, a lot of thought has been invested into the extension
1627	   mechanism described below, and we suggest that deviations from it
1628	   warrant a good explanation.

1630	   This  memo  defines  only  a  single  payload  header  extension
1631	   (Temporal Scalability Control Information, described below in
1632	   Section 4.5), and, therefore, only the F0 bit carries semantics.
1633	   F1 and F2 are already named (and not just marked as reserved, as
1634	   a typical video spec designer would do).  They are intended to
1635	   signal  two  additional  extensions.    The  Y  bit  allows  to,
1636	   recursively, add further F and Y bits to extend the mechanism
1637	   beyond 3 possible payload header extensions.  It is suggested to
1638	   define a new packet type (using a different value for Type) when
1639	   assigning the F1, F2, or Y bits different semantics than what is
1640	   suggested below.

1642	   When a Y bit is set, an 8 bit flag-extension is inserted after
1643	   the Y bit.  A flag-extension consists of 7 flags F[n..n+6], and
1644	   another Y bit.

1646	   The basic PACI header already includes F0, F1, and F2.
1647	   Therefore, the Fx bits in the first flag-extensions are numbered
1648	   F3, F4, ..., F9, the F bits in the second flag-extension are
1649	   numbered F10, F11, ..., F16, and so forth.  As a result, at least
1650	   3 Fx bits are always in the PACI, but the number of Fx bits (and
1651	   associated types of extensions), can be increased by setting the
1652	   next Y bit and adding an octet of flag-extensions, carrying 7
1653	   flags and another Y bit.  The size of this list of flags is
1654	   subject to the limits specified in Section 4.4.4 (32 octets for
1655	   all flag-extensions and the PHES information combined).

1657	   Each of the F bits can indicate either the presence of
1658	   information in the Payload Header Extension Structure (PHES),
1659	   described below, or a given F bit can indicate a certain
1660	   condition, without including additional information in the PHES.

1662	   When a spec developer devises a new syntax that takes advantage
1663	   of the PACI extension mechanism, he/she must follow the
1664	   constraints listed below; otherwise the extension mechanism may
1665	   break.

1667	     1) The fields added for a particular Fx bit MUST be fixed in
1668	        length and not depend on what other Fx bits are set (no
1669	        parsing dependency).
1670	     2) The Fx bits must be assigned in order.
1671	     3) An implementation that supports the n-th Fn bit for any
1672	        value of n must understand the syntax (though not
1673	        necessarily the semantics) of the fields Fk (with k < n), so
1674	        to be able to either use those bits when present, or at
1675	        least be able to skip over them.

1677	4.5 Temporal Scalability Control Information

1679	   This section describes the single payload header extension
1680	   defined in this specification, known as Temporal Scalability
1681	   Control Information (TSCI).  If, in the future, additional
1682	   payload header extensions become necessary, they could be
1683	   specified in this section of an updated version of this document,
1684	   or in their own documents.

1686	   When F0 is set to 1 in a PACI, this specifies that the PHES field
1687	   includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as
1688	   follows:

1690	    0                   1                   2                   3
1691	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1692	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1693	   |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1694	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1695	   |   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
1696	   |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1697	   |                           ....                                |
1698	   |               PACI payload: NAL unit                          |
1699	   |                                                               |
1700	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1701	   |                               :...OPTIONAL RTP padding        |
1702	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1704	   Figure 12   The structure of a PACI with a PHES containing a TSCI

1706	   TL0PICIDX (8 bits)
1707	      When present, the TL0PICIDX field MUST be set to equal to
1708	      temporal_sub_layer_zero_idx as specified in Section D.3.22 of
1709	      [H.265] for the access unit containing the NAL unit in the
1710	      PACI.

1712	   IrapPicID (8 bits)
1713	      When present, the IrapPicID field MUST be set to equal to
1714	      irap_pic_id as specified in Section D.3.22 of [H.265] for the
1715	      access unit containing the NAL unit in the PACI.

1717	   S (1 bit)
1718	      The S bit MUST be set to 1 if any of the following conditions
1719	      is true and MUST be set to 0 otherwise:
1720	      o The NAL unit in the payload of the PACI is the first VCL NAL
1721	        unit, in decoding order, of a picture.
1722	      o The NAL unit in the payload of the PACI is an AP and the NAL
1723	        unit in the first contained aggregation unit is the first
1724	        VCL NAL unit, in decoding order, of a picture.
1725	      o The NAL unit in the payload of the PACI is an FU with its S
1726	        bit equal to 1 and the FU payload containing a fragment of
1727	        the first VCL NAL unit, in decoding order of a picture.

1729	   E (1 bit)
1730	      The E bit MUST be set to 1 if any of the following conditions
1731	      is true and MUST be set to 0 otherwise:
1732	      o The NAL unit in the payload of the PACI is the last VCL NAL
1733	        unit, in decoding order, of a picture.
1734	      o The NAL unit in the payload of the PACI is an AP and the NAL
1735	        unit in the last contained aggregation unit is the last VCL
1736	        NAL unit, in decoding order, of a picture.
1737	      o The NAL unit in the payload of the PACI is an FU with its E
1738	        bit equal to 1 and the FU payload containing a fragment of
1739	        the last VCL NAL unit, in decoding order of a picture.

1741	   RES (6 bits)
1742	      MUST be equal to 0.  Reserved for future extensions.

1744	   The value of PHSsize MUST be set to 3.  Receivers MUST allow
1745	   other values of the fields F0, F1, F2, Y, and PHSsize, and MUST
1746	   ignore any additional fields, when present, than specified above
1747	   in the PHES.

1749	4.6 Decoding Order Number

1751	   For each NAL unit, the variable AbsDon is derived, representing
1752	   the decoding order number that is indicative of the NAL unit
1753	   decoding order.

1755	   Let NAL unit n be the n-th NAL unit in transmission order within
1756	   an RTP stream.

1758	   If sprop-max-don-diff is equal to 0 for all the RTP streams
1759	   carrying the HEVC bitstream, AbsDon[n], the value of AbsDon for
1760	   NAL unit n, is derived as equal to n.

1762	   Otherwise (sprop-max-don-diff is greater than 0 for any of the
1763	   RTP streams), AbsDon[n] is derived as follows, where DON[n] is
1764	   the value of the variable DON for NAL unit n:

1766	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
1767	      in transmission order), AbsDon[0] is set equal to DON[0].

1769	   o  Otherwise (n is greater than 0), the following applies for
1770	      derivation of AbsDon[n]:

1772	            If DON[n] == DON[n-1],
1773	                AbsDon[n] = AbsDon[n-1]

1775	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1776	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1778	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1779	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1781	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1782	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
1783	            DON[n])

1785	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1786	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1788	   For any two NAL units m and n, the following applies:

1790	   o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
1791	      follows NAL unit m in NAL unit decoding order.

1793	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding
1794	      order of the two NAL units can be in either order.

1796	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n
1797	      precedes NAL unit m in decoding order.

1799	      Informative note: When two consecutive NAL units in the NAL
1800	      unit decoding order have different values of AbsDon, the
1801	      absolute difference between the two AbsDon values may be
1802	      greater than or equal to 1.

1804	      Informative note: There are multiple reasons to allow for the
1805	      absolute difference of the values of AbsDon for two
1806	      consecutive NAL units in the NAL unit decoding order to be
1807	      greater than one.  An increment by one is not required, as at
1808	      the time of associating values of AbsDon to NAL units, it may
1809	      not be known whether all NAL units are to be delivered to the
1810	      receiver.  For example, a gateway may not forward VCL NAL
1811	      units of higher sub-layers or some SEI NAL units when there is
1812	      congestion in the network.  In another example, the first
1813	      intra-coded picture of a pre-encoded clip is transmitted in
1814	      advance to ensure that it is readily available in the
1815	      receiver, and when transmitting the first intra-coded picture,
1816	      the originator does not exactly know how many NAL units will
1817	      be encoded before the first intra-coded picture of the pre-
1818	      encoded clip follows in decoding order.  Thus, the values of
1819	      AbsDon for the NAL units of the first intra-coded picture of
1820	      the pre-encoded clip have to be estimated when they are
1821	      transmitted, and gaps in values of AbsDon may occur.  Another
1822	      example is MRST or MRMT with sprop-max-don-diff greater than
1823	      0, where the AbsDon values must indicate cross-layer decoding
1824	      order for NAL units conveyed in all the RTP streams.

1826	5 Packetization Rules

1828	   The following packetization rules apply:

1830	   o  If sprop-max-don-diff is greater than 0 for any of the RTP
1831	      streams, the transmission order of NAL units carried in the
1832	      RTP stream MAY be different than the NAL unit decoding order
1833	      and the NAL unit output order.  Otherwise (sprop-max-don-diff
1834	      is equal to 0 for all the RTP streams), the transmission order
1835	      of NAL units carried in the RTP stream MUST be the same as the
1836	      NAL unit decoding order, and, when tx-mode is equal to "MRST"
1837	      or "MRMT", MUST also be the same as the NAL unit output order.

1839	   o  A NAL unit of a small size SHOULD be encapsulated in an
1840	      aggregation packet together with one or more other NAL units
1841	      in order to avoid the unnecessary packetization overhead for
1842	      small NAL units.  For example, non-VCL NAL units such as
1843	      access unit delimiters, parameter sets, or SEI NAL units are
1844	      typically small and can often be aggregated with VCL NAL units
1845	      without violating MTU size constraints.

1847	   o  Each non-VCL NAL unit SHOULD, when possible from an MTU size
1848	      match viewpoint, be encapsulated in an aggregation packet
1849	      together with its associated VCL NAL unit, as typically a non-
1850	      VCL NAL unit would be meaningless without the associated VCL
1851	      NAL unit being available.

1853	   o  For carrying exactly one NAL unit in an RTP packet, a single
1854	      NAL unit packet MUST be used.

1856	6 De-packetization Process

1858	   The general concept behind de-packetization is to get the NAL
1859	   units out of the RTP packets in an RTP stream and all RTP streams
1860	   the RTP stream depends on, if any, and pass them to the decoder
1861	   in the NAL unit decoding order.

1863	   The de-packetization process is implementation dependent.
1864	   Therefore, the following description should be seen as an example
1865	   of a suitable implementation.  Other schemes may be used as well
1866	   as long as the output for the same input is the same as the
1867	   process described below.  The output is the same when the set of
1868	   output NAL units and their order are both identical.
1869	   Optimizations relative to the described algorithms are possible.

1871	   All normal RTP mechanisms related to buffer management apply.  In
1872	   particular, duplicated or outdated RTP packets (as indicated by
1873	   the RTP sequences number and the RTP timestamp) are removed.  To
1874	   determine the exact time for decoding, factors such as a possible
1875	   intentional delay to allow for proper inter-stream
1876	   synchronization must be factored in.

1878	   NAL units with NAL unit type values in the range of 0 to 47,
1879	   inclusive may be passed to the decoder.  NAL-unit-like structures
1880	   with NAL unit type values in the range of 48 to 63, inclusive,
1881	   MUST NOT be passed to the decoder.

1883	   The receiver includes a receiver buffer, which is used to
1884	   compensate for transmission delay jitter within individual RTP
1885	   streams and across RTP streams, to reorder NAL units from
1886	   transmission order to the NAL unit decoding order, and to recover
1887	   the NAL unit decoding order in MRST or MRMT, when applicable.  In
1888	   this section, the receiver operation is described under the
1889	   assumption that there is no transmission delay jitter within an
1890	   RTP stream and across RTP streams.  To make a difference from a
1891	   practical receiver buffer that is also used for compensation of
1892	   transmission delay jitter, the receiver buffer is here after
1893	   called the de-packetization buffer in this section.  Receivers
1894	   should also prepare for transmission delay jitter; i.e. either
1895	   reserve separate buffers for transmission delay jitter buffering
1896	   and de-packetization buffering or use a receiver buffer for both
1897	   transmission delay jitter and de-packetization.  Moreover,
1898	   receivers should take transmission delay jitter into account in
1899	   the buffering operation; e.g. by additional initial buffering
1900	   before starting of decoding and playback.

1902	   When sprop-max-don-diff is equal to 0 for all the received RTP
1903	   streams, the de-packetization buffer size is zero bytes and the
1904	   process described in the remainder of this paragraph applies.
1905	   When there is only one RTP stream received, the NAL units carried
1906	   in the single RTP stream are directly passed to the decoder in
1907	   their transmission order, which is identical to their decoding
1908	   order.  When there is more than one RTP stream received, the NAL
1909	   units carried in the multiple RTP streams are passed to the
1910	   decoder in their NTP timestamp order.  When there are several NAL
1911	   units of different RTP streams with the same NTP timestamp, the
1912	   order to pass them to the decoder is their dependency order,
1913	   where NAL units of a dependee RTP stream are passed to the
1914	   decoder prior to the NAL units of the dependent RTP stream.  When
1915	   there are several NAL units of the same RTP stream with the same
1916	   NTP timestamp, the order to pass them to the decoder is their
1917	   transmission order.

1919	         Informative note: The mapping between RTP and NTP
1920	         timestamps is conveyed in RTCP SR packets.  In addition,
1921	         the mechanisms for faster media timestamp synchronization
1922	         discussed in [RFC6051] may be used to speed up the
1923	         acquisition of the RTP-to-wall-clock mapping.

1925	   When sprop-max-don-diff is greater than 0 for any the received
1926	   RTP streams, the process described in the remainder of this
1927	   section applies.

1929	   There are two buffering states in the receiver: initial buffering
1930	   and buffering while playing.  Initial buffering starts when the
1931	   reception is initialized.  After initial buffering, decoding and
1932	   playback are started, and the buffering-while-playing mode is
1933	   used.

1935	   Regardless of the buffering state, the receiver stores incoming
1936	   NAL units, in reception order, into the de-packetization buffer.
1937	   NAL units carried in RTP packets are stored in the de-
1938	   packetization buffer individually, and the value of AbsDon is
1939	   calculated and stored for each NAL unit.  When MRST or MRMT is in
1940	   use, NAL units of all RTP streams of a bitstream are stored in
1941	   the same de-packetization buffer.  When NAL units carried in any
1942	   two RTP streams are available to be placed into the de-
1943	   packetization buffer, those NAL units carried in the RTP stream
1944	   that is lower in the dependency tree are placed into the buffer
1945	   first.  For example, if RTP stream A depends on RTP stream B,
1946	   then NAL units carried in RTP stream B are placed into the buffer
1947	   first.

1949	   Initial buffering lasts until condition A (the difference between
1950	   the greatest and smallest AbsDon values of the NAL units in the
1951	   de-packetization buffer is greater than or equal to the value of
1952	   sprop-max-don-diff of the highest RTP stream) or condition B (the
1953	   number of NAL units in the de-packetization buffer is greater
1954	   than the value of sprop-depack-buf-nalus) is true.

1956	   After initial buffering, whenever condition A or condition B is
1957	   true, the following operation is repeatedly applied until both
1958	   condition A and condition B become false:

1960	   o  The NAL unit in the de-packetization buffer with the smallest
1961	      value of AbsDon is removed from the de-packetization buffer
1962	      and passed to the decoder.

1964	   When no more NAL units are flowing into the de-packetization
1965	   buffer, all NAL units remaining in the de-packetization buffer
1966	   are removed from the buffer and passed to the decoder in the
1967	   order of increasing AbsDon values.

1969	7 Payload Format Parameters

1971	   This section specifies the parameters that MAY be used to select
1972	   optional features of the payload format and certain features or
1973	   properties of the bitstream or the RTP stream.  The parameters
1974	   are specified here as part of the media type registration for the
1975	   HEVC codec.  A mapping of the parameters into the Session
1976	   Description Protocol (SDP) [RFC4566] is also provided for
1977	   applications that use SDP.  Equivalent parameters could be
1978	   defined elsewhere for use with control protocols that do not use
1979	   SDP.

1981	7.1 Media Type Registration

1983	   The media subtype for the HEVC codec is allocated from the IETF
1984	   tree.

1986	   The receiver MUST ignore any unrecognized parameter.

1988	   Media Type name:     video

1990	   Media subtype name:  H265

1992	   Required parameters: none

1994	   OPTIONAL parameters:

1996	      profile-space, tier-flag, profile-id, profile-compatibility-
1997	      indicator, interop-constraints, and level-id:

1999	         These parameters indicate the profile, tier, default level,
2000	         and some constraints of the bitstream carried by the RTP
2001	         stream and all RTP streams the RTP stream depends on, or a
2002	         specific set of the profile, tier, default level, and some
2003	         constraints the receiver supports.

2005	         The profile and some constraints are indicated collectively
2006	         by profile-space, profile-id, profile-compatibility-
2007	         indicator, and interop-constraints.  The profile specifies
2008	         the subset of coding tools that may have been used to
2009	         generate the bitstream or that the receiver supports.

2011	            Informative note: There are 32 values of profile-id, and
2012	            there are 32 flags in profile-compatibility-indicator,
2013	            each flag corresponding to one value of profile-id.
2014	            According to HEVC version 1 in [HEVC], when more than
2015	            one of the 32 flags is set for a bitstream, the
2016	            bitstream would comply with all the profiles
2017	            corresponding to the set flags.  However, in a draft of
2018	            HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19
2019	            Format Range Extensions profiles have been specified,
2020	            all using the same value of profile-id (4),
2021	            differentiated by some of the 48 bits in interop-
2022	            constraints - this (rather unexpected way of profile
2023	            signalling) means that one of the 32 flags may
2024	            correspond to multiple profiles.  To be able to support
2025	            whatever HEVC extension profile that might be specified
2026	            and indicated using profile-space, profile-id, profile-
2027	            compatibility-indicator, and interop-constraints in the
2028	            future, it would be safe to require symmetric use of
2029	            these parameters in SDP offer/answer unless recv-sub-
2030	            layer-id is included in the SDP answer for choosing one
2031	            of the sub-layers offered.

2033	         The tier is indicated by tier-flag.  The default level is
2034	         indicated by level-id.  The tier and the default level
2035	         specify the limits on values of syntax elements or
2036	         arithmetic combinations of values of syntax elements that
2037	         are followed when generating the bitstream or that the
2038	         receiver supports.

2040	         A set of profile-space, tier-flag, profile-id, profile-
2041	         compatibility-indicator, interop-constraints, and level-id
2042	         parameters ptlA is said to be consistent with another set
2043	         of these parameters ptlB if any decoder that conforms to
2044	         the profile, tier, level, and constraints indicated by ptlB
2045	         can decode any bitstream that conforms to the profile,
2046	         tier, level, and constraints indicated by ptlA.

2048	         In SDP offer/answer, when the SDP answer does not include
2049	         the recv-sub-layer-id parameter that is less than the
2050	         sprop-sub-layer-id parameter in the SDP offer, the
2051	         following applies:

2053	            o The profile-space, tier-flag, profile-id, profile-
2054	              compatibility-indicator, and interop-constraints
2055	              parameters MUST be used symmetrically, i.e. the value
2056	              of each of these parameters in the offer MUST be the
2057	              same as that in the answer, either explicitly
2058	              signalled or implicitly inferred.

2060	            o The level-id parameter is changeable as long as the
2061	              highest level indicated by the answer is either equal
2062	              to or lower than that in the offer.  Note that the
2063	              highest level is indicated by level-id and max-recv-
2064	              level-id together.

2066	         In SDP offer/answer, when the SDP answer does include the
2067	         recv-sub-layer-id parameter that is less than the sprop-
2068	         sub-layer-id parameter in the SDP offer, the set of
2069	         profile-space, tier-flag, profile-id, profile-
2070	         compatibility-indicator, interop-constraints, and level-id
2071	         parameters included in the answer MUST be consistent with
2072	         that for the chosen sub-layer representation as indicated
2073	         in the SDP offer, with the exception that the level-id
2074	         parameter in the SDP answer is changable as long as the
2075	         highest level indicated by the answer is either lower than
2076	         or equal to that in the offer.

2078	         More specifications of these parameters, including how they
2079	         relate to the values of the profile, tier, and level syntax
2080	         elements specified in [HEVC] are provided below.

2082	      profile-space, profile-id:

2084	         The value of profile-space MUST be in the range of 0 to 3,
2085	         inclusive.  The value of profile-id MUST be in the range of
2086	         0 to 31, inclusive.

2088	         When profile-space is not present, a value of 0 MUST be
2089	         inferred.  When profile-id is not present, a value of 1
2090	         (i.e. the Main profile) MUST be inferred.

2092	         When used to indicate properties of a bitstream, profile-
2093	         space and profile-id are derived from the profile, tier,
2094	         and level syntax elements in SPS or VPS NAL units as
2095	         follows, where general_profile_space, general_profile_idc,
2096	         sub_layer_profile_space[j], and sub_layer_profile_idc[j]
2097	         are specified in [HEVC]:

2099	            If the RTP stream is the highest RTP stream, the
2100	            following applies:

2102	            o profile_space = general_profile_space
2103	            o profile_id = general_profile_idc

2105	            Otherwise (the RTP stream is a dependee RTP stream), the
2106	            following applies, with j being the value of the sprop-
2107	            sub-layer-id parameter:

2109	            o profile_space = sub_layer_profile_space[j]
2110	            o profile_id = sub_layer_profile_idc[j]

2112	      tier-flag, level-id:

2114	         The value of tier-flag MUST be in the range of 0 to 1,
2115	         inclusive.  The value of level-id MUST be in the range of 0
2116	         to 255, inclusive.

2118	         If the tier-flag and level-id parameters are used to
2119	         indicate properties of a bitstream, they indicate the tier
2120	         and the highest level the bitstream complies with.

2122	         If the tier-flag and level-id parameters are used for
2123	         capability exchange, the following applies.  If max-recv-
2124	         level-id is not present, the default level defined by
2125	         level-id indicates the highest level the codec wishes to
2126	         support.  Otherwise, max-recv-level-id indicates the
2127	         highest level the codec supports for receiving.  For either
2128	         receiving or sending, all levels that are lower than the
2129	         highest level supported MUST also be supported.

2131	         If no tier-flag is present, a value of 0 MUST be inferred
2132	         and if no level-id is present, a value of 93 (i.e. level
2133	         3.1) MUST be inferred.

2135	         When used to indicate properties of a bitstream, the tier-
2136	         flag and level-id parameters are derived from the profile,
2137	         tier, and level syntax elements in SPS or VPS NAL units as
2138	         follows, where general_tier_flag, general_level_idc,
2139	         sub_layer_tier_flag[j], and sub_layer_level_idc[j] are
2140	         specified in [HEVC]:

2142	            If the RTP stream is the highest RTP stream, the
2143	            following applies:

2145	            o tier-flag = general_tier_flag
2146	            o level-id = general_level_idc

2148	            Otherwise (the RTP stream is a dependee RTP stream), the
2149	            following applies, with j being the value of the sprop-
2150	            sub-layer-id parameter:

2152	            o tier-flag = sub_layer_tier_flag[j]
2153	            o level-id = sub_layer_level_idc[j]

2155	      interop-constraints:

2157	         A base16 [RFC4648] (hexadecimal) representation of six
2158	         bytes of data, consisting of progressive_source_flag,
2159	         interlaced_source_flag, non_packed_constraint_flag,
2160	         frame_only_constraint_flag, and reserved_zero_44bits.

2162	         If the interop-constraints parameter is not present, the
2163	         following MUST be inferred:

2165	            o progressive_source_flag = 1
2166	            o interlaced_source_flag = 0
2167	            o non_packed_constraint_flag = 1
2168	            o frame_only_constraint_flag = 1
2169	            o reserved_zero_44bits = 0

2171	         When the interop-constraints parameter is used to indicate
2172	         properties of a bitstream, the following applies, where
2173	         general_progressive_source_flag,
2174	         general_interlaced_source_flag,
2175	         general_non_packed_constraint_flag,
2176	         general_non_packed_constraint_flag,
2177	         general_frame_only_constraint_flag,
2178	         general_reserved_zero_44bits,
2179	         sub_layer_progressive_source_flag[j],
2180	         sub_layer_interlaced_source_flag[j],
2181	         sub_layer_non_packed_constraint_flag[j],
2182	         sub_layer_frame_only_constraint_flag[j], and
2183	         sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:

2185	            If the RTP stream is the highest RTP stream, the
2186	            following applies:

2188	            o progressive_source_flag =
2189	            general_progressive_source_flag
2190	            o interlaced_source_flag =
2191	            general_interlaced_source_flag
2192	            o non_packed_constraint_flag =
2193	                              general_non_packed_constraint_flag
2194	            o frame_only_constraint_flag =
2195	                              general_frame_only_constraint_flag
2196	            o reserved_zero_44bits = general_reserved_zero_44bits

2198	            Otherwise (the RTP stream is a dependee RTP stream), the
2199	            following applies, with j being the value of the sprop-
2200	            sub-layer-id parameter:

2202	            o progressive_source_flag =
2203	                              sub_layer_progressive_source_flag[j]
2204	            o interlaced_source_flag =
2205	                              sub_layer_interlaced_source_flag[j]
2206	            o non_packed_constraint_flag =

2208	               sub_layer_non_packed_constraint_flag[j]
2209	            o frame_only_constraint_flag =

2211	               sub_layer_frame_only_constraint_flag[j]
2212	            o reserved_zero_44bits =
2213	            sub_layer_reserved_zero_44bits[j]

2215	         Using interop-constraints for capability exchange results
2216	         in a requirement on any bitstream to be compliant with the
2217	         interop-constraints.

2219	      profile-compatibility-indicator:

2221	         A base16 [RFC4648] representation of four bytes of data.

2223	         When profile-compatibility-indicator is used to indicate
2224	         properties of a bitstream, the following applies, where
2225	         general_profile_compatibility_flag[j] and
2226	         sub_layer_profile_compatibility_flag[i][j] are specified in
2227	         [HEVC]:

2229	            The profile-compatibility-indicator in this case
2230	            indicates additional profiles to the profile defined by
2231	            profile_space, profile_id, and interop-constraints the
2232	            bitstream conforms to.  A decoder that conforms to any
2233	            of all the profiles the bitstream conforms to would be
2234	            capable of decoding the bitstream.  These additional
2235	            profiles are defined by profile-space, each set bit of
2236	            profile-compatibility-indicator, and interop-
2237	            constraints.

2239	            If the RTP stream is the highest RTP stream, the
2240	            following applies for each value of j in the range of 0
2241	            to 31, inclusive:

2243	            o bit j of profile-compatibility-indicator =
2244	                  general_profile_compatibility_flag[j]

2246	            Otherwise (the RTP stream is a dependee RTP stream), the
2247	            following applies for i equal to sprop-sub-layer-id and
2248	            for each value of j in the range of 0 to 31, inclusive:

2250	            o bit j of profile-compatibility-indicator =
2251	                  sub_layer_profile_compatibility_flag[i][j]

2253	         Using profile-compatibility-indicator for capability
2254	         exchange results in a requirement on any bitstream to be
2255	         compliant with the profile-compatibility-indicator.  This
2256	         is intended to handle cases where any future HEVC profile
2257	         is defined as an intersection of two or more profiles.

2259	         If this parameter is not present, this parameter defaults
2260	         to the following: bit j, with j equal to profile-id, of
2261	         profile-compatibility-indicator is inferred to be equal to
2262	         1, and all other bits are inferred to be equal to 0.

2264	      sprop-sub-layer-id:

2266	         This parameter MAY be used to indicate the highest allowed
2267	         value of TID in the bitstream.  When not present, the value
2268	         of sprop-sub-layer-id is inferred to be equal to 6.

2270	         The value of sprop-sub-layer-id MUST be in the range of 0
2271	         to 6, inclusive.

2273	      recv-sub-layer-id:

2275	         This parameter MAY be used to signal a receiver's choice of
2276	         the offered or declared sub-layer representations in the
2277	         sprop-vps.  The value of recv-sub-layer-id indicates the
2278	         TID of the highest sub-layer of the bitstream that a
2279	         receiver supports.  When not present, the value of recv-
2280	         sub-layer-id is inferred to be equal to the value of the
2281	         sprop-sub-layer-id parameter in the SDP offer.

2283	         The value of recv-sub-layer-id MUST be in the range of 0 to
2284	         6, inclusive.

2286	      max-recv-level-id:

2288	         This parameter MAY be used to indicate the highest level a
2289	         receiver supports.  The highest level the receiver supports
2290	         is equal to the value of max-recv-level-id divided by 30.

2292	         The value of max-recv-level-id MUST be in the range of 0
2293	         to 255, inclusive.

2295	         When max-recv-level-id is not present, the value is
2296	         inferred to be equal to level-id.

2298	         max-recv-level-id MUST NOT be present when the highest
2299	         level the receiver supports is not higher than the default
2300	         level.

2302	      tx-mode:

2304	         This parameter indicates whether the transmission mode is
2305	         SRST, MRST, or MRMT.

2307	         The value of tx-mode MUST be equal to "SRST", "MRST" or
2308	         "MRMT".  When not present, the value of tx-mode is inferred
2309	         to be equal to "SRST".

2311	         If the value is equal to "MRST", MRST MUST be in use.
2312	         Otherwise, if the value is equal to "MRMT", MRMT MUST be in
2313	         use.  Otherwise (the value is equal to "SRST"), SRST MUST
2314	         be in use.

2316	         The value of tx-mode MUST be equal to "MRST" for all RTP
2317	         streams in an MRST.

2319	         The value of tx-mode MUST be equal to "MRMT" for all RTP
2320	         streams in an MRMT.

2322	      sprop-vps:

2324	         This parameter MAY be used to convey any video parameter
2325	         set NAL unit of the bitstream for out-of-band transmission
2326	         of video parameter sets.  The parameter MAY also be used
2327	         for capability exchange and to indicate sub-stream
2328	         characteristics (i.e. properties of sub-layer
2329	         representations as defined in [HEVC]).  The value of the
2330	         parameter is a comma-separated (',') list of base64
2331	         [RFC4648] representations of the video parameter set NAL
2332	         units as specified in Section 7.3.2.1 of [HEVC].

2334	         The sprop-vps parameter MAY contain one or more than one
2335	         video parameter set NAL unit. However, all other video
2336	         parameter sets contained in the sprop-vps parameter MUST be
2337	         consistent with the first video parameter set in the sprop-
2338	         vps parameter.  A video parameter set vpsB is said to be
2339	         consistent with another video parameter set vpsA if any
2340	         decoder that conforms to the profile, tier, level, and
2341	         constraints indicated by the 12 bytes of data starting from
2342	         the syntax element general_profile_space to the syntax
2343	         element general_level_id, inclusive, in the first
2344	         profile_tier_level( ) syntax structure in vpsA can decode
2345	         any bitstream that conforms to the profile, tier, level,
2346	         and constraints indicated by the 12 bytes of data starting
2347	         from the syntax element general_profile_space to the syntax
2348	         element general_level_id, inclusive, in the first
2349	         profile_tier_level( ) syntax structure in vpsB.

2351	      sprop-sps:

2353	         This parameter MAY be used to convey sequence parameter set
2354	         NAL units of the bitstream for out-of-band transmission of
2355	         sequence parameter sets.  The value of the parameter is a
2356	         comma-separated (',') list of base64 [RFC4648]
2357	         representations of the sequence parameter set NAL units as
2358	         specified in Section 7.3.2.2 of [HEVC].

2360	      sprop-pps:

2362	         This parameter MAY be used to convey picture parameter set
2363	         NAL units of the bitstream for out-of-band transmission of
2364	         picture parameter sets.  The value of the parameter is a
2365	         comma-separated (',') list of base64 [RFC4648]
2366	         representations of the picture parameter set NAL units as
2367	         specified in Section 7.3.2.3 of [HEVC].

2369	      sprop-sei:

2371	         This parameter MAY be used to convey one or more SEI
2372	         messages that describe bitstream characteristics.  When
2373	         present, a decoder can rely on the bitstream
2374	         characteristics that are described in the SEI messages for
2375	         the entire duration of the session, independently from the
2376	         persistence scopes of the SEI messages as specified in
2377	         [HEVC].

2379	         The value of the parameter is a comma-separated (',') list
2380	         of base64 [RFC4648] representations of SEI NAL units as
2381	         specified in Section 7.3.2.4 of [HEVC].

2383	            Informative note: Intentionally, no list of applicable
2384	            or inapplicable SEI messages is specified here.
2385	            Conveying certain SEI messages in sprop-sei may be
2386	            sensible in some application scenarios and meaningless
2387	            in others.  However, a few examples are described below:

2389	           1) In an environment where the bitstream was created
2390	               from film-based source material, and no splicing is
2391	               going to occur during the lifetime of the session,
2392	               the film grain characteristics SEI message or the
2393	               tone mapping information SEI message are likely
2394	               meaningful, and sending them in sprop-sei rather than
2395	               in the bitstream at each entry point may help saving
2396	               bits and allows to configure the renderer only once,
2397	               avoiding unwanted artifacts.
2398	           2) The structure of pictures information SEI message in
2399	               sprop-sei can be used to inform a decoder of
2400	               information on the NAL unit types, picture order
2401	               count values, and prediction dependencies of a
2402	               sequence of pictures.  Having such knowledge can be
2403	               helpful for error recovery.
2404	           3) Examples for SEI messages that would be meaningless
2405	               to be conveyed in sprop-sei include the decoded
2406	               picture hash SEI message (it is close to impossible
2407	               that all decoded pictures have the same hash-tag),
2408	               the display orientation SEI message when the device
2409	               is a handheld device (as the display orientation may
2410	               change when the handheld device is turned around), or
2411	               the filler payload SEI message (as there is no point
2412	               in just having more bits in SDP).

2414	      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

2416	         These parameters MAY be used to signal the capabilities of
2417	         a receiver implementation.  These parameters MUST NOT be
2418	         used for any other purpose.  The highest level (specified
2419	         by max-recv-level-id) MUST be the highest that the receiver
2420	         is fully capable of supporting.  max-lsr, max-lps, max-cpb,
2421	         max-dpb, max-br, max-tr, and max-tc MAY be used to indicate
2422	         capabilities of the receiver that extend the required
2423	         capabilities of the highest level, as specified below.

2425	         When more than one parameter from the set (max-lsr, max-
2426	         lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present,
2427	         the receiver MUST support all signaled capabilities
2428	         simultaneously.  For example, if both max-lsr and max-br
2429	         are present, the highest level with the extension of both
2430	         the picture rate and bitrate is supported.  That is, the
2431	         receiver is able to decode bitstreams in which the luma
2432	         sample rate is up to max-lsr (inclusive), the bitrate is up
2433	         to max-br (inclusive), the coded picture buffer size is
2434	         derived as specified in the semantics of the max-br
2435	         parameter below, and the other properties comply with the
2436	         highest level specified by max-recv-level-id.

2438	            Informative note: When the OPTIONAL media type
2439	            parameters are used to signal the properties of a
2440	            bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max-
2441	            br, max-tr, and max-tc are not present, the values of
2442	            profile-space, tier-flag, profile-id, profile-
2443	            compatibility-indicator, interop-constraints, and level-
2444	            id must always be such that the bitstream complies fully
2445	            with the specified profile, tier, and level.

2447	      max-lsr:
2448	         The value of max-lsr is an integer indicating the maximum
2449	         processing rate in units of luma samples per second.  The
2450	         max-lsr parameter signals that the receiver is capable of
2451	         decoding video at a higher rate than is required by the
2452	         highest level.

2454	         When max-lsr is signaled, the receiver MUST be able to
2455	         decode bitstreams that conform to the highest level, with
2456	         the exception that the MaxLumaSR value in Table A-2 of
2457	         [HEVC] for the highest level is replaced with the value of
2458	         max-lsr.  Senders MAY use this knowledge to send pictures
2459	         of a given size at a higher picture rate than is indicated
2460	         in the highest level.

2462	         When not present, the value of max-lsr is inferred to be
2463	         equal to the value of MaxLumaSR given in Table A-2 of
2464	         [HEVC] for the highest level.

2466	         The value of max-lsr MUST be in the range of MaxLumaSR to
2467	         16 * MaxLumaSR, inclusive, where MaxLumaSR is given in
2468	         Table A-2 of [HEVC] for the highest level.

2470	      max-lps:
2471	         The value of max-lps is an integer indicating the maximum
2472	         picture size in units of luma samples.  The max-lps
2473	         parameter signals that the receiver is capable of decoding
2474	         larger picture sizes than are required by the highest
2475	         level.  When max-lps is signaled, the receiver MUST be able
2476	         to decode bitstreams that conform to the highest level,
2477	         with the exception that the MaxLumaPS value in Table A-1 of
2478	         [HEVC] for the highest level is replaced with the value of
2479	         max-lps.  Senders MAY use this knowledge to send larger
2480	         pictures at a proportionally lower picture rate than is
2481	         indicated in the highest level.

2483	         When not present, the value of max-lps is inferred to be
2484	         equal to the value of MaxLumaPS given in Table A-1 of
2485	         [HEVC] for the highest level.

2487	         The value of max-lps MUST be in the range of MaxLumaPS to
2488	         16 * MaxLumaPS, inclusive, where MaxLumaPS is given in
2489	         Table A-1 of [HEVC] for the highest level.

2491	      max-cpb:
2492	         The value of max-cpb is an integer indicating the maximum
2493	         coded picture buffer size in units of CpbBrVclFactor bits
2494	         for the VCL HRD parameters and in units of CpbBrNalFactor
2495	         bits for the NAL HRD parameters, where CpbBrVclFactor and
2496	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The
2497	         max-cpb parameter signals that the receiver has more memory
2498	         than the minimum amount of coded picture buffer memory
2499	         required by the highest level.  When max-cpb is signaled,
2500	         the receiver MUST be able to decode bitstreams that conform
2501	         to the highest level, with the exception that the MaxCPB
2502	         value in Table A-1 of [HEVC] for the highest level is
2503	         replaced with the value of max-cpb.  Senders MAY use this
2504	         knowledge to construct coded bitstreams with greater
2505	         variation of bitrate than can be achieved with the MaxCPB
2506	         value in Table A-1 of [HEVC].

2508	         When not present, the value of max-cpb is inferred to be
2509	         equal to the value of MaxCPB given in Table A-1 of [HEVC]
2510	         for the highest level.

2512	         The value of max-cpb MUST be in the range of MaxCPB to
2513	         16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table
2514	         A-1 of [HEVC] for the highest level.

2516	            Informative note: The coded picture buffer is used in
2517	            the hypothetical reference decoder (Annex C of HEVC).
2518	            The use of the hypothetical reference decoder is
2519	            recommended in HEVC encoders to verify that the produced
2520	            bitstream conforms to the standard and to control the
2521	            output bitrate.  Thus, the coded picture buffer is
2522	            conceptually independent of any other potential buffers
2523	            in the receiver, including de-packetization and de-
2524	            jitter buffers.  The coded picture buffer need not be
2525	            implemented in decoders as specified in Annex C of HEVC,
2526	            but rather standard-compliant decoders can have any
2527	            buffering arrangements provided that they can decode
2528	            standard-compliant bitstreams.  Thus, in practice, the
2529	            input buffer for a video decoder can be integrated with
2530	            de-packetization and de-jitter buffers of the receiver.

2532	         max-dpb:
2533	         The value of max-dpb is an integer indicating the maximum
2534	         decoded picture buffer size in units decoded pictures at
2535	         the MaxLumaPS for the highest level, i.e. the number of
2536	         decoded pictures at the maximum picture size defined by the
2537	         highest level.  The value of max-dpb MUST be in the range
2538	         of 1 to 16, respectively.  The max-dpb parameter signals
2539	         that the receiver has more memory than the minimum amount
2540	         of decoded picture buffer memory required by default, which
2541	         is MaxDpbPicBuf as defined in [HEVC] (equal to 6).  When
2542	         max-dpb is signaled, the receiver MUST be able to decode
2543	         bitstreams that conform to the highest level, with the
2544	         exception that the MaxDpbPicBuff value defined in [HEVC] as
2545	         6 is replaced with the value of max-dpb.  Consequently, a
2546	         receiver that signals max-dpb MUST be capable of storing
2547	         the following number of decoded pictures (MaxDpbSize) in
2548	         its decoded picture buffer:

2550	           if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
2551	              MaxDpbSize = Min( 4 * max-dpb, 16 )
2552	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
2553	              MaxDpbSize = Min( 2 * max-dpb, 16 )
2554	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2
2555	         ) )
2556	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
2557	           else
2558	              MaxDpbSize = max-dpb

2560	         Wherein MaxLumaPS given in Table A-1 of [HEVC] for the
2561	         highest level and PicSizeInSamplesY is the current size of
2562	         each decoded picture in units of luma samples as defined in
2563	         [HEVC].

2565	         The value of max-dpb MUST be greater than or equal to the
2566	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].
2567	         Senders MAY use this knowledge to construct coded
2568	         bitstreams with improved compression.

2570	         When not present, the value of max-dpb is inferred to be
2571	         equal to the value of MaxDpbPicBuf (i.e. 6) as defined in
2572	         [HEVC].

2574	            Informative note: This parameter was added primarily to
2575	            complement a similar codepoint in the ITU-T
2576	            Recommendation H.245, so as to facilitate signaling
2577	            gateway designs.  The decoded picture buffer stores
2578	            reconstructed samples.  There is no relationship between
2579	            the size of the decoded picture buffer and the buffers
2580	            used in RTP, especially de-packetization and de-jitter
2581	            buffers.

2583	      max-br:
2584	         The value of max-br is an integer indicating the maximum
2585	         video bitrate in units of CpbBrVclFactor bits per second
2586	         for the VCL HRD parameters and in units of CpbBrNalFactor
2587	         bits per second for the NAL HRD parameters, where
2588	         CpbBrVclFactor and CpbBrNalFactor are defined in Section
2589	         A.4 of [HEVC].

2591	         The max-br parameter signals that the video decoder of the
2592	         receiver is capable of decoding video at a higher bitrate
2593	         than is required by the highest level.

2595	         When max-br is signaled, the video codec of the receiver
2596	         MUST be able to decode bitstreams that conform to the
2597	         highest level, with the following exceptions in the limits
2598	         specified by the highest level:

2600	          o The value of max-br replaces the MaxBR value in Table A-
2601	            2 of [HEVC] for the highest level.
2602	          o When the max-cpb parameter is not present, the result of
2603	            the following formula replaces the value of MaxCPB in
2604	            Table A-1 of [HEVC]:

2606	               (MaxCPB of the highest level) * max-br / (MaxBR of
2607	               the highest level)

2609	         For example, if a receiver signals capability for Main
2610	         profile Level 2 with max-br equal to 2000, this indicates a
2611	         maximum video bitrate of 2000 kbits/sec for VCL HRD
2612	         parameters, a maximum video bitrate of 2200 kbits/sec for
2613	         NAL HRD parameters, and a CPB size of 2000000 bits (2000000
2614	         / 1500000 * 1500000).

2616	         Senders MAY use this knowledge to send higher bitrate video
2617	         as allowed in the level definition of Annex A of HEVC to
2618	         achieve improved video quality.

2620	         When not present, the value of max-br is inferred to be
2621	         equal to the value of MaxBR given in Table A-2 of [HEVC]
2622	         for the highest level.

2624	         The value of max-br MUST be in the range of MaxBR to
2625	         16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of
2626	         [HEVC] for the highest level.

2628	            Informative note: This parameter was added primarily to
2629	            complement a similar codepoint in the ITU-T
2630	            Recommendation H.245, so as to facilitate signaling
2631	            gateway designs.  The assumption that the network is
2632	            capable of handling such bitrates at any given time
2633	            cannot be made from the value of this parameter.  In
2634	            particular, no conclusion can be drawn that the signaled
2635	            bitrate is possible under congestion control
2636	            constraints.

2638	      max-tr:
2639	         The value of max-tr is an integer indication the maximum
2640	         number of tile rows.  The max-tr parameter signals that the
2641	         receiver is capable of decoding video with a larger number
2642	         of tile rows than the value allowed by the highest level.

2644	         When max-tr is signaled, the receiver MUST be able to
2645	         decode bitstreams that conform to the highest level, with
2646	         the exception that the MaxTileRows value in Table A-1 of
2647	         [HEVC] for the highest level is replaced with the value of
2648	         max-tr.

2650	         Senders MAY use this knowledge to send pictures utilizing a
2651	         larger number of tile rows than the value allowed by the
2652	         highest level.

2654	         When not present, the value of max-tr is inferred to be
2655	         equal to the value of MaxTileRows given in Table A-1 of
2656	         [HEVC] for the highest level.

2658	         The value of max-tr MUST be in the range of MaxTileRows to
2659	         16 * MaxTileRows, inclusive, where MaxTileRows is given in
2660	         Table A-1 of [HEVC] for the highest level.

2662	      max-tc:
2663	         The value of max-tc is an integer indication the maximum
2664	         number of tile columns.  The max-tc parameter signals that
2665	         the receiver is capable of decoding video with a larger
2666	         number of tile columns than the value allowed by the
2667	         highest level.

2669	         When max-tc is signaled, the receiver MUST be able to
2670	         decode bitstreams that conform to the highest level, with
2671	         the exception that the MaxTileCols value in Table A-1 of
2672	         [HEVC] for the highest level is replaced with the value of
2673	         max-tc.

2675	         Senders MAY use this knowledge to send pictures utilizing a
2676	         larger number of tile columns than the value allowed by the
2677	         highest level.

2679	         When not present, the value of max-tc is inferred to be
2680	         equal to the value of MaxTileCols given in Table A-1 of
2681	         [HEVC] for the highest level.

2683	         The value of max-tc MUST be in the range of MaxTileCols to
2684	         16 * MaxTileCols, inclusive, where MaxTileCols is given in
2685	         Table A-1 of [HEVC] for the highest level.

2687	      max-fps:

2689	         The value of max-fps is an integer indicating the maximum
2690	         picture rate in units of pictures per 100 seconds that can
2691	         be effectively processed by the receiver.  The max-fps
2692	         parameter MAY be used to signal that the receiver has a
2693	         constraint in that it is not capable of processing video
2694	         effectively at the full picture rate that is implied by the
2695	         highest level and, when present, one or more of the
2696	         parameters max-lsr, max-lps, and max-br.

2698	         The value of max-fps is not necessarily the picture rate at
2699	         which the maximum picture size can be sent, it constitutes
2700	         a constraint on maximum picture rate for all resolutions.

2702	            Informative note: The max-fps parameter is semantically
2703	            different from max-lsr, max-lps, max-cpb, max-dpb, max-
2704	            br, max-tr, and max-tc in that max-fps is used to signal
2705	            a constraint, lowering the maximum picture rate from
2706	            what is implied by other parameters.

2708	         The encoder MUST use a picture rate equal to or less than
2709	         this value.  In cases where the max-fps parameter is absent
2710	         the encoder is free to choose any picture rate according to
2711	         the highest level and any signaled optional parameters.

2713	         The value of max-fps MUST be smaller than or equal to the
2714	         full picture rate that is implied by the highest level and,
2715	         when present, one or more of the parameters max-lsr, max-
2716	         lps, and max-br.

2718	      sprop-max-don-diff:

2720	         If tx-mode is equal to "SRST" and there is no NAL unit
2721	         naluA that is followed in transmission order by any NAL
2722	         unit preceding naluA in decoding order (i.e. the
2723	         transmission order of the NAL units is the same as the
2724	         decoding order), the value of this parameter MUST be equal
2725	         to 0.

2727	         Otherwise, if tx-mode is equal to "MRST" or "MRMT", the
2728	         decoding order of the NAL units of all the RTP streams is
2729	         the same as the NAL unit transmission order and the NAL
2730	         unit output order, the value of this parameter MUST be
2731	         equal to either 0 or 1.

2733	         Otherwise, if tx-mode is equal to "MRST" or "MRMT" and the
2734	         decoding order of the NAL units of all the RTP streams is
2735	         the same as the NAL unit transmission order but not the
2736	         same as the NAL unit output order, the value of this
2737	         parameter MUST be equal to 1.

2739	         Otherwise, this parameter specifies the maximum absolute
2740	         difference between the decoding order number (i.e., AbsDon)
2741	         values of any two NAL units naluA and naluB, where naluA
2742	         follows naluB in decoding order and precedes naluB in
2743	         transmission order.

2745	         The value of sprop-max-don-diff MUST be an integer in the
2746	         range of 0 to 32767, inclusive.

2748	         When not present, the value of sprop-max-don-diff is
2749	         inferred to be equal to 0.

2751	      sprop-depack-buf-nalus:

2753	         This parameter specifies the maximum number of NAL units
2754	         that precede a NAL unit in transmission order and follow
2755	         the NAL unit in decoding order.

2757	         The value of sprop-depack-buf-nalus MUST be an integer in
2758	         the range of 0 to 32767, inclusive.

2760	         When not present, the value of sprop-depack-buf-nalus is
2761	         inferred to be equal to 0.

2763	         When sprop-max-don-diff is present and greater than 0, this
2764	         parameter MUST be present and the value MUST be greater
2765	         than 0.

2767	      sprop-depack-buf-bytes:

2769	         This parameter signals the required size of the de-
2770	         packetization buffer in units of bytes.  The value of the
2771	         parameter MUST be greater than or equal to the maximum
2772	         buffer occupancy (in units of bytes) of the de-
2773	         packetization buffer as specified in Section 6.

2775	         The value of sprop-depack-buf-bytes MUST be an integer in
2776	         the range of 0 to 4294967295, inclusive.

2778	         When sprop-max-don-diff is present and greater than 0, this
2779	         parameter MUST be present and the value MUST be greater
2780	         than 0. When not present, the value of sprop-depack-buf-
2781	         bytes is inferred to be equal to 0.

2783	            Informative note: The value of sprop-depack-buf-bytes
2784	            indicates the required size of the de-packetization
2785	            buffer only.  When network jitter can occur, an
2786	            appropriately sized jitter buffer has to be available as
2787	            well.

2789	      depack-buf-cap:

2791	         This parameter signals the capabilities of a receiver
2792	         implementation and indicates the amount of de-packetization
2793	         buffer space in units of bytes that the receiver has
2794	         available for reconstructing the NAL unit decoding order
2795	         from NAL units carried in one or more RTP streams.  A
2796	         receiver is able to handle any RTP stream, and all RTP
2797	         streams the RTP stream depends on, when present, for which
2798	         the value of the sprop-depack-buf-bytes parameter is
2799	         smaller than or equal to this parameter.

2801	         When not present, the value of depack-buf-cap is inferred
2802	         to be equal to 4294967295.  The value of depack-buf-cap
2803	         MUST be an integer in the range of 1 to 4294967295,
2804	         inclusive.

2806	            Informative note: depack-buf-cap indicates the maximum
2807	            possible size of the de-packetization buffer of the
2808	            receiver only, without allowing for network jitter.

2810	      sprop-segmentation-id:

2812	         This parameter MAY be used to signal the segmentation tools
2813	         present in the bitstream and that can be used for
2814	         parallelization.  The value of sprop-segmentation-id MUST
2815	         be an integer in the range of 0 to 3, inclusive.  When not
2816	         present, the value of sprop-segmentation-id is inferred to
2817	         be equal to 0.

2819	         When sprop-segmentation-id is equal to 0, no information
2820	         about the segmentation tools is provided.  When sprop-
2821	         segmentation-id is equal to 1, it indicates that slices are
2822	         present in the bitstream.  When sprop-segmentation-id is
2823	         equal to 2, it indicates that tiles are present in the
2824	         bitstream.  When sprop-segmentation-id is equal to 3, it
2825	         indicates that WPP is used in the bitstream.

2827	      sprop-spatial-segmentation-idc:

2829	         A base16 [RFC4648] representation of the syntax element
2830	         min_spatial_segmentation_idc as specified in [HEVC].  This
2831	         parameter MAY be used to describe parallelization
2832	         capabilities of the bitstream.

2834	      dec-parallel-cap:

2836	         This parameter MAY be used to indicate the decoder's
2837	         additional decoding capabilities given the presence of
2838	         tools enabling parallel decoding, such as slices, tiles,
2839	         and WPP, in the bitstream.  The decoding capability of the
2840	         decoder may vary with the setting of the parallel decoding
2841	         tools present in the bitstream, e.g. the size of the tiles
2842	         that are present in a bitstream.  Therefore, multiple
2843	         capability points may be provided, each indicating the
2844	         minimum required decoding capability that is associated
2845	         with a parallelism requirement, which is a requirement on
2846	         the bitstream that enables parallel decoding.

2848	         Each capability point is defined as a combination of 1) a
2849	         parallelism requirement, 2) a profile (determined by
2850	         profile-space and profile-id), 3) a highest level, and 4) a
2851	         maximum processing rate, a maximum picture size, and a
2852	         maximum video bitrate that may be equal to or greater than
2853	         that determined by the highest level.  The parameter's
2854	         syntax in ABNF [RFC5234] is as follows:

2856	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2857	                               cap-point) "}"

2859	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2860	                         cap-parameter)

2862	            spatial-seg-idc = 1*4DIGIT ; (1-4095)

2864	            cap-parameter = tier-flag / level-id / max-lsr
2865	                            / max-lps / max-br

2867	            tier-flag = "tier-flag" EQ ("0" / "1")

2869	            level-id  = "level-id" EQ 1*3DIGIT ; (0-255)

2871	            max-lsr   = "max-lsr" EQ  1*20DIGIT ; (0-
2872	            18,446,744,073,709,551,615)

2874	            max-lps   = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)

2876	            max-br    = "max-br"  EQ 1*20DIGIT ; (0-
2877	            18,446,744,073,709,551,615)

2879	            EQ = "="

2881	         The set of capability points expressed by the dec-parallel-
2882	         cap parameter is enclosed in a pair of curly braces ("{}").
2883	         Each set of two consecutive capability points is separated
2884	         by a comma (',').  Within each capability point, each set
2885	         of two consecutive parameters, and when present, their
2886	         values, is separated by a semicolon (';').

2888	         The profile of all capability points is determined by
2889	         profile-space and profile-id that are outside the dec-
2890	         parallel-cap parameter.

2892	         Each capability point starts with an indication of the
2893	         parallelism requirement, which consists of a parallel tool
2894	         type, which may be equal to 'w' or 't', and a decimal value
2895	         of the spatial-seg-idc parameter.  When the type is 'w',
2896	         the capability point is valid only for H.265 bitstreams
2897	         with WPP in use, i.e. entropy_coding_sync_enabled_flag
2898	         equal to 1.  When the type is 't', the capability point is
2899	         valid only for H.265 bitstreams with WPP not in use (i.e.
2900	         entropy_coding_sync_enabled_flag equal to 0).  The
2901	         capability-point is valid only for H.265 bitstreams with
2902	         min_spatial_segmentation_idc equal to or greater than
2903	         spatial-seg-idc.

2905	         After the parallelism requirement indication, each
2906	         capability point continues with one or more pairs of
2907	         parameter and value in any order for any of the following
2908	         parameters:

2910	            o tier-flag
2911	            o level-id
2912	            o max-lsr
2913	            o max-lps
2914	            o max-br

2916	         At most one occurrence of each of the above five parameters
2917	         is allowed within each capability point.

2919	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2920	         cap.level-id for a capability point indicate the highest
2921	         level of the capability point.  The values of dec-parallel-
2922	         cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel-
2923	         cap.max-br for a capability point indicate the maximum
2924	         processing rate in units of luma samples per second, the
2925	         maximum picture size in units of luma samples, and the
2926	         maximum video bitrate (in units of CpbBrVclFactor bits per
2927	         second for the VCL HRD parameters and in units of
2928	         CpbBrNalFactor bits per second for the NAL HRD parameters
2929	         where CpbBrVclFactor and CpbBrNalFactor are defined in
2930	         Section A.4 of [HEVC]).

2932	         When not present, the value of dec-parallel-cap.tier-flag
2933	         is inferred to be equal to the value of tier-flag outside
2934	         the dec-parallel-cap parameter.  When not present, the
2935	         value of dec-parallel-cap.level-id is inferred to be equal
2936	         to the value of max-recv-level-id outside the dec-parallel-
2937	         cap parameter.  When not present, the value of dec-
2938	         parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec-
2939	         parallel-cap.max-br is inferred to be equal to the value of
2940	         max-lsr, max-lps, or max-br, respectively, outside the dec-
2941	         parallel-cap parameter.

2943	         The general decoding capability, expressed by the set of
2944	         parameters outside of dec-parallel-cap, is defined as the
2945	         capability point that is determined by the following
2946	         combination of parameters: 1) the parallelism requirement
2947	         corresponding to the value of sprop-segmentation-id equal
2948	         to 0 for a bitstream, 2) the profile determined by profile-
2949	         space, profile-id, profile-compatibility-indicator, and
2950	         interop-constraints, 3) the tier and the highest level
2951	         determined by tier-flag and max-recv-level-id, and 4) the
2952	         maximum processing rate, the maximum picture size, and the
2953	         maximum video bitrate determined by the highest level.  The
2954	         general decoding capability MUST NOT be included as one of
2955	         the set of capability points in the dec-parallel-cap
2956	         parameter.

2958	         For example, the following parameters express the general
2959	         decoding capability of 720p30 (Level 3.1) plus an
2960	         additional decoding capability of 1080p30 (Level 4) given
2961	         that the spatially largest tile or slice used in the
2962	         bitstream is equal to or less than 1/3 of the picture size:

2964	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-
2965	            id=120}

2967	         For another example, the following parameters express an
2968	         additional decoding capability of 1080p30, using dec-
2969	         parallel-cap.max-lsr and dec-parallel-cap.max-lps, given
2970	         that WPP is used in the bitstream:

2972	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2973	                        max-lsr=62668800;max-lps=2088960}

2975	            Informative note: When min_spatial_segmentation_idc is
2976	            present in a bitstream and WPP is not used, [HEVC]
2977	            specifies that there is no slice or no tile in the
2978	            bitstream containing more than 4 * PicSizeInSamplesY /
2979	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2981	      include-dph:

2983	         This parameter is used to indicate the capability and
2984	         preference to utilize or include decoded picture hash (DPH)
2985	         SEI messages (See Section D.3.19 of [HEVC]) in the
2986	         bitstream. DPH SEI messages can be used to detect picture
2987	         corruption so the receiver can request picture repair, see
2988	         Section 8.  The value is a comma separated list of hash
2989	         types that is supported or requested to be used, each hash
2990	         type provided as an unsigned integer value (0-255), with
2991	         the hash types listed from most preferred to the least
2992	         preferred.  Example: "include-dph=0,2", which indicates the
2993	         capability for MD5 (most preferred) and Checksum (less
2994	         preferred).  If the parameter is not included or the value
2995	         contains no hash types, then no capability to utilize DPH
2996	         SEI messages is assumed.  Note that DPH SEI messages MAY
2997	         still be included in the bitstream even when there is no
2998	         declaration of capability to use them, as in general SEI
2999	         messages do not affect the normative decoding process and
3000	         decoders are allowed to ignore SEI messages.

3002	      Encoding considerations:

3004	         This type is only defined for transfer via RTP (RFC 3550).

3006	      Security considerations:

3008	         See Section 9 of RFC XXXX.

3010	      Public specification:

3012	         Please refer to Section 13 of RFC XXXX.

3014	      Additional information: None

3016	      File extensions: none

3018	      Macintosh file type code: none

3020	      Object identifier or OID: none

3022	      Person & email address to contact for further information:

3024	         Ye-Kui Wang (yekuiw@qti.qualcomm.com).

3026	      Intended usage: COMMON

3028	      Author: See Section 14 of RFC XXXX.

3030	      Change controller:

3032	         IETF Audio/Video Transport Payloads working group delegated
3033	         from the IESG.

3035	7.2 SDP Parameters

3037	   The receiver MUST ignore any parameter unspecified in this memo.

3039	7.2.1 Mapping of Payload Type Parameters to SDP

3041	   The media type video/H265 string is mapped to fields in the
3042	   Session Description Protocol (SDP) [RFC4566] as follows:

3044	   o  The media name in the "m=" line of SDP MUST be video.

3046	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265
3047	      (the media subtype).

3049	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

3051	   o  The OPTIONAL parameters "profile-space", "profile-id", "tier-
3052	      flag", "level-id", "interop-constraints", "profile-
3053	      compatibility-indicator", "sprop-sub-layer-id", "recv-sub-
3054	      layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max-
3055	      lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc",
3056	      "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus",
3057	      "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-
3058	      segmentation-id", "sprop-spatial-segmentation-idc", "dec-
3059	      parallel-cap", and "include-dph", when present, MUST be
3060	      included in the "a=fmtp" line of SDP.  This parameter is
3061	      expressed as a media type string, in the form of a semicolon
3062	      separated list of parameter=value pairs.

3064	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
3065	      pps", when present, MUST be included in the "a=fmtp" line of
3066	      SDP or conveyed using the "fmtp" source attribute as specified
3067	      in Section 6.3 of [RFC5576].  For a particular media format
3068	      (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop-
3069	      pps" MUST NOT be both included in the "a=fmtp" line of SDP and
3070	      conveyed using the "fmtp" source attribute.  When included in
3071	      the "a=fmtp" line of SDP, these parameters are expressed as a
3072	      media type string, in the form of a semicolon separated list
3073	      of parameter=value pairs.  When conveyed in the "a=fmtp" line
3074	      of SDP for a particular payload type, the parameters "sprop-
3075	      vps", "sprop-sps", and "sprop-pps" MUST be applied to each
3076	      SSRC with the payload type.  When conveyed using the "fmtp"
3077	      source attribute, these parameters are only associated with
3078	      the given source and payload type as parts of the "fmtp"
3079	      source attribute.

3081	          Informative note: Conveyance of "sprop-vps", "sprop-sps",
3082	          and "sprop-pps" using the "fmtp" source attribute allows
3083	          for out-of-band transport of parameter sets in topologies
3084	          like Topo-Video-switch-MCU as specified in [RFC5117].

3086	   An example of media representation in SDP is as follows:

3088	         m=video 49170 RTP/AVP 98
3089	         a=rtpmap:98 H265/90000
3090	         a=fmtp:98 profile-id=1;
3091	                   sprop-vps=<video parameter sets data>

3093	7.2.2 Usage with SDP Offer/Answer Model

3095	   When HEVC is offered over RTP using SDP in an Offer/Answer model
3096	   [RFC3264] for negotiation for unicast usage, the following
3097	   limitations and rules apply:

3099	   o  The parameters identifying a media format configuration for
3100	      HEVC are profile-space, profile-id, tier-flag, level-id,
3101	      interop-constraints, profile-compatibility-indicator, and tx-
3102	      mode.  These media configuration parameters, except level-id,
3103	      MUST be used symmetrically when the answerer does not include
3104	      recv-sub-layer-id in the answer for the media format (payload
3105	      type) or the included recv-sub-layer-id is equal to sprop-sub-
3106	      layer-id in the offer.  The answerer MUST

3108	        1) maintain all configuration parameters with the values
3109	           remaining the same as in the offer for the media format
3110	           (payload type), with the exception that the value of
3111	           level-id is changeable as long as the highest level
3112	           indicated by the answer is not higher than that indicated
3113	           by the offer;

3115	        2) include in the answer the recv-sub-layer-id parameter,
3116	           with a value less than the sprop-sub-layer-id parameter
3117	           in the offer, for the media format (payload type), and
3118	           maintain all configuration parameters with the values
3119	           being the same as signalled in the sprop-vps for the
3120	           chosen sub-layer representation, with the exception that
3121	           the value of level-id is changeable as long as the
3122	           highest level indicated by the answer is not higher than
3123	           the level indicated by the sprop-vps in offer for the
3124	           chosen sub-layer representation; or

3126	        3) remove the media format (payload type) completely (when
3127	           one or more of the parameter values are not supported).

3129	          Informative note: The above requirement for symmetric use
3130	          does not apply for level-id, and does not apply for the
3131	          other bitstream or RTP stream properties and capability
3132	          parameters.

3134	   o  The profile-compatibility-indicator, when offered as sendonly,
3135	      describe bitstream properties.  The answerer MAY accept an RTP
3136	      payload type even if the decoder is not capable of handling
3137	      the profile indicated by the profile-space, profile-id, and
3138	      interop-constraints parameters, but capable of any of the
3139	      profiles indicated by the profile-space, profile-
3140	      compatibility-indicator, and interop-constraints.  However,
3141	      when the profile-compatibility-indicator is used in a recvonly
3142	      or sendrecv media description, the bitstream using this RTP
3143	      payload type is required to conform to all profiles indicated
3144	      by profile-space, profile-compatibility-indicator, and
3145	      interop-constraints.

3147	   o  To simplify handling and matching of these configurations, the
3148	      same RTP payload type number used in the offer SHOULD also be
3149	      used in the answer, as specified in [RFC3264].

3151	   o  The same RTP payload type number used in the offer for the
3152	      media subtype H265 MUST be used in the answer when the answer
3153	      includes recv-sub-layer-id.  When the answer does not include
3154	      recv-sub-layer-id, the answer MUST NOT contain a payload type
3155	      number used in the offer for the media subtype H265 unless the
3156	      configuration is exactly the same as in the offer or the
3157	      configuration in the answer only differs from that in the
3158	      offer with a different value of level-id.  The answer MAY
3159	      contain the recv-sub-layer-id parameter if an HEVC bitstream
3160	      contains multiple operation points (using temporal scalability
3161	      and sub-layers) and sprop-vps is included in the offer where
3162	      information of sub-layers are present in the first video
3163	      parameter set contained in sprop-vps.  If the sprop-vps is
3164	      provided in an offer, an answerer MAY select a particular
3165	      operation point indicated in the first video parameter set
3166	      contained in sprop-vps.  When the answer includes recv-sub-
3167	      layer-id that is less than sprop-sub-layer-id in the offer,
3168	      all video parameter sets contained in the sprop-vps parameter
3169	      in the SDP answer and all video parameter sets sent in-band
3170	      for either the offerer-to-answerer direction or the answerer-
3171	      to-offerer direction MUST be consistent with the first video
3172	      parameter set in the sprop-vps parameter of the offer (see the
3173	      semantics of sprop-vps in Section 7.1 of this document on one
3174	      video parameter set being consistent with another video
3175	      parameter set), and the bitstream sent in either direction
3176	      MUST conform to the profile, tier, level, and constraints of
3177	      the chosen sub-layer representation as indicated by the first
3178	      profile_tier_level( ) syntax structure in the first video
3179	      parameter set in the sprop-vps parameter of the offer.

3181	          Informative note: When an offerer receives an answer that
3182	          does not include recv-sub-layer-id, it has to compare
3183	          payload types not declared in the offer based on the media
3184	          type (i.e. video/H265) and the above media configuration
3185	          parameters with any payload types it has already declared.
3186	          This will enable it to determine whether the configuration
3187	          in question is new or if it is equivalent to configuration
3188	          already offered, since a different payload type number may
3189	          be used in the answer.  The ability to perform operation
3190	          point selection enables a receiver to utilize the temporal
3191	          scalable nature of an HEVC bitstream.

3193	   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
3194	      sprop-depack-buf-bytes describe the properties of an RTP
3195	      stream, and all RTP streams the RTP stream depends on, when
3196	      present, that the offerer or the answerer is sending for the
3197	      media format configuration.  This differs from the normal
3198	      usage of the Offer/Answer parameters: normally such parameters
3199	      declare the properties of the bitstream or RTP stream that the
3200	      offerer or the answerer is able to receive.  When dealing with
3201	      HEVC, the offerer assumes that the answerer will be able to
3202	      receive media encoded using the configuration being offered.

3204	          Informative note:  The above parameters apply for any RTP
3205	          stream and all RTP streams the RTP stream depends on, when
3206	          present, sent by a declaring entity with the same
3207	          configuration.  In other words, the applicability of the
3208	          above parameters to RTP streams depends on the source
3209	          endpoint.  Rather than being bound to the payload type,
3210	          the values may have to be applied to another payload type
3211	          when being sent, as they apply for the configuration.

3213	   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
3214	      max-br, max-tr, and max-tc MAY be used to declare further
3215	      capabilities of the offerer or answerer for receiving.  These
3216	      parameters MUST NOT be present when the direction attribute is
3217	      "sendonly".

3219	   o  The capability parameter max-fps MAY be used to declare lower
3220	      capabilities of the offerer or answerer for receiving.  The
3221	      parameters MUST NOT be present when the direction attribute is
3222	      "sendonly".

3224	   o  The capability parameter dec-parallel-cap MAY be used to
3225	      declare additional decoding capabilities of the offerer or
3226	      answerer for receiving.  Upon receiving such a declaration of
3227	      a receiver, a sender MAY send a bitstream to the receiver
3228	      utilizing those capabilities under the assumption that the
3229	      bitstream fulfills the parallelism requirement.  A bitstream
3230	      that is sent based on choosing a capability point with
3231	      parallel tool type 'w' from dec-parallel-cap MUST have
3232	      entropy_coding_sync_enabled_flag equal to 1 and
3233	      min_spatial_segmentation_idc equal to or larger than dec-
3234	      parallel-cap.spatial-seg-idc of the capability point.  A
3235	      bitstream that is sent based on choosing a capability point
3236	      with parallel tool type 't' from dec-parallel-cap MUST have
3237	      entropy_coding_sync_enabled_flag equal to 0 and
3238	      min_spatial_segmentation_idc equal to or larger than dec-
3239	      parallel-cap.spatial-seg-idc of the capability point.

3241	   o  An offerer has to include the size of the de-packetization
3242	      buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff
3243	      and sprop-depack-buf-nalus, in the offer for an interleaved
3244	      HEVC bitstream or for the MRST or MRMT transmission mode when
3245	      sprop-max-don-diff is greater than 0 for at least one of the
3246	      RTP streams.  To enable the offerer and answerer to inform
3247	      each other about their capabilities for de-packetization
3248	      buffering in receiving RTP streams, both parties are
3249	      RECOMMENDED to include depack-buf-cap.  For interleaved RTP
3250	      streams or in MRST or MRMT, it is also RECOMMENDED to consider
3251	      offering multiple payload types with different buffering
3252	      requirements when the capabilities of the receiver are
3253	      unknown.

3255	   o  The capability parameter include-dph MAY be used to declare
3256	      the capability to utilize decoded picture hash SEI messages
3257	      and which types of hashes in any HEVC RTP streams received by
3258	      the offerer or answerer.

3260	   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included
3261	      in the "a=fmtp" line of SDP or conveyed using the "fmtp"
3262	      source attribute as specified in Section 6.3 of [RFC5576]),
3263	      are used for out-of-band transport of the parameter sets (VPS,
3264	      SPS, or PPS respectively).

3266	   o  The answerer MAY use either out-of-band or in-band transport
3267	      of parameter sets for the bitstream it is sending, regardless
3268	      of whether out-of-band parameter sets transport has been used
3269	      in the offerer-to-answerer direction.  Parameter sets included
3270	      in an answer are independent of those parameter sets included
3271	      in the offer, as they are used for decoding two different
3272	      bitstreams, one from the answerer to the offerer and the other
3273	      in the opposite direction.  In case some RTP stream(s) are
3274	      sent before SDP offer/answer settles down, in-band parameter
3275	      sets MUST be used for those RTP stream parts sent before the
3276	      SDP offer/answer.

3278	   o  The following rules apply to transport of parameter set in the
3279	      offerer-to-answerer direction.

3281	       o An offer MAY include sprop-vps, sprop-sps, and/or sprop-
3282	          pps.  If none of these parameters is present in the offer,
3283	          then only in-band transport of parameter sets is used.

3285	       o If the level to use in the offerer-to-answerer direction
3286	          is equal to the default level in the offer, the answerer
3287	          MUST be prepared to use the parameter sets included in
3288	          sprop-vps, sprop-sps, and sprop-pps (either included in
3289	          the "a=fmtp" line of SDP or conveyed using the "fmtp"
3290	          source attribute) for decoding the incoming bitstream,
3291	          e.g. by passing these parameter set NAL units to the video
3292	          decoder before passing any NAL units carried in the RTP
3293	          streams.  Otherwise, the answerer MUST ignore sprop-vps,
3294	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
3295	          line of SDP or conveyed using the "fmtp" source attribute)
3296	          and the offerer MUST transmit parameter sets in-band.

3298	       o In MRST or MRMT, the answerer MUST be prepared to use the
3299	          parameter sets out-of-band transmitted for the RTP stream
3300	          and all RTP streams the RTP stream depends on, when
3301	          present, for decoding the incoming bitstream, e.g. by
3302	          passing these parameter set NAL units to the video decoder
3303	          before passing any NAL units carried in the RTP streams.

3305	   o  The following rules apply to transport of parameter set in the
3306	      answerer-to-offerer direction.

3308	       o An answer MAY include sprop-vps, sprop-sps, and/or sprop-
3309	          pps.  If none of these parameters is present in the
3310	          answer, then only in-band transport of parameter sets is
3311	          used.

3313	       o The offerer MUST be prepared to use the parameter sets
3314	          included in sprop-vps, sprop-sps, and sprop-pps (either
3315	          included in the "a=fmtp" line of SDP or conveyed using the
3316	          "fmtp" source attribute) for decoding the incoming
3317	          bitstream, e.g. by passing these parameter set NAL units
3318	          to the video decoder before passing any NAL units carried
3319	          in the RTP streams.

3321	       o In MRST or MRMT, the offerer MUST be prepared to use the
3322	          parameter sets out-of-band transmitted for the RTP stream
3323	          and all RTP streams the RTP stream depends on, when
3324	          present, for decoding the incoming bitstream, e.g. by
3325	          passing these parameter set NAL units to the video decoder
3326	          before passing any NAL units carried in the RTP streams.

3328	   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
3329	      the "fmtp" source attribute as specified in Section 6.3 of
3330	      [RFC5576], the receiver of the parameters MUST store the
3331	      parameter sets included in sprop-vps, sprop-sps, and/or sprop-
3332	      pps and associate them with the source given as part of the
3333	      "fmtp" source attribute.  Parameter sets associated with one
3334	      source (given as part of the "fmtp" source attribute) MUST
3335	      only be used to decode NAL units conveyed in RTP packets from
3336	      the same source (given as part of the "fmtp" source
3337	      attribute).  When this mechanism is in use, SSRC collision
3338	      detection and resolution MUST be performed as specified in
3339	      [RFC5576].

3341	   For bitstreams being delivered over multicast, the following
3342	   rules apply:

3344	   o  The media format configuration is identified by profile-space,
3345	      profile-id, tier-flag, level-id, interop-constraints, profile-
3346	      compatibility-indicator, and tx-mode.  These media format
3347	      configuration parameters, including level-id, MUST be used
3348	      symmetrically; that is, the answerer MUST either maintain all
3349	      configuration parameters or remove the media format (payload
3350	      type) completely.  Note that this implies that the level-id
3351	      for Offer/Answer in multicast is not changeable.

3353	   o  To simplify the handling and matching of these configurations,
3354	      the same RTP payload type number used in the offer SHOULD also
3355	      be used in the answer, as specified in [RFC3264].  An answer
3356	      MUST NOT contain a payload type number used in the offer
3357	      unless the configuration is the same as in the offer.

3359	   o  Parameter sets received MUST be associated with the
3360	      originating source and MUST only be used in decoding the
3361	      incoming bitstream from the same source.

3363	   o  The rules for other parameters are the same as above for
3364	      unicast as long as the three above rules are obeyed.

3366	   Table 1 lists the interpretation of all the parameters that MUST
3367	   be used for the various combinations of offer, answer, and
3368	   direction attributes.  Note that the two columns wherein the
3369	   recv-sub-layer-id parameter is used only apply to answers,
3370	   whereas the other columns apply to both offers and answers.

3372	   Table 1.  Interpretation of parameters for various combinations
3373	   of offers, answers, direction attributes, with and without recv-
3374	   sub-layer-id.  Columns that do not indicate offer or answer apply
3375	   to both.

3377	                                          sendonly --+
3378	            answer: recvonly, recv-sub-layer-id --+  |
3379	              recvonly w/o recv-sub-layer-id --+  |  |
3380	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
3381	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
3382	                                         |  |  |  |  |
3383	      profile-space                      C  D  C  D  P
3384	      profile-id                         C  D  C  D  P
3385	      tier-flag                          C  D  C  D  P
3386	      level-id                           D  D  D  D  P
3387	      interop-constraints                C  D  C  D  P
3388	      profile-compatibility-indicator    C  D  C  D  P
3389	      tx-mode                            C  C  C  C  P
3390	      max-recv-level-id                  R  R  R  R  -
3391	      sprop-max-don-diff                 P  P  -  -  P
3392	      sprop-depack-buf-nalus             P  P  -  -  P
3393	      sprop-depack-buf-bytes             P  P  -  -  P
3394	      depack-buf-cap                     R  R  R  R  -
3395	      sprop-segmentation-id              P  P  P  P  P
3396	      sprop-spatial-segmentation-idc     P  P  P  P  P
3397	      max-br                             R  R  R  R  -
3398	      max-cpb                            R  R  R  R  -
3399	      max-dpb                            R  R  R  R  -
3400	      max-lsr                            R  R  R  R  -
3401	      max-lps                            R  R  R  R  -
3402	      max-tr                             R  R  R  R  -
3403	      max-tc                             R  R  R  R  -
3404	      max-fps                            R  R  R  R  -
3405	      sprop-vps                          P  P  -  -  P
3406	      sprop-sps                          P  P  -  -  P
3407	      sprop-pps                          P  P  -  -  P
3408	      sprop-sub-layer-id                 P  P  -  -  P
3409	      recv-sub-layer-id                  X  O  X  O  -
3410	      dec-parallel-cap                   R  R  R  R  -
3411	      include-dph                        R  R  R  R  -

3413	     Legend:

3415	      C: configuration for sending and receiving bitstreams
3416	      D: changable configuration, same as C except possible
3417	         to answer with a different but consistent value (see the
3418	         semantics of the six parameters related to profile, tier,
3419	         and level on these parameters being consistent)
3420	      P: properties of the bitstream to be sent
3421	      R: receiver capabilities
3422	      O: operation point selection
3423	      X: MUST NOT be present
3424	      -: not usable, when present MUST be ignored

3426	   Parameters used for declaring receiver capabilities are in
3427	   general downgradable; i.e. they express the upper limit for a
3428	   sender's possible behavior.  Thus, a sender MAY select to set its
3429	   encoder using only lower/lesser or equal values of these
3430	   parameters.

3432	   When the answer does not include recv-sub-layer-id that is less
3433	   than the sprop-sub-layer-id in the offer, parameters declaring a
3434	   configuration point are not changeable, with the exception of the
3435	   level-id parameter for unicast usage, and these parameters
3436	   express values a receiver expects to be used and MUST be used
3437	   verbatim in the answer as in the offer.

3439	   When a sender's capabilities are declared with the configuration
3440	   parameters, these parameters express a configuration that is
3441	   acceptable for the sender to receive bitstreams.  In order to
3442	   achieve high interoperability levels, it is often advisable to
3443	   offer multiple alternative configurations.  It is impossible to
3444	   offer multiple configurations in a single payload type.  Thus,
3445	   when multiple configuration offers are made, each offer requires
3446	   its own RTP payload type associated with the offer.  However, it
3447	   is possible to offer multiple operation points using one
3448	   configuration in a single payload type by including sprop-vps in
3449	   the offer and recv-sub-layer-id in the answer.

3451	   A receiver SHOULD understand all media type parameters, even if
3452	   it only supports a subset of the payload format's functionality.
3453	   This ensures that a receiver is capable of understanding when an
3454	   offer to receive media can be downgraded to what is supported by
3455	   the receiver of the offer.

3457	   An answerer MAY extend the offer with additional media format
3458	   configurations.  However, to enable their usage, in most cases a
3459	   second offer is required from the offerer to provide the
3460	   bitstream property parameters that the media sender will use.
3461	   This also has the effect that the offerer has to be able to
3462	   receive this media format configuration, not only to send it.

3464	7.2.3 Usage in Declarative Session Descriptions

3466	   When HEVC over RTP is offered with SDP in a declarative style, as
3467	   in Real Time Streaming Protocol (RTSP) [RFC2326] or Session
3468	   Announcement Protocol (SAP) [RFC2974], the following
3469	   considerations are necessary.

3471	   o  All parameters capable of indicating both bitstream properties
3472	      and receiver capabilities are used to indicate only bitstream
3473	      properties.  For example, in this case, the parameter profile-
3474	      tier-level-id declares the values used by the bitstream, not
3475	      the capabilities for receiving bitstreams.  This results in
3476	      that the following interpretation of the parameters MUST be
3477	      used:

3479	      o Declaring actual configuration or bitstream properties:
3480	         - profile-space
3481	         - profile-id
3482	         - tier-flag
3483	         - level-id
3484	         - interop-constraints
3485	         - profile-compatibility-indicator
3486	         - tx-mode
3487	         - sprop-vps
3488	         - sprop-sps
3489	         - sprop-pps
3490	         - sprop-max-don-diff
3491	         - sprop-depack-buf-nalus
3492	         - sprop-depack-buf-bytes
3493	         - sprop-segmentation-id
3494	         - sprop-spatial-segmentation-idc

3496	      o Not usable (when present, they MUST be ignored):
3497	         - max-lps
3498	         - max-lsr
3499	         - max-cpb
3500	         - max-dpb
3501	         - max-br
3502	         - max-tr
3503	         - max-tc
3504	         - max-fps
3505	         - max-recv-level-id
3506	         - depack-buf-cap
3507	         - sprop-sub-layer-id
3508	         - dec-parallel-cap
3509	         - include-dph

3511	   o  A receiver of the SDP is required to support all parameters
3512	      and values of the parameters provided; otherwise, the receiver
3513	      MUST reject (RTSP) or not participate in (SAP) the session.
3514	      It falls on the creator of the session to use values that are
3515	      expected to be supported by the receiving application.

3517	7.2.4 Parameter Sets Considerations

3519	   When out-of-band transport of parameter sets is used, parameter
3520	   sets MAY still be additionally transported in-band unless
3521	   explicitly disallowed by an application, and some of these
3522	   additionally in-band transported parameter sets may update some
3523	   of the out-of-band transported parameter sets.  Update of a
3524	   parameter set refers to sending of a parameter set of the same
3525	   type using the same parameter set ID but with different values
3526	   for at least one other parameter of the parameter set.

3528	7.2.5 Dependency Signaling in Multi-Stream Mode

3530	   If MRST or MRMT is used, the rules on signaling media decoding
3531	   dependency in SDP as defined in [RFC5583] apply.  The rules on
3532	   "hierarchical or layered encoding" with multicast in Section 5.7
3533	   of [RFC4566] do not apply.  This means that the notation for
3534	   Connection Data "c=" SHALL NOT be used with more than one
3535	   address, i.e. the sub-field <number of addresses> in the sub-
3536	   field <connection-address> of the "c=" field, described in
3537	   [RFC4566], must not be present.  The order of session dependency
3538	   is given from the RTP stream containing the lowest temporal sub-
3539	   layer to the RTP stream containing the highest temporal sub-
3540	   layer.

3542	8 Use with Feedback Messages

3544	   The following subsections define the use of the Picture Loss
3545	   Indication (PLI), Slice Lost Indication (SLI), Reference Picture
3546	   Selection Indication (RPSI), and Full Intra Request (FIR)
3547	   feedback messages with HEVC. The PLI, SLI, and RPSI messages are
3548	   defined in  RFC 4585 [RFC4585], and the FIR message is defined in
3549	   RFC 5104 [RFC5104].

3551	8.1 Picture Loss Indication (PLI)

3553	   As specified in RFC 4585 Section 6.3.1, the reception of a
3554	   picture loss indication by a media sender indicates "the loss of
3555	   an undefined amount of coded video data belonging to one or more
3556	   pictures."  Without having any specific knowledge of the setup of
3557	   the bitstream (such as: use and location of in-band parameter
3558	   sets, non-IDR decoder refresh points, picture structures, and so
3559	   forth) a reaction to the reception of an PLI by an HEVC sender
3560	   SHOULD be to send an IDR picture and relevant parameter sets;
3561	   potentially with sufficient redundancy so to ensure correct
3562	   reception.  However, sometimes information about the bitstream
3563	   structure is known.  For example, state could have been
3564	   established outside of the mechanisms defined in this document
3565	   that parameter sets are conveyed out of band only, and stay
3566	   static for the duration of the session.  In that case, it is
3567	   obviously unnecessary to send them in-band as a result of the
3568	   reception of a PLI.  Other examples could be devised based on a
3569	   priori knowledge of different aspects of the bitstream structure.
3570	   In all cases, the timing and congestion control mechanisms of RFC
3571	   4585 MUST be observed.

3573	8.2 Slice Loss Indication (SLI)

3575	   RFC 4585's Slice Loss Indication can be used to indicate, to a
3576	   sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB
3577	   raster scan order of a picture.  In the SLI's Feedback Control
3578	   Indication (FCI) field, the subfield "First" MUST be set to the
3579	   CTB address of the first lost CTB.  Note that the CTB address is
3580	   in CTB raster scan order of a picture.  For the first CTB of a
3581	   slice segment, the CTB address is the value of
3582	   slice_segment_address when present; or 0 when the value of
3583	   first_slice_segement_in_pic_flag is equal to 1; both syntax
3584	   elements are in the slice segment header.  The subfield "Number"
3585	   MUST be set to the number of consecutive lost CTBs, again in CTB
3586	   raster scan order of a picture.  Note that due to both the
3587	   "First" and "Number" are counted in CTBs in CTB raster scan
3588	   order, of a picture, not in tile scan order (which is the
3589	   bitstream order of CTBs), multiple SLI messages may be needed to
3590	   report the loss of one tile covering multiple CTB rows but less
3591	   wide than the picture.

3593	   The subfield "PictureID" MUST be set to the 6 least significant
3594	   bits of a binary representation of the value of PicOrderCntVal,
3595	   as defined in [HEVC], of the picture for which the lost CTBs are
3596	   indicated.  Note that for IDR pictures the syntax element
3597	   slice_pic_order_cnt_lsb is not present, but then the value is
3598	   inferred to be equal to 0.

3600	   As described in RFC 4585, an encoder in a media sender can use
3601	   these information to "clean up" the corrupted picture by sending
3602	   intra information, while observing the constraints described in
3603	   RFC 4585, for example with respect to congestion control.  In
3604	   many cases, error tracking is required to identify the corrupted
3605	   region in the receiver's state (reference pictures) because of
3606	   error import in uncorrupted regions of the picture through motion
3607	   compensation.  Reference picture selection can also be used to
3608	   "clean up" the corrupted picture, which is usually more efficient
3609	   and less likely to generate congestion than sending intra
3610	   information.

3612	   In contrast to the video codecs contemplated in RFC 4585 and RFC
3613	   5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to
3614	   16x16 luma samples, but variable.  That, however, does not create
3615	   a conceptual difficulty with SLI, because the setting of the CTB
3616	   size is a sequence-level functionality, and using a slice loss
3617	   indication across CVS boundaries is meaningless as there is no
3618	   prediction across sequence boundaries.  However, a proper use of
3619	   SLI messages is not as straightforward as it was with older,
3620	   fixed-macroblock-sized video codecs, as the state of the sequence
3621	   parameter set (where the CTB size is located) has to be taken
3622	   into account when interpreting the "First" subfield in the FCI.

3624	8.3 Reference Picture Selection Indication (RPSI)

3626	   Feedback based reference picture selection has been shown as a
3627	   powerful tool to stop temporal error propagation for improved
3628	   error resilience [Girod99][Wang05].  In one approach, the decoder
3629	   side tracks errors in the decoded pictures and informs to the
3630	   encoder side that a particular picture that has been decoded
3631	   relatively earlier is correct and still present in the decoded
3632	   picture buffer and requests the encoder to use that correct
3633	   picture availability information when encoding the next picture,
3634	   so to stop further temporal error propagation.  For this
3635	   approach, the decoder side should use the RPSI feedback message.

3637	   Encoders can encode some long-term reference pictures as
3638	   specified in H.264 or HEVC for purposes described in the previous
3639	   paragraph without the need of a huge decoded picture buffer.  As
3640	   shown in [Wang05], with a flexible reference picture management
3641	   scheme as in H.264 and HEVC, even a decoded picture buffer size
3642	   of two picture storage buffers would work for the approach
3643	   described in the previous paragraph.

3645	   The field "Native RPSI bit string defined per codec" is a base16
3646	   [RFC4648] representation of the 8 bits consisting of 2 most
3647	   significant bits equal to 0 and 6 bits of nuh_layer_id, as
3648	   defined in [HEVC], followed by the 32 bits representing the value
3649	   of the PicOrderCntVal (in network byte order), as defined in
3650	   [HEVC], for the picture that is indicated by the RPSI feedback
3651	   message.

3653	   The use of the RPSI feedback message as positive acknowledgement
3654	   with HEVC is deprecated.  In other words, the RPSI feedback
3655	   message MUST only be used as a reference picture selection
3656	   request, such that it can also be used in multicast.

3658	8.4 Full Intra Request (FIR)

3660	   The purpose of the FIR message is to force an encoder to send an
3661	   independent decoder refresh point as soon as possible (observing,
3662	   for example, the congestion control related constraints set out
3663	   in RFC 5104).

3665	   Upon reception of a FIR, a sender MUST send an IDR picture.
3666	   Parameter sets MUST also be sent, except when there is a priori
3667	   knowledge that the parameter sets have been correctly
3668	   established.  A typical example for that is an understanding
3669	   between sender and receiver, established by means outside this
3670	   document, that parameter sets are exclusively sent out of band.

3672	9 Security Considerations

3674	   The scope of this Security Considerations section is limited to
3675	   the payload format itself, and to one feature of HEVC that may
3676	   pose a particularly serious security risk if implemented naively.
3677	   The payload format, in isolation, does not form a complete
3678	   system.  Implementers are advised to read and understand relevant
3679	   security related documents, especially those pertaining to RTP
3680	   (see the security considerations section in RFC 3550 [RFC3550]),
3681	   and the security of the call control stack chosen (that may make
3682	   use of the media type registration of this memo).  Implementers
3683	   should also consider known security vulnerabilities of video
3684	   coding and decoding implementations in general and avoid those.

3686	   Within this RTP payload format, and with the exception of the
3687	   user data SEI message as described below, no security threats
3688	   other than those common to RTP payload formats are known.  In
3689	   other words, neither the various media plane based mechanisms,
3690	   nor the signaling part of this memo, seems to pose a security
3691	   risk beyond those common to all RTP based systems.

3693	   RTP packets using the payload format defined in this
3694	   specification are subject to the security considerations
3695	   discussed in the RTP specification [RFC3550], and in any
3696	   applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF
3697	   [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124].  However,
3698	   as "Securing the RTP Protocol Framework: Why RTP Does Not Mandate
3699	   a Single Media Security Solution" RFC 7202 [RFC7202] discusses,
3700	   it is not an RTP payload format's responsibility to discuss or
3701	   mandate what solutions are used to meet the basic security goals
3702	   like confidentiality, integrity and source authenticity for RTP
3703	   in general.  This responsibility lays on anyone using RTP in an
3704	   application.  They can find guidance on available security
3705	   mechanisms and important considerations in Options for Securing
3706	   RTP Sessions [RFC7201]. Applications SHOULD use one or more
3707	   appropriate strong security mechanisms.  The rest of this
3708	   security consideration section discusses the security impacting
3709	   properties of the payload format itself.

3711	   Because the data compression used with this payload format is
3712	   applied end-to-end, any encryption needs to be performed after
3713	   compression.  A potential denial-of-service threat exists for
3714	   data encodings using compression techniques that have non-uniform
3715	   receiver-end computational load.  The attacker can inject
3716	   pathological datagrams into the bitstream that are complex to
3717	   decode and that cause the receiver to be overloaded.  H.265 is
3718	   particularly vulnerable to such attacks, as it is extremely
3719	   simple to generate datagrams containing NAL units that affect the
3720	   decoding process of many future NAL units.  Therefore, the usage
3721	   of data origin authentication and data integrity protection of at
3722	   least the RTP packet is RECOMMENDED, for example, with SRTP
3723	   [RFC3711].

3725	   Like [H.264], HEVC includes a user data Supplementary Enhancement
3726	   Information (SEI) message.  This SEI message allows inclusion of
3727	   an arbitrary bitstring into the video bitstream. Such a bitstring
3728	   could include JavaScript, machine code, and other active content.
3729	   HEVC leaves the handling of this SEI message to the receiving
3730	   system.  In order to avoid harmful side effects of the user data
3731	   SEI message, decoder implementations cannot naviely trust its
3732	   content.  For example, it would be a bad and insecure
3733	   implementation practice to forward any JavaScript a decoder
3734	   implementation detects to a web browser.  The safest way to deal
3735	   with user data SEI messages is to simply discard them, but that
3736	   can have negative side effects on the quality of experience by
3737	   the user.

3739	   End-to-end security with authentication, integrity, or
3740	   confidentiality protection will prevent a MANE from performing
3741	   media-aware operations other than discarding complete packets.
3742	   In the case of confidentiality protection, it will even be
3743	   prevented from discarding packets in a media-aware way.  To be
3744	   allowed to perform such operations, a MANE is required to be a
3745	   trusted entity that is included in the security context
3746	   establishment.

3748	10 Congestion Control

3750	   Congestion control for RTP SHALL be used in accordance with RTP
3751	   [RFC3550] and with any applicable RTP profile, e.g. AVP
3752	   [RFC3551].  If best-effort service is being used, an additional
3753	   requirement is that users of this payload format MUST monitor
3754	   packet loss to ensure that the packet loss rate is within an
3755	   acceptable range.  Packet loss is considered acceptable if a TCP
3756	   flow across the same network path, and experiencing the same
3757	   network conditions, would achieve an average throughput, measured
3758	   on a reasonable timescale, that is not less than all RTP streams
3759	   combined is achieving.  This condition can be satisfied by
3760	   implementing congestion control mechanisms to adapt the
3761	   transmission rate, the number of layers subscribed for a layered
3762	   multicast session, or by arranging for a receiver to leave the
3763	   session if the loss rate is unacceptably high.

3765	   The bitrate adaptation necessary for obeying the congestion
3766	   control principle is easily achievable when real-time encoding is
3767	   used, for example by adequately tuning the quantization
3768	   parameter.

3770	   However, when pre-encoded content is being transmitted, bandwidth
3771	   adaptation requires the pre-coded bitstream to be tailored for
3772	   such adaptivity.  The key mechanism available in HEVC is temporal
3773	   scalability.  A media sender can remove NAL units belonging to
3774	   higher temporal sub-layers (i.e. those NAL units with a high
3775	   value of TID) until the sending bitrate drops to an acceptable
3776	   range.  HEVC contains mechanisms that allow the lightweight
3777	   identification of switching points in temporal enhancement
3778	   layers, as discussed in Section 1.1.2 of this memo.  An HEVC
3779	   media sender can send packets belonging to NAL units of temporal
3780	   enhancement layers starting from these switching points to probe
3781	   for available bandwidth and to utilized bandwidth that has been
3782	   shown to be available.

3784	   Above mechanisms generally work within a defined profile and
3785	   level and, therefore, no renegotiation of the channel is
3786	   required.  Only when non-downgradable parameters (such as
3787	   profile) are required to be changed does it become necessary to
3788	   terminate and restart the RTP stream(s).  This may be
3789	   accomplished by using different RTP payload types.

3791	   MANEs MAY remove certain unusable packets from the RTP stream
3792	   when that RTP stream was damaged due to previous packet losses.
3793	   This can help reduce the network load in certain special cases.
3794	   For example, MANES can remove those FUs where the leading FUs
3795	   belonging to the same NAL unit have been lost or those dependent
3796	   slice segments when the leading slice segments belonging to the
3797	   same slice have been lost, because the trailing FUs or dependent
3798	   slice segments are meaningless to most decoders.  MANES can also
3799	   remove higher temporal scalable layers if the outbound
3800	   transmission (from the MANE's viewpoint) experiences congestion.

3802	11 IANA Consideration

3804	   A new media type, as specified in Section 7.1 of this memo,
3805	   should be registered with IANA.

3807	12 Acknowledgements

3809	   Muhammed Coban and Marta Karczewicz are thanked for discussions
3810	   on the specification of the use with feedback messages and other
3811	   aspects in this memo.  Jonathan Lennox and Jill Boyce are thanked
3812	   for their contributions to the PACI design included in this memo.
3813	   Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund,
3814	   and Tom Kristensen are thanked for their contributions to
3815	   parallel processing related signalling.  Magnus Westerlund,
3816	   Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even,
3817	   Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross
3818	   Finlayson, Danny Hong, Bo Burman, Ben Campbell, Brian Carpenter,
3819	   Qin Wu, and Stephen Farrell made valuable reviewing comments that
3820	   led to improvements.

3822	   This document was prepared using 2-Word-v2.0.template.dot, and
3823	   the .txt file was generated using the online Word-post procesor
3824	   available here: http://www.isi.edu/touch/tools/rfc-word-
3825	   template.html.

3827	13 References

3829	13.1 Normative References

3831	   [HEVC]    ITU-T Recommendation H.265, "High efficiency video
3832	             coding", April 2013.

3834	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
3835	             generic audiovisual services", April 2013.

3837	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
3838	             Requirement Levels", BCP 14, RFC 2119, March 1997.

3840	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
3841	             Model with Session Description Protocol (SDP)", RFC
3842	             3264, June 2002.

3844	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and
3845	             Jacobson, V., "RTP: A Transport Protocol for Real-Time
3846	             Applications", STD 64, RFC 3550, July 2003.

3848	   [RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio
3849	             and Video Conferences with Minimal Control", STD 65,
3850	             RFC 3551, July 2003.

3852	   [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
3853	             Norrman, K., "The Secure Real-time Transport Protocol
3854	             (SRTP)", RFC 3711, March 2004.

3856	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP:
3857	             Session Description Protocol", RFC 4566, July 2006.

3859	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
3860	             J., "Extended RTP Profile for Real-time Transport
3861	             Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC
3862	             4585, July 2006.

3864	   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
3865	             Encodings", RFC 4648, October 2006.

3867	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman,
3868	             B., "Codec Control Messages in the RTP Audio-Visual
3869	             Profile with Feedback (AVPF)", RFC 5104, February 2008.

3871	   [RFC5124] Ott, J. and Carrara, E., "Extended Secure RTP Profile
3872	             for Real-time Transport Control Protocol (RTCP)-Based
3873	             Feedback (RTP/SAVPF)", RFC 5124, February 2008.

3875	   [RFC5234] Crocker, D. and Overell, P., "Augmented BNF for Syntax
3876	             Specifications: ABNF", RFC 5234, January 2008.

3878	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
3879	             Media Attributes in the Session Description Protocol",
3880	             RFC 5576, June 2009.

3882	   [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
3883	             Dependency in the Session Description Protocol (SDP)",
3884	             RFC 5583, July 2009.

3886	13.2 Informative References

3888	   [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
3889	             Streaming Service (PSS); Progressive Download and
3890	             Dynamic Adaptive Streaming over HTTP (3GP-DASH)",
3891	             v12.1.0, December 2013.

3893	   [3GPPFF]  3GPP TS 26.244, "Transparent end-to-end packet switched
3894	             streaming service (PSS); 3GPP file format (3GP)",
3895	             v12.20, December 2013.

3897	   [CABAC]   Sole, J., Joshi, R., Nguyen, N., Ji, T., Karczewicz,
3898	             M., Clare, G., Henry, F., and Duenas, A., "Transform
3899	             coefficient coding in HEVC", IEEE Transactions on
3900	             Circuts and Systems for Video Technology, Vol. 22, No.
3901	             12, pp. 1765-1777, December 2012.

3903	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error
3904	             control for mobile video transmission", Proceedings
3905	             IEEE, Vol. 87, No. 10, pp. 1707-1723, October 1999.

3907	   [HEVC draft v2]
3908	             Draft version 2 of HEVC, "High Efficiency Video Coding
3909	             (HEVC) Range Extensions text specification: Draft 7",
3910	             JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting, 27
3911	             March - 4 April 2014, Valencia, Spain.

3913	   [I-D.ietf-avtcore-rtp-multi-stream]
3914	             Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
3915	             "Sending Multiple Media Streams in a Single RTP
3916	             Session", draft-ietf-avtcore-rtp-multi-stream-09 (work
3917	             in progress), September 2015.

3919	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
3920	             Holmberg, C., Alvestrand, H., and C. Jennings,
3921	             "Multiplexing Negotiation Using Session Description
3922	             Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
3923	             bundle-negotiation-23 (work in progress), July 2015.

3925	   [I-D.ietf-avtext-rtp-grouping-taxonomy]
3926	             Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G.,
3927	             and Burman, B. "A Taxonomy of Grouping Semantics and
3928	             Mechanisms for Real-Time Transport", draft-ietf-avtext-
3929	             rtp-grouping-taxonomy-08 (work in progress), July 2015.

3931	   [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
3932	             Coding of audio-visual objects - Part 12: ISO base
3933	             media file format" | "Information technology - JPEG
3934	             2000 image coding system - Part 12: ISO base media file
3935	             format", 2012.

3937	   [JCTVC-J0107]
3938	             Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
3939	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-
3940	             L0107, 10th JCT-VC meeting, July 2012, Stockholm,
3941	             Sweden.

3943	   [MPEG2S]  ISO/IEC 13818-1, "Information technology - Generic
3944	             coding of moving pictures and associated audio
3945	             information: Systems", 2013.

3947	   [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
3948	             adaptive streaming over HTTP (DASH) - Part 1: Media
3949	             presentation description and segment formats", 2012.

3951	   [RFC2326] Schulzrinne, H., Rao, A., and Lanphier R., "Real Time
3952	             Streaming Protocol (RTSP)", RFC 2326, April 1998.

3954	   [RFC2974] Handley, M., Perkins C., and Whelan E., "Session
3955	             Announcement Protocol", RFC 2974, October 2000.

3957	   [RFC5117] Westerlund, M. and Wenger, S., "RTP Topologies", RFC
3958	             5117, January 2008.

3960	   [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of
3961	             RTP Flows", RFC 6051, November 2010.

3963	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup,
3964	             "RTP Payload Format for H.264 Video", RFC 6184, May
3965	             2011.

3967	   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
3968	             Eleftheriadis, "RTP Payload Format for Scalable Video
3969	             Coding", RFC 6190, May 2011.

3971	   [RFC7201] Westerlund, M. and Perkins, C., "Options for Securing
3972	             RTP Sessions", RFC 7201, April 2014.

3974	   [RFC7202] Perkins, C. and Westerlund, M., "Securing the RTP
3975	             Framework: Why RTP Does Not Mandate a Single Media
3976	             Security Solution", RFC 7202, April 2014.

3978	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient
3979	             video coding using flexible reference fames", Visual
3980	             Communications and Image Processing 2005 (VCIP 2005),
3981	             July 2005, Beijing, China.

3983	14 Authors' Addresses

3985	   Ye-Kui Wang
3986	   Qualcomm Incorporated
3987	   5775 Morehouse Drive
3988	   San Diego, CA 92121, USA
3989	   Phone: +1-858-651-8345
3990	   EMail: yekui.wang@gmail.com

3992	   Yago Sanchez
3993	   Fraunhofer HHI
3994	   Einsteinufer 37
3995	   D-10587 Berlin, Germany
3996	   Phone: +49-30-31002-227
3997	   Email: yago.sanchez@hhi.fraunhofer.de

3999	   Thomas Schierl
4000	   Fraunhofer HHI
4001	   Einsteinufer 37
4002	   D-10587 Berlin, Germany
4003	   Phone: +49-30-31002-227
4004	   Email: ts@thomas-schierl.de

4006	   Stephan Wenger
4007	   Vidyo, Inc.
4008	   433 Hackensack Ave., 7th floor
4009	   Hackensack, N.J. 07601, USA
4010	   Phone: +1-415-713-5473
4011	   EMail: stewe@stewe.org

4013	   Miska M. Hannuksela
4014	   Nokia Corporation
4015	   P.O. Box 1000
4016	   33721 Tampere, Finland
4017	   Phone: +358-7180-08000
4018	   EMail: miska.hannuksela@nokia.com