idnits 2.17.1 

draft-ietf-payload-rtp-h265-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (August 13, 2014) is 3534 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '0' on line 1113

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HEVC'

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-05

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-07

  == Outdated reference: A later version (-08) exists of
     draft-ietf-avtext-rtp-grouping-taxonomy-02

  -- Obsolete informational reference (is this intentional?): RFC 2326
     (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Network Working Group                                      Y.-K. Wang
2	Internet Draft                                               Qualcomm
3	Intended status: Standards track                           Y. Sanchez
4	Expires: February 2015                                     T. Schierl
5	                                                       Fraunhofer HHI
6	                                                          S. Wenger
7	                                                              Vidyo
8	                                                     M. M. Hannuksela
9	                                                                Nokia
10	                                                      August 13, 2014

12	          RTP Payload Format for High Efficiency Video Coding
13	                   draft-ietf-payload-rtp-h265-06.txt

15	Abstract

17	   This memo describes an RTP payload format for the video coding
18	   standard ITU-T Recommendation H.265 and ISO/IEC International
19	   Standard 23008-2, both also known as High Efficiency Video Coding
20	   (HEVC) and developed by the Joint Collaborative Team on Video
21	   Coding (JCT-VC).  The RTP payload format allows for packetization
22	   of one or more Network Abstraction Layer (NAL) units in each RTP
23	   packet payload, as well as fragmentation of a NAL unit into
24	   multiple RTP packets.  Furthermore, it supports transmission of
25	   an HEVC bitstream over a single as well as multiple RTP streams.
26	   The payload format has wide applicability in videoconferencing,
27	   Internet video streaming, and high bit-rate entertainment-quality
28	   video, among others.

30	Status of this Memo

32	   This Internet-Draft is submitted to IETF in full conformance with
33	   the provisions of BCP 78 and BCP 79.

35	   Internet-Drafts are working documents of the Internet Engineering
36	   Task Force (IETF), its areas, and its working groups.  Note that
37	   other groups may also distribute working documents as Internet-
38	   Drafts.

40	   Internet-Drafts are draft documents valid for a maximum of six
41	   months and may be updated, replaced, or obsoleted by other
42	   documents at any time.  It is inappropriate to use Internet-
43	   Drafts as reference material or to cite them other than as "work
44	   in progress."

46	   The list of current Internet-Drafts can be accessed at
47	   http://www.ietf.org/ietf/1id-abstracts.txt.

49	   The list of Internet-Draft Shadow Directories can be accessed at
50	   http://www.ietf.org/shadow.html.

52	   This Internet-Draft will expire on February 13, 2015.

54	Copyright and License Notice

56	   Copyright (c) 2014 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (http://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with
64	   respect to this document.  Code Components extracted from this
65	   document must include Simplified BSD License text as described in
66	   Section 4.e of the Trust Legal Provisions and are provided
67	   without warranty as described in the Simplified BSD License.

69	Table of Contents

71	   Abstract.........................................................1
72	   Status of this Memo..............................................1
73	   Table of Contents................................................3
74	   1 Introduction...................................................5
75	      1.1 Overview of the HEVC Codec................................5
76	         1.1.1 Coding-Tool Features.................................5
77	         1.1.2 Systems and Transport Interfaces.....................7
78	         1.1.3 Parallel Processing Support.........................14
79	         1.1.4 NAL Unit Header.....................................16
80	      1.2 Overview of the Payload Format...........................18
81	   2 Conventions...................................................18
82	   3 Definitions and Abbreviations.................................19
83	      3.1 Definitions..............................................19
84	         3.1.1 Definitions from the HEVC Specification.............19
85	         3.1.2 Definitions Specific to This Memo...................21
86	      3.2 Abbreviations............................................22
87	   4 RTP Payload Format............................................24
88	      4.1 RTP Header Usage.........................................24
89	      4.2 Payload Header Usage.....................................26
90	      4.3 Payload Structures.......................................27
91	      4.4 Transmission Modes.......................................27
92	      4.5 Decoding Order Number....................................28
93	      4.6 Single NAL Unit Packets..................................30
94	      4.7 Aggregation Packets (APs)................................31
95	      4.8 Fragmentation Units (FUs)................................36
96	      4.9 PACI packets.............................................39
97	         4.9.1 Reasons for the PACI rules (informative)............42
98	         4.9.2 PACI extensions (Informative).......................43
99	      4.10 Temporal Scalability Control Information................44
100	   5 Packetization Rules...........................................46
101	   6 De-packetization Process......................................47
102	   7 Payload Format Parameters.....................................49
103	      7.1 Media Type Registration..................................50
104	      7.2 SDP Parameters...........................................75
105	         7.2.1 Mapping of Payload Type Parameters to SDP...........75
106	         7.2.2 Usage with SDP Offer/Answer Model...................77
107	         7.2.3 Usage in Declarative Session Descriptions...........86
108	         7.2.4 Parameter Sets Considerations.......................87
109	         7.2.5 Dependency Signaling in Multi-Stream Mode...........87
110	   8 Use with Feedback Messages....................................88
111	      8.1 Picture Loss Indication (PLI)............................89
112	      8.2 Slice Loss Indication....................................89
113	      8.3 Use of HEVC with the RPSI Feedback Message...............90
114	      8.4 Full Intra Request (FIR).................................91
115	   9 Security Considerations.......................................92
116	   10 Congestion Control...........................................93
117	   11 IANA Consideration...........................................94
118	   12 Acknowledgements.............................................94
119	   13 References...................................................95
120	      13.1 Normative References....................................95
121	      13.2 Informative References..................................96
122	   14 Authors' Addresses...........................................98

124	1 Introduction

126	1.1 Overview of the HEVC Codec

128	   High Efficiency Video Coding [HEVC], formally known as ITU-T
129	   Recommendation H.265 and ISO/IEC International Standard 23008-2
130	   was ratified by ITU-T in April 2013 and reportedly provides
131	   significant coding efficiency gains over H.264 [H.264].

133	   As both H.264 [H.264] and its RTP payload format [RFC6184] are
134	   widely deployed and generally known in the relevant implementer
135	   communities, frequently only the differences between those two
136	   specifications are highlighted in non-normative, explanatory
137	   parts of this memo.  Basic familiarity with both specifications
138	   is assumed for those parts.  However, the normative parts of this
139	   memo do not require study of H.264 or its RTP payload format.

141	   H.264 and HEVC share a similar hybrid video codec design.
142	   Conceptually, both technologies include a video coding layer
143	   (VCL), which is often used to refer to the coding-tool features,
144	   and a network abstraction layer (NAL), which is often used to
145	   refer to the systems and transport interface aspects of the
146	   codecs.

148	1.1.1 Coding-Tool Features

150	   Similarly to earlier hybrid-video-coding-based standards,
151	   including H.264, the following basic video coding design is
152	   employed by HEVC.  A prediction signal is first formed either by
153	   intra or motion compensated prediction, and the residual (the
154	   difference between the original and the prediction) is then
155	   coded.  The gains in coding efficiency are achieved by
156	   redesigning and improving almost all parts of the codec over
157	   earlier designs.  In addition, HEVC includes several tools to
158	   make the implementation on parallel architectures easier.  Below
159	   is a summary of HEVC coding-tool features.

161	   Quad-tree block and transform structure

163	   One of the major tools that contribute significantly to the
164	   coding efficiency of HEVC is the usage of flexible coding blocks
165	   and transforms, which are defined in a hierarchical quad-tree
166	   manner.  Unlike H.264, where the basic coding block is a
167	   macroblock of fixed size 16x16, HEVC defines a Coding Tree Unit
168	   (CTU) of a maximum size of 64x64.  Each CTU can be divided into
169	   smaller units in a hierarchical quad-tree manner and can
170	   represent smaller blocks down to size 4x4.  Similarly, the
171	   transforms used in HEVC can have different sizes, starting from
172	   4x4 and going up to 32x32.  Utilizing large blocks and transforms
173	   contribute to the major gain of HEVC, especially at high
174	   resolutions.

176	   Entropy coding

178	   HEVC uses a single entropy coding engine, which is based on
179	   Context Adaptive Binary Arithmetic Coding (CABAC), whereas H.264
180	   uses two distinct entropy coding engines.  CABAC in HEVC shares
181	   many similarities with CABAC of H.264, but contains several
182	   improvements.  Those include improvements in coding efficiency
183	   and lowered implementation complexity, especially for parallel
184	   architectures.

186	   In-loop filtering

188	   H.264 includes an in-loop adaptive deblocking filter, where the
189	   blocking artifacts around the transform edges in the
190	   reconstructed picture are smoothed to improve the picture quality
191	   and compression efficiency.  In HEVC, a similar deblocking filter
192	   is employed but with somewhat lower complexity.  In addition,
193	   pictures undergo a subsequent filtering operation called Sample
194	   Adaptive Offset (SAO), which is a new design element in HEVC.
195	   SAO basically adds a pixel-level offset in an adaptive manner and
196	   usually acts as a de-ringing filter.  It is observed that SAO
197	   improves the picture quality, especially around sharp edges
198	   contributing substantially to visual quality improvements of
199	   HEVC.

201	   Motion prediction and coding

203	   There have been a number of improvements in this area that are
204	   summarized as follows.  The first category is motion merge and
205	   advanced motion vector prediction (AMVP) modes.  The motion
206	   information of a prediction block can be inferred from the
207	   spatially or temporally neighboring blocks.  This is similar to
208	   the DIRECT mode in H.264 but includes new aspects to incorporate
209	   the flexible quad-tree structure and methods to improve the
210	   parallel implementations.  In addition, the motion vector
211	   predictor can be signaled for improved efficiency.  The second
212	   category is high-precision interpolation.  The interpolation
213	   filter length is increased to 8-tap from 6-tap, which improves
214	   the coding efficiency but also comes with increased complexity.
215	   In addition, the interpolation filter is defined with higher
216	   precision without any intermediate rounding operations to further
217	   improve the coding efficiency.

219	   Intra prediction and intra coding

221	   Compared to 8 intra prediction modes in H.264, HEVC supports
222	   angular intra prediction with 33 directions.  This increased
223	   flexibility improves both objective coding efficiency and visual
224	   quality as the edges can be better predicted and ringing
225	   artifacts around the edges can be reduced.  In addition, the
226	   reference samples are adaptively smoothed based on the prediction
227	   direction.  To avoid contouring artifacts a new interpolative
228	   prediction generation is included to improve the visual quality.
229	   Furthermore, discrete sine transform (DST) is utilized instead of
230	   traditional discrete cosine transform (DCT) for 4x4 intra
231	   transform blocks.

233	   Other coding-tool features

235	   HEVC includes some tools for lossless coding and efficient screen
236	   content coding, such as skipping the transform for certain
237	   blocks.  These tools are particularly useful for example when
238	   streaming the user-interface of a mobile device to a large
239	   display.

241	1.1.2 Systems and Transport Interfaces

243	   HEVC inherited the basic systems and transport interfaces
244	   designs, such as the NAL-unit-based syntax structure, the
245	   hierarchical syntax and data unit structure from sequence-level
246	   parameter sets, multi-picture-level or picture-level parameter
247	   sets, slice-level header parameters, lower-level parameters, the
248	   supplemental enhancement information (SEI) message mechanism, the
249	   hypothetical reference decoder (HRD) based video buffering model,
250	   and so on.  In the following, a list of differences in these
251	   aspects compared to H.264 is summarized.

253	   Video parameter set

255	   A new type of parameter set, called video parameter set (VPS),
256	   was introduced.  For the first (2013) version of [HEVC], the
257	   video parameter set NAL unit is required to be available prior to
258	   its activation, while the information contained in the video
259	   parameter set is not necessary for operation of the decoding
260	   process.  For future HEVC extensions, such as the 3D or scalable
261	   extensions, the video parameter set is expected to include
262	   information necessary for operation of the decoding process, e.g.
263	   decoding dependency or information for reference picture set
264	   construction of enhancement layers.  The VPS provides a "big
265	   picture" of a bitstream, including what types of operation points
266	   are provided, the profile, tier, and level of the operation
267	   points, and some other high-level properties of the bitstream
268	   that can be used as the basis for session negotiation and content
269	   selection, etc. (see section 7.1).

271	   Profile, tier and level

273	   The profile, tier and level syntax structure that can be included
274	   in both VPS and sequence parameter set (SPS) includes 12 bytes of
275	   data to describe the entire bitstream (including all temporally
276	   scalable layers, which are referred to as sub-layers in the HEVC
277	   specification), and can optionally include more profile, tier and
278	   level information pertaining to individual temporally scalable
279	   layers.  The profile indicator indicates the "best viewed as"
280	   profile when the bitstream conforms to multiple profiles, similar
281	   to the major brand concept in the ISO base media file format
282	   (ISOBMFF) [ISOBMFF] and file formats derived based on ISOBMFF,
283	   such as the 3GPP file format [3GPPFF].  The profile, tier and
284	   level syntax structure also includes the indications of whether
285	   the bitstream is free of frame-packed content, whether the
286	   bitstream is free of interlaced source content and free of field
287	   pictures, i.e. contains only frame pictures of progressive
288	   source, such that clients/players with no support of post-
289	   processing functionalities for handling of frame-packed or
290	   interlaced source content or field pictures can reject those
291	   bitstreams.

293	   Bitstream and elementary stream

295	   HEVC includes a definition of an elementary stream, which is new
296	   compared to H.264.  An elementary stream consists of a sequence
297	   of one or more bitstreams.  An elementary stream that consists of
298	   two or more bitstreams has typically been formed by splicing
299	   together two or more bitstreams (or parts thereof).  When an
300	   elementary stream contains more than one bitstream, the last NAL
301	   unit of the last access unit of a bitstream (except the last
302	   bitstream in the elementary stream) must contain an end of
303	   bitstream NAL unit and the first access unit of the subsequent
304	   bitstream must be an intra random access point (IRAP) access
305	   unit.  This IRAP access unit may be a clean random access (CRA),
306	   broken link access (BLA), or instantaneous decoding refresh (IDR)
307	   access unit.

309	   Random access support

311	   HEVC includes signaling in NAL unit header, through NAL unit
312	   types, of IRAP pictures beyond IDR pictures.  Three types of IRAP
313	   pictures, namely IDR, CRA and BLA pictures are supported, wherein
314	   IDR pictures are conventionally referred to as closed group-of-
315	   pictures (closed-GOP) random access points, and CRA and BLA
316	   pictures are those conventionally referred to as open-GOP random
317	   access points.  BLA pictures usually originate from splicing of
318	   two bitstreams or part thereof at a CRA picture, e.g. during
319	   stream switching.  To enable better systems usage of IRAP
320	   pictures, altogether six different NAL units are defined to
321	   signal the properties of the IRAP pictures, which can be used to
322	   better match the stream access point (SAP) types as defined in
323	   the ISOBMFF [ISOBMFF], which are utilized for random access
324	   support in both 3GP-DASH [3GPDASH] and MPEG DASH [MPEGDASH].
325	   Pictures following an IRAP picture in decoding order and
326	   preceding the IRAP picture in output order are referred to as
327	   leading pictures associated with the IRAP picture.  There are two
328	   types of leading pictures, namely random access decodable leading
329	   (RADL) pictures and random access skipped leading (RASL)
330	   pictures.  RADL pictures are decodable when the decoding started
331	   at the associated IRAP picture, and RASL pictures are not
332	   decodable when the decoding started at the associated IRAP
333	   picture and are usually discarded.  HEVC provides mechanisms to
334	   enable the specification of conformance of bitstreams with RASL
335	   pictures being discarded, thus to provide a standard-compliant
336	   way to enable systems components to discard RASL pictures when
337	   needed.

339	   Temporal scalability support

341	   HEVC includes an improved support of temporal scalability, by
342	   inclusion of the signaling of TemporalId in the NAL unit header,
343	   the restriction that pictures of a particular temporal sub-layer
344	   cannot be used for inter prediction reference by pictures of a
345	   lower temporal sub-layer, the sub-bitstream extraction process,
346	   and the requirement that each sub-bitstream extraction output be
347	   a conforming bitstream.  Media-aware network elements (MANEs) can
348	   utilize the TemporalId in the NAL unit header for stream
349	   adaptation purposes based on temporal scalability.

351	   Temporal sub-layer switching support

353	   HEVC specifies, through NAL unit types present in the NAL unit
354	   header, the signaling of temporal sub-layer access (TSA) and
355	   stepwise temporal sub-layer access (STSA).  A TSA picture and
356	   pictures following the TSA picture in decoding order do not use
357	   pictures prior to the TSA picture in decoding order with
358	   TemporalId greater than or equal to that of the TSA picture for
359	   inter prediction reference.  A TSA picture enables up-switching,
360	   at the TSA picture, to the sub-layer containing the TSA picture
361	   or any higher sub-layer, from the immediately lower sub-layer.
362	   An STSA picture does not use pictures with the same TemporalId as
363	   the STSA picture for inter prediction reference.  Pictures
364	   following an STSA picture in decoding order with the same
365	   TemporalId as the STSA picture do not use pictures prior to the
366	   STSA picture in decoding order with the same TemporalId as the
367	   STSA picture for inter prediction reference.  An STSA picture
368	   enables up-switching, at the STSA picture, to the sub-layer
369	   containing the STSA picture, from the immediately lower sub-
370	   layer.

372	   Sub-layer reference or non-reference pictures

374	   The concept and signaling of reference/non-reference pictures in
375	   HEVC are different from H.264.  In H.264, if a picture may be
376	   used by any other picture for inter prediction reference, it is a
377	   reference picture; otherwise it is a non-reference picture, and
378	   this is signaled by two bits in the NAL unit header.  In HEVC, a
379	   picture is called a reference picture only when it is marked as
380	   "used for reference".  In addition, the concept of sub-layer
381	   reference picture was introduced.  If a picture may be used by
382	   another other picture with the same TemporalId for inter
383	   prediction reference, it is a sub-layer reference picture;
384	   otherwise it is a sub-layer non-reference picture.  Whether a
385	   picture is a sub-layer reference picture or sub-layer non-
386	   reference picture is signaled through NAL unit type values.

388	   Extensibility

390	   Besides the TemporalId in the NAL unit header, HEVC also includes
391	   the signaling of a six-bit layer ID in the NAL unit header, which
392	   must be equal to 0 for a single-layer bitstream.  Extension
393	   mechanisms have been included in VPS, SPS, PPS, SEI NAL unit,
394	   slice headers, and so on.  All these extension mechanisms enable
395	   future extensions in a backward compatible manner, such that
396	   bitstreams encoded according to potential future HEVC extensions
397	   can be fed to then-legacy decoders (e.g. HEVC version 1 decoders)
398	   and the then-legacy decoders can decode and output the base layer
399	   bitstream.

401	   Bitstream extraction

403	   HEVC includes a bitstream extraction process as an integral part
404	   of the overall decoding process, as well as specification of the
405	   use of the bitstream extraction process in description of
406	   bitstream conformance tests as part of the hypothetical reference
407	   decoder (HRD) specification.

409	   Reference picture management

411	   The reference picture management of HEVC, including reference
412	   picture marking and removal from the decoded picture buffer (DPB)
413	   as well as reference picture list construction (RPLC), differs
414	   from that of H.264.  Instead of the sliding window plus adaptive
415	   memory management control operation (MMCO) based reference
416	   picture marking mechanism in H.264, HEVC specifies a reference
417	   picture set (RPS) based reference picture management and marking
418	   mechanism, and the RPLC is consequently based on the RPS
419	   mechanism.  A reference picture set consists of a set of
420	   reference pictures associated with a picture, consisting of all
421	   reference pictures that are prior to the associated picture in
422	   decoding order, that may be used for inter prediction of the
423	   associated picture or any picture following the associated
424	   picture in decoding order.  The reference picture set consists of
425	   five lists of reference pictures; RefPicSetStCurrBefore,
426	   RefPicSetStCurrAfter, RefPicSetStFoll, RefPicSetLtCurr and
427	   RefPicSetLtFoll.  RefPicSetStCurrBefore, RefPicSetStCurrAfter and
428	   RefPicSetLtCurr contain all reference pictures that may be used
429	   in inter prediction of the current picture and that may be used
430	   in inter prediction of one or more of the pictures following the
431	   current picture in decoding order.  RefPicSetStFoll and
432	   RefPicSetLtFoll consist of all reference pictures that are not
433	   used in inter prediction of the current picture but may be used
434	   in inter prediction of one or more of the pictures following the
435	   current picture in decoding order.  RPS provides an "intra-coded"
436	   signaling of the DPB status, instead of an "inter-coded"
437	   signaling, mainly for improved error resilience.  The RPLC
438	   process in HEVC is based on the RPS, by signaling an index to an
439	   RPS subset for each reference index; this process is simpler than
440	   the RPLC process in H.264.

442	   Ultra low delay support

444	   HEVC specifies a sub-picture-level HRD operation, for support of
445	   the so-called ultra-low delay.  The mechanism specifies a
446	   standard-compliant way to enable delay reduction below one
447	   picture interval.  Sub-picture-level coded picture buffer (CPB)
448	   and DPB parameters may be signaled, and utilization of these
449	   information for the derivation of CPB timing (wherein the CPB
450	   removal time corresponds to decoding time) and DPB output timing
451	   (display time) is specified.  Decoders are allowed to operate the
452	   HRD at the conventional access-unit-level, even when the sub-
453	   picture-level HRD parameters are present.

455	   New SEI messages

457	   HEVC inherits many H.264 SEI messages with changes in syntax
458	   and/or semantics making them applicable to HEVC.  Additionally,
459	   there are a few new SEI messages reviewed briefly in the
460	   following paragraphs.

462	   The display orientation SEI message informs the decoder of a
463	   transformation that is recommended to be applied to the cropped
464	   decoded picture prior to display, such that the pictures can be
465	   properly displayed, e.g. in an upside-up manner.

467	   The structure of pictures SEI message provides information on the
468	   NAL unit types, picture order count values, and prediction
469	   dependencies of a sequence of pictures.  The SEI message can be
470	   used for example for concluding what impact a lost picture has on
471	   other pictures.

473	   The decoded picture hash SEI message provides a checksum derived
474	   from the sample values of a decoded picture.  It can be used for
475	   detecting whether a picture was correctly received and decoded.

477	   The active parameter sets SEI message includes the IDs of the
478	   active video parameter set and the active sequence parameter set
479	   and can be used to activate VPSs and SPSs.  In addition, the SEI
480	   message includes the following indications: 1) An indication of
481	   whether "full random accessibility" is supported (when supported,
482	   all parameter sets needed for decoding of the remaining of the
483	   bitstream when random accessing from the beginning of the current
484	   coded video sequence by completely discarding all access units
485	   earlier in decoding order are present in the remaining bitstream
486	   and all coded pictures in the remaining bitstream can be
487	   correctly decoded); 2) An indication of whether there is no
488	   parameter set within the current coded video sequence that
489	   updates another parameter set of the same type preceding in
490	   decoding order.  An update of a parameter set refers to the use
491	   of the same parameter set ID but with some other parameters
492	   changed.  If this property is true for all coded video sequences
493	   in the bitstream, then all parameter sets can be sent out-of-band
494	   before session start.

496	   The decoding unit information SEI message provides coded picture
497	   buffer removal delay information for a decoding unit.  The
498	   message can be used in very-low-delay buffering operations.

500	   The region refresh information SEI message can be used together
501	   with the recovery point SEI message (present in both H.264 and
502	   HEVC) for improved support of gradual decoding refresh (GDR).
503	   This supports random access from inter-coded pictures, wherein
504	   complete pictures can be correctly decoded or recovered after an
505	   indicated number of pictures in output/display order.

507	1.1.3 Parallel Processing Support

509	   The reportedly significantly higher encoding computational demand
510	   of HEVC over H.264, in conjunction with the ever increasing video
511	   resolution (both spatially and temporally) required by the
512	   market, led to the adoption of VCL coding tools specifically
513	   targeted to allow for parallelization on the sub-picture level.
514	   That is, parallelization occurs, at the minimum, at the
515	   granularity of an integer number of CTUs.  The targets for this
516	   type of high-level parallelization are multicore CPUs and DSPs as
517	   well as multiprocessor systems.  In a system design, to be
518	   useful, these tools require signaling support, which is provided
519	   in Section 7 of this memo.  This section provides a brief
520	   overview of the tools available in [HEVC].

522	   Many of the tools incorporated in HEVC were designed keeping in
523	   mind the potential parallel implementations in multi-core/multi-
524	   processor architectures.  Specifically, for parallelization, four
525	   picture partition strategies are available.

527	   Slices are segments of the bitstream that can be reconstructed
528	   independently from other slices within the same picture (though
529	   there may still be interdependencies through loop filtering
530	   operations).  Slices are the only tool that can be used for
531	   parallelization that is also available, in virtually identical
532	   form, in H.264.  Slices based parallelization does not require
533	   much inter-processor or inter-core communication (except for
534	   inter-processor or inter-core data sharing for motion
535	   compensation when decoding a predictively coded picture, which is
536	   typically much heavier than inter-processor or inter-core data
537	   sharing due to in-picture prediction), as slices are designed to
538	   be independently decodable.  However, for the same reason, slices
539	   can require some coding overhead.  Further, slices (in contrast
540	   to some of the other tools mentioned below) also serve as the key
541	   mechanism for bitstream partitioning to match Maximum Transfer
542	   Unit (MTU) size requirements, due to the in-picture independence
543	   of slices and the fact that each regular slice is encapsulated in
544	   its own NAL unit.  In many cases, the goal of parallelization and
545	   the goal of MTU size matching can place contradicting demands to
546	   the slice layout in a picture.  The realization of this situation
547	   led to the development of the more advanced tools mentioned
548	   below.

550	   Dependent slice segments allow for fragmentation of a coded slice
551	   into fragments at CTU boundaries without breaking any in-picture
552	   prediction mechanism.  They are complementary to the
553	   fragmentation mechanism described in this memo in that they need
554	   the cooperation of the encoder.  As a dependent slice segment
555	   necessarily contains an integer number of CTUs, a decoder using
556	   multiple cores operating on CTUs can process a dependent slice
557	   segment without communicating parts of the slice segment's
558	   bitstream to other cores.  Fragmentation, as specified in this
559	   memo, in contrast, does not guarantee that a fragment contains an
560	   integer number of CTUs.

562	   In wavefront parallel processing (WPP), the picture is
563	   partitioned into rows of CTUs.  Entropy decoding and prediction
564	   are allowed to use data from CTUs in other partitions.  Parallel
565	   processing is possible through parallel decoding of CTU rows,
566	   where the start of the decoding of a row is delayed by two CTUs,
567	   so to ensure that data related to a CTU above and to the right of
568	   the subject CTU is available before the subject CTU is being
569	   decoded.  Using this staggered start (which appears like a
570	   wavefront when represented graphically), parallelization is
571	   possible with up to as many processors/cores as the picture
572	   contains CTU rows.

574	   Because in-picture prediction between neighboring CTU rows within
575	   a picture is allowed, the required inter-processor/inter-core
576	   communication to enable in-picture prediction can be substantial.
577	   The WPP partitioning does not result in the creation of more NAL
578	   units compared to when it is not applied, thus WPP cannot be used
579	   for MTU size matching, though slices can be used in combination
580	   for that purpose.

582	   Tiles define horizontal and vertical boundaries that partition a
583	   picture into tile columns and rows.  The scan order of CTUs is
584	   changed to be local within a tile (in the order of a CTU raster
585	   scan of a tile), before decoding the top-left CTU of the next
586	   tile in the order of tile raster scan of a picture.  Similar to
587	   slices, tiles break in-picture prediction dependencies (including
588	   entropy decoding dependencies).  However, they do not need to be
589	   included into individual NAL units (same as WPP in this regard),
590	   hence tiles cannot be used for MTU size matching, though slices
591	   can be used in combination for that purpose.  Each tile can be
592	   processed by one processor/core, and the inter-processor/inter-
593	   core communication required for in-picture prediction between
594	   processing units decoding neighboring tiles is limited to
595	   conveying the shared slice header in cases a slice is spanning
596	   more than one tile, and loop filtering related sharing of
597	   reconstructed samples and metadata.  Insofar, tiles are less
598	   demanding in terms of inter-processor communication bandwidth
599	   compared to WPP due to the in-picture independence between two
600	   neighboring partitions.

602	1.1.4 NAL Unit Header

604	   HEVC maintains the NAL unit concept of H.264 with modifications.
605	   HEVC uses a two-byte NAL unit header, as shown in Figure 1.  The
606	   payload of a NAL unit refers to the NAL unit excluding the NAL
607	   unit header.

609	                   +---------------+---------------+
610	                   |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
611	                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
612	                   |F|   Type    |  LayerId  | TID |
613	                   +-------------+-----------------+

615	             Figure 1 The structure of HEVC NAL unit header

617	   The semantics of the fields in the NAL unit header are as
618	   specified in [HEVC] and described briefly below for convenience.
619	   In addition to the name and size of each field, the corresponding
620	   syntax element name in [HEVC] is also provided.

622	   F: 1 bit
623	      forbidden_zero_bit.  Required to be zero in [HEVC].  HEVC
624	      declares a value of 1 as a syntax violation.  Note that the
625	      inclusion of this bit in the NAL unit header is to enable
626	      transport of HEVC video over MPEG-2 transport systems
627	      (avoidance of start code emulations) [MPEG2S].

629	   Type: 6 bits
630	      nal_unit_type.  This field specifies the NAL unit type as
631	      defined in Table 7-1 of [HEVC].  If the most significant bit
632	      of this field of a NAL unit is equal to 0 (i.e. the value of
633	      this field is less than 32), the NAL unit is a VCL NAL unit.
634	      Otherwise, the NAL unit is a non-VCL NAL unit.  For a
635	      reference of all currently defined NAL unit types and their
636	      semantics, please refer to Section 7.4.1 in [HEVC].

638	   LayerId: 6 bits
639	      nuh_layer_id.  Required to be equal to zero in [HEVC].  It is
640	      anticipated that in future scalable or 3D video coding
641	      extensions of this specification, this syntax element will be
642	      used to identify additional layers that may be present in the
643	      coded video sequence, wherein a layer may be, e.g. a spatial
644	      scalable layer, a quality scalable layer, a texture view, or a
645	      depth view.

647	   TID: 3 bits
648	      nuh_temporal_id_plus1.  This field specifies the temporal
649	      identifier of the NAL unit plus 1.  The value of TemporalId is
650	      equal to TID minus 1.  A TID value of 0 is illegal to ensure
651	      that there is at least one bit in the NAL unit header equal to
652	      1, so to enable independent considerations of start code
653	      emulations in the NAL unit header and in the NAL unit payload
654	      data.

656	1.2 Overview of the Payload Format

658	   This payload format defines the following processes required for
659	   transport of HEVC coded data over RTP [RFC3550]:

661	   o Usage of RTP header with this payload format

663	   o Packetization of HEVC coded NAL units into RTP packets using
664	     three types of payload structures, namely single NAL unit
665	     packet, aggregation packet, and fragment unit

667	   o Transmission of HEVC NAL units of the same bitstream within a
668	     single RTP stream or multiple RTP streams within one or more
669	     RTP sessions, where within an RTP stream transmission of NAL
670	     units may be either non-interleaved (i.e. the transmission
671	     order of NAL units is the same as their decoding order) or
672	     interleaved (i.e. the transmission order of NAL units is
673	     different from their decoding order)

675	   o Media type parameters to be used with the Session Description
676	     Protocol (SDP) [RFC4566]

678	   o A payload header extension mechanism and data structures for
679	     enhanced support of temporal scalability based on that
680	     extension mechanism.

682	2 Conventions

684	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
685	   NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
686	   "OPTIONAL" in this document are to be interpreted as described in
687	   BCP 14, RFC 2119 [RFC2119].

689	   In this document, these key words will appear with that
690	   interpretation only when in ALL CAPS.  Lower case uses of these
691	   words are not to be interpreted as carrying the RFC 2119
692	   significance.

694	   This specification uses the notion of setting and clearing a bit
695	   when bit fields are handled.  Setting a bit is the same as
696	   assigning that bit the value of 1 (On).  Clearing a bit is the
697	   same as assigning that bit the value of 0 (Off).

699	3 Definitions and Abbreviations

701	3.1 Definitions

703	   This document uses the terms and definitions of [HEVC].  Section
704	   3.1.1 lists relevant definitions copied from [HEVC] for
705	   convenience.  Section 3.1.2 provides definitions specific to this
706	   memo.

708	3.1.1 Definitions from the HEVC Specification

710	   access unit: A set of NAL units that are associated with each
711	   other according to a specified classification rule, are
712	   consecutive in decoding order, and contain exactly one coded
713	   picture.

715	   BLA access unit: An access unit in which the coded picture is a
716	   BLA picture.

718	   BLA picture: An IRAP picture for which each VCL NAL unit has
719	   nal_unit_type equal to BLA_W_LP, BLA_W_RADL, or BLA_N_LP.

721	   coded video sequence: A sequence of access units that consists,
722	   in decoding order, of an IRAP access unit with NoRaslOutputFlag
723	   equal to 1, followed by zero or more access units that are not
724	   IRAP access units with NoRaslOutputFlag equal to 1, including all
725	   subsequent access units up to but not including any subsequent
726	   access unit that is an IRAP access unit with NoRaslOutputFlag
727	   equal to 1.

729	      Informative note: An IRAP access unit may be an IDR access
730	      unit, a BLA access unit, or a CRA access unit.  The value of
731	      NoRaslOutputFlag is equal to 1 for each IDR access unit, each
732	      BLA access unit, and each CRA access unit that is the first
733	      access unit in the bitstream in decoding order, is the first
734	      access unit that follows an end of sequence NAL unit in
735	      decoding order, or has HandleCraAsBlaFlag equal to 1.

737	   CRA access unit: An access unit in which the coded picture is a
738	   CRA picture.

740	   CRA picture: A RAP picture for which each VCL NAL unit has
741	   nal_unit_type equal to CRA_NUT.

743	   IDR access unit: An access unit in which the coded picture is an
744	   IDR picture.

746	   IDR picture: A RAP picture for which each VCL NAL unit has
747	   nal_unit_type equal to IDR_W_RADL or IDR_N_LP.

749	   IRAP access unit: An access unit in which the coded picture is an
750	   IRAP picture.

752	   IRAP picture: A coded picture for which each VCL NAL unit has
753	   nal_unit_type in the range of BLA_W_LP (16) to RSV_IRAP_VCL23
754	   (23), inclusive.

756	   layer: A set of VCL NAL units that all have a particular value of
757	   nuh_layer_id and the associated non-VCL NAL units, or one of a
758	   set of syntactical structures having a hierarchical relationship.

760	   operation point: bitstream created from another bitstream by
761	   operation of the sub-bitstream extraction process with the
762	   another bitstream, a target highest TemporalId, and a target
763	   layer identifier list as inputs.

765	   random access: The act of starting the decoding process for a
766	   bitstream at a point other than the beginning of the bitstream.

768	   sub-layer: A temporal scalable layer of a temporal scalable
769	   bitstream consisting of VCL NAL units with a particular value of
770	   the TemporalId variable, and the associated non-VCL NAL units.

772	   sub-layer representation: A subset of the bitstream consisting of
773	   NAL units of a particular sub-layer and the lower sub-layers.

775	   tile: A rectangular region of coding tree blocks within a
776	   particular tile column and a particular tile row in a picture.

778	   tile column: A rectangular region of coding tree blocks having a
779	   height equal to the height of the picture and a width specified
780	   by syntax elements in the picture parameter set.

782	   tile row: A rectangular region of coding tree blocks having a
783	   height specified by syntax elements in the picture parameter set
784	   and a width equal to the width of the picture.

786	3.1.2 Definitions Specific to This Memo

788	   dependee RTP stream: An RTP stream on which another RTP stream
789	   depends.  All RTP streams in an MSM except for the highest RTP
790	   stream are dependee RTP streams.

792	   highest RTP stream: The RTP stream on which no other RTP stream
793	   depends.  The RTP stream in an SSM is the highest RTP stream.

795	   media aware network element (MANE): A network element, such as a
796	   middlebox, selective forwarding unit, or application layer
797	   gateway that is capable of parsing certain aspects of the RTP
798	   payload headers or the RTP payload and reacting to their
799	   contents.

801	      Informative note: The concept of a MANE goes beyond normal
802	      routers or gateways in that a MANE has to be aware of the
803	      signaling (e.g. to learn about the payload type mappings of
804	      the media streams), and in that it has to be trusted when
805	      working with SRTP.  The advantage of using MANEs is that they
806	      allow packets to be dropped according to the needs of the
807	      media coding.  For example, if a MANE has to drop packets due
808	      to congestion on a certain link, it can identify and remove
809	      those packets whose elimination produces the least adverse
810	      effect on the user experience.  After dropping packets, MANEs
811	      must rewrite RTCP packets to match the changes to the RTP
812	      stream as specified in Section 7 of [RFC3550].

814	   multi-stream mode(MSM): Transmission of an HEVC bitstream using
815	   more than one RTP stream.

817	   NAL unit decoding order: A NAL unit order that conforms to the
818	   constraints on NAL unit order given in Section 7.4.2.4 in [HEVC].

820	   NAL-unit-like structure: A data structure that is similar to NAL
821	   units in the sense that it also has a NAL unit header and a
822	   payload, with a difference that the payload does not follow the
823	   start code emulation prevention mechanism required for the NAL
824	   unit syntax as specified in Section 7.3.1.1 of [HEVC].  Examples
825	   NAL-unit-like structures defined in this memo are packet payloads
826	   of AP, PACI, and FU packets.

828	   NALU-time: The value that the RTP timestamp would have if the NAL
829	   unit would be transported in its own RTP packet.

831	   RTP stream: See [I-D.ietf-avtext-rtp-grouping-taxonomy].  Within
832	   the scope of this memo, one RTP stream is utilized to transport
833	   one or more temporal sub-layers.

835	   single-stream mode (SSM): Transmission of an HEVC bitstream using
836	   only one RTP stream.

838	   transmission order: The order of packets in ascending RTP
839	   sequence number order (in modulo arithmetic).  Within an
840	   aggregation packet, the NAL unit transmission order is the same
841	   as the order of appearance of NAL units in the packet.

843	3.2 Abbreviations

845	   AP       Aggregation Packet

847	   BLA      Broken Link Access

849	   CRA      Clean Random Access
850	   CTB      Coding Tree Block

852	   CTU      Coding Tree Unit

854	   CVS      Coded Video Sequence

856	   DPH      Decoded Picture Hash

858	   FU       Fragmentation Unit

860	   GDR      Gradual Decoding Refresh

862	   HRD      Hypothetical Reference Decoder

864	   IDR      Instantaneous Decoding Refresh

866	   IRAP     Intra Random Access Point

868	   MANE     Media Aware Network Element

870	   MSM      Multi-Stream Mode

872	   MTU      Maximum Transfer Unit

874	   NAL      Network Abstraction Layer

876	   NALU     Network Abstraction Layer Unit

878	   PACI     PAyload Content Information

880	   PHES     Payload Header Extension Structure

882	   PPS      Picture Parameter Set

884	   RADL     Random Access Decodable Leading (Picture)

886	   RASL     Random Access Skipped Leading (Picture)

888	   RPS      Reference Picture Set

890	   SEI      Supplemental Enhancement Information

892	   SPS      Sequence Parameter Set
893	   SSM      Single-Stream Mode

895	   STSA     Step-wise Temporal Sub-layer Access

897	   TSA      Temporal Sub-layer Access

899	   TCSI     Temporal Scalability Control Information

901	   VCL      Video Coding Layer

903	   VPS      Video Parameter Set

905	4 RTP Payload Format

907	4.1 RTP Header Usage

909	   The format of the RTP header is specified in [RFC3550] and
910	   reprinted in Figure 2 for convenience.  This payload format uses
911	   the fields of the header in a manner consistent with that
912	   specification.

914	   The RTP payload (and the settings for some RTP header bits) for
915	   aggregation packets and fragmentation units are specified in
916	   Sections 4.7 and 4.8, respectively.

918	    0                   1                   2                   3
919	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
920	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
921	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
922	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
923	   |                           timestamp                           |
924	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
925	   |           synchronization source (SSRC) identifier            |
926	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
927	   |            contributing source (CSRC) identifiers             |
928	   |                             ....                              |
929	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

931	               Figure 2 RTP header according to [RFC3550]

933	   The RTP header information to be set according to this RTP
934	   payload format is set as follows:

936	   Marker bit (M): 1 bit

938	      Set for the last packet, carried in the current RTP stream, of
939	      the access unit, in line with the normal use of the M bit in
940	      video formats, to allow an efficient playout buffer handling.
941	      When MSM is in use, if an access unit appears in multiple RTP
942	      streams, the marker bit is set on each RTP stream's last
943	      packet of the access unit.

945	         Informative note: The content of a NAL unit does not tell
946	         whether or not the NAL unit is the last NAL unit, in
947	         decoding order, of an access unit.  An RTP sender
948	         implementation may obtain this information from the video
949	         encoder.  If, however, the implementation cannot obtain
950	         this information directly from the encoder, e.g. when the
951	         bitstream was pre-encoded, and also there is no timestamp
952	         allocated for each NAL unit, then the sender implementation
953	         can inspect subsequent NAL units in decoding order to
954	         determine whether or not the NAL unit is the last NAL unit
955	         of an access unit as follows.  A NAL unit naluX is the last
956	         NAL unit of an access unit if it is the last NAL unit of
957	         the bitstream or the next VCL NAL unit naluY in decoding
958	         order has the high-order bit of the first byte after its
959	         NAL unit header equal to 1, and all NAL units between naluX
960	         and naluY, when present, have nal_unit_type in the range of
961	         32 to 35, inclusive, equal to 39, or in the ranges of 41 to
962	         44, inclusive, or 48 to 55, inclusive.

964	   Payload type (PT): 7 bits

966	      The assignment of an RTP payload type for this new packet
967	      format is outside the scope of this document and will not be
968	      specified here.  The assignment of a payload type has to be
969	      performed either through the profile used or in a dynamic way.

971	         Informative note: It is not required to use different
972	         payload type values for different RTP streams in MSM.

974	   Sequence number (SN): 16 bits

976	      Set and used in accordance with RFC 3550 [RFC3550].

978	   Timestamp: 32 bits

980	      The RTP timestamp is set to the sampling timestamp of the
981	      content.  A 90 kHz clock rate MUST be used.

983	      If the NAL unit has no timing properties of its own (e.g.
984	      parameter set and SEI NAL units), the RTP timestamp MUST be
985	      set to the RTP timestamp of the coded picture of the access
986	      unit in which the NAL unit (according to Section 7.4.2.4.4 of
987	      [HEVC]) is included.

989	      Receivers MUST use the RTP timestamp for the display process,
990	      even when the bitstream contains picture timing SEI messages
991	      or decoding unit information SEI messages as specified in
992	      [HEVC].  However, this does not mean that picture timing SEI
993	      messages in the bitstream should be discarded, as picture
994	      timing SEI messages may contain frame-field information that
995	      is important in appropriately rendering interlaced video.

997	   Synchronization source (SSRC): 32-bits

999	      Used to identify the source of the RTP packets.  In SSM, by
1000	      definition a single SSRC is used for all parts of a single
1001	      bitstream.  In MSM, each SSRC is used for an RTP stream
1002	      containing a subset of the sub-layers for a single (temporally
1003	      scalable) bitstream.  A receiver is required to correctly
1004	      associate the set of SSRCs that are included parts of the same
1005	      bitstream.

1007	         Informative note: The term "bitstream" in this document is
1008	         equivalent to the term "encoded stream" in [I-D.ietf-
1009	         avtext-rtp-grouping-taxonomy].

1011	4.2 Payload Header Usage

1013	   The TID value indicates (among other things) the relative
1014	   importance of an RTP packet, for example because NAL units
1015	   belonging to higher temporal sub-layers are not used for the
1016	   decoding of lower temporal sub-layers.  A lower value of TID
1017	   indicates a higher importance.  More important NAL units MAY be
1018	   better protected against transmission losses than less important
1019	   NAL units.

1021	4.3 Payload Structures

1023	   The first two bytes of the payload of an RTP packet are referred
1024	   to as the payload header.  The payload header consists of the
1025	   same fields (F, Type, LayerId, and TID) as the NAL unit header as
1026	   shown in section 1.1.4, irrespective of the type of the payload
1027	   structure.

1029	   Four different types of RTP packet payload structures are
1030	   specified.  A receiver can identify the type of an RTP packet
1031	   payload through the Type field in the payload header.

1033	   The four different payload structures are as follows:

1035	   o  Single NAL unit packet: Contains a single NAL unit in the
1036	      payload, and the NAL unit header of the NAL unit also serves
1037	      as the payload header.  This payload structure is specified in
1038	      section 4.6.

1040	   o  Aggregation packet (AP): Contains more than one NAL unit
1041	      within one access unit.  This payload structure is specified
1042	      in section 4.7.

1044	   o  Fragmentation unit (FU): Contains a subset of a single NAL
1045	      unit.  This payload structure is specified in section 4.8.

1047	   o  PACI carrying RTP packet: Contains a payload header (that
1048	      differs from other payload headers for efficiency), a Payload
1049	      Header Extension Structure (PHES), and a PACI payload.  This
1050	      payload structure is specified in section 4.9.

1052	4.4 Transmission Modes

1054	   This memo enables transmission of an HEVC bitstream over a single
1055	   RTP stream or multiple RTP streams.  The concept and working
1056	   principle is inherited from the design of what was called single
1057	   and multiple session transmission in [RFC6190] and follows a
1058	   similar design.  If only one RTP stream is used for transmission
1059	   of the HEVC bitstream, the transmission mode is referred to as
1060	   single-stream mode (SSM); otherwise (more than one RTP stream is
1061	   used for transmission of the HEVC bitstream), the transmission
1062	   mode is referred to as multi-stream mode (MSM).

1064	   Dependency of one RTP stream on another RTP stream is typically
1065	   indicated as specified in [RFC5583].  When an RTP stream A
1066	   depends on another RTP stream B, the RTP stream B is referred to
1067	   as a dependee RTP stream of the RTP stream A.

1069	      Informative note: An MSM may involve one or more RTP sessions.
1070	      Each RTP stream in an MSM may be in its own RTP session or a
1071	      set of multiple RTP streams in an MSM may belong to the same
1072	      RTP session, e.g. as indicated by the mechanism specified in
1073	      the Internet-Draft [I-D.ietf-avtcore-rtp-multi-stream] or in
1074	      [I-D.ietf-mmusic-sdp-bundle-negotiation].

1076	   SSM SHOULD be used for point-to-point unicast scenarios, while
1077	   MSM SHOULD be used for point-to-multipoint multicast scenarios
1078	   where different receivers require different operation points of
1079	   the same HEVC bitstream, to improve bandwidth utilizing
1080	   efficiency.

1082	      Informative note: A multicast may degrade to a unicast after
1083	      all but one receivers have left (this is a justification of
1084	      the first "SHOULD" instead of "MUST"), and there might be
1085	      scenarios where MSM is desirable but not possible e.g. when IP
1086	      multicast is not deployed in certain network (this is a
1087	      justification of the second "SHOULD" instead of "MUST").

1089	   The transmission mode is indicated by the tx-mode media parameter
1090	   (see section 7.1).  If tx-mode is equal to "SSM", SSM MUST be
1091	   used.  Otherwise (tx-mode is equal to "MSM"), MSM MUST be used.

1093	   Receivers MUST support both SSM and MSM.

1095	4.5 Decoding Order Number

1097	   For each NAL unit, the variable AbsDon is derived, representing
1098	   the decoding order number that is indicative of the NAL unit
1099	   decoding order.

1101	   Let NAL unit n be the n-th NAL unit in transmission order within
1102	   an RTP stream.

1104	   If tx-mode is equal to "SSM" and sprop-max-don-diff is equal to
1105	   0, AbsDon[n], the value of AbsDon for NAL unit n, is derived as
1106	   equal to n.

1108	   Otherwise (tx-mode is equal to "MSM" or sprop-max-don-diff is
1109	   greater than 0), AbsDon[n] is derived as follows, where DON[n] is
1110	   the value of the variable DON for NAL unit n:

1112	   o  If n is equal to 0 (i.e. NAL unit n is the very first NAL unit
1113	      in transmission order), AbsDon[0] is set equal to DON[0].

1115	   o  Otherwise (n is greater than 0), the following applies for
1116	      derivation of AbsDon[n]:

1118	            If DON[n] == DON[n-1],
1119	                AbsDon[n] = AbsDon[n-1]

1121	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] < 32768),
1122	                AbsDon[n] = AbsDon[n-1] + DON[n] - DON[n-1]

1124	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] >= 32768),
1125	                AbsDon[n] = AbsDon[n-1] + 65536 - DON[n-1] + DON[n]

1127	            If (DON[n] > DON[n-1] and DON[n] - DON[n-1] >= 32768),
1128	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] + 65536 -
1129	            DON[n])

1131	            If (DON[n] < DON[n-1] and DON[n-1] - DON[n] < 32768),
1132	                AbsDon[n] = AbsDon[n-1] - (DON[n-1] - DON[n])

1134	   For any two NAL units m and n, the following applies:

1136	   o  AbsDon[n] greater than AbsDon[m] indicates that NAL unit n
1137	      follows NAL unit m in NAL unit decoding order.

1139	   o  When AbsDon[n] is equal to AbsDon[m], the NAL unit decoding
1140	      order of the two NAL units can be in either order.

1142	   o  AbsDon[n] less than AbsDon[m] indicates that NAL unit n
1143	      precedes NAL unit m in decoding order.

1145	   When two consecutive NAL units in the NAL unit decoding order
1146	   have different values of AbsDon, the value of AbsDon for the
1147	   second NAL unit in decoding order MUST be greater than the value
1148	   of AbsDon for the first NAL unit, and the absolute difference
1149	   between the two AbsDon values MAY be greater than or equal to 1.

1151	      Informative note: There are multiple reasons to allow for the
1152	      absolute difference of the values of AbsDon for two
1153	      consecutive NAL units in the NAL unit decoding order to be
1154	      greater than one.  An increment by one is not required, as at
1155	      the time of associating values of AbsDon to NAL units, it may
1156	      not be known whether all NAL units are to be delivered to the
1157	      receiver.  For example, a gateway may not forward VCL NAL
1158	      units of higher sub-layers or some SEI NAL units when there is
1159	      congestion in the network.  In another example, the first
1160	      intra-coded picture of a pre-encoded clip is transmitted in
1161	      advance to ensure that it is readily available in the
1162	      receiver, and when transmitting the first intra-coded picture,
1163	      the originator does not exactly know how many NAL units will
1164	      be encoded before the first intra-coded picture of the pre-
1165	      encoded clip follows in decoding order.  Thus, the values of
1166	      AbsDon for the NAL units of the first intra-coded picture of
1167	      the pre-encoded clip have to be estimated when they are
1168	      transmitted, and gaps in values of AbsDon may occur.  Another
1169	      example is MSM where the AbsDon values must indicate cross-
1170	      layer decoding order for NAL units conveyed in all the RTP
1171	      streams.

1173	4.6 Single NAL Unit Packets

1175	   A single NAL unit packet contains exactly one NAL unit, and
1176	   consists of a payload header (denoted as PayloadHdr), a
1177	   conditional 16-bit DONL field (in network byte order), and the
1178	   NAL unit payload data (the NAL unit excluding its NAL unit
1179	   header) of the contained NAL unit, as shown in Figure 3.

1181	   0                   1                   2                   3
1182	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1183	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1184	   |           PayloadHdr          |      DONL (conditional)       |
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1186	   |                                                               |
1187	   |                  NAL unit payload data                        |
1188	   |                                                               |
1189	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1190	   |                               :...OPTIONAL RTP padding        |
1191	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1193	            Figure 3 The structure a single NAL unit packet

1195	   The payload header SHOULD be an exact copy of the NAL unit header
1196	   of the contained NAL unit.  However, the Type (i.e.
1197	   nal_unit_type) field MAY be changed, e.g. when it is desirable to
1198	   handle a CRA picture to be a BLA picture [JCTVC-J0107].

1200	   The DONL field, when present, specifies the value of the 16 least
1201	   significant bits of the decoding order number of the contained
1202	   NAL unit.  If tx-mode is equal to "MSM" or sprop-max-don-diff is
1203	   greater than 0, the DONL field MUST be present, and the variable
1204	   DON for the contained NAL unit is derived as equal to the value
1205	   of the DONL field.  Otherwise (tx-mode is equal to "SSM" and
1206	   sprop-max-don-diff is equal to 0), the DONL field MUST NOT be
1207	   present.

1209	4.7 Aggregation Packets (APs)

1211	   Aggregation packets (APs) are introduced to enable the reduction
1212	   of packetization overhead for small NAL units, such as most of
1213	   the non-VCL NAL units, which are often only a few octets in size.

1215	   An AP aggregates NAL units within one access unit.  Each NAL unit
1216	   to be carried in an AP is encapsulated in an aggregation unit.
1217	   NAL units aggregated in one AP are in NAL unit decoding order.

1219	   An AP consists of a payload header (denoted as PayloadHdr)
1220	   followed by two or more aggregation units, as shown in Figure 4.

1222	   0                   1                   2                   3
1223	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1224	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1225	   |    PayloadHdr (Type=48)       |                               |
1226	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1227	   |                                                               |
1228	   |             two or more aggregation units                     |
1229	   |                                                               |
1230	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1231	   |                               :...OPTIONAL RTP padding        |
1232	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1234	            Figure 4 The structure of an aggregation packet

1236	   The fields in the payload header are set as follows.  The F bit
1237	   MUST be equal to 0 if the F bit of each aggregated NAL unit is
1238	   equal to zero; otherwise, it MUST be equal to 1.  The Type field
1239	   MUST be equal to 48.  The value of LayerId MUST be equal to the
1240	   lowest value of LayerId of all the aggregated NAL units.  The
1241	   value of TID MUST be the lowest value of TID of all the
1242	   aggregated NAL units.

1244	      Informative Note: All VCL NAL units in an AP have the same TID
1245	      value since they belong to the same access unit.  However, an
1246	      AP may contain non-VCL NAL units for which the TID value in
1247	      the NAL unit header may be different than the TID value of the
1248	      VCL NAL units in the same AP.

1250	   An AP MUST carry at least two aggregation units and can carry as
1251	   many aggregation units as necessary; however, the total amount of
1252	   data in an AP obviously MUST fit into an IP packet, and the size
1253	   SHOULD be chosen so that the resulting IP packet is smaller than
1254	   the MTU size so to avoid IP layer fragmentation.  An AP MUST NOT
1255	   contain Fragmentation Units (FUs) specified in section 4.8.  APs
1256	   MUST NOT be nested; i.e. an AP MUST NOT contain another AP.

1258	   The first aggregation unit in an AP consists of a conditional 16-
1259	   bit DONL field (in network byte order) followed by a 16-bit
1260	   unsigned size information (in network byte order) that indicates
1261	   the size of the NAL unit in bytes (excluding these two octets,
1262	   but including the NAL unit header), followed by the NAL unit
1263	   itself, including its NAL unit header, as shown in Figure 5.

1265	   0                   1                   2                   3
1266	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1267	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1268	                   :       DONL (conditional)      |   NALU size   |
1269	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1270	   |   NALU size   |                                               |
1271	   +-+-+-+-+-+-+-+-+         NAL unit                              |
1272	   |                                                               |
1273	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1274	   |                               :
1275	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1277	     Figure 5 The structure of the first aggregation unit in an AP

1279	   The DONL field, when present, specifies the value of the 16 least
1280	   significant bits of the decoding order number of the aggregated
1281	   NAL unit.

1283	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
1284	   than 0, the DONL field MUST be present in an aggregation unit
1285	   that is the first aggregation unit in an AP, and the variable DON
1286	   for the aggregated NAL unit is derived as equal to the value of
1287	   the DONL field.  Otherwise (tx-mode is equal to "SSM" and sprop-
1288	   max-don-diff is equal to 0), the DONL field MUST NOT be present
1289	   in an aggregation unit that is the first aggregation unit in an
1290	   AP.

1292	   An aggregation unit that is not the first aggregation unit in an
1293	   AP consists of a conditional 8-bit DOND field followed by a 16-
1294	   bit unsigned size information (in network byte order) that
1295	   indicates the size of the NAL unit in bytes (excluding these two
1296	   octets, but including the NAL unit header), followed by the NAL
1297	   unit itself, including its NAL unit header, as shown in Figure 6.

1299	   0                   1                   2                   3
1300	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1301	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1302	                   : DOND (cond)   |          NALU size            |
1303	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1304	   |                                                               |
1305	   |                       NAL unit                                |
1306	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1307	   |                               :
1308	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1310	     Figure 6 The structure of an aggregation unit that is not the
1311	                    first aggregation unit in an AP

1313	   When present, the DOND field plus 1 specifies the difference
1314	   between the decoding order number values of the current
1315	   aggregated NAL unit and the preceding aggregated NAL unit in the
1316	   same AP.

1318	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
1319	   than 0, the DOND field MUST be present in an aggregation unit
1320	   that is not the first aggregation unit in an AP, and the variable
1321	   DON for the aggregated NAL unit is derived as equal to the DON of
1322	   the preceding aggregated NAL unit in the same AP plus the value
1323	   of the DOND field plus 1 modulo 65536.  Otherwise (tx-mode is
1324	   equal to "SSM" and sprop-max-don-diff is equal to 0), the DOND
1325	   field MUST NOT be present in an aggregation unit that is not the
1326	   first aggregation unit in an AP, and in this case the
1327	   transmission order and decoding order of NAL units carried in the
1328	   AP are the same as the order the NAL units appear in the AP.

1330	   Figure 7 presents an example of an AP that contains two
1331	   aggregation units, labeled as 1 and 2 in the figure, without the
1332	   DONL and DOND fields being present.

1334	    0                   1                   2                   3
1335	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1336	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1337	   |                          RTP Header                           |
1338	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1339	   |   PayloadHdr (Type=48)        |         NALU 1 Size           |
1340	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1341	   |          NALU 1 HDR           |                               |
1342	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+         NALU 1 Data           |
1343	   |                   . . .                                       |
1344	   |                                                               |
1345	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1346	   |  . . .        | NALU 2 Size                   | NALU 2 HDR    |
1347	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1348	   | NALU 2 HDR    |                                               |
1349	   +-+-+-+-+-+-+-+-+              NALU 2 Data                      |
1350	   |                   . . .                                       |
1351	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1352	   |                               :...OPTIONAL RTP padding        |
1353	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1355	     Figure 7 An example of an AP packet containing two aggregation
1356	                 units without the DONL and DOND fields

1358	   Figure 8 presents an example of an AP that contains two
1359	   aggregation units, labeled as 1 and 2 in the figure, with the
1360	   DONL and DOND fields being present.

1362	    0                   1                   2                   3
1363	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1364	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1365	   |                          RTP Header                           |
1366	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1367	   |   PayloadHdr (Type=48)        |        NALU 1 DONL            |
1368	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1369	   |          NALU 1 Size          |            NALU 1 HDR         |
1370	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1371	   |                                                               |
1372	   |                 NALU 1 Data   . . .                           |
1373	   |                                                               |
1374	   +     . . .     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1375	   |               |  NALU 2 DOND  |          NALU 2 Size          |
1376	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1377	   |          NALU 2 HDR           |                               |
1378	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+          NALU 2 Data          |
1379	   |                                                               |
1380	   |        . . .                  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1381	   |                               :...OPTIONAL RTP padding        |
1382	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1384	     Figure 8 An example of an AP containing two aggregation units
1385	                     with the DONL and DOND fields

1387	4.8 Fragmentation Units (FUs)

1389	   Fragmentation units (FUs) are introduced to enable fragmenting a
1390	   single NAL unit into multiple RTP packets, possibly without
1391	   cooperation or knowledge of the HEVC encoder.  A fragment of a NAL
1392	   unit consists of an integer number of consecutive octets of that
1393	   NAL unit.  Fragments of the same NAL unit MUST be sent in consecutive
1394	   order with ascending RTP sequence numbers (with no other RTP packets
1395	   within the same RTP stream being sent between the first and last
1396	   fragment).

1398	   When a NAL unit is fragmented and conveyed within FUs, it is
1399	   referred to as a fragmented NAL unit.  APs MUST NOT be
1400	   fragmented.  FUs MUST NOT be nested; i.e. an FU MUST NOT contain
1401	   a subset of another FU.

1403	   The RTP timestamp of an RTP packet carrying an FU is set to the
1404	   NALU-time of the fragmented NAL unit.

1406	   An FU consists of a payload header (denoted as PayloadHdr), an FU
1407	   header of one octet, a conditional 16-bit DONL field (in network
1408	   byte order), and an FU payload, as shown in Figure 9.

1410	    0                   1                   2                   3
1411	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1412	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1413	   |    PayloadHdr (Type=49)       |   FU header   | DONL (cond)   |
1414	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1415	   | DONL (cond)   |                                               |
1416	   |-+-+-+-+-+-+-+-+                                               |
1417	   |                         FU payload                            |
1418	   |                                                               |
1419	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1420	   |                               :...OPTIONAL RTP padding        |
1421	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1423	                    Figure 9 The structure of an FU

1425	   The fields in the payload header are set as follows.  The Type
1426	   field MUST be equal to 49.  The fields F, LayerId, and TID MUST
1427	   be equal to the fields F, LayerId, and TID, respectively, of the
1428	   fragmented NAL unit.

1430	   The FU header consists of an S bit, an E bit, and a 6-bit FuType
1431	   field, as shown in Figure 10.

1433	                            +---------------+
1434	                            |0|1|2|3|4|5|6|7|
1435	                            +-+-+-+-+-+-+-+-+
1436	                            |S|E|  FuType   |
1437	                            +---------------+

1439	                 Figure 10   The structure of FU header

1441	   The semantics of the FU header fields are as follows:
1442	   S: 1 bit
1443	      When set to one, the S bit indicates the start of a fragmented
1444	      NAL unit i.e. the first byte of the FU payload is also the
1445	      first byte of the payload of the fragmented NAL unit.  When
1446	      the FU payload is not the start of the fragmented NAL unit
1447	      payload, the S bit MUST be set to zero.

1449	   E: 1 bit
1450	      When set to one, the E bit indicates the end of a fragmented
1451	      NAL unit, i.e. the last byte of the payload is also the last
1452	      byte of the fragmented NAL unit.  When the FU payload is not
1453	      the last fragment of a fragmented NAL unit, the E bit MUST be
1454	      set to zero.

1456	   FuType: 6 bits
1457	      The field FuType MUST be equal to the field Type of the
1458	      fragmented NAL unit.

1460	   The DONL field, when present, specifies the value of the 16 least
1461	   significant bits of the decoding order number of the fragmented
1462	   NAL unit.

1464	   If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
1465	   than 0, and the S bit is equal to 1, the DONL field MUST be
1466	   present in the FU, and the variable DON for the fragmented NAL
1467	   unit is derived as equal to the value of the DONL field.
1468	   Otherwise (tx-mode is equal to "SSM" and sprop-max-don-diff is
1469	   equal to 0, or the S bit is equal to 0), the DONL field MUST NOT
1470	   be present in the FU.

1472	   A non-fragmented NAL unit MUST NOT be transmitted in one FU; i.e.
1473	   the Start bit and End bit MUST NOT both be set to one in the same
1474	   FU header.

1476	   The FU payload consists of fragments of the payload of the
1477	   fragmented NAL unit so that if the FU payloads of consecutive
1478	   FUs, starting with an FU with the S bit equal to 1 and ending
1479	   with an FU with the E bit equal to 1, are sequentially
1480	   concatenated, the payload of the fragmented NAL unit can be
1481	   reconstructed.  The NAL unit header of the fragmented NAL unit is
1482	   not included as such in the FU payload, but rather the
1483	   information of the NAL unit header of the fragmented NAL unit is
1484	   conveyed in F, LayerId, and TID fields of the FU payload headers
1485	   of the FUs and the FuType field of the FU header of the FUs.  An
1486	   FU payload MUST NOT be empty.

1488	   If an FU is lost, the receiver SHOULD discard all following
1489	   fragmentation units in transmission order corresponding to the
1490	   same fragmented NAL unit, unless the decoder in the receiver is
1491	   known to be prepared to gracefully handle incomplete NAL units.

1493	   A receiver in an endpoint or in a MANE MAY aggregate the first n-
1494	   1 fragments of a NAL unit to an (incomplete) NAL unit, even if
1495	   fragment n of that NAL unit is not received.  In this case, the
1496	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate
1497	   a syntax violation.

1499	4.9 PACI packets

1501	   This section specifies the PACI packet structure.  The basic
1502	   payload header specified in this memo is intentionally limited to
1503	   the 16 bits of the NAL unit header so to keep the packetization
1504	   overhead to a minimum.  However, cases have been identified where
1505	   it is advisable to include control information in an easily
1506	   accessible position in the packet header, despite the additional
1507	   overhead.  One such control information is the Temporal
1508	   Scalability Control Information as specified in section 4.10
1509	   below.  PACI packets carry this and future, similar structures.

1511	   The PACI packet structure is based on a payload header extension
1512	   mechanism that is generic and extensible to carry payload header
1513	   extensions.  In this section, the focus lies on the use within
1514	   this specification.  Section 4.9.2 below provides guidance for
1515	   the specification designers in how to employ the extension
1516	   mechanism in future specifications.

1518	   A PACI packet consists of a payload header (denoted as
1519	   PayloadHdr), for which the structure follows what is described in
1520	   section 4.3 above.  The payload header is followed by the fields
1521	   A, cType, PHSsize, F[0..2] and Y.

1523	   Figure 11 shows a PACI packet in compliance with this memo; that
1524	   is, without any extensions.

1526	      0                   1                   2                   3
1527	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1528	   1
1529	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1530	   +-+
1531	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1532	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1533	   +-+
1534	      |        Payload Header Extension Structure (PHES)              |

1536	   |=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=|
1537	      |                                                               |
1538	      |                  PACI payload: NAL unit                       |
1539	      |                   . . .                                       |
1540	      |                                                               |
1541	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1542	   +-+
1543	      |                               :...OPTIONAL RTP padding        |
1544	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1545	   +-

1547	                  Figure 11   The structure of a PACI

1549	   The fields in the payload header are set as follows.  The F bit
1550	   MUST be equal to 0.  The Type field MUST be equal to 50.  The
1551	   value of LayerId MUST be a copy of the LayerId field of the PACI
1552	   payload NAL unit or NAL-unit-like structure.  The value of TID
1553	   MUST be a copy of the TID field of the PACI payload NAL unit or
1554	   NAL-unit-like structure.

1556	   The semantics of other fields are as follows:

1558	   A: 1 bit
1559	      Copy of the F bit of the PACI payload NAL unit or NAL-unit-
1560	      like structure.

1562	   cType: 6 bits
1563	      Copy of the Type field of the PACI payload NAL unit or NAL-
1564	      unit-like structure.

1566	   PHSsize: 5 bits
1567	      Indicates the length of the PHES field.  The value is limited
1568	      to be less than or equal to 32 octets, to simplify encoder
1569	      design for MTU size matching.

1571	   F0
1572	      This field equal to 1 specifies the presence of a temporal
1573	      scalability support extension in the PHES.

1575	   F1, F2
1576	      MUST be 0, available for future extensions, see section 4.9.2.

1578	   Y: 1 bit
1579	      MUST be 0, available for future extensions, see section 4.9.2.

1581	   PHES: variable number of octets
1582	      A variable number of octets as indicated by the value of
1583	      PHSsize.

1585	   PACI Payload
1586	      The single NAL unit packet or NAL-unit-like structure (such
1587	      as: FU or AP) to be carried, not including the first two
1588	      octets.

1590	         Informative note: The first two octets of the NAL unit or
1591	         NAL-unit-like structure carried in the PACI payload are not
1592	         included in the PACI payload. Rather, the respective values
1593	         are copied in locations of the PayloadHdr of the RTP
1594	         packet.  This design offers two advantages: first, the
1595	         overall structure of the payload header is preserved, i.e.
1596	         there is no special case of payload header structure that
1597	         needs to be implemented for PACI.  Second, no additional
1598	         overhead is introduced.

1600	      A PACI payload MAY be a single NAL unit, an FU, or an AP.
1601	      PACIs MUST NOT be fragmented or aggregated.  The following
1602	      subsection documents the reasons for these design choices.

1604	4.9.1 Reasons for the PACI rules (informative)

1606	   A PACI cannot be fragmented.  If a PACI could be fragmented, and
1607	   a fragment other than the first fragment would get lost, access
1608	   to the information in the PACI would not be possible.  Therefore,
1609	   a PACI must not be fragmented.  In other words, an FU must not
1610	   carry (fragments of) a PACI.

1612	   A PACI cannot be aggregated.  Aggregation of PACIs is inadvisable
1613	   from a compression viewpoint, as, in many cases, several to be
1614	   aggregated NAL units would share identical PACI fields and values
1615	   which would be carried redundantly for no reason.   Most, if not
1616	   all the practical effects of PACI aggregation can be achieved by
1617	   aggregating NAL units and bundling them with a PACI (see below).
1618	   Therefore, a PACI must not be aggregated.  In other words, an AP
1619	   must not contain a PACI.

1621	   The payload of a PACI can be a fragment.  Both middleboxes and
1622	   sending systems with inflexible (often hardware-based) encoders
1623	   occasionally find themselves in situations where a PACI and its
1624	   headers, combined, are larger than the MTU size.  In such a
1625	   scenario, the middlebox or sender can fragment the NAL unit and
1626	   encapsulate the fragment in a PACI.  Doing so preserves the
1627	   payload header extension information for all fragments, allowing
1628	   downstream middleboxes and the receiver to take advantage of that
1629	   information.  Therefore, a sender may place a fragment into a
1630	   PACI, and a receiver must be able to handle such a PACI.

1632	   The payload of a PACI can be an aggregation NAL unit.  HEVC
1633	   bitstreams can contain unevenly sized and/or small (when compared
1634	   to the MTU size) NAL units.  In order to efficiently packetize
1635	   such small NAL units, AP were introduced.  The benefits of APs
1636	   are independent from the need for a payload header extension.
1637	   Therefore, a sender may place an AP into a PACI, and a receiver
1638	   must be able to handle such a PACI.

1640	4.9.2 PACI extensions (Informative)

1642	   This subsection includes recommendations for future specification
1643	   designers on how to extent the PACI syntax to accommodate future
1644	   extensions.  Obviously, designers are free to specify whatever
1645	   appears to be appropriate to them at the time of their design.
1646	   However, a lot of thought has been invested into the extension
1647	   mechanism described below, and we suggest that deviations from it
1648	   warrant a good explanation.

1650	   This memo defines only a single payload header extension (Temporal
1651	   Scalability Control Information, described below in section 4.10),
1652	   and, therefore, only the F0 bit carries semantics.  F1 and F2 are
1653	   already named (and not just marked as reserved, as a typical video
1654	   spec designer would do).  They are intended to signal two additional
1655	   extensions.  The Y bit allows to, recursively, add further F and Y
1656	   bits to extend the mechanism beyond 3 possible payload header
1657	   extensions.  It is suggested to define a new packet type (using a
1658	   different value for Type) when assigning the F1, F2, or Y bits
1659	   different semantics than what is suggested below.

1661	   When a Y bit is set, an 8 bit flag-extension is inserted after
1662	   the Y bit.  A flag-extension consists of 7 flags F[n..n+6], and
1663	   another Y bit.

1665	   The basic PACI header already includes F0, F1, and F2.
1666	   Therefore, the Fx bits in the first flag-extensions are numbered
1667	   F3, F4, ..., F9, the F bits in the second flag-extension are
1668	   numbered F10, F11, ..., F16, and so forth.  As a result, at least
1669	   3 Fx bits are always in the PACI, but the number of Fx bits (and
1670	   associated types of extensions), can be increased by setting the
1671	   next Y bit and adding an octet of flag-extensions, carrying 7
1672	   flags and another Y bit.  The size of this list of flags is
1673	   subject to the limits specified in section 4.9 (32 octets for all
1674	   flag-extensions and the PHES information combined).

1676	   Each of the F bits can indicate either the presence of
1677	   information in the Payload Header Extension Structure (PHES),
1678	   described below, or a given F bit can indicate a certain
1679	   condition, without including additional information in the PHES.

1681	   When a spec developer devises a new syntax that takes advantage
1682	   of the PACI extension mechanism, he/she must follow the
1683	   constraints listed below; otherwise the extension mechanism may
1684	   break.

1686	     1) The fields added for a particular Fx bit MUST be fixed in
1687	        length and not depend on what other Fx bits are set (no
1688	        parsing dependency).
1689	     2) The Fx bits must be assigned in order.
1690	     3) An implementation that supports the n-th Fn bit for any
1691	        value of n must understand the syntax (though not
1692	        necessarily the semantics) of the fields Fk (with k < n), so
1693	        to be able to either use those bits when present, or at
1694	        least be able to skip over them.

1696	4.10 Temporal Scalability Control Information

1698	   This section describes the single payload header extension
1699	   defined in this specification, known as Temporal Scalability
1700	   Control Information (TSCI).  If, in the future, additional
1701	   payload header extensions become necessary, they could be
1702	   specified in this section of an updated version of this document,
1703	   or in their own documents.

1705	   When F0 is set to 1 in a PACI, this specifies that the PHES field
1706	   includes the TSCI fields TL0PICIDX, IrapPicID, S, and E as
1707	   follows:

1709	     0                   1                   2                   3
1710	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
1711	   1
1712	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1713	   +-+
1714	      |    PayloadHdr (Type=50)       |A|   cType   | PHSsize |F0..2|Y|
1715	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1716	   +-+
1717	      |   TL0PICIDX   |   IrapPicID   |S|E|    RES    |               |
1718	      |-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1719	      |                           ....                                |
1720	      |               PACI payload: NAL unit                          |
1721	      |                                                               |
1722	      |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1723	   +-+
1724	      |                               :...OPTIONAL RTP padding        |
1725	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1726	   +-+

1728	   Figure 12   The structure of a PACI with a PHES containing a TSCI

1730	   TL0PICIDX (8 bits)
1731	      When present, the TL0PICIDX field MUST be set to equal to
1732	      temporal_sub_layer_zero_idx as specified in Section D.3.22 of
1733	      [H.265] for the access unit containing the NAL unit in the
1734	      PACI.

1736	   IrapPicID (8 bits)
1737	      When present, the IrapPicID field MUST be set to equal to
1738	      irap_pic_id as specified in Section D.3.22 of [H.265] for the
1739	      access unit containing the NAL unit in the PACI.

1741	   S (1 bit)
1742	      The S bit MUST be set to 1 if any of the following conditions
1743	      is true and MUST be set to 0 otherwise:
1744	      o The NAL unit in the payload of the PACI is the first VCL NAL
1745	        unit, in decoding order, of a picture.

1747	      o The NAL unit in the payload of the PACI is an AP and the NAL
1748	        unit in the first contained aggregation unit is the first
1749	        VCL NAL unit, in decoding order, of a picture.
1750	      o The NAL unit in the payload of the PACI is an FU with its S
1751	        bit equal to 1 and the FU payload containing a fragment of
1752	        the first VCL NAL unit, in decoding order of a picture.

1754	   E (1 bit)
1755	      The E bit MUST be set to 1 if any of the following conditions
1756	      is true and MUST be set to 0 otherwise:
1757	      o The NAL unit in the payload of the PACI is the last VCL NAL
1758	        unit, in decoding order, of a picture.
1759	      o The NAL unit in the payload of the PACI is an AP and the NAL
1760	        unit in the last contained aggregation unit is the last VCL
1761	        NAL unit, in decoding order, of a picture.
1762	      o The NAL unit in the payload of the PACI is an FU with its E
1763	        bit equal to 1 and the FU payload containing a fragment of
1764	        the last VCL NAL unit, in decoding order of a picture.

1766	   RES (6 bits)
1767	      MUST be equal to 0.  Reserved for future extensions.

1769	   The value of PHSsize MUST be set to 3.  Receivers MUST allow
1770	   other values of the fields F0, F1, F2, Y, and PHSsize, and MUST
1771	   ignore any additional fields, when present, than specified above
1772	   in the PHES.

1774	5 Packetization Rules

1776	   The following packetization rules apply:

1778	   o  If tx-mode is equal to "MSM" or sprop-max-don-diff is greater
1779	      than 0 for an RTP stream, the transmission order of NAL units
1780	      carried in the RTP stream MAY be different than the NAL unit
1781	      decoding order.  Otherwise (tx-mode is equal to "SSM" and sprop-
1782	      max-don-diff is equal to 0 for an RTP stream), the transmission
1783	      order of NAL units carried in the RTP stream MUST be the same as
1784	      the NAL unit decoding order.

1786	   o  A NAL unit of a small size SHOULD be encapsulated in an
1787	      aggregation packet together with one or more other NAL units
1788	      in order to avoid the unnecessary packetization overhead for
1789	      small NAL units.  For example, non-VCL NAL units such as
1790	      access unit delimiters, parameter sets, or SEI NAL units are
1791	      typically small and can often be aggregated with VCL NAL units
1792	      without violating MTU size constraints.

1794	   o  Each non-VCL NAL unit SHOULD, when possible from an MTU size
1795	      match viewpoint, be encapsulated in an aggregation packet
1796	      together with its associated VCL NAL unit, as typically a non-
1797	      VCL NAL unit would be meaningless without the associated VCL
1798	      NAL unit being available.

1800	   o  For carrying exactly one NAL unit in an RTP packet, a single
1801	      NAL unit packet MUST be used.

1803	6 De-packetization Process

1805	   The general concept behind de-packetization is to get the NAL
1806	   units out of the RTP packets in an RTP stream and all RTP streams
1807	   the RTP stream depends on, if any, and pass them to the decoder
1808	   in the NAL unit decoding order.

1810	   The de-packetization process is implementation dependent.
1811	   Therefore, the following description should be seen as an example
1812	   of a suitable implementation.  Other schemes may be used as well
1813	   as long as the output for the same input is the same as the
1814	   process described below.  The output is the same when the set of
1815	   output NAL units and their order are both identical.
1816	   Optimizations relative to the described algorithms are possible.

1818	   All normal RTP mechanisms related to buffer management apply.  In
1819	   particular, duplicated or outdated RTP packets (as indicated by
1820	   the RTP sequences number and the RTP timestamp) are removed.  To
1821	   determine the exact time for decoding, factors such as a possible
1822	   intentional delay to allow for proper inter-stream
1823	   synchronization must be factored in.

1825	   NAL units with NAL unit type values in the range of 0 to 47,
1826	   inclusive may be passed to the decoder.  NAL-unit-like structures
1827	   with NAL unit type values in the range of 48 to 63, inclusive,
1828	   MUST NOT be passed to the decoder.

1830	   The receiver includes a receiver buffer, which is used to
1831	   compensate for transmission delay jitter within individual RTP
1832	   streams and across RTP streams, to reorder NAL units from
1833	   transmission order to the NAL unit decoding order, and to recover
1834	   the NAL unit decoding order in MSM, when applicable.  In this
1835	   section, the receiver operation is described under the assumption
1836	   that there is no transmission delay jitter within an RTP stream
1837	   and across RTP streams.  To make a difference from a practical
1838	   receiver buffer that is also used for compensation of
1839	   transmission delay jitter, the receiver buffer is here after
1840	   called the de-packetization buffer in this section.  Receivers
1841	   should also prepare for transmission delay jitter; i.e. either
1842	   reserve separate buffers for transmission delay jitter buffering
1843	   and de-packetization buffering or use a receiver buffer for both
1844	   transmission delay jitter and de-packetization.  Moreover,
1845	   receivers should take transmission delay jitter into account in
1846	   the buffering operation; e.g. by additional initial buffering
1847	   before starting of decoding and playback.

1849	   If only one RTP stream is being received and sprop-max-don-diff
1850	   of the only RTP stream being received is equal to 0, the de-
1851	   packetization buffer size is zero bytes, i.e. the NAL units
1852	   carried in the RTP stream are directly passed to the decoder in
1853	   their transmission order, which is identical to the decoding
1854	   order of the NAL units. Otherwise, the process described in the
1855	   remainder of this section applies.

1857	   There are two buffering states in the receiver: initial buffering
1858	   and buffering while playing.  Initial buffering starts when the
1859	   reception is initialized.  After initial buffering, decoding and
1860	   playback are started, and the buffering-while-playing mode is
1861	   used.

1863	   Regardless of the buffering state, the receiver stores incoming
1864	   NAL units, in reception order, into the de-packetization buffer.
1865	   NAL units carried in RTP packets are stored in the de-
1866	   packetization buffer individually, and the value of AbsDon is
1867	   calculated and stored for each NAL unit.  When MSM is in use, NAL
1868	   units of all RTP streams of a bitstream are stored in the same
1869	   de-packetization buffer.  When NAL units carried in any two RTP
1870	   streams are available to be placed into the de-packetization
1871	   buffer, those NAL units carried in the RTP stream that is lower
1872	   in the dependency tree are placed into the buffer first.  For
1873	   example, if RTP stream A depends on RTP stream B, then NAL units
1874	   carried in RTP stream B are placed into the buffer first.

1876	   Initial buffering lasts until condition A (the difference between
1877	   the greatest and smallest AbsDon values of the NAL units in the
1878	   de-packetization buffer is greater than or equal to the value of
1879	   sprop-max-don-diff of the highest RTP stream) or condition B (the
1880	   number of NAL units in the de-packetization buffer is greater
1881	   than the value of sprop-depack-buf-nalus) is true.

1883	   After initial buffering, whenever condition A or condition B is
1884	   true, the following operation is repeatedly applied until both
1885	   condition A and condition A become false:

1887	   o  The NAL unit in the de-packetization buffer with the smallest
1888	      value of AbsDon is removed from the de-packetization buffer
1889	      and passed to the decoder.

1891	   When no more NAL units are flowing into the de-packetization
1892	   buffer, all NAL units remaining in the de-packetization buffer
1893	   are removed from the buffer and passed to the decoder in the
1894	   order of increasing AbsDon values.

1896	7 Payload Format Parameters

1898	   This section specifies the parameters that MAY be used to select
1899	   optional features of the payload format and certain features or
1900	   properties of the bitstream or the RTP stream.  The parameters
1901	   are specified here as part of the media type registration for the
1902	   HEVC codec.  A mapping of the parameters into the Session
1903	   Description Protocol (SDP) [RFC4566] is also provided for
1904	   applications that use SDP.  Equivalent parameters could be
1905	   defined elsewhere for use with control protocols that do not use
1906	   SDP.

1908	7.1 Media Type Registration

1910	   The media subtype for the HEVC codec is allocated from the IETF
1911	   tree.

1913	   The receiver MUST ignore any unrecognized parameter.

1915	   Media Type name:     video

1917	   Media subtype name:  H265

1919	   Required parameters: none

1921	   OPTIONAL parameters:

1923	      profile-space, tier-flag, profile-id, profile-compatibility-
1924	      indicator, interop-constraints, and level-id:

1926	         These parameters indicate the profile, tier, default level,
1927	         and some constraints of the bitstream carried by the RTP
1928	         stream and all RTP streams the RTP stream depends on, or a
1929	         specific set of the profile, tier, default level, and some
1930	         constraints the receiver supports.

1932	         The profile and some constraints are indicated collectively
1933	         by profile-space, profile-id, profile-compatibility-
1934	         indicator, and interop-constraints.  The profile specifies
1935	         the subset of coding tools that may have been used to
1936	         generate the bitstream or that the receiver supports.

1938	            Informative note: There are 32 values of profile-id, and
1939	            there are 32 flags in profile-compatibility-indicator,
1940	            each flag corresponding to one value of profile-id.
1941	            According to HEVC version 1 in [HEVC], when more than
1942	            one of the 32 flags is set for a bitstream, the
1943	            bitstream would comply with all the profiles
1944	            corresponding to the set flags.  However, in a draft of
1945	            HEVC version 2 in [HEVC draft v2], subclause A.3.5, 19
1946	            Format Range Extensions profiles have been specified,
1947	            all using the same value of profile-id (4),
1948	            differentiated by some of the 48 bits in interop-
1949	            constraints - this (rather unexpected way of profile
1950	            signalling) means that one of the 32 flags may
1951	            correspond to multiple profiles.  To be able to support
1952	            whatever HEVC extension profile that might be specified
1953	            and indicated using profile-space, profile-id, profile-
1954	            compatibility-indicator, and interop-constraints in the
1955	            future, it would be safe to require symmetric use of
1956	            these parameters in SDP offer/answer unless recv-sub-
1957	            layer-id is included in the SDP answer for choosing one
1958	            of the sub-layers offered.

1960	         The tier is indicated by tier-flag.  The default level is
1961	         indicated by level-id.  The tier and the default level
1962	         specify the limits on values of syntax elements or
1963	         arithmetic combinations of values of syntax elements that
1964	         are followed when generating the bitstream or that the
1965	         receiver supports.

1967	         A set of profile-space, tier-flag, profile-id, profile-
1968	         compatibility-indicator, interop-constraints, and level-id
1969	         parameters ptlA is said to be consistent with another set
1970	         of these parameters ptlB if any decoder that conforms to
1971	         the profile, tier, level, and constraints indicated by ptlB
1972	         can decode any bitstream that conforms to the profile,
1973	         tier, level, and constraints indicated by ptlA.

1975	         In SDP offer/answer, when the SDP answer does not include
1976	         the recv-sub-layer-id parameter that is less than the
1977	         sprop-sub-layer-id parameter in the SDP offer, the
1978	         following applies:

1980	            o The profile-space, tier-flag, profile-id, profile-
1981	              compatibility-indicator, and interop-constraints
1982	              parameters MUST be used symmetrically, i.e. the value
1983	              of each of these parameters in the offer MUST be the
1984	              same as that in the answer, either explicitly
1985	              signalled or implicitly inferred.
1986	            o The level-id parameter is changeable as long as the
1987	              highest level indicated by the answer is either equal
1988	              to or lower than that in the offer.  Note that the
1989	              highest level is indicated by level-id and max-recv-
1990	              level-id together.

1992	         In SDP offer/answer, when the SDP answer does include the
1993	         recv-sub-layer-id parameter that is less than the sprop-
1994	         sub-layer-id parameter in the SDP offer, the set of
1995	         profile-space, tier-flag, profile-id, profile-
1996	         compatibility-indicator, interop-constraints, and level-id
1997	         parameters included in the answer MUST be consistent with
1998	         that for the chosen sub-layer representation as indicated
1999	         in the SDP offer, with the exception that the level-id
2000	         parameter in the SDP answer is changable as long as the
2001	         highest level indicated by the answer is either lower than
2002	         or equal to that in the offer.

2004	         More specifications of these parameters, including how they
2005	         relate to the values of the profile, tier, and level syntax
2006	         elements specified in [HEVC] are provided below.

2008	      profile-space, profile-id:

2010	         The value of profile-space MUST be in the range of 0 to 3,
2011	         inclusive.  The value of profile-id MUST be in the range of
2012	         0 to 31, inclusive.

2014	         When profile-space is not present, a value of 0 MUST be
2015	         inferred.  When profile-id is not present, a value of 1
2016	         (i.e. the Main profile) MUST be inferred.

2018	         When used to indicate properties of a bitstream, profile-
2019	         space and profile-id are derived from the profile, tier,
2020	         and level syntax elements in SPS or VPS NAL units as
2021	         follows, where general_profile_space, general_profile_idc,
2022	         sub_layer_profile_space[j], and sub_layer_profile_idc[j]
2023	         are specified in [HEVC]:

2025	            If the RTP stream is the highest RTP stream, the
2026	            following applies:

2028	            o profile_space = general_profile_space
2029	            o profile_id = general_profile_idc
2030	            Otherwise (the RTP stream is a dependee RTP stream), the
2031	            following applies, with j being the value of the sprop-
2032	            sub-layer-id parameter:

2034	            o profile_space = sub_layer_profile_space[j]
2035	            o profile_id = sub_layer_profile_idc[j]

2037	      tier-flag, level-id:

2039	         The value of tier-flag MUST be in the range of 0 to 1,
2040	         inclusive.  The value of level-id MUST be in the range of 0
2041	         to 255, inclusive.

2043	         If the tier-flag and level-id parameters are used to
2044	         indicate properties of a bitstream, they indicate the tier
2045	         and the highest level the bitstream complies with.

2047	         If the tier-flag and level-id parameters are used for
2048	         capability exchange, the following applies.  If max-recv-
2049	         level-id is not present, the default level defined by
2050	         level-id indicates the highest level the codec wishes to
2051	         support.  Otherwise, max-recv-level-id indicates the
2052	         highest level the codec supports for receiving.  For either
2053	         receiving or sending, all levels that are lower than the
2054	         highest level supported MUST also be supported.

2056	         If no tier-flag is present, a value of 0 MUST be inferred
2057	         and if no level-id is present, a value of 93 (i.e. level
2058	         3.1) MUST be inferred.

2060	         When used to indicate properties of a bitstream, the tier-
2061	         flag and level-id parameters are derived from the profile,
2062	         tier, and level syntax elements in SPS or VPS NAL units as
2063	         follows, where general_tier_flag, general_level_idc,
2064	         sub_layer_tier_flag[j], and sub_layer_level_idc[j] are
2065	         specified in [HEVC]:

2067	            If the RTP stream is the highest RTP stream, the
2068	            following applies:

2070	            o tier-flag = general_tier_flag
2071	            o level-id = general_level_idc

2073	            Otherwise (the RTP stream is a dependee RTP stream), the
2074	            following applies, with j being the value of the sprop-
2075	            sub-layer-id parameter:

2077	            o tier-flag = sub_layer_tier_flag[j]
2078	            o level-id = sub_layer_level_idc[j]

2080	      interop-constraints:

2082	         A base16 [RFC4648] (hexadecimal) representation of six
2083	         bytes of data, consisting of progressive_source_flag,
2084	         interlaced_source_flag, non_packed_constraint_flag,
2085	         frame_only_constraint_flag, and reserved_zero_44bits.

2087	         If the interop-constraints parameter is not present, the
2088	         following MUST be inferred:

2090	            o progressive_source_flag = 1
2091	            o interlaced_source_flag = 0
2092	            o non_packed_constraint_flag = 1
2093	            o frame_only_constraint_flag = 1
2094	            o reserved_zero_44bits = 0

2096	         When the interop-constraints parameter is used to indicate
2097	         properties of a bitstream, the following applies, where
2098	         general_progressive_source_flag,
2099	         general_interlaced_source_flag,
2100	         general_non_packed_constraint_flag,
2101	         general_non_packed_constraint_flag,
2102	         general_frame_only_constraint_flag,
2103	         general_reserved_zero_44bits,
2104	         sub_layer_progressive_source_flag[j],
2105	         sub_layer_interlaced_source_flag[j],
2106	         sub_layer_non_packed_constraint_flag[j],
2107	         sub_layer_frame_only_constraint_flag[j], and
2108	         sub_layer_reserved_zero_44bits[j] are specified in [HEVC]:

2110	            If the RTP stream is the highest RTP stream, the
2111	            following applies:

2113	            o progressive_source_flag =
2114	            general_progressive_source_flag
2115	            o interlaced_source_flag =
2116	            general_interlaced_source_flag
2117	            o non_packed_constraint_flag =
2118	                              general_non_packed_constraint_flag
2119	            o frame_only_constraint_flag =
2120	                              general_frame_only_constraint_flag
2121	            o reserved_zero_44bits = general_reserved_zero_44bits

2123	            Otherwise (the RTP stream is a dependee RTP stream), the
2124	            following applies, with j being the value of the sprop-
2125	            sub-layer-id parameter:

2127	            o progressive_source_flag =
2128	                              sub_layer_progressive_source_flag[j]
2129	            o interlaced_source_flag =
2130	                              sub_layer_interlaced_source_flag[j]
2131	            o non_packed_constraint_flag =

2133	               sub_layer_non_packed_constraint_flag[j]
2134	            o frame_only_constraint_flag =

2136	               sub_layer_frame_only_constraint_flag[j]
2137	            o reserved_zero_44bits =
2138	            sub_layer_reserved_zero_44bits[j]

2140	         Using interop-constraints for capability exchange results
2141	         in a requirement on any bitstream to be compliant with the
2142	         interop-constraints.

2144	      profile-compatibility-indicator:

2146	         A base16 [RFC4648] representation of four bytes of data.

2148	         When profile-compatibility-indicator is used to indicate
2149	         properties of a bitstream, the following applies, where
2150	         general_profile_compatibility_flag[j] and
2151	         sub_layer_profile_compatibility_flag[i][j] are specified in
2152	         [HEVC]:

2154	            The profile-compatibility-indicator in this case
2155	            indicates additional profiles to the profile defined by
2156	            profile_space, profile_id, and interop-constraints the
2157	            bitstream conforms to.  A decoder that conforms to any
2158	            of all the profiles the bitstream conforms to would be
2159	            capable of decoding the bitstream.  These additional
2160	            profiles are defined by profile-space, each set bit of
2161	            profile-compatibility-indicator, and interop-
2162	            constraints.

2164	            If the RTP stream is the highest RTP stream, the
2165	            following applies for each value of j in the range of 0
2166	            to 31, inclusive:

2168	            o bit j of profile-compatibility-indicator =
2169	                  general_profile_compatibility_flag[j]

2171	            Otherwise (the RTP stream is a dependee RTP stream), the
2172	            following applies for i equal to sprop-sub-layer-id and
2173	            for each value of j in the range of 0 to 31, inclusive:

2175	            o bit j of profile-compatibility-indicator =
2176	                  sub_layer_profile_compatibility_flag[i][j]

2178	         Using profile-compatibility-indicator for capability
2179	         exchange results in a requirement on any bitstream to be
2180	         compliant with the profile-compatibility-indicator.  This
2181	         is intended to handle cases where any future HEVC profile
2182	         is defined as an intersection of two or more profiles.

2184	         If this parameter is not present, this parameter defaults
2185	         to the following: bit j, with j equal to profile-id, of
2186	         profile-compatibility-indicator is inferred to be equal to
2187	         1, and all other bits are inferred to be equal to 0.

2189	      sprop-sub-layer-id:

2191	         This parameter MAY be used to indicate the highest allowed
2192	         value of TID in the bitstream.  When not present, the value
2193	         of sprop-sub-layer-id is inferred to be equal to 6.

2195	         The value of sprop-sub-layer-id MUST be in the range of 0
2196	         to 6, inclusive.

2198	      recv-sub-layer-id:

2200	         This parameter MAY be used to signal a receiver's choice of
2201	         the offered or declared sub-layer representations in the
2202	         sprop-vps.  The value of recv-sub-layer-id indicates the
2203	         TID of the highest sub-layer of the bitstream that a
2204	         receiver supports.  When not present, the value of recv-
2205	         sub-layer-id is inferred to be equal to the value of the
2206	         sprop-sub-layer-id parameter in the SDP offer.

2208	         The value of recv-sub-layer-id MUST be in the range of 0 to
2209	         6, inclusive.

2211	      max-recv-level-id:

2213	         This parameter MAY be used to indicate the highest level a
2214	         receiver supports.  The highest level the receiver supports
2215	         is equal to the value of max-recv-level-id divided by 30.

2217	         The value of max-recv-level-id MUST be in the range of 0
2218	         to 255, inclusive.

2220	         When max-recv-level-id is not present, the value is
2221	         inferred to be equal to level-id.

2223	         max-recv-level-id MUST NOT be present when the highest
2224	         level the receiver supports is not higher than the default
2225	         level.

2227	      tx-mode:

2229	         This parameter indicates whether the transmission mode is SSM
2230	         or MSM.

2232	         The value of tx-mode MUST be equal to either "MSM" or "SSM".
2233	         When not present, the value of tx-mode is inferred to be
2234	         equal to "SSM".

2236	         If the value is equal to "MSM", MSM MUST be in use.  Otherwise
2237	         (the value is equal to "SSM"), SSM MUST be in use.

2239	         The value of tx-mode MUST be equal to "MSM" for all RTP
2240	         sessions in an MSM.

2242	      sprop-vps:

2244	         This parameter MAY be used to convey any video parameter
2245	         set NAL unit of the bitstream for out-of-band transmission
2246	         of video parameter sets.  The parameter MAY also be used
2247	         for capability exchange and to indicate sub-stream
2248	         characteristics (i.e. properties of sub-layer
2249	         representations as defined in [HEVC]).  The value of the
2250	         parameter is a comma-separated (',') list of base64
2251	         [RFC4648] representations of the video parameter set NAL
2252	         units as specified in Section 7.3.2.1 of [HEVC].

2254	         The sprop-vps parameter MAY contain one or more than one
2255	         video parameter set NAL unit. However, all other video
2256	         parameter sets contained in the sprop-vps parameter MUST be
2257	         consistent with the first video parameter set in the sprop-
2258	         vps parameter.  A video parameter set vpsB is said to be
2259	         consistent with another video parameter set vpsA if any
2260	         decoder that conforms to the profile, tier, level, and
2261	         constraints indicated by the 12 bytes of data starting from
2262	         the syntax element general_profile_space to the syntax
2263	         element general_level_id, inclusive, in the first
2264	         profile_tier_level( ) syntax structure in vpsA can decode
2265	         any bitstream that conforms to the profile, tier, level,
2266	         and constraints indicated by the 12 bytes of data starting
2267	         from the syntax element general_profile_space to the syntax
2268	         element general_level_id, inclusive, in the first
2269	         profile_tier_level( ) syntax structure in vpsB.

2271	      sprop-sps:

2273	         This parameter MAY be used to convey sequence parameter set
2274	         NAL units of the bitstream for out-of-band transmission of
2275	         sequence parameter sets.  The value of the parameter is a
2276	         comma-separated (',') list of base64 [RFC4648]
2277	         representations of the sequence parameter set NAL units as
2278	         specified in Section 7.3.2.2 of [HEVC].

2280	      sprop-pps:

2282	         This parameter MAY be used to convey picture parameter set
2283	         NAL units of the bitstream for out-of-band transmission of
2284	         picture parameter sets.  The value of the parameter is a
2285	         comma-separated (',') list of base64 [RFC4648]
2286	         representations of the picture parameter set NAL units as
2287	         specified in Section 7.3.2.3 of [HEVC].

2289	      sprop-sei:

2291	         This parameter MAY be used to convey one or more SEI
2292	         messages that describe bitstream characteristics.  When
2293	         present, a decoder can rely on the bitstream
2294	         characteristics that are described in the SEI messages for
2295	         the entire duration of the session, independently from the
2296	         persistence scopes of the SEI messages as specified in
2297	         [HEVC].

2299	         The value of the parameter is a comma-separated (',') list
2300	         of base64 [RFC4648] representations of SEI NAL units as
2301	         specified in Section 7.3.2.4 of [HEVC].

2303	            Informative note: Intentionally, no list of applicable
2304	            or inapplicable SEI messages is specified here.
2305	            Conveying certain SEI messages in sprop-sei may be
2306	            sensible in some application scenarios and meaningless
2307	            in others.  However, a few examples are described below:

2309	           1) In an environment where the bitstream was created
2310	               from film-based source material, and no splicing is
2311	               going to occur during the lifetime of the session,
2312	               the film grain characteristics SEI message or the
2313	               tone mapping information SEI message are likely
2314	               meaningful, and sending them in sprop-sei rather than
2315	               in the bitstream at each entry point may help saving
2316	               bits and allows to configure the renderer only once,
2317	               avoiding unwanted artifacts.
2318	           2) The structure of pictures information SEI message in
2319	               sprop-sei can be used to inform a decoder of
2320	               information on the NAL unit types, picture order
2321	               count values, and prediction dependencies of a
2322	               sequence of pictures.  Having such knowledge can be
2323	               helpful for error recovery.
2324	           3) Examples for SEI messages that would be meaningless
2325	               to be conveyed in sprop-sei include the decoded
2326	               picture hash SEI message (it is close to impossible
2327	               that all decoded pictures have the same hash-tag),
2328	               the display orientation SEI message when the device
2329	               is a handheld device (as the display orientation may
2330	               change when the handheld device is turned around), or
2331	               the filler payload SEI message (as there is no point
2332	               in just having more bits in SDP).

2334	      max-lsr, max-lps, max-cpb, max-dpb, max-br, max-tr, max-tc:

2336	         These parameters MAY be used to signal the capabilities of
2337	         a receiver implementation.  These parameters MUST NOT be
2338	         used for any other purpose.  The highest level (specified
2339	         by max-recv-level-id) MUST be such that the receiver is
2340	         fully capable of supporting.  max-lsr, max-lps, max-cpb,
2341	         max-dpb, max-br, max-tr, and max-tc MAY be used to indicate
2342	         capabilities of the receiver that extend the required
2343	         capabilities of the highest level, as specified below.

2345	         When more than one parameter from the set (max-lsr, max-
2346	         lps, max-cpb, max-dpb, max-br, max-tr, max-tc) is present,
2347	         the receiver MUST support all signaled capabilities
2348	         simultaneously.  For example, if both max-lsr and max-br
2349	         are present, the highest level with the extension of both
2350	         the picture rate and bitrate is supported.  That is, the
2351	         receiver is able to decode bitstreams in which the luma
2352	         sample rate is up to max-lsr (inclusive), the bitrate is up
2353	         to max-br (inclusive), the coded picture buffer size is
2354	         derived as specified in the semantics of the max-br
2355	         parameter below, and the other properties comply with the
2356	         highest level specified by max-recv-level-id.

2358	            Informative note: When the OPTIONAL media type
2359	            parameters are used to signal the properties of a
2360	            bitstream, and max-lsr, max-lps, max-cpb, max-dpb, max-
2361	            br, max-tr, and max-tc are not present, the values of
2362	            profile-space, tier-flag, profile-id, profile-
2363	            compatibility-indicator, interop-constraints, and level-
2364	            id must always be such that the bitstream complies fully
2365	            with the specified profile, tier, and level.

2367	      max-lsr:
2368	         The value of max-lsr is an integer indicating the maximum
2369	         processing rate in units of luma samples per second.  The
2370	         max-lsr parameter signals that the receiver is capable of
2371	         decoding video at a higher rate than is required by the
2372	         highest level.

2374	         When max-lsr is signaled, the receiver MUST be able to
2375	         decode bitstreams that conform to the highest level, with
2376	         the exception that the MaxLumaSR value in Table A-2 of
2377	         [HEVC] for the highest level is replaced with the value of
2378	         max-lsr.  Senders MAY use this knowledge to send pictures
2379	         of a given size at a higher picture rate than is indicated
2380	         in the highest level.

2382	         When not present, the value of max-lsr is inferred to be
2383	         equal to the value of MaxLumaSR given in Table A-2 of
2384	         [HEVC] for the highest level.

2386	         The value of max-lsr MUST be in the range of MaxLumaSR to
2387	         16 * MaxLumaSR, inclusive, where MaxLumaSR is given in
2388	         Table A-2 of [HEVC] for the highest level.

2390	      max-lps:
2391	         The value of max-lps is an integer indicating the maximum
2392	         picture size in units of luma samples.  The max-lps
2393	         parameter signals that the receiver is capable of decoding
2394	         larger picture sizes than are required by the highest
2395	         level.  When max-lps is signaled, the receiver MUST be able
2396	         to decode bitstreams that conform to the highest level,
2397	         with the exception that the MaxLumaPS value in Table A-1 of
2398	         [HEVC] for the highest level is replaced with the value of
2399	         max-lps.  Senders MAY use this knowledge to send larger
2400	         pictures at a proportionally lower picture rate than is
2401	         indicated in the highest level.

2403	         When not present, the value of max-lps is inferred to be
2404	         equal to the value of MaxLumaPS given in Table A-1 of
2405	         [HEVC] for the highest level.

2407	         The value of max-lps MUST be in the range of MaxLumaPS to
2408	         16 * MaxLumaPS, inclusive, where MaxLumaPS is given in
2409	         Table A-1 of [HEVC] for the highest level.

2411	      max-cpb:
2412	         The value of max-cpb is an integer indicating the maximum
2413	         coded picture buffer size in units of CpbBrVclFactor bits
2414	         for the VCL HRD parameters and in units of CpbBrNalFactor
2415	         bits for the NAL HRD parameters, where CpbBrVclFactor and
2416	         CpbBrNalFactor are defined in Section A.4 of [HEVC].  The
2417	         max-cpb parameter signals that the receiver has more memory
2418	         than the minimum amount of coded picture buffer memory
2419	         required by the highest level.  When max-cpb is signaled,
2420	         the receiver MUST be able to decode bitstreams that conform
2421	         to the highest level, with the exception that the MaxCPB
2422	         value in Table A-1 of [HEVC] for the highest level is
2423	         replaced with the value of max-cpb.  Senders MAY use this
2424	         knowledge to construct coded bitstreams with greater
2425	         variation of bitrate than can be achieved with the MaxCPB
2426	         value in Table A-1 of [HEVC].

2428	         When not present, the value of max-cpb is inferred to be
2429	         equal to the value of MaxCPB given in Table A-1 of [HEVC]
2430	         for the highest level.

2432	         The value of max-cpb MUST be in the range of MaxCPB to
2433	         16 * MaxCPB, inclusive, where MaxLumaCPB is given in Table
2434	         A-1 of [HEVC] for the highest level.

2436	            Informative note: The coded picture buffer is used in
2437	            the hypothetical reference decoder (Annex C of HEVC).
2438	            The use of the hypothetical reference decoder is
2439	            recommended in HEVC encoders to verify that the produced
2440	            bitstream conforms to the standard and to control the
2441	            output bitrate.  Thus, the coded picture buffer is
2442	            conceptually independent of any other potential buffers
2443	            in the receiver, including de-packetization and de-
2444	            jitter buffers.  The coded picture buffer need not be
2445	            implemented in decoders as specified in Annex C of HEVC,
2446	            but rather standard-compliant decoders can have any
2447	            buffering arrangements provided that they can decode
2448	            standard-compliant bitstreams.  Thus, in practice, the
2449	            input buffer for a video decoder can be integrated with
2450	            de-packetization and de-jitter buffers of the receiver.

2452	         max-dpb:
2453	         The value of max-dpb is an integer indicating the maximum
2454	         decoded picture buffer size in units decoded pictures at
2455	         the MaxLumaPS for the highest level, i.e. the number of
2456	         decoded pictures at the maximum picture size defined by the
2457	         highest level.  The value of max-dpb MUST be in the range
2458	         of 1 to 16, respectively.  The max-dpb parameter signals
2459	         that the receiver has more memory than the minimum amount
2460	         of decoded picture buffer memory required by default, which
2461	         is MaxDpbPicBuf as defined in [HEVC] (equal to 6).  When
2462	         max-dpb is signaled, the receiver MUST be able to decode
2463	         bitstreams that conform to the highest level, with the
2464	         exception that the MaxDpbPicBuff value defined in [HEVC] as
2465	         6 is replaced with the value of max-dpb.  Consequently, a
2466	         receiver that signals max-dpb MUST be capable of storing
2467	         the following number of decoded pictures (MaxDpbSize) in
2468	         its decoded picture buffer:

2470	           if( PicSizeInSamplesY <= ( MaxLumaPS >> 2 ) )
2471	              MaxDpbSize = Min( 4 * max-dpb, 16 )
2472	           else if ( PicSizeInSamplesY <= ( MaxLumaPS >> 1 ) )
2473	              MaxDpbSize = Min( 2 * max-dpb, 16 )
2474	           else if ( PicSizeInSamplesY <= ( ( 3 * MaxLumaPS ) >> 2
2475	         ) )
2476	              MaxDpbSize = Min( (4 * max-dpb) / 3, 16 )
2477	           else
2478	              MaxDpbSize = max-dpb

2480	         Wherein MaxLumaPS given in Table A-1 of [HEVC] for the
2481	         highest level and PicSizeInSamplesY is the current size of
2482	         each decoded picture in units of luma samples as defined in
2483	         [HEVC].

2485	         The value of max-dpb MUST be greater than or equal to the
2486	         value of MaxDpbPicBuf (i.e. 6) as defined in [HEVC].
2487	         Senders MAY use this knowledge to construct coded
2488	         bitstreams with improved compression.

2490	         When not present, the value of max-dpb is inferred to be
2491	         equal to the value of MaxDpbPicBuf (i.e. 6) as defined in
2492	         [HEVC].

2494	            Informative note: This parameter was added primarily to
2495	            complement a similar codepoint in the ITU-T
2496	            Recommendation H.245, so as to facilitate signaling
2497	            gateway designs.  The decoded picture buffer stores
2498	            reconstructed samples.  There is no relationship between
2499	            the size of the decoded picture buffer and the buffers
2500	            used in RTP, especially de-packetization and de-jitter
2501	            buffers.

2503	      max-br:
2504	         The value of max-br is an integer indicating the maximum
2505	         video bitrate in units of CpbBrVclFactor bits per second
2506	         for the VCL HRD parameters and in units of CpbBrNalFactor
2507	         bits per second for the NAL HRD parameters, where
2508	         CpbBrVclFactor and CpbBrNalFactor are defined in Section
2509	         A.4 of [HEVC].

2511	         The max-br parameter signals that the video decoder of the
2512	         receiver is capable of decoding video at a higher bitrate
2513	         than is required by the highest level.

2515	         When max-br is signaled, the video codec of the receiver
2516	         MUST be able to decode bitstreams that conform to the
2517	         highest level, with the following exceptions in the limits
2518	         specified by the highest level:

2520	          o The value of max-br replaces the MaxBR value in Table A-
2521	            2 of [HEVC] for the highest level.
2522	          o When the max-cpb parameter is not present, the result of
2523	            the following formula replaces the value of MaxCPB in
2524	            Table A-1 of [HEVC]:

2526	               (MaxCPB of the highest level) * max-br / (MaxBR of
2527	               the highest level)

2529	         For example, if a receiver signals capability for Main
2530	         profile Level 2 with max-br equal to 2000, this indicates a
2531	         maximum video bitrate of 2000 kbits/sec for VCL HRD
2532	         parameters, a maximum video bitrate of 2200 kbits/sec for
2533	         NAL HRD parameters, and a CPB size of 2000000 bits (2000000
2534	         / 1500000 * 1500000).

2536	         Senders MAY use this knowledge to send higher bitrate video
2537	         as allowed in the level definition of Annex A of HEVC to
2538	         achieve improved video quality.

2540	         When not present, the value of max-br is inferred to be
2541	         equal to the value of MaxBR given in Table A-2 of [HEVC]
2542	         for the highest level.

2544	         The value of max-br MUST be in the range of MaxBR to
2545	         16 * MaxBR, inclusive, where MaxBR is given in Table A-2 of
2546	         [HEVC] for the highest level.

2548	            Informative note: This parameter was added primarily to
2549	            complement a similar codepoint in the ITU-T
2550	            Recommendation H.245, so as to facilitate signaling
2551	            gateway designs.  The assumption that the network is
2552	            capable of handling such bitrates at any given time
2553	            cannot be made from the value of this parameter.  In
2554	            particular, no conclusion can be drawn that the signaled
2555	            bitrate is possible under congestion control
2556	            constraints.

2558	      max-tr:
2559	         The value of max-tr is an integer indication the maximum
2560	         number of tile rows.  The max-tr parameter signals that the
2561	         receiver is capable of decoding video with a larger number
2562	         of tile rows than the value allowed by the highest level.

2564	         When max-tr is signaled, the receiver MUST be able to
2565	         decode bitstreams that conform to the highest level, with
2566	         the exception that the MaxTileRows value in Table A-1 of
2567	         [HEVC] for the highest level is replaced with the value of
2568	         max-tr.

2570	         Senders MAY use this knowledge to send pictures utilizing a
2571	         larger number of tile rows than the value allowed by the
2572	         highest level.

2574	         When not present, the value of max-tr is inferred to be
2575	         equal to the value of MaxTileRows given in Table A-1 of
2576	         [HEVC] for the highest level.

2578	         The value of max-tr MUST be in the range of MaxTileRows to
2579	         16 * MaxTileRows, inclusive, where MaxTileRows is given in
2580	         Table A-1 of [HEVC] for the highest level.

2582	      max-tc:
2583	         The value of max-tc is an integer indication the maximum
2584	         number of tile columns.  The max-tc parameter signals that
2585	         the receiver is capable of decoding video with a larger
2586	         number of tile columns than the value allowed by the
2587	         highest level.

2589	         When max-tc is signaled, the receiver MUST be able to
2590	         decode bitstreams that conform to the highest level, with
2591	         the exception that the MaxTileCols value in Table A-1 of
2592	         [HEVC] for the highest level is replaced with the value of
2593	         max-tc.

2595	         Senders MAY use this knowledge to send pictures utilizing a
2596	         larger number of tile columns than the value allowed by the
2597	         highest level.

2599	         When not present, the value of max-tc is inferred to be
2600	         equal to the value of MaxTileCols given in Table A-1 of
2601	         [HEVC] for the highest level.

2603	         The value of max-tc MUST be in the range of MaxTileCols to
2604	         16 * MaxTileCols, inclusive, where MaxTileCols is given in
2605	         Table A-1 of [HEVC] for the highest level.

2607	      max-fps:

2609	         The value of max-fps is an integer indicating the maximum
2610	         picture rate in units of pictures per 100 seconds that can
2611	         be effectively processed by the receiver.  The max-fps
2612	         parameter MAY be used to signal that the receiver has a
2613	         constraint in that it is not capable of processing video
2614	         effectively at the full picture rate that is implied by the
2615	         highest level and, when present, one or more of the
2616	         parameters max-lsr, max-lps, and max-br.

2618	         The value of max-fps is not necessarily the picture rate at
2619	         which the maximum picture size can be sent, it constitutes
2620	         a constraint on maximum picture rate for all resolutions.

2622	            Informative note: The max-fps parameter is semantically
2623	            different from max-lsr, max-lps, max-cpb, max-dpb, max-
2624	            br, max-tr, and max-tc in that max-fps is used to signal
2625	            a constraint, lowering the maximum picture rate from
2626	            what is implied by other parameters.

2628	         The encoder MUST use a picture rate equal to or less than
2629	         this value.  In cases where the max-fps parameter is absent
2630	         the encoder is free to choose any picture rate according to
2631	         the highest level and any signaled optional parameters.

2633	         The value of max-fps MUST be smaller than or equal to the
2634	         full picture rate that is implied by the highest level and,
2635	         when present, one or more of the parameters max-lsr, max-
2636	         lps, and max-br.

2638	      sprop-max-don-diff:

2640	         The value of this parameter MUST be equal to 0, if the RTP
2641	         stream does not depend on other RTP streams and there is no
2642	         NAL unit naluA that is followed in transmission order by
2643	         any NAL unit preceding naluA in decoding order.  Otherwise,
2644	         this parameter specifies the maximum absolute difference
2645	         between the decoding order number (i.e., AbsDon) values of
2646	         any two NAL units naluA and naluB, where naluA follows
2647	         naluB in decoding order and precedes naluB in transmission
2648	         order.

2650	         The value of sprop-max-don-diff MUST be an integer in the
2651	         range of 0 to 32767, inclusive.

2653	         When not present, the value of sprop-max-don-diff is
2654	         inferred to be equal to 0.

2656	         When the RTP stream depends on one or more other RTP
2657	         streams (in this case tx-mode MUST be equal to "MSM" and
2658	         MSM is in use), this parameter MUST be present and the
2659	         value MUST be greater than 0.

2661	            Informative note: When the RTP stream does not depend on
2662	            other RTP streams, either MSM or SSM may be in use.

2664	      sprop-depack-buf-nalus:

2666	         This parameter specifies the maximum number of NAL units
2667	         that precede a NAL unit in transmission order and follow
2668	         the NAL unit in decoding order.

2670	         The value of sprop-depack-buf-nalus MUST be an integer in
2671	         the range of 0 to 32767, inclusive.

2673	         When not present, the value of sprop-depack-buf-nalus is
2674	         inferred to be equal to 0.

2676	         When the RTP stream depends on one or more other RTP
2677	         streams (in this case tx-mode MUST be equal to "MSM" and
2678	         MSM is in use), this parameter MUST be present and the
2679	         value MUST be greater than 0.

2681	      sprop-depack-buf-bytes:

2683	         This parameter signals the required size of the de-
2684	         packetization buffer in units of bytes.  The value of the
2685	         parameter MUST be greater than or equal to the maximum
2686	         buffer occupancy (in units of bytes) of the de-
2687	         packetization buffer as specified in section 6.

2689	         The value of sprop-depack-buf-bytes MUST be an integer in
2690	         the range of 0 to 4294967295, inclusive.

2692	         When the RTP stream depends on one or more other RTP
2693	         streams (in this case tx-mode MUST be equal to "MSM" and
2694	         MSM is in use) or sprop-max-don-diff is present and greater
2695	         than 0, this parameter MUST be present and the value MUST
2696	         be greater than 0.

2698	            Informative note: The value of sprop-depack-buf-bytes
2699	            indicates the required size of the de-packetization
2700	            buffer only.  When network jitter can occur, an
2701	            appropriately sized jitter buffer has to be available as
2702	            well.

2704	      depack-buf-cap:

2706	         This parameter signals the capabilities of a receiver
2707	         implementation and indicates the amount of de-packetization
2708	         buffer space in units of bytes that the receiver has
2709	         available for reconstructing the NAL unit decoding order
2710	         from NAL units carried in one or more RTP streams.  A
2711	         receiver is able to handle any RTP stream, and all RTP
2712	         streams the RTP stream depends on, when present, for which
2713	         the value of the sprop-depack-buf-bytes parameter is
2714	         smaller than or equal to this parameter.

2716	         When not present, the value of depack-buf-cap is inferred
2717	         to be equal to 4294967295.  The value of depack-buf-cap
2718	         MUST be an integer in the range of 1 to 4294967295,
2719	         inclusive.

2721	            Informative note: depack-buf-cap indicates the maximum
2722	            possible size of the de-packetization buffer of the
2723	            receiver only.  When network jitter can occur, an
2724	            appropriately sized jitter buffer has to be available as
2725	            well.

2727	      sprop-segmentation-id:

2729	         This parameter MAY be used to signal the segmentation tools
2730	         present in the bitstream and that can be used for
2731	         parallelization.  The value of sprop-segmentation-id MUST
2732	         be an integer in the range of 0 to 3, inclusive.  When not
2733	         present, the value of sprop-segmentation-id is inferred to
2734	         be equal to 0.

2736	         When sprop-segmentation-id is equal to 0, no information
2737	         about the segmentation tools is provided.  When sprop-
2738	         segmentation-id is equal to 1, it indicates that slices are
2739	         present in the bitstream.  When sprop-segmentation-id is
2740	         equal to 2, it indicates that tiles are present in the
2741	         bitstream.  When sprop-segmentation-id is equal to 3, it
2742	         indicates that WPP is used in the bitstream.

2744	      sprop-spatial-segmentation-idc:

2746	         A base16 [RFC4648] representation of the syntax element
2747	         min_spatial_segmentation_idc as specified in [HEVC].  This
2748	         parameter MAY be used to describe parallelization
2749	         capabilities of the bitstream.

2751	      dec-parallel-cap:

2753	         This parameter MAY be used to indicate the decoder's
2754	         additional decoding capabilities given the presence of
2755	         tools enabling parallel decoding, such as slices, tiles,
2756	         and WPP, in the bitstream.  The decoding capability of the
2757	         decoder may vary with the setting of the parallel decoding
2758	         tools present in the bitstream, e.g. the size of the tiles
2759	         that are present in a bitstream.  Therefore, multiple
2760	         capability points may be provided, each indicating the
2761	         minimum required decoding capability that is associated
2762	         with a parallelism requirement, which is a requirement on
2763	         the bitstream that enables parallel decoding.

2765	         Each capability point is defined as a combination of 1) a
2766	         parallelism requirement, 2) a profile (determined by
2767	         profile-space and profile-id), 3) a highest level, and 4) a
2768	         maximum processing rate, a maximum picture size, and a
2769	         maximum video bitrate that may be equal to or greater than
2770	         that determined by the highest level.  The parameter's
2771	         syntax in ABNF [RFC5234] is as follows:

2773	            dec-parallel-cap = "dec-parallel-cap={" cap-point *(","
2774	                               cap-point) "}"

2776	            cap-point = ("w" / "t") ":" spatial-seg-idc 1*(";"
2777	                         cap-parameter)

2779	            spatial-seg-idc = 1*4DIGIT ; (1-4095)

2781	            cap-parameter = tier-flag / level-id / max-lsr
2782	                            / max-lps / max-br

2784	            tier-flag = "tier-flag" EQ ("0" / "1")

2786	            level-id  = "level-id" EQ 1*3DIGIT ; (0-255)

2788	            max-lsr   = "max-lsr" EQ  1*20DIGIT ; (0-
2789	            18,446,744,073,709,551,615)

2791	            max-lps   = "max-lps" EQ 1*10DIGIT ; (0-4,294,967,295)

2793	            max-br    = "max-br"  EQ 1*20DIGIT ; (0-
2794	            18,446,744,073,709,551,615)

2796	            EQ = "="

2798	         The set of capability points expressed by the dec-parallel-
2799	         cap parameter is enclosed in a pair of curly braces ("{}").
2800	         Each set of two consecutive capability points is separated
2801	         by a comma (',').  Within each capability point, each set
2802	         of two consecutive parameters, and when present, their
2803	         values, is separated by a semicolon (';').

2805	         The profile of all capability points is determined by
2806	         profile-space and profile-id that are outside the dec-
2807	         parallel-cap parameter.

2809	         Each capability point starts with an indication of the
2810	         parallelism requirement, which consists of a parallel tool
2811	         type, which may be equal to 'w' or 't', and a decimal value
2812	         of the spatial-seg-idc parameter.  When the type is 'w',
2813	         the capability point is valid only for H.265 bitstreams
2814	         with WPP in use, i.e. entropy_coding_sync_enabled_flag
2815	         equal to 1.  When the type is 't', the capability point is
2816	         valid only for H.265 bitstreams with WPP not in use (i.e.
2817	         entropy_coding_sync_enabled_flag equal to 0).  The
2818	         capability-point is valid only for H.265 bitstreams with
2819	         min_spatial_segmentation_idc equal to or greater than
2820	         spatial-seg-idc.

2822	         After the parallelism requirement indication, each
2823	         capability point continues with one or more pairs of
2824	         parameter and value in any order for any of the following
2825	         parameters:

2827	            o tier-flag
2828	            o level-id
2829	            o max-lsr
2830	            o max-lps
2831	            o max-br

2833	         At most one occurrence of each of the above five parameters
2834	         is allowed within each capability point.

2836	         The values of dec-parallel-cap.tier-flag and dec-parallel-
2837	         cap.level-id for a capability point indicate the highest
2838	         level of the capability point.  The values of dec-parallel-
2839	         cap.max-lsr, dec-parallel-cap.max-lps, and dec-parallel-
2840	         cap.max-br for a capability point indicate the maximum
2841	         processing rate in units of luma samples per second, the
2842	         maximum picture size in units of luma samples, and the
2843	         maximum video bitrate (in units of CpbBrVclFactor bits per
2844	         second for the VCL HRD parameters and in units of
2845	         CpbBrNalFactor bits per second for the NAL HRD parameters
2846	         where CpbBrVclFactor and CpbBrNalFactor are defined in
2847	         Section A.4 of [HEVC]).

2849	         When not present, the value of dec-parallel-cap.tier-flag
2850	         is inferred to be equal to the value of tier-flag outside
2851	         the dec-parallel-cap parameter.  When not present, the
2852	         value of dec-parallel-cap.level-id is inferred to be equal
2853	         to the value of max-recv-level-id outside the dec-parallel-
2854	         cap parameter.  When not present, the value of dec-
2855	         parallel-cap.max-lsr, dec-parallel-cap.max-lps, or dec-
2856	         parallel-cap.max-br is inferred to be equal to the value of
2857	         max-lsr, max-lps, or max-br, respectively, outside the dec-
2858	         parallel-cap parameter.

2860	         The general decoding capability, expressed by the set of
2861	         parameters outside of dec-parallel-cap, is defined as the
2862	         capability point that is determined by the following
2863	         combination of parameters: 1) the parallelism requirement
2864	         corresponding to the value of sprop-segmentation-id equal
2865	         to 0 for a bitstream, 2) the profile determined by profile-
2866	         space, profile-id, profile-compatibility-indicator, and
2867	         interop-constraints, 3) the tier and the highest level
2868	         determined by tier-flag and max-recv-level-id, and 4) the
2869	         maximum processing rate, the maximum picture size, and the
2870	         maximum video bitrate determined by the highest level.  The
2871	         general decoding capability MUST NOT be included as one of
2872	         the set of capability points in the dec-parallel-cap
2873	         parameter.

2875	         For example, the following parameters express the general
2876	         decoding capability of 720p30 (Level 3.1) plus an
2877	         additional decoding capability of 1080p30 (Level 4) given
2878	         that the spatially largest tile or slice used in the
2879	         bitstream is equal to or less than 1/3 of the picture size:

2881	            a=fmtp:98 level-id=93;dec-parallel-cap={t:8;level-
2882	            id=120}

2884	         For another example, the following parameters express an
2885	         additional decoding capability of 1080p30, using dec-
2886	         parallel-cap.max-lsr and dec-parallel-cap.max-lps, given
2887	         that WPP is used in the bitstream:

2889	            a=fmtp:98 level-id=93;dec-parallel-cap={w:8;
2890	                        max-lsr=62668800;max-lps=2088960}

2892	            Informative note: When min_spatial_segmentation_idc is
2893	            present in a bitstream and WPP is not used, [HEVC]
2894	            specifies that there is no slice or no tile in the
2895	            bitstream containing more than 4 * PicSizeInSamplesY /
2896	            ( min_spatial_segmentation_idc + 4 ) luma samples.

2898	      include-dph:

2900	         This parameter is used to indicate the capability and
2901	         preference to utilize or include decoded picture hash (DPH)
2902	         SEI messages (See Section D.3.19 of [HEVC]) in the
2903	         bitstream. DPH SEI messages can be used to detect picture
2904	         corruption so the receiver can request picture repair, see
2905	         Section 8.  The value is a comma separated list of hash
2906	         types that is supported or requested to be used, each hash
2907	         type provided as an unsigned integer value (0-255), with
2908	         the hash types listed from most preferred to the least
2909	         preferred.  Example: "include-dph=0,2", which indicates the
2910	         capability for MD5 (most preferred) and Checksum (less
2911	         preferred).  If the parameter is not included or the value
2912	         contains no hash types, then no capability to utilize DPH
2913	         SEI messages is assumed.  Note that DPH SEI messages MAY
2914	         still be included in the bitstream even when there is no
2915	         declaration of capability to use them, as in general SEI
2916	         messages do not affect the normative decoding process and
2917	         decoders are allowed to ignore SEI messages.

2919	      Encoding considerations:

2921	         This type is only defined for transfer via RTP (RFC 3550).

2923	      Security considerations:

2925	         See Section 9 of RFC XXXX.

2927	      Public specification:

2929	         Please refer to Section 13 of RFC XXXX.

2931	      Additional information: None

2933	      File extensions: none

2935	      Macintosh file type code: none

2937	      Object identifier or OID: none

2939	      Person & email address to contact for further information:

2941	         Ye-Kui Wang (yekuiw@qti.qualcomm.com).

2943	      Intended usage: COMMON

2945	      Author: See Section 14 of RFC XXXX.

2947	      Change controller:

2949	         IETF Audio/Video Transport Payloads working group delegated
2950	         from the IESG.

2952	7.2 SDP Parameters

2954	   The receiver MUST ignore any parameter unspecified in this memo.

2956	7.2.1 Mapping of Payload Type Parameters to SDP

2958	   The media type video/H265 string is mapped to fields in the
2959	   Session Description Protocol (SDP) [RFC4566] as follows:

2961	   o  The media name in the "m=" line of SDP MUST be video.

2963	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H265
2964	      (the media subtype).

2966	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2968	   o  The OPTIONAL parameters "profile-space", "profile-id", "tier-
2969	      flag", "level-id", "interop-constraints", "profile-
2970	      compatibility-indicator", "sprop-sub-layer-id", "recv-sub-
2971	      layer-id", "max-recv-level-id", "tx-mode", "max-lsr", "max-
2972	      lps", "max-cpb", "max-dpb", "max-br", "max-tr", "max-tc",
2973	      "max-fps", "sprop-max-don-diff", "sprop-depack-buf-nalus",
2974	      "sprop-depack-buf-bytes", "depack-buf-cap", "sprop-
2975	      segmentation-id", "sprop-spatial-segmentation-idc", "dec-
2976	      parallel-cap", and "include-dph", when present, MUST be
2977	      included in the "a=fmtp" line of SDP.  This parameter is
2978	      expressed as a media type string, in the form of a semicolon
2979	      separated list of parameter=value pairs.

2981	   o  The OPTIONAL parameters "sprop-vps", "sprop-sps", and "sprop-
2982	      pps", when present, MUST be included in the "a=fmtp" line of
2983	      SDP or conveyed using the "fmtp" source attribute as specified
2984	      in section 6.3 of [RFC5576].  For a particular media format
2985	      (i.e. RTP payload type), "sprop-vps" "sprop-sps", or "sprop-
2986	      pps" MUST NOT be both included in the "a=fmtp" line of SDP and
2987	      conveyed using the "fmtp" source attribute.  When included in
2988	      the "a=fmtp" line of SDP, these parameters are expressed as a
2989	      media type string, in the form of a semicolon separated list
2990	      of parameter=value pairs.  When conveyed in the "a=fmtp" line
2991	      of SDP for a particular payload type, the parameters "sprop-
2992	      vps", "sprop-sps", and "sprop-pps" MUST be applied to each
2993	      SSRC with the payload type.  When conveyed using the "fmtp"
2994	      source attribute, these parameters are only associated with
2995	      the given source and payload type as parts of the "fmtp"
2996	      source attribute.

2998	          Informative note: Conveyance of "sprop-vps", "sprop-sps",
2999	          and "sprop-pps" using the "fmtp" source attribute allows
3000	          for out-of-band transport of parameter sets in topologies
3001	          like Topo-Video-switch-MCU as specified in [RFC5117].

3003	   An example of media representation in SDP is as follows:

3005	         m=video 49170 RTP/AVP 98
3006	         a=rtpmap:98 H265/90000
3007	         a=fmtp:98 profile-id=1;
3008	                   sprop-vps=<video parameter sets data>

3010	7.2.2 Usage with SDP Offer/Answer Model

3012	   When HEVC is offered over RTP using SDP in an Offer/Answer model
3013	   [RFC3264] for negotiation for unicast usage, the following
3014	   limitations and rules apply:

3016	   o  The parameters identifying a media format configuration for
3017	      HEVC are profile-space, profile-id, tier-flag, level-id,
3018	      interop-constraints, profile-compatibility-indicator, and tx-
3019	      mode.  These media configuration parameters, except level-id,
3020	      MUST be used symmetrically when the answerer does not include
3021	      recv-sub-layer-id in the answer for the media format (payload
3022	      type) or the included recv-sub-layer-id is equal to sprop-sub-
3023	      layer-id in the offer.  The answerer MUST

3025	        1) maintain all configuration parameters with the values
3026	           remaining the same as in the offer for the media format
3027	           (payload type), with the exception that the value of
3028	           level-id is changeable as long as the highest level
3029	           indicated by the answer is not higher than that indicated
3030	           by the offer;

3032	        2) include in the answer the recv-sub-layer-id parameter,
3033	           with a value less than the sprop-sub-layer-id parameter
3034	           in the offer, for the media format (payload type), and
3035	           maintain all configuration parameters with the values
3036	           being the same as signalled in the sprop-vps for the
3037	           chosen sub-layer representation, with the exception that
3038	           the value of level-id is changeable as long as the
3039	           highest level indicated by the answer is not higher than
3040	           the level indicated by the sprop-vps in offer for the
3041	           chosen sub-layer representation; or

3043	        3) remove the media format (payload type) completely (when
3044	           one or more of the parameter values are not supported).

3046	          Informative note: The above requirement for symmetric use
3047	          does not apply for level-id, and does not apply for the
3048	          other bitstream or RTP stream properties and capability
3049	          parameters.

3051	   o  The profile-compatibility-indicator, when offered as sendonly,
3052	      describe bitstream properties.  The answerer MAY accept an RTP
3053	      payload type even if the decoder is not capable of handling
3054	      the profile indicated by the profile-space, profile-id, and
3055	      interop-constraints parameters, but capable of any of the
3056	      profiles indicated by the profile-space, profile-
3057	      compatibility-indicator, and interop-constraints.  However,
3058	      when the profile-compatibility-indicator is used in a recvonly
3059	      or sendrecv media description, the bitstream using this RTP
3060	      payload type is required to conform to all profiles indicated
3061	      by profile-space, profile-compatibility-indicator, and
3062	      interop-constraints.

3064	   o  To simplify handling and matching of these configurations, the
3065	      same RTP payload type number used in the offer SHOULD also be
3066	      used in the answer, as specified in [RFC3264].

3068	   o  The same RTP payload type number used in the offer MUST be
3069	      used in the answer when the answer includes recv-sub-layer-id.
3070	      When the answer does not include recv-sub-layer-id, the answer
3071	      MUST NOT contain a payload type number used in the offer
3072	      unless the configuration is exactly the same as in the offer
3073	      or the configuration in the answer only differs from that in
3074	      the offer with a different value of level-id.  The answer MAY
3075	      contain the recv-sub-layer-id parameter if an HEVC bitstream
3076	      contains multiple operation points (using temporal scalability
3077	      and sub-layers) and sprop-vps is included in the offer where
3078	      information of sub-layers are present in the first video
3079	      parameter set contained in sprop-vps.  If the sprop-vps is
3080	      provided in an offer, an answerer MAY select a particular
3081	      operation point indicated in the first video parameter set
3082	      contained in sprop-vps.  When the answer includes recv-sub-
3083	      layer-id that is less than sprop-sub-layer-id in the offer,
3084	      all video parameter sets contained in the sprop-vps parameter
3085	      in the SDP answer and all video parameter sets sent in-band
3086	      for either the offerer-to-answerer direction or the answerer-
3087	      to-offerer direction MUST be consistent with the first video
3088	      parameter set in the sprop-vps parameter of the offer (see the
3089	      semantics of sprop-vps in section 7.1 of this document on one
3090	      video parameter set being consistent with another video
3091	      parameter set), and the bitstream sent in either direction
3092	      MUST conform to the profile, tier, level, and constraints of
3093	      the chosen sub-layer representation as indicated by the first
3094	      profile_tier_level( ) syntax structure in the first video
3095	      parameter set in the sprop-vps parameter of the offer.

3097	          Informative note: When an offerer receives an answer that
3098	          does not include recv-sub-layer-id, it has to compare
3099	          payload types not declared in the offer based on the media
3100	          type (i.e. video/H265) and the above media configuration
3101	          parameters with any payload types it has already declared.
3102	          This will enable it to determine whether the configuration
3103	          in question is new or if it is equivalent to configuration
3104	          already offered, since a different payload type number may
3105	          be used in the answer.  The ability to perform operation
3106	          point selection enables a receiver to utilize the temporal
3107	          scalable nature of an HEVC bitstream.

3109	   o  The parameters sprop-max-don-diff, sprop-depack-buf-nalus, and
3110	      sprop-depack-buf-bytes describe the properties of an RTP
3111	      stream, and all RTP streams the RTP stream depends on, when
3112	      present, that the offerer or the answerer is sending for the
3113	      media format configuration.  This differs from the normal
3114	      usage of the Offer/Answer parameters: normally such parameters
3115	      declare the properties of the bitstream or RTP stream that the
3116	      offerer or the answerer is able to receive.  When dealing with
3117	      HEVC, the offerer assumes that the answerer will be able to
3118	      receive media encoded using the configuration being offered.

3120	          Informative note:  The above parameters apply for any RTP
3121	          stream and all RTP streams the RTP stream depends on, when
3122	          present, sent by a declaring entity with the same
3123	          configuration; i.e. they are dependent on their source
3124	          endpoint.  Rather than being bound to the payload type,
3125	          the values may have to be applied to another payload type
3126	          when being sent, as they apply for the configuration.

3128	   o  The capability parameters max-lsr, max-lps, max-cpb, max-dpb,
3129	      max-br, max-tr, and max-tc MAY be used to declare further
3130	      capabilities of the offerer or answerer for receiving.  These
3131	      parameters MUST NOT be present when the direction attribute is
3132	      "sendonly".

3134	   o  The capability parameter max-fps MAY be used to declare lower
3135	      capabilities of the offerer or answerer for receiving.  The
3136	      parameters MUST NOT be present when the direction attribute is
3137	      "sendonly".

3139	   o  The capability parameter dec-parallel-cap MAY be used to
3140	      declare additional decoding capabilities of the offerer or
3141	      answerer for receiving.  Upon receiving such a declaration of
3142	      a receiver, a sender MAY send a bitstream to the receiver
3143	      utilizing those capabilities under the assumption that the
3144	      bitstream fulfills the parallelism requirement.  A bitstream
3145	      that is sent based on choosing a capability point with
3146	      parallel tool type 'w' from dec-parallel-cap MUST have
3147	      entropy_coding_sync_enabled_flag equal to 1 and
3148	      min_spatial_segmentation_idc equal to or larger than dec-
3149	      parallel-cap.spatial-seg-idc of the capability point.  A
3150	      bitstream that is sent based on choosing a capability point
3151	      with parallel tool type 't' from dec-parallel-cap MUST have
3152	      entropy_coding_sync_enabled_flag equal to 0 and
3153	      min_spatial_segmentation_idc equal to or larger than dec-
3154	      parallel-cap.spatial-seg-idc of the capability point.

3156	   o  An offerer has to include the size of the de-packetization
3157	      buffer, sprop-depack-buf-bytes, as well as sprop-max-don-diff
3158	      and sprop-depack-buf-nalus, in the offer for an interleaved
3159	      HEVC bitstream or for the MSM transmission mode.  To enable
3160	      the offerer and answerer to inform each other about their
3161	      capabilities for de-packetization buffering in receiving RTP
3162	      streams, both parties are RECOMMENDED to include depack-buf-
3163	      cap.  For interleaved RTP streams or in MSM, it is also
3164	      RECOMMENDED to consider offering multiple payload types with
3165	      different buffering requirements when the capabilities of the
3166	      receiver are unknown.

3168	   o  The capability parameter include-dph MAY be used to declare
3169	      the capability to utilize decoded picture hash SEI messages
3170	      and which types of hashes in any HEVC RTP streams received by
3171	      the offerer or answerer.

3173	   o  The sprop-vps, sprop-sps, or sprop-pps, when present (included
3174	      in the "a=fmtp" line of SDP or conveyed using the "fmtp"
3175	      source attribute as specified in section 6.3 of [RFC5576]),
3176	      are used for out-of-band transport of the parameter sets (VPS,
3177	      SPS, or PPS respectively).

3179	   o  The answerer MAY use either out-of-band or in-band transport
3180	      of parameter sets for the bitstream it is sending, regardless
3181	      of whether out-of-band parameter sets transport has been used
3182	      in the offerer-to-answerer direction.  Parameter sets included
3183	      in an answer are independent of those parameter sets included
3184	      in the offer, as they are used for decoding two different
3185	      bitstreams, one from the answerer to the offerer and the other
3186	      in the opposite direction.  In case some RTP stream(s) are
3187	      sent before SDP offer/answer settles down, in-band parameter
3188	      sets MUST be used for those RTP stream parts sent before the
3189	      SDP offer/answer.

3191	   o  The following rules apply to transport of parameter set in the
3192	      offerer-to-answerer direction.

3194	       o An offer MAY include sprop-vps, sprop-sps, and/or sprop-
3195	          pps.  If none of these parameters is present in the offer,
3196	          then only in-band transport of parameter sets is used.

3198	       o If the level to use in the offerer-to-answerer direction
3199	          is equal to the default level in the offer, the answerer
3200	          MUST be prepared to use the parameter sets included in
3201	          sprop-vps, sprop-sps, and sprop-pps (either included in
3202	          the "a=fmtp" line of SDP or conveyed using the "fmtp"
3203	          source attribute) for decoding the incoming bitstream,
3204	          e.g. by passing these parameter set NAL units to the video
3205	          decoder before passing any NAL units carried in the RTP
3206	          streams.  Otherwise, the answerer MUST ignore sprop-vps,
3207	          sprop-sps, and sprop-pps (either included in the "a=fmtp"
3208	          line of SDP or conveyed using the "fmtp" source attribute)
3209	          and the offerer MUST transmit parameter sets in-band.

3211	       o In MSM, the answerer MUST be prepared to use the parameter
3212	          sets out-of-band transmitted for the RTP stream and all
3213	          RTP streams the RTP stream depends on, when present, for
3214	          decoding the incoming bitstream, e.g. by passing these
3215	          parameter set NAL units to the video decoder before
3216	          passing any NAL units carried in the RTP streams.

3218	   o  The following rules apply to transport of parameter set in the
3219	      answerer-to-offerer direction.

3221	       o An answer MAY include sprop-vps, sprop-sps, and/or sprop-
3222	          pps.  If none of these parameters is present in the
3223	          answer, then only in-band transport of parameter sets is
3224	          used.

3226	       o The offerer MUST be prepared to use the parameter sets
3227	          included in sprop-vps, sprop-sps, and sprop-pps (either
3228	          included in the "a=fmtp" line of SDP or conveyed using the
3229	          "fmtp" source attribute) for decoding the incoming
3230	          bitstream, e.g. by passing these parameter set NAL units
3231	          to the video decoder before passing any NAL units carried
3232	          in the RTP streams.

3234	       o In MSM, the offerer MUST be prepared to use the parameter
3235	          sets out-of-band transmitted for the RTP stream and all
3236	          RTP streams the RTP stream depends on, when present, for
3237	          decoding the incoming bitstream, e.g. by passing these
3238	          parameter set NAL units to the video decoder before
3239	          passing any NAL units carried in the RTP streams.

3241	   o  When sprop-vps, sprop-sps, and/or sprop-pps are conveyed using
3242	      the "fmtp" source attribute as specified in section 6.3 of
3243	      [RFC5576], the receiver of the parameters MUST store the
3244	      parameter sets included in sprop-vps, sprop-sps, and/or sprop-
3245	      pps and associate them with the source given as part of the
3246	      "fmtp" source attribute.  Parameter sets associated with one
3247	      source (given as part of the "fmtp" source attribute) MUST
3248	      only be used to decode NAL units conveyed in RTP packets from
3249	      the same source (given as part of the "fmtp" source
3250	      attribute).  When this mechanism is in use, SSRC collision
3251	      detection and resolution MUST be performed as specified in
3252	      [RFC5576].

3254	   For bitstreams being delivered over multicast, the following
3255	   rules apply:

3257	   o  The media format configuration is identified by profile-space,
3258	      profile-id, tier-flag, level-id, interop-constraints, profile-
3259	      compatibility-indicator, and tx-mode.  These media format
3260	      configuration parameters, including level-id, MUST be used
3261	      symmetrically; that is, the answerer MUST either maintain all
3262	      configuration parameters or remove the media format (payload
3263	      type) completely.  Note that this implies that the level-id
3264	      for Offer/Answer in multicast is not changeable.

3266	   o  To simplify the handling and matching of these configurations,
3267	      the same RTP payload type number used in the offer SHOULD also
3268	      be used in the answer, as specified in [RFC3264].  An answer
3269	      MUST NOT contain a payload type number used in the offer
3270	      unless the configuration is the same as in the offer.

3272	   o  Parameter sets received MUST be associated with the
3273	      originating source and MUST only be used in decoding the
3274	      incoming bitstream from the same source.

3276	   o  The rules for other parameters are the same as above for
3277	      unicast as long as the three above rules are obeyed.

3279	   Table 1 lists the interpretation of all the parameters that MUST
3280	   be used for the various combinations of offer, answer, and
3281	   direction attributes.  Note that the two columns wherein the
3282	   recv-sub-layer-id parameter is used only apply to answers,
3283	   whereas the other columns apply to both offers and answers.

3285	   Table 1.  Interpretation of parameters for various combinations
3286	   of offers, answers, direction attributes, with and without recv-
3287	   sub-layer-id.  Columns that do not indicate offer or answer apply
3288	   to both.

3290	                                          sendonly --+
3291	            answer: recvonly, recv-sub-layer-id --+  |
3292	              recvonly w/o recv-sub-layer-id --+  |  |
3293	      answer: sendrecv, recv-sub-layer-id --+  |  |  |
3294	        sendrecv w/o recv-sub-layer-id --+  |  |  |  |
3295	                                         |  |  |  |  |
3296	      profile-space                      C  D  C  D  P
3297	      profile-id                         C  D  C  D  P
3298	      tier-flag                          C  D  C  D  P
3299	      level-id                           D  D  D  D  P
3300	      interop-constraints                C  D  C  D  P
3301	      profile-compatibility-indicator    C  D  C  D  P
3302	      tx-mode                            C  C  C  C  P
3303	      max-recv-level-id                  R  R  R  R  -
3304	      sprop-max-don-diff                 P  P  -  -  P
3305	      sprop- depack-buf-nalus            P  P  -  -  P
3306	      sprop-depack-buf-bytes             P  P  -  -  P
3307	      depack-buf-cap                     R  R  R  R  -
3308	      sprop-segmentation-id              P  P  P  P  P
3309	      sprop-spatial-segmentation-idc     P  P  P  P  P
3310	      max-br                             R  R  R  R  -
3311	      max-cpb                            R  R  R  R  -
3312	      max-dpb                            R  R  R  R  -
3313	      max-lsr                            R  R  R  R  -
3314	      max-lps                            R  R  R  R  -
3315	      max-tr                             R  R  R  R  -
3316	      max-tc                             R  R  R  R  -
3317	      max-fps                            R  R  R  R  -
3318	      sprop-vps                          P  P  -  -  P
3319	      sprop-sps                          P  P  -  -  P
3320	      sprop-pps                          P  P  -  -  P
3321	      sprop-sub-layer-id                 P  P  -  -  P
3322	      recv-sub-layer-id                  X  O  X  O  -
3323	      dec-parallel-cap                   R  R  R  R  -
3324	      include-dph                        R  R  R  R  -

3326	     Legend:

3328	      C: configuration for sending and receiving bitstreams
3329	      D: changable configuration, same as C except possible
3330	         to answer with a different but consistent value (see the
3331	         semantics of the six parameters related to profile, tier,
3332	         and level on these parameters being consistent)
3333	      P: properties of the bitstream to be sent
3334	      R: receiver capabilities
3335	      O: operation point selection
3336	      X: MUST NOT be present
3337	      -: not usable, when present SHOULD be ignored

3339	   Parameters used for declaring receiver capabilities are in
3340	   general downgradable; i.e. they express the upper limit for a
3341	   sender's possible behavior.  Thus, a sender MAY select to set its
3342	   encoder using only lower/lesser or equal values of these
3343	   parameters.

3345	   When the answer does not include recv-sub-layer-id that is less
3346	   than the sprop-sub-layer-id in the offer, parameters declaring a
3347	   configuration point are not changeable, with the exception of the
3348	   level-id parameter for unicast usage, and these parameters
3349	   express values a receiver expects to be used and MUST be used
3350	   verbatim in the answer as in the offer.

3352	   When a sender's capabilities are declared with the configuration
3353	   parameters, these parameters express a configuration that is
3354	   acceptable for the sender to receive bitstreams.  In order to
3355	   achieve high interoperability levels, it is often advisable to
3356	   offer multiple alternative configurations.  It is impossible to
3357	   offer multiple configurations in a single payload type.  Thus,
3358	   when multiple configuration offers are made, each offer requires
3359	   its own RTP payload type associated with the offer.  However, it
3360	   is possible to offer multiple operation points using one
3361	   configuration in a single payload type by including sprop-vps in
3362	   the offer and recv-sub-layer-id in the answer.

3364	   A receiver SHOULD understand all media type parameters, even if
3365	   it only supports a subset of the payload format's functionality.
3366	   This ensures that a receiver is capable of understanding when an
3367	   offer to receive media can be downgraded to what is supported by
3368	   the receiver of the offer.

3370	   An answerer MAY extend the offer with additional media format
3371	   configurations.  However, to enable their usage, in most cases a
3372	   second offer is required from the offerer to provide the
3373	   bitstream property parameters that the media sender will use.

3375	   This also has the effect that the offerer has to be able to
3376	   receive this media format configuration, not only to send it.

3378	7.2.3 Usage in Declarative Session Descriptions

3380	   When HEVC over RTP is offered with SDP in a declarative style, as
3381	   in Real Time Streaming Protocol (RTSP) [RFC2326] or Session
3382	   Announcement Protocol (SAP) [RFC2974], the following
3383	   considerations are necessary.

3385	   o  All parameters capable of indicating both bitstream properties
3386	      and receiver capabilities are used to indicate only bitstream
3387	      properties.  For example, in this case, the parameter profile-
3388	      tier-level-id declares the values used by the bitstream, not
3389	      the capabilities for receiving bitstreams.  This results in
3390	      that the following interpretation of the parameters MUST be
3391	      used:

3393	      o Declaring actual configuration or bitstream properties:
3394	         - profile-space
3395	         - profile-id
3396	         - tier-flag
3397	         - level-id
3398	         - interop-constraints
3399	         - profile-compatibility-indicator
3400	         - tx-mode
3401	         - sprop-vps
3402	         - sprop-sps
3403	         - sprop-pps
3404	         - sprop-max-don-diff
3405	         - sprop-depack-buf-nalus
3406	         - sprop-depack-buf-bytes
3407	         - sprop-segmentation-id
3408	         - sprop-spatial-segmentation-idc

3410	      o Not usable (when present, they SHOULD be ignored):
3411	         - max-lps
3412	         - max-lsr
3413	         - max-cpb
3414	         - max-dpb
3415	         - max-br
3416	         - max-tr
3417	         - max-tc
3418	         - max-fps
3419	         - max-recv-level-id
3420	         - depack-buf-cap
3421	         - sprop-sub-layer-id
3422	         - dec-parallel-cap
3423	         - include-dph

3425	   o  A receiver of the SDP is required to support all parameters
3426	      and values of the parameters provided; otherwise, the receiver
3427	      MUST reject (RTSP) or not participate in (SAP) the session.
3428	      It falls on the creator of the session to use values that are
3429	      expected to be supported by the receiving application.

3431	7.2.4 Parameter Sets Considerations

3433	   When out-of-band transport of parameter sets is used, parameter
3434	   sets MAY still be additionally transported in-band unless
3435	   explicitly disallowed by an application, and some of these
3436	   additionally in-band transported parameter sets may update some
3437	   of the out-of-band transported parameter sets.  Update of a
3438	   parameter set refers to sending of a parameter set of the same
3439	   type using the same parameter set ID but with different values
3440	   for at least one other parameter of the parameter set.

3442	   If MSM is used, the rules on signaling media decoding dependency
3443	   in SDP as defined in [RFC5583] apply.  The rules on "hierarchical
3444	   or layered encoding" with multicast in Section 5.7 of [RFC4566]
3445	   do not apply, i.e. the notation for Connection Data "c=" SHALL
3446	   NOT be used with more than one address.  The order of session
3447	   dependency is given from the RTP stream containing the lowest
3448	   temporal sub-layer to the RTP stream containing the highest
3449	   temporal sub-layer.

3451	7.2.5 Dependency Signaling in Multi-Stream Mode

3453	   If MSM is used, the rules on signaling media decoding dependency
3454	   in SDP as defined in [RFC5583] apply.  The rules on "hierarchical
3455	   or layered encoding" with multicast in Section 5.7 of [RFC4566]
3456	   do not apply, i.e. the notation for Connection Data "c=" SHALL
3457	   NOT be used with more than one address.  The order of session
3458	   dependency is given from the RTP stream containing the lowest
3459	   temporal sub-layer to the RTP stream containing the highest
3460	   temporal sub-layer.

3462	8 Use with Feedback Messages

3464	   As specified in section 6.1 of RFC 4585 [RFC4585], payload
3465	   Specific Feedback messages are identified by the RTCP packet type
3466	   value PSFB (206).  AVPF [RFC4585] defines three payload-specific
3467	   feedback messages and one application layer feedback message, and
3468	   CCM [RFC5104] specifies four payload-specific feedback messages.

3470	   These feedback messages are identified by means of the feedback
3471	   message type (FMT) parameter as follows:

3473	   Assigned in [RFC4585]:

3475	      1:     Picture Loss Indication (PLI)
3476	      2:     Slice Lost Indication (SLI)
3477	      3:     Reference Picture Selection Indication (RPSI)
3478	      15:    Application layer FB message
3479	      31:    reserved for future expansion of the number space

3481	   Assigned in [RFC5104]:

3483	      4:     Full Intra Request (FIR) Command
3484	      5:     Temporal-Spatial Trade-off Request (TSTR)
3485	      6:     Temporal-Spatial Trade-off Notification (TSTN)
3486	      7:     Video Back Channel Message (VBCM)

3488	   Unassigned:

3490	      0:      unassigned
3491	      8-14:   unassigned
3492	      16-30:  unassigned

3494	   The following subsections define the use of the PLI, SLI, RPSI,
3495	   and FIR feedback messages with HEVC.

3497	8.1 Picture Loss Indication (PLI)

3499	   As specified in RFC 4585 section 6.3.1, the reception of a
3500	   picture loss indication by a media sender indicates "the loss of
3501	   an undefined amount of coded video data belonging to one or more
3502	   pictures."  Without having any specific knowledge of the setup of
3503	   the bitstream (such as: use and location of in-band parameter
3504	   sets, non-IDR decoder refresh points, picture structures, and so
3505	   forth) a reaction to the reception of an PLI by an HEVC sender
3506	   SHOULD be to send an IDR picture and relevant parameter sets;
3507	   potentially with sufficient redundancy so to ensure correct
3508	   reception.  However, sometimes information about the bitstream
3509	   structure is known.  For example, state could have been
3510	   established outside of the mechanisms defined in this document
3511	   that parameter sets are conveyed out of band only, and stay
3512	   static for the duration of the session.  In that case, it is
3513	   obviously unnecessary to send them in-band as a result of the
3514	   reception of a PLI.  Other examples could be devised based on a
3515	   priori knowledge of different aspects of the bitstream structure.
3516	   In all cases, the timing and congestion control mechanisms of RFC
3517	   4585 MUST be observed.

3519	8.2 Slice Loss Indication

3521	   RFC 4585's Slice Loss Indication can be used to indicate, to a
3522	   sender, the loss of a number of Coded Tree Blocks (CTBs) in CTB
3523	   raster scan order of a picture.  In the SLI's Feedback Control
3524	   Indication (FCI) field, the subfield "First" MUST be set to the
3525	   CTB address of the first lost CTB.  Note that the CTB address is
3526	   in CTB raster scan order of a picture.  For the first CTB of a
3527	   slice segment, the CTB address is the value of
3528	   slice_segment_address when present; or 0 when the value of
3529	   first_slice_segement_in_pic_flag is equal to 1; both syntax
3530	   elements are in the slice segment header.  The subfield "Number"
3531	   MUST be set to the number of consecutive lost CTBs, again in CTB
3532	   raster scan order of a picture.  Note that due to both the
3533	   "First" and "Number" are counted in CTBs in CTB raster scan
3534	   order, of a picture, not in tile scan order (which is the
3535	   bitstream order of CTBs), multiple SLI messages may be needed to
3536	   report the loss of one tile covering multiple CTB rows but less
3537	   wide than the picture.

3539	   The subfield "PictureID" MUST be set to the 6 least significant
3540	   bits of a binary representation of the value of PicOrderCntVal,
3541	   as defined in [HEVC], of the picture for which the lost CTBs are
3542	   indicated.  Note that for IDR pictures the syntax element
3543	   slice_pic_order_cnt_lsb is not present, but then the value is
3544	   inferred to be equal to 0.

3546	   As described in RFC 4585, an encoder in a media sender can use
3547	   this information to "clean up" the corrupted picture by sending
3548	   intra information, while observing the constraints described in
3549	   RFC4585, for example with respect to congestion control.  In many
3550	   cases, error tracking is required to identify the corrupted
3551	   region in the receiver's state (reference pictures) because of
3552	   error import in uncorrupted regions of the picture through motion
3553	   compensation.  Reference picture selection can also be used to
3554	   "clean up" the corrupted picture, which is usually more efficient
3555	   and less likely to generate congestion than sending intra
3556	   information.

3558	   In contrast to the video codecs contemplated in RFC 4585 and RFC
3559	   5104 [RFC5104], in HEVC, the "macroblock size" is not fixed to
3560	   16x16 luma samples, but variable.  That, however, does not create
3561	   a conceptual difficulty with SLI, because the setting of the CTB
3562	   size is a sequence-level functionality, and using a slice loss
3563	   indication across coded video sequence boundaries is meaningless
3564	   as there is no prediction across sequence boundaries.  However, a
3565	   proper use of SLI messages is not as straightforward as it was
3566	   with older, fixed-macroblock-sized video codecs, as the state of
3567	   the sequence parameter set (where the CTB size is located) has to
3568	   be taken into account when interpreting the "First" subfield in
3569	   the FCI.

3571	8.3 Use of HEVC with the RPSI Feedback Message

3573	   Feedback based reference picture selection has been shown as a
3574	   powerful tool to stop temporal error propagation for improved
3575	   error resilience [Girod99][Wang05].  In one approach, the decoder
3576	   side tracks errors in the decoded pictures and informs to the
3577	   encoder side that a particular picture that has been decoded
3578	   relatively earlier is correct and still present in the decoded
3579	   picture buffer and requests the encoder to use that correct
3580	   picture for reference when encoding the next picture, so to stop
3581	   further temporal error propagation.  For this approach, the
3582	   decoder side should use the RPSI feedback message.

3584	   Encoders can encode some long-term reference pictures as
3585	   specified in H.264 or HEVC for purposes described in the previous
3586	   paragraph without the need of a huge decoded picture buffer.  As
3587	   shown in [Wang05], with a flexible reference picture management
3588	   scheme as in H.264 and HEVC, even a decoded picture buffer size
3589	   of two would work for the approach described in the previous
3590	   paragraph.

3592	   The field "Native RPSI bit string defined per codec" is a base16
3593	   [RFC4648] representation of the 8 bits consisting of 2 most
3594	   significant bits equal to 0 and 6 bits of nuh_layer_id, as
3595	   defined in [HEVC], followed by the 32 bits representing the value
3596	   of the PicOrderCntVal (in network byte order), as defined in
3597	   [HEVC], for the picture that is requested to be used for
3598	   reference when encoding the next picture.

3600	   The use of the RPSI feedback message as positive acknowledgement
3601	   with HEVC is deprecated.  In other words, the RPSI feedback
3602	   message MUST only be used as a reference picture selection
3603	   request, such that it can also be used in multicast.

3605	8.4 Full Intra Request (FIR)

3607	   The purpose of the FIR message is to force an encoder to send an
3608	   independent decoder refresh point as soon as possible (observing,
3609	   for example, the congestion control related constraints set out
3610	   in RFC 5104).

3612	   Upon reception of a FIR, a sender MUST send an IDR picture.
3613	   Parameter sets MUST also be sent, except when there is a priori
3614	   knowledge that the parameter sets have been correctly
3615	   established.  A typical example for that is an understanding
3616	   between sender and receiver, established by means outside this
3617	   document, that parameter sets are exclusively sent out of band.

3619	9 Security Considerations

3621	   RTP packets using the payload format defined in this
3622	   specification are subject to the security considerations
3623	   discussed in the RTP specification [RFC3550], and in any
3624	   applicable RTP profile such as RTP/AVP [RFC3551], RTP/AVPF
3625	   [RFC4585], RTP/SAVP [RFC3711], or RTP/SAVPF [RFC5124].  However,
3626	   as RFC 7202 [RFC7202] discusses it is not an RTP payload format's
3627	   responsibility to discuss or mandate what solutions are used to
3628	   meet the basic security goals like confidentiality, integrity,
3629	   and source authenticity for RTP in general.  This responsibility
3630	   lays on anyone using RTP in an application.  They can find
3631	   guidance on available security mechanisms and important
3632	   considerations as discussed in RFC 7201 [RFC7201].

3634	   The rest of this section discusses the security impacting
3635	   properties of the payload format itself.

3637	   Because the data compression used with this payload format is
3638	   applied end-to-end, any encryption needs to be performed after
3639	   compression.  A potential denial-of-service threat exists for
3640	   data encodings using compression techniques that have non-uniform
3641	   receiver-end computational load.  The attacker can inject
3642	   pathological datagrams into the bitstream that are complex to
3643	   decode and that cause the receiver to be overloaded.  H.265 is
3644	   particularly vulnerable to such attacks, as it is extremely
3645	   simple to generate datagrams containing NAL units that affect the
3646	   decoding process of many future NAL units.  Therefore, the usage
3647	   of data origin authentication and data integrity protection of at
3648	   least the RTP packet is RECOMMENDED, for example, with SRTP
3649	   [RFC3711].

3651	   Note that the appropriate mechanism to ensure confidentiality and
3652	   integrity of RTP packets and their payloads is very dependent on
3653	   the application and on the transport and signaling protocols
3654	   employed.  Thus, although SRTP is given as an example above,
3655	   other possible choices exist.

3657	   Decoders MUST exercise caution with respect to the handling of
3658	   user data SEI messages, particularly if they contain active
3659	   elements, and MUST restrict their domain of applicability to the
3660	   presentation containing the bitstream.

3662	   End-to-end security with authentication, integrity, or
3663	   confidentiality protection will prevent a MANE from performing
3664	   media-aware operations other than discarding complete packets.
3665	   In the case of confidentiality protection, it will even be
3666	   prevented from discarding packets in a media-aware way.  To be
3667	   allowed to perform such operations, a MANE is required to be a
3668	   trusted entity that is included in the security context
3669	   establishment.

3671	10 Congestion Control

3673	   Congestion control for RTP SHALL be used in accordance with RTP
3674	   [RFC3550] and with any applicable RTP profile, e.g. AVP
3675	   [RFC3551].  If best-effort service is being used, an additional
3676	   requirement is that users of this payload format MUST monitor
3677	   packet loss to ensure that the packet loss rate is within an
3678	   acceptable range.  Packet loss is considered acceptable if a TCP
3679	   flow across the same network path, and experiencing the same
3680	   network conditions, would achieve an average throughput, measured
3681	   on a reasonable timescale, that is not less than all RTP streams
3682	   combined is achieving.  This condition can be satisfied by
3683	   implementing congestion control mechanisms to adapt the
3684	   transmission rate, the number of layers subscribed for a layered
3685	   multicast session, or by arranging for a receiver to leave the
3686	   session if the loss rate is unacceptably high.

3688	   The bitrate adaptation necessary for obeying the congestion
3689	   control principle is easily achievable when real-time encoding is
3690	   used, for example by adequately tuning the quantization
3691	   parameter.

3693	   However, when pre-encoded content is being transmitted, bandwidth
3694	   adaptation requires the pre-coded bitstream to be tailored for
3695	   such adaptivity.  The key mechanism available in HEVC is temporal
3696	   scalability.  A media sender can remove NAL units belonging to
3697	   higher temporal sub-layers (i.e. those NAL units with a high
3698	   value of TID) until the sending bitrate drops to an acceptable
3699	   range.  HEVC contains mechanisms that allow the lightweight
3700	   identification of switching points in temporal enhancement
3701	   layers, as discussed in Section 1.1.2 of this memo.  An HEVC
3702	   media sender can send packets belonging to NAL units of temporal
3703	   enhancement layers starting from these switching points to probe
3704	   for available bandwidth and to utilized bandwidth that has been
3705	   shown to be available.

3707	   Above mechanisms generally work within a defined profile and
3708	   level and, therefore, no renegotiation of the channel is
3709	   required.  Only when non-downgradable parameters (such as
3710	   profile) are required to be changed does it become necessary to
3711	   terminate and restart the RTP stream(s).  This may be
3712	   accomplished by using different RTP payload types.

3714	   MANEs MAY remove certain unusable packets from the RTP stream
3715	   when that RTP stream was damaged due to previous packet losses.
3716	   This can help reduce the network load in certain special cases.
3717	   For example, MANES can remove those FUs where the leading FUs
3718	   belonging to the same NAL unit have been lost or those dependent
3719	   slice segments when the leading slice segments belonging to the
3720	   same slice have been lost, because the trailing FUs or dependent
3721	   slice segments are meaningless to most decoders.  MANES can also
3722	   remove higher temporal scalable layers if the outbound
3723	   transmission (from the MANE's viewpoint) experiences congestion.

3725	11 IANA Consideration

3727	   A new media type, as specified in Section 7.1 of this memo,
3728	   should be registered with IANA.

3730	12 Acknowledgements

3732	   Muhammed Coban and Marta Karczewicz are thanked for discussions
3733	   on the specification of the use with feedback messages and other
3734	   aspects in this memo.  Jonathan Lennox and Jill Boyce are thanked
3735	   for their contributions to the PACI design included in this memo.
3736	   Rickard Sjoberg, Arild Fuldseth, Bo Burman, Magnus Westerlund,
3737	   and Tom Kristensen are thanked for their contributions to
3738	   parallel processing related signalling.  Magnus Westerlund,
3739	   Jonathan Lennox, Bernard Aboba, Jonatan Samuelsson, Roni Even,
3740	   Rickard Sjoberg, Sachin Deshpande, Woo Johnman, Mo Zanaty, Ross
3741	   Finlayson, and Danny Hong made valuable reviewing comments that
3742	   led to improvements.

3744	   This document was prepared using 2-Word-v2.0.template.dot.

3746	13 References

3748	13.1 Normative References

3750	   [HEVC]    ITU-T Recommendation H.265, "High efficiency video
3751	             coding", April 2013.

3753	   [H.264]   ITU-T Recommendation H.264, "Advanced video coding for
3754	             generic audiovisual services", April 2013.

3756	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
3757	             Requirement Levels", BCP 14, RFC 2119, March 1997.

3759	   [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
3760	             Model with Session Description Protocol (SDP)", RFC
3761	             3264, June 2002.

3763	   [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and
3764	             Jacobson, V., "RTP: A Transport Protocol for Real-Time
3765	             Applications", STD 64, RFC 3550, July 2003.

3767	   [RFC3551] Schulzrinne, H. and Casner, S., "RTP Profile for Audio
3768	             and Video Conferences with Minimal Control", STD 65,
3769	             RFC 3551, July 2003.

3771	   [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and
3772	             Norrman, K., "The Secure Real-time Transport Protocol
3773	             (SRTP)", RFC 3711, March 2004.

3775	   [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP:
3776	             Session Description Protocol", RFC 4566, July 2006.

3778	   [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and Rey,
3779	             J., "Extended RTP Profile for Real-time Transport
3780	             Control Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC
3781	             4585, July 2006.

3783	   [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data
3784	             Encodings", RFC 4648, October 2006.

3786	   [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and Burman,
3787	             B., "Codec Control Messages in the RTP Audio-Visual
3788	             Profile with Feedback (AVPF)", RFC 5104, February 2008.

3790	   [RFC5124] Ott, J. and Carrara, E., "Extended Secure RTP Profile
3791	             for Real-time Transport Control Protocol (RTCP)-Based
3792	             Feedback (RTP/SAVPF)", RFC 5124, February 2008.

3794	   [RFC5234] Crocker, D. and Overell, P., "Augmented BNF for Syntax
3795	             Specifications: ABNF", RFC 5234, January 2008.

3797	   [RFC5576] Lennox, J., Ott, J., and Schierl, T., "Source-Specific
3798	             Media Attributes in the Session Description Protocol",
3799	             RFC 5576, June 2009.

3801	   [RFC5583] Schierl, T. and Wenger, S., "Signaling Media Decoding
3802	             Dependency in the Session Description Protocol (SDP)",
3803	             RFC 5583, July 2009.

3805	   [RFC6184] Wang, Y.-K., Even, R., Kristensen, T., and R. Jesup,
3806	             "RTP Payload Format for H.264 Video", RFC 6184, May
3807	             2011.

3809	   [RFC6190] Wenger, S., Wang, Y.-K., Schierl, T., and A.
3810	             Eleftheriadis, "RTP Payload Format for Scalable Video
3811	             Coding", RFC 6190, May 2011.

3813	13.2 Informative References

3815	   [3GPDASH] 3GPP TS 26.247, "Transparent end-to-end Packet-switched
3816	             Streaming Service (PSS); Progressive Download and
3817	             Dynamic Adaptive Streaming over HTTP (3GP-DASH)",
3818	             v12.1.0, December 2013.

3820	   [3GPPFF]  3GPP TS 26.244, "Transparent end-to-end packet switched
3821	             streaming service (PSS); 3GPP file format (3GP)",
3822	             v12.20, December 2013.

3824	   [Girod99] Girod, B. and Faerber, F., "Feedback-based error
3825	             control for mobile video transmission", Proceedings
3826	             IEEE, Vol. 87, No. 10, pp. 1707-1723, October 1999.

3828	   [HEVC draft v2]
3829	             Draft version 2 of HEVC, "High Efficiency Video Coding
3830	             (HEVC) Range Extensions text specification: Draft 7",
3831	             JCT-VC document JCTVC-Q1005, 17th JCT-VC meeting, 27
3832	             March - 4 April 2014, Valencia, Spain.

3834	   [I-D.ietf-avtcore-rtp-multi-stream]
3835	             Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
3836	             "Sending Multiple Media Streams in a Single RTP
3837	             Session", draft-ietf-avtcore-rtp-multi-stream-05 (work
3838	             in progress), July 2014.

3840	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
3841	             Holmberg, C., Alvestrand, H., and C. Jennings,
3842	             "Multiplexing Negotiation Using Session Description
3843	             Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
3844	             bundle-negotiation-07 (work in progress), April 2014.

3846	   [I-D.ietf-avtext-rtp-grouping-taxonomy]
3847	             Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G.,
3848	             and Burman, B. "A Taxonomy of Grouping Semantics and
3849	             Mechanisms for Real-Time Transport", draft-ietf-avtext-
3850	             rtp-grouping-taxonomy-02 (work in progress), June 2014.

3852	   [ISOBMFF] IS0/IEC 14496-12 | 15444-12: "Information technology -
3853	             Coding of audio-visual objects - Part 12: ISO base
3854	             media file format" | "Information technology - JPEG
3855	             2000 image coding system - Part 12: ISO base media file
3856	             format", 2012.

3858	   [JCTVC-J0107]
3859	             Wang, Y.-K., Chen, Y., Joshi, R., and Ramasubramonian,
3860	             K., "AHG9: On RAP pictures", JCT-VC document JCTVC-
3861	             L0107, 10th JCT-VC meeting, July 2012, Stockholm,
3862	             Sweden.

3864	   [MPEG2S]  ISO/IEC 13818-1, "Information technology - Generic
3865	             coding of moving pictures and associated audio
3866	             information: Systems", 2013.

3868	   [MPEGDASH] ISO/IEC 23009-1, "Information technology - Dynamic
3869	             adaptive streaming over HTTP (DASH) - Part 1: Media
3870	             presentation description and segment formats", 2012.

3872	   [RFC2326] Schulzrinne, H., Rao, A., and Lanphier R., "Real Time
3873	             Streaming Protocol (RTSP)", RFC 2326, April 1998.

3875	   [RFC2974] Handley, M., Perkins C., and Whelan E., "Session
3876	             Announcement Protocol", RFC 2974, October 2000.

3878	   [RFC5117] Westerlund, M. and Wenger, S., "RTP Topologies", RFC
3879	             5117, January 2008.

3881	   [RFC7201] Westerlund, M. and Perkins, C., "Options for Securing
3882	             RTP Sessions", RFC 7201, April 2014.

3884	   [RFC7202] Perkins, C. and Westerlund, M., "Securing the RTP
3885	             Framework: Why RTP Does Not Mandate a Single Media
3886	             Security Solution", RFC 7202, April 2014.

3888	   [Wang05]  Wang, Y.-K., Zhu, C., and Li, H., "Error resilient
3889	             video coding using flexible reference fames", Visual
3890	             Communications and Image Processing 2005 (VCIP 2005),
3891	             July 2005, Beijing, China.

3893	14 Authors' Addresses

3895	   Ye-Kui Wang
3896	   Qualcomm Incorporated
3897	   5775 Morehouse Drive
3898	   San Diego, CA 92121, USA
3899	   Phone: +1-858-651-8345
3900	   EMail: yekuiw@qti.qualcomm.com

3902	   Yago Sanchez
3903	   Fraunhofer HHI
3904	   Einsteinufer 37
3905	   D-10587 Berlin, Germany
3906	   Phone: +49-30-31002-227
3907	   Email: yago.sanchez@hhi.fraunhofer.de

3909	   Thomas Schierl
3910	   Fraunhofer HHI
3911	   Einsteinufer 37
3912	   D-10587 Berlin, Germany
3913	   Phone: +49-30-31002-227
3914	   Email: ts@thomas-schierl.de

3916	   Stephan Wenger
3917	   Vidyo, Inc.
3918	   433 Hackensack Ave., 7th floor
3919	   Hackensack, N.J. 07601, USA
3920	   Phone: +1-415-713-5473
3921	   EMail: stewe@stewe.org

3923	   Miska M. Hannuksela
3924	   Nokia Corporation
3925	   P.O. Box 1000
3926	   33721 Tampere, Finland
3927	   Phone: +358-7180-08000
3928	   EMail: miska.hannuksela@nokia.com