idnits 2.17.1 

draft-ietf-avt-rtp-rfc3984bis-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 25 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  -- The abstract seems to indicate that this document obsoletes RFC3984, but
     the header doesn't have an 'Obsoletes:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
     the answerer MAY use either out-of-band or in-band transport of parameter
     sets for the stream it is sending, regardless of whether out-of-band
     parameter sets transport has been used in the offerer-to-answerer
     direction.  When the offer includes "in-band-parameter-sets" equal to 1,
     the answerer MUST not include "sprop-parameter-sets" and MUST transmit
     parameter sets in-band.  All parameter sets included in the
     "sprop-parameter-sets", when present, for the accepted payload type in an
     answer MUST be associated with the accepted level, as indicated by the
     profile-level-id in the answer for the accepted payload type.

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 22, 2009) is 5483 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '4' is defined on line 4120, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Obsolete normative reference: RFC 4566 (ref. '6') (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648)

  -- Obsolete informational reference (is this intentional?): RFC 2429 (ref.
     '11') (Obsoleted by RFC 4629)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '27') (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117 (ref.
     '29') (Obsoleted by RFC 7667)


     Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport WG                                     Y.-K. Wang
2	Internet Draft                                      Huawei Technologies
3	Intended status: Standards track                                R. Even
4	Expires: October 2009                                     Self-employed
5	                                                          T. Kristensen
6	                                                               Tandberg
7	                                                         April 22, 2009

9	                    RTP Payload Format for H.264 Video
10	                   draft-ietf-avt-rtp-rfc3984bis-05.txt

12	Status of this Memo

14	   This Internet-Draft is submitted to IETF in full conformance with the
15	   provisions of BCP 78 and BCP 79.  This document may contain material
16	   from IETF Documents or IETF Contributions published or made publicly
17	   available before November 10, 2008.  The person(s) controlling the
18	   copyright in some of this material may not have granted the IETF
19	   Trust the right to allow modifications of such material outside the
20	   IETF Standards Process.  Without obtaining an adequate license from
21	   the person(s) controlling the copyright in such materials, this
22	   document may not be modified outside the IETF Standards Process, and
23	   derivative works of it may not be created outside the IETF Standards
24	   Process, except to format it for publication as an RFC or to
25	   translate it into languages other than English.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF), its areas, and its working groups.  Note that
29	   other groups may also distribute working documents as Internet-Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on October 22, 2009.

44	Copyright Notice

46	   Copyright (c) 2009 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents in effect on the date of
51	   publication of this document (http://trustee.ietf.org/license-info).
52	   Please review these documents carefully, as they describe your rights
53	   and restrictions with respect to this document.

55	Abstract

57	   This memo describes an RTP Payload format for the ITU-T
58	   Recommendation H.264 video codec and the technically identical
59	   ISO/IEC International Standard 14496-10 video codec, excluding the
60	   Scalable Video Coding (SVC) extension and the Multivew Video Coding
61	   extension, for which the RTP payload formats are defined elsewhere.
62	   The RTP payload format allows for packetization of one or more
63	   Network Abstraction Layer Units (NALUs), produced by an H.264 video
64	   encoder, in each RTP payload.  The payload format has wide
65	   applicability, as it supports applications from simple low bit-rate
66	   conversational usage, to Internet video streaming with interleaved
67	   transmission, to high bit-rate video-on-demand.

69	   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
70	   in section 18.  Issues on backward compatibility to RFC 3984 are
71	   discussed in section 17.

73	Table of Contents

75	   1. Introduction...................................................4
76	      1.1. The H.264 Codec...........................................4
77	      1.2. Parameter Set Concept.....................................5
78	      1.3. Network Abstraction Layer Unit Types......................6
79	   2. Conventions....................................................7
80	   3. Scope..........................................................7
81	   4. Definitions and Abbreviations..................................8
82	      4.1. Definitions...............................................8
83	      4.2. Abbreviations............................................10
84	   5. RTP Payload Format............................................10
85	      5.1. RTP Header Usage.........................................10
86	      5.2. Payload Structures.......................................13
87	      5.3. NAL Unit Header Usage....................................14
88	      5.4. Packetization Modes......................................17
89	      5.5. Decoding Order Number (DON)..............................18
90	      5.6. Single NAL Unit Packet...................................20
91	      5.7. Aggregation Packets......................................21
92	         5.7.1. Single-Time Aggregation Packet......................23
93	         5.7.2. Multi-Time Aggregation Packets (MTAPs)..............25
94	         5.7.3. Fragmentation Units (FUs)...........................29
95	   6. Packetization Rules...........................................33
96	      6.1. Common Packetization Rules...............................33
97	      6.2. Single NAL Unit Mode.....................................34
98	      6.3. Non-Interleaved Mode.....................................34
99	      6.4. Interleaved Mode.........................................34
100	   7. De-Packetization Process......................................35
101	      7.1. Single NAL Unit and Non-Interleaved Mode.................35
102	      7.2. Interleaved Mode.........................................35
103	         7.2.1. Size of the De-interleaving Buffer..................36
104	         7.2.2. De-interleaving Process.............................36
105	      7.3. Additional De-Packetization Guidelines...................38
106	   8. Payload Format Parameters.....................................39
107	      8.1. Media Type Registration..................................39
108	      8.2. SDP Parameters...........................................56
109	         8.2.1. Mapping of Payload Type Parameters to SDP...........56
110	         8.2.2. Usage with the SDP Offer/Answer Model...............57
111	         8.2.3. Usage in Declarative Session Descriptions...........64
112	      8.3. Examples.................................................65
113	      8.4. Parameter Set Considerations.............................72
114	      8.5. Decoder Refresh Point Procedure using In-Band Transport of
115	      Parameter Sets (Informative)..................................74
116	         8.5.1. IDR Procedure to Respond to a Request for a Decoder
117	         Refresh Point..............................................75
118	         8.5.2. Gradual Recovery Procedure to Respond to a Request for a
119	         Decoder Refresh Point......................................75
120	   9. Security Considerations.......................................76
121	   10. Congestion Control...........................................77
122	   11. IANA Consideration...........................................77
123	   12. Informative Appendix: Application Examples...................78
124	      12.1. Video Telephony according to ITU-T Recommendation H.241
125	      Annex A.......................................................78
126	      12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
127	      Aggregation...................................................78
128	      12.3. Video Telephony, Interleaved Packetization Using NAL Unit
129	      Aggregation...................................................79
130	      12.4. Video Telephony with Data Partitioning..................79
131	      12.5. Video Telephony or Streaming with FUs and Forward Error
132	      Correction....................................................80
133	      12.6. Low Bit-Rate Streaming..................................82
134	      12.7. Robust Packet Scheduling in Video Streaming.............83

136	   13. Informative Appendix: Rationale for Decoding Order Number....84
137	      13.1. Introduction............................................84
138	      13.2. Example of Multi-Picture Slice Interleaving.............84
139	      13.3. Example of Robust Packet Scheduling.....................86
140	      13.4. Robust Transmission Scheduling of Redundant Coded Slices89
141	      13.5. Remarks on Other Design Possibilities...................90
142	   14. Acknowledgements.............................................91
143	   15. References...................................................91
144	      15.1. Normative References....................................91
145	      15.2. Informative References..................................92
146	   16. Authors' Addresses...........................................94
147	   17. Backward Compatibility to RFC 3984...........................94
148	   18. Changes from RFC 3984........................................96

150	1. Introduction

152	   This memo specifies an RTP payload specification for the video coding
153	   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
154	   International Standard 14496 Part 10 [2] (both also known as Advanced
155	   Video Coding, or AVC).  In this memo the name H.264 is used for the
156	   codec and the standard, but the memo is equally applicable to the
157	   ISO/IEC counterpart of the coding standard.

159	   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
160	   in section 18.   Issues on backward compatibility to RFC 3984 are
161	   discussed in section 17.

163	1.1. The H.264 Codec

165	   The H.264 video codec has a very broad application range that covers
166	   all forms of digital compressed video, from low bit-rate Internet
167	   streaming applications to HDTV broadcast and Digital Cinema
168	   applications with nearly lossless coding.  Compared to the current
169	   state of technology, the overall performance of H.264 is such that
170	   bit rate savings of 50% or more are reported.  Digital Satellite TV
171	   quality, for example, was reported to be achievable at 1.5 Mbit/s,
172	   compared to the current operation point of MPEG 2 video at around 3.5
173	   Mbit/s [10].

175	   The codec specification [1] itself distinguishes conceptually between
176	   a video coding layer (VCL) and a network abstraction layer (NAL).
177	   The VCL contains the signal processing functionality of the codec;
178	   mechanisms such as transform, quantization, and motion compensated
179	   prediction; and a loop filter.  It follows the general concept of
180	   most of today's video codecs, a macroblock-based coder that uses
181	   inter picture prediction with motion compensation and transform
182	   coding of the residual signal.  The VCL encoder outputs slices: a bit
183	   string that contains the macroblock data of an integer number of
184	   macroblocks, and the information of the slice header (containing the
185	   spatial address of the first macroblock in the slice, the initial
186	   quantization parameter, and similar information).  Macroblocks in
187	   slices are arranged in scan order unless a different macroblock
188	   allocation is specified, by using the so-called Flexible Macroblock
189	   Ordering syntax.  In-picture prediction is used only within a slice.
190	   More information is provided in [10].

192	   The Network Abstraction Layer (NAL) encoder encapsulates the slice
193	   output of the VCL encoder into Network Abstraction Layer Units (NAL
194	   units), which are suitable for transmission over packet networks or
195	   use in packet oriented multiplex environments.  Annex B of H.264
196	   defines an encapsulation process to transmit such NAL units over
197	   byte-stream oriented networks.  In the scope of this memo, Annex B is
198	   not relevant.

200	   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
201	   byte header and the payload byte string.  The header indicates the
202	   type of the NAL unit, the (potential) presence of bit errors or
203	   syntax violations in the NAL unit payload, and information regarding
204	   the relative importance of the NAL unit for the decoding process.
205	   This RTP payload specification is designed to be unaware of the bit
206	   string in the NAL unit payload.

208	   One of the main properties of H.264 is the complete decoupling of the
209	   transmission time, the decoding time, and the sampling or
210	   presentation time of slices and pictures.  The decoding process
211	   specified in H.264 is unaware of time, and the H.264 syntax does not
212	   carry information such as the number of skipped frames (as is common
213	   in the form of the Temporal Reference in earlier video compression
214	   standards).  Also, there are NAL units that affect many pictures and
215	   that are, therefore, inherently timeless.  For this reason, the
216	   handling of the RTP timestamp requires some special considerations
217	   for NAL units for which the sampling or presentation time is not
218	   defined or, at transmission time, unknown.

220	1.2. Parameter Set Concept

222	   One very fundamental design concept of H.264 is to generate self-
223	   contained packets, to make mechanisms such as the header duplication
224	   of RFC 2429 [11] or MPEG-4 Visual's Header Extension Code (HEC) [12]
225	   unnecessary.  This was achieved by decoupling information relevant to
226	   more than one slice from the media stream.  This higher layer meta
227	   information should be sent reliably, asynchronously, and in advance
228	   from the RTP packet stream that contains the slice packets.
229	   (Provisions for sending this information in-band are also available
230	   for applications that do not have an out-of-band transport channel
231	   appropriate for the purpose.)  The combination of the higher-level
232	   parameters is called a parameter set.  The H.264 specification
233	   includes two types of parameter sets: sequence parameter set and
234	   picture parameter set.  An active sequence parameter set remains
235	   unchanged throughout a coded video sequence, and an active picture
236	   parameter set remains unchanged within a coded picture.  The sequence
237	   and picture parameter set structures contain information such as
238	   picture size, optional coding modes employed, and macroblock to slice
239	   group map.

241	   To be able to change picture parameters (such as the picture size)
242	   without having to transmit parameter set updates synchronously to the
243	   slice packet stream, the encoder and decoder can maintain a list of
244	   more than one sequence and picture parameter set.  Each slice header
245	   contains a codeword that indicates the sequence and picture parameter
246	   set to be used.

248	   This mechanism allows the decoupling of the transmission of parameter
249	   sets from the packet stream, and the transmission of them by external
250	   means (e.g., as a side effect of the capability exchange), or through
251	   a (reliable or unreliable) control protocol.  It may even be possible
252	   that they are never transmitted but are fixed by an application
253	   design specification.

255	1.3. Network Abstraction Layer Unit Types

257	   Tutorial information on the NAL design can be found in [13], [14],
258	   and [15].

260	   All NAL units consist of a single NAL unit type octet, which also co-
261	   serves as the payload header of this RTP payload format.  The payload
262	   of a NAL unit follows immediately.

264	   The syntax and semantics of the NAL unit type octet are specified in
265	   [1], but the essential properties of the NAL unit type octet are
266	   summarized below.  The NAL unit type octet has the following format:

268	      +---------------+
269	      |0|1|2|3|4|5|6|7|
270	      +-+-+-+-+-+-+-+-+
271	      |F|NRI|  Type   |
272	      +---------------+

274	   The semantics of the components of the NAL unit type octet, as
275	   specified in the H.264 specification, are described briefly below.

277	   F: 1 bit
278	      forbidden_zero_bit.  The H.264 specification declares a value of
279	      1 as a syntax violation.

281	   NRI: 2 bits
282	      nal_ref_idc.  A value of 00 indicates that the content of the NAL
283	      unit is not used to reconstruct reference pictures for inter
284	      picture prediction.  Such NAL units can be discarded without
285	      risking the integrity of the reference pictures.  Values greater
286	      than 00 indicate that the decoding of the NAL unit is required to
287	      maintain the integrity of the reference pictures.

289	   Type: 5 bits
290	      nal_unit_type.  This component specifies the NAL unit payload
291	      type as defined in Table 7-1 of [1], and later within this memo.
292	      For a reference of all currently defined NAL unit types and their
293	      semantics, please refer to section 7.4.1 in [1].

295	   This memo introduces new NAL unit types, which are presented in
296	   section 5.2.  The NAL unit types defined in this memo are marked as
297	   unspecified in [1].  Moreover, this specification extends the
298	   semantics of F and NRI as described in section 5.3.

300	2. Conventions

302	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
303	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
304	   document are to be interpreted as described in RFC-2119 [3].

306	   This specification uses the notion of setting and clearing a bit when
307	   bit fields are handled.  Setting a bit is the same as assigning that
308	   bit the value of 1 (On).  Clearing a bit is the same as assigning
309	   that bit the value of 0 (Off).

311	3. Scope

313	   This payload specification can only be used to carry the "naked"
314	   H.264 NAL unit stream over RTP, and not the bitstream format
315	   discussed in Annex B of H.264.  Likely, the first applications of
316	   this specification will be in the conversational multimedia field,
317	   video telephony or video conferencing, but the payload format also
318	   covers other applications, such as Internet streaming and TV over IP.

320	4. Definitions and Abbreviations

322	4.1. Definitions

324	   This document uses the definitions of [1].  The following terms,
325	   defined in [1], are summed up for convenience:

327	      access unit: A set of NAL units always containing a primary coded
328	      picture.  In addition to the primary coded picture, an access
329	      unit may also contain one or more redundant coded pictures or
330	      other NAL units not containing slices or slice data partitions of
331	      a coded picture.  The decoding of an access unit always results
332	      in a decoded picture.

334	      coded video sequence: A sequence of access units that consists,
335	      in decoding order, of an instantaneous decoding refresh (IDR)
336	      access unit followed by zero or more non-IDR access units
337	      including all subsequent access units up to but not including any
338	      subsequent IDR access unit.

340	      IDR access unit: An access unit in which the primary coded
341	      picture is an IDR picture.

343	      IDR picture: A coded picture containing only slices with I or SI
344	      slice types that causes a "reset" in the decoding process.  After
345	      the decoding of an IDR picture, all following coded pictures in
346	      decoding order can be decoded without inter prediction from any
347	      picture decoded prior to the IDR picture.

349	      primary coded picture: The coded representation of a picture to
350	      be used by the decoding process for a bitstream conforming to
351	      H.264.  The primary coded picture contains all macroblocks of the
352	      picture.

354	      redundant coded picture: A coded representation of a picture or a
355	      part of a picture.  The content of a redundant coded picture
356	      shall not be used by the decoding process for a bitstream
357	      conforming to H.264.  The content of a redundant coded picture
358	      may be used by the decoding process for a bitstream that contains
359	      errors or losses.

361	      VCL NAL unit: A collective term used to refer to coded slice and
362	      coded data partition NAL units.

364	   In addition, the following definitions apply:

366	      decoding order number (DON): A field in the payload structure or
367	      a derived variable indicating NAL unit decoding order.  Values of
368	      DON are in the range of 0 to 65535, inclusive.  After reaching
369	      the maximum value, the value of DON wraps around to 0.

371	      NAL unit decoding order: A NAL unit order that conforms to the
372	      constraints on NAL unit order given in section 7.4.1.2 in [1].

374	      NALU-time: The value that the RTP timestamp would have if the NAL
375	      unit would be transported in its own RTP packet.

377	      transmission order: The order of packets in ascending RTP
378	      sequence number order (in modulo arithmetic).  Within an
379	      aggregation packet, the NAL unit transmission order is the same
380	      as the order of appearance of NAL units in the packet.

382	      media aware network element (MANE): A network element, such as a
383	      middlebox or application layer gateway that is capable of parsing
384	      certain aspects of the RTP payload headers or the RTP payload and
385	      reacting to the contents.

387	         Informative note: The concept of a MANE goes beyond normal
388	         routers or gateways in that a MANE has to be aware of the
389	         signaling (e.g., to learn about the payload type mappings of
390	         the media streams), and in that it has to be trusted when
391	         working with SRTP.  The advantage of using MANEs is that they
392	         allow packets to be dropped according to the needs of the
393	         media coding.  For example, if a MANE has to drop packets due
394	         to congestion on a certain link, it can identify and remove
395	         those packets whose elimination produces the least adverse
396	         effect on the user experience.

398	      static macroblock: A certain amount of macroblocks in the video
399	      stream can be defined as static, as defined in section 8.3.2.8 in
400	      [3].  Static macroblocks free up additional processing cycles for
401	      the handling of non-static macroblocks.  Based on a given amount
402	      of video processing resources and a given resolution, a higher
403	      number of static macroblocks enables a correspondingly higher
404	      frame rate.

406	      default sub-profile: The subset of coding tools, which may be all
407	      coding tools of one profile or the common subset of coding tools
408	      of more than one profile, indicated by the profile-level-id
409	      parameter.

411	      default level: The level indicated by the profile-level-id
412	      parameter, which consists of three octets, profile_idc, profile-
413	      iop, and level_idc.  The default level is indicated by level_idc
414	      in most cases, and, in some cases, additionally by profile-iop.

416	4.2. Abbreviations

418	      DON:        Decoding Order Number
419	      DONB:       Decoding Order Number Base
420	      DOND:       Decoding Order Number Difference
421	      FEC:        Forward Error Correction
422	      FU:         Fragmentation Unit
423	      IDR:        Instantaneous Decoding Refresh
424	      IEC:        International Electrotechnical Commission
425	      ISO:        International Organization for Standardization
426	      ITU-T:      International Telecommunication Union,
427	                  Telecommunication Standardization Sector
428	      MANE:       Media Aware Network Element
429	      MTAP:       Multi-Time Aggregation Packet
430	      MTAP16:     MTAP with 16-bit timestamp offset
431	      MTAP24:     MTAP with 24-bit timestamp offset
432	      NAL:        Network Abstraction Layer
433	      NALU:       NAL Unit
434	      SAR:        Sample Aspect Ratio
435	      SEI:        Supplemental Enhancement Information
436	      STAP:       Single-Time Aggregation Packet
437	      STAP-A:     STAP type A
438	      STAP-B:     STAP type B
439	      TS:         Timestamp
440	      VCL:        Video Coding Layer
441	      VUI:        Video Usability Information

443	5. RTP Payload Format

445	5.1. RTP Header Usage

447	   The format of the RTP header is specified in RFC 3550 [5] and
448	   reprinted in Figure 1 for convenience.  This payload format uses the
449	   fields of the header in a manner consistent with that specification.

451	   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
452	   payload format is specified in section 5.6.  The RTP payload (and the
453	   settings for some RTP header bits) for aggregation packets and
454	   fragmentation units are specified in sections 5.7 and 5.8,
455	   respectively.

457	    0                   1                   2                   3
458	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
459	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
460	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
461	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
462	   |                           timestamp                           |
463	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
464	   |           synchronization source (SSRC) identifier            |
465	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
466	   |            contributing source (CSRC) identifiers             |
467	   |                             ....                              |
468	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

470	                 Figure 1 RTP header according to RFC 3550

472	   The RTP header information to be set according to this RTP payload
473	   format is set as follows:

475	   Marker bit (M): 1 bit
476	      Set for the very last packet of the access unit indicated by the
477	      RTP timestamp, in line with the normal use of the M bit in video
478	      formats, to allow an efficient playout buffer handling.  For
479	      aggregation packets (STAP and MTAP), the marker bit in the RTP
480	      header MUST be set to the value that the marker bit of the last
481	      NAL unit of the aggregation packet would have been if it were
482	      transported in its own RTP packet.  Decoders MAY use this bit as
483	      an early indication of the last packet of an access unit, but
484	      MUST NOT rely on this property.

486	         Informative note: Only one M bit is associated with an
487	         aggregation packet carrying multiple NAL units.  Thus, if a
488	         gateway has re-packetized an aggregation packet into several
489	         packets, it cannot reliably set the M bit of those packets.

491	   Payload type (PT): 7 bits
492	      The assignment of an RTP payload type for this new packet format
493	      is outside the scope of this document and will not be specified
494	      here.  The assignment of a payload type has to be performed
495	      either through the profile used or in a dynamic way.

497	   Sequence number (SN): 16 bits
498	      Set and used in accordance with RFC 3550.  For the single NALU
499	      and non-interleaved packetization mode, the sequence number is
500	      used to determine decoding order for the NALU.

502	   Timestamp: 32 bits
503	      The RTP timestamp is set to the sampling timestamp of the content.
504	      A 90 kHz clock rate MUST be used.

506	      If the NAL unit has no timing properties of its own (e.g.,
507	      parameter set and SEI NAL units), the RTP timestamp is set to the
508	      RTP timestamp of the primary coded picture of the access unit in
509	      which the NAL unit is included, according to section 7.4.1.2 of
510	      [1].

512	      The setting of the RTP Timestamp for MTAPs is defined in section
513	      5.7.2.

515	      Receivers SHOULD ignore any picture timing SEI messages included
516	      in access units that have only one display timestamp.  Instead,
517	      receivers SHOULD use the RTP timestamp for synchronizing the
518	      display process.

520	      RTP senders SHOULD NOT transmit picture timing SEI messages for
521	      pictures that are not supposed to be displayed as multiple fields.

523	      If one access unit has more than one display timestamp carried in
524	      a picture timing SEI message, then the information in the SEI
525	      message SHOULD be treated as relative to the RTP timestamp, with
526	      the earliest event occurring at the time given by the RTP
527	      timestamp, and subsequent events later, as given by the
528	      difference in SEI message picture timing values.  Let tSEI1,
529	      tSEI2, ..., tSEIn be the display timestamps carried in the SEI
530	      message of an access unit, where tSEI1 is the earliest of all
531	      such timestamps.  Let tmadjst() be a function that adjusts the
532	      SEI messages time scale to a 90-kHz time scale.  Let TS be the
533	      RTP timestamp.  Then, the display time for the event associated
534	      with tSEI1 is TS.  The display time for the event with tSEIx,
535	      where x is [2..n] is TS + tmadjst (tSEIx - tSEI1).

537	         Informative note: Displaying coded frames as fields is needed
538	         commonly in an operation known as 3:2 pulldown, in which film
539	         content that consists of coded frames is displayed on a
540	         display using interlaced scanning.  The picture timing SEI
541	         message enables carriage of multiple timestamps for the same
542	         coded picture, and therefore the 3:2 pulldown process is
543	         perfectly controlled.  The picture timing SEI message
544	         mechanism is necessary because only one timestamp per coded
545	         frame can be conveyed in the RTP timestamp.

547	         Informative note: Because H.264 allows the decoding order to
548	         be different from the display order, values of RTP timestamps
549	         may not be monotonically non-decreasing as a function of RTP
550	         sequence numbers.  Furthermore, the value for inter-arrival
551	         jitter reported in the RTCP reports may not be a trustworthy
552	         indication of the network performance, as the calculation
553	         rules for inter-arrival jitter (section 6.4.1 of RFC 3550)
554	         assume that the RTP timestamp of a packet is directly
555	         proportional to its transmission time.

557	5.2. Payload Structures

559	   The payload format defines three different basic payload structures.
560	   A receiver can identify the payload structure by the first byte of
561	   the RTP packet payload, which co-serves as the RTP payload header and,
562	   in some cases, as the first byte of the payload.  This byte is always
563	   structured as a NAL unit header.  The NAL unit type field indicates
564	   which structure is present.  The possible structures are as follows:

566	   Single NAL Unit Packet: Contains only a single NAL unit in the
567	   payload.  The NAL header type field will be equal to the original NAL
568	   unit type; i.e., in the range of 1 to 23, inclusive.  Specified in
569	   section 5.6.

571	   Aggregation Packet: Packet type used to aggregate multiple NAL units
572	   into a single RTP payload.  This packet exists in four versions, the
573	   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
574	   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
575	   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
576	   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
577	   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
578	   27, respectively.  Specified in section 5.7.

580	   Fragmentation Unit: Used to fragment a single NAL unit over multiple
581	   RTP packets.  Exists with two versions, FU-A and FU-B, identified
582	   with the NAL unit type numbers 28 and 29, respectively.  Specified in
583	   section 5.8.

585	      Informative note: This specification does not limit the size of
586	      NAL units encapsulated in single NAL unit packets and
587	      fragmentation units.  The maximum size of a NAL unit encapsulated
588	      in any aggregation packet is 65535 bytes.

590	   Table 1 summarizes NAL unit types and the corresponding RTP packet
591	   types when each of these NAL units is directly used as a packet
592	   payload, and where the types are described in this memo.

594	     Table 1.  Summary of NAL unit types and the corresponding packet
595	                                   types

597	      NAL Unit  Packet    Packet Type Name               Section
598	      Type      Type
599	      ---------------------------------------------------------
600	      0        reserved                                     -
601	      1-23     NAL unit  Single NAL unit packet             5.6
602	      24       STAP-A    Single-time aggregation packet     5.7.1
603	      25       STAP-B    Single-time aggregation packet     5.7.1
604	      26       MTAP16    Multi-time aggregation packet      5.7.2
605	      27       MTAP24    Multi-time aggregation packet      5.7.2
606	      28       FU-A      Fragmentation unit                 5.8
607	      29       FU-B      Fragmentation unit                 5.8
608	      30-31    reserved                                     -

610	5.3. NAL Unit Header Usage

612	   The structure and semantics of the NAL unit header were introduced in
613	   section 1.3.  For convenience, the format of the NAL unit header is
614	   reprinted below:

616	      +---------------+
617	      |0|1|2|3|4|5|6|7|
618	      +-+-+-+-+-+-+-+-+
619	      |F|NRI|  Type   |
620	      +---------------+

622	   This section specifies the semantics of F and NRI according to this
623	   specification.

625	   F: 1 bit
626	      forbidden_zero_bit.  A value of 0 indicates that the NAL unit
627	      type octet and payload should not contain bit errors or other
628	      syntax violations.  A value of 1 indicates that the NAL unit type
629	      octet and payload may contain bit errors or other syntax
630	      violations.

632	      MANEs SHOULD set the F bit to indicate detected bit errors in the
633	      NAL unit.  The H.264 specification requires that the F bit is
634	      equal to 0.  When the F bit is set, the decoder is advised that
635	      bit errors or any other syntax violations may be present in the
636	      payload or in the NAL unit type octet.  The simplest decoder
637	      reaction to a NAL unit in which the F bit is equal to 1 is to
638	      discard such a NAL unit and to conceal the lost data in the
639	      discarded NAL unit.

641	   NRI: 2 bits
642	      nal_ref_idc.  The semantics of value 00 and a non-zero value
643	      remain unchanged from the H.264 specification.  In other words, a
644	      value of 00 indicates that the content of the NAL unit is not
645	      used to reconstruct reference pictures for inter picture
646	      prediction. Such NAL units can be discarded without risking the
647	      integrity of the reference pictures.  Values greater than 00
648	      indicate that the decoding of the NAL unit is required to
649	      maintain the integrity of the reference pictures.

651	      In addition to the specification above, according to this RTP
652	      payload specification, values of NRI indicate the relative
653	      transport priority, as determined by the encoder.  MANEs can use
654	      this information to protect more important NAL units better than
655	      they do less important NAL units.  The highest transport priority
656	      is 11, followed by 10, and then by 01; finally, 00 is the lowest.

658	         Informative note: Any non-zero value of NRI is handled
659	         identically in H.264 decoders.  Therefore, receivers need not
660	         manipulate the value of NRI when passing NAL units to the
661	         decoder.

663	      An H.264 encoder MUST set the value of NRI according to the H.264
664	      specification (subclause 7.4.1) when the value of nal_unit_type
665	      is in the range of 1 to 12, inclusive.  In particular, the H.264
666	      specification requires that the value of NRI SHALL be equal to 0
667	      for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or
668	      12.

670	      For NAL units having nal_unit_type equal to 7 or 8 (indicating a
671	      sequence parameter set or a picture parameter set, respectively),
672	      an H.264 encoder SHOULD set the value of NRI to 11 (in binary
673	      format).  For coded slice NAL units of a primary coded picture
674	      having nal_unit_type equal to 5 (indicating a coded slice
675	      belonging to an IDR picture), an H.264 encoder SHOULD set the
676	      value of NRI to 11 (in binary format).

678	      For a mapping of the remaining nal_unit_types to NRI values, the
679	      following example MAY be used and has been shown to be efficient
680	      in a certain environment [14].  Other mappings MAY also be
681	      desirable, depending on the application and the H.264/AVC Annex A
682	      profile in use.

684	         Informative note: Data Partitioning is not available in
685	         certain profiles; e.g., in the Main or Baseline profiles.
686	         Consequently, the NAL unit types 2, 3, and 4 can occur only if
687	         the video bitstream conforms to a profile in which data
688	         partitioning is allowed and not in streams that conform to the
689	         Main or Baseline profiles.

691	   Table 2.  Example of NRI values for coded slices and coded slice data
692	              partitions of primary coded reference pictures

694	      NAL Unit Type     Content of NAL unit              NRI (binary)
695	      ----------------------------------------------------------------
696	       1              non-IDR coded slice                         10
697	       2              Coded slice data partition A                10
698	       3              Coded slice data partition B                01
699	       4              Coded slice data partition C                01

701	         Informative note: As mentioned before, the NRI value of non-
702	         reference pictures is 00 as mandated by H.264/AVC.

704	      An H.264 encoder SHOULD set the value of NRI for coded slice and
705	      coded slice data partition NAL units of redundant coded reference
706	      pictures equal to 01 (in binary format).

708	      Definitions of the values for NRI for NAL unit types 24 to 29,
709	      inclusive, are given in sections 5.7 and 5.8 of this memo.

711	      No recommendation for the value of NRI is given for NAL units
712	      having nal_unit_type in the range of 13 to 23, inclusive, because
713	      these values are reserved for ITU-T and ISO/IEC.  No
714	      recommendation for the value of NRI is given for NAL units having
715	      nal_unit_type equal to 0 or in the range of 30 to 31, inclusive,
716	      as the semantics of these values are not specified in this memo.

718	5.4. Packetization Modes

720	   This memo specifies three cases of packetization modes:

722	   o  Single NAL unit mode

724	   o  Non-interleaved mode

726	   o  Interleaved mode

728	   The single NAL unit mode is targeted for conversational systems that
729	   comply with ITU-T Recommendation H.241 [3]  (see section 12.1).  The
730	   non-interleaved mode is targeted for conversational systems that may
731	   not comply with ITU-T Recommendation H.241.  In the non-interleaved
732	   mode, NAL units are transmitted in NAL unit decoding order.  The
733	   interleaved mode is targeted for systems that do not require very low
734	   end-to-end latency.  The interleaved mode allows transmission of NAL
735	   units out of NAL unit decoding order.

737	   The packetization mode in use MAY be signaled by the value of the
738	   OPTIONAL packetization-mode media type parameter.  The used
739	   packetization mode governs which NAL unit types are allowed in RTP
740	   payloads.  Table 3 summarizes the allowed packet payload types for
741	   each packetization mode.  Packetization modes are explained in more
742	   detail in section 6.

744	    Table 3.  Summary of allowed NAL unit types for each packetization
745	            mode (yes = allowed, no = disallowed, ig = ignore)

747	      Payload Packet    Single NAL    Non-Interleaved    Interleaved
748	      Type    Type      Unit Mode           Mode             Mode
749	      -------------------------------------------------------------
750	      0      reserved      ig               ig               ig
751	      1-23   NAL unit     yes              yes               no
752	      24     STAP-A        no              yes               no
753	      25     STAP-B        no               no              yes
754	      26     MTAP16        no               no              yes
755	      27     MTAP24        no               no              yes
756	      28     FU-A          no              yes              yes
757	      29     FU-B          no               no              yes
758	      30-31  reserved      ig               ig               ig

760	   Some NAL unit or payload type values (indicated as reserved in
761	   Table 3) are reserved for future extensions.  NAL units of those
762	   types SHOULD NOT be sent by a sender (direct as packet payloads, or
763	   as aggregation units in aggregation packets, or as fragmented units
764	   in FU packets) and MUST be ignored by a receiver.  For example, the
765	   payload types 1-23, with the associated packet type "NAL unit", are
766	   allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode", but
767	   disallowed in "Interleaved Mode".  However, NAL units of NAL unit
768	   types 1-23 can be used in ''Interleaved Mode'' as aggregation units in
769	   STAP-B, MTAP16 and MTAP14 packets as well as fragmented units in FU-A
770	   and FU-B packets.  Similarly, NAL units of NAL unit types 1-23 can
771	   also be used in the "Non-Interleaved Mode" as aggregation units in
772	   STAP-A packets or fragmented units in FU-A packets, in addition to
773	   being directly used as packet payloads.

775	5.5. Decoding Order Number (DON)

777	   In the interleaved packetization mode, the transmission order of NAL
778	   units is allowed to differ from the decoding order of the NAL units.
779	   Decoding order number (DON) is a field in the payload structure or a
780	   derived variable that indicates the NAL unit decoding order.
781	   Rationale and examples of use cases for transmission out of decoding
782	   order and for the use of DON are given in section 13.

784	   The coupling of transmission and decoding order is controlled by the
785	   OPTIONAL sprop-interleaving-depth media type parameter as follows.
786	   When the value of the OPTIONAL sprop-interleaving-depth media type
787	   parameter is equal to 0 (explicitly or per default), the transmission
788	   order of NAL units MUST conform to the NAL unit decoding order.  When
789	   the value of the OPTIONAL sprop-interleaving-depth media type
790	   parameter is greater than 0,

792	   o  the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED
793	      to be the NAL unit decoding order, and

795	   o  the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
796	      and FUs in two consecutive packets is NOT REQUIRED to be the NAL
797	      unit decoding order.

799	   The RTP payload structures for a single NAL unit packet, an STAP-A,
800	   and an FU-A do not include DON.  STAP-B and FU-B structures include
801	   DON, and the structure of MTAPs enables derivation of DON as
802	   specified in section 5.7.2.

804	      Informative note: When an FU-A occurs in interleaved mode, it
805	      always follows an FU-B, which sets its DON.

807	      Informative note: If a transmitter wants to encapsulate a single
808	      NAL unit per packet and transmit packets out of their decoding
809	      order, STAP-B packet type can be used.

811	   In the single NAL unit packetization mode, the transmission order of
812	   NAL units, determined by the RTP sequence number, MUST be the same as
813	   their NAL unit decoding order.  In the non-interleaved packetization
814	   mode, the transmission order of NAL units in single NAL unit packets,
815	   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
816	   The NAL units within an STAP MUST appear in the NAL unit decoding
817	   order.  Thus, the decoding order is first provided through the
818	   implicit order within a STAP, and second provided through the RTP
819	   sequence number for the order between STAPs, FUs, and single NAL unit
820	   packets.

822	   Signaling of the value of DON for NAL units carried in STAP-B, MTAP,
823	   and a series of fragmentation units starting with an FU-B is
824	   specified in sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
825	   value of the first NAL unit in transmission order MAY be set to any
826	   value.  Values of DON are in the range of 0 to 65535, inclusive.
827	   After reaching the maximum value, the value of DON wraps around to 0.

829	   The decoding order of two NAL units contained in any STAP-B, MTAP, or
830	   a series of fragmentation units starting with an FU-B is determined
831	   as follows.  Let DON(i) be the decoding order number of the NAL unit
832	   having index i in the transmission order.  Function don_diff(m,n) is
833	   specified as follows:

835	         If DON(m) == DON(n), don_diff(m,n) = 0

837	         If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
838	         don_diff(m,n) = DON(n) - DON(m)

840	         If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
841	         don_diff(m,n) = 65536 - DON(m) + DON(n)

843	         If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
844	         don_diff(m,n) = - (DON(m) + 65536 - DON(n))

846	         If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
847	         don_diff(m,n) = - (DON(m) - DON(n))

849	   A positive value of don_diff(m,n) indicates that the NAL unit having
850	   transmission order index n follows, in decoding order, the NAL unit
851	   having transmission order index m.  When don_diff(m,n) is equal to 0,
852	   then the NAL unit decoding order of the two NAL units can be in
853	   either order.  A negative value of don_diff(m,n) indicates that the
854	   NAL unit having transmission order index n precedes, in decoding
855	   order, the NAL unit having transmission order index m.

857	   Values of DON related fields (DON, DONB, and DOND; see section 5.7)
858	   MUST be such that the decoding order determined by the values of DON,
859	   as specified above, conforms to the NAL unit decoding order.  If the
860	   order of two NAL units in NAL unit decoding order is switched and the
861	   new order does not conform to the NAL unit decoding order, the NAL
862	   units MUST NOT have the same value of DON.  If the order of two
863	   consecutive NAL units in the NAL unit stream is switched and the new
864	   order still conforms to the NAL unit decoding order, the NAL units
865	   MAY have the same value of DON.  For example, when arbitrary slice
866	   order is allowed by the video coding profile in use, all the coded
867	   slice NAL units of a coded picture are allowed to have the same value
868	   of DON.  Consequently, NAL units having the same value of DON can be
869	   decoded in any order, and two NAL units having a different value of
870	   DON should be passed to the decoder in the order specified above.
871	   When two consecutive NAL units in the NAL unit decoding order have a
872	   different value of DON, the value of DON for the second NAL unit in
873	   decoding order SHOULD be the value of DON for the first, incremented
874	   by one.

876	   An example of the de-packetization process to recover the NAL unit
877	   decoding order is given in section 7.

879	      Informative note: Receivers should not expect that the absolute
880	      difference of values of DON for two consecutive NAL units in the
881	      NAL unit decoding order will be equal to one, even in error-free
882	      transmission.  An increment by one is not required, as at the
883	      time of associating values of DON to NAL units, it may not be
884	      known whether all NAL units are delivered to the receiver.  For
885	      example, a gateway may not forward coded slice NAL units of non-
886	      reference pictures or SEI NAL units when there is a shortage of
887	      bit rate in the network to which the packets are forwarded.  In
888	      another example, a live broadcast is interrupted by pre-encoded
889	      content, such as commercials, from time to time.  The first intra
890	      picture of a pre-encoded clip is transmitted in advance to ensure
891	      that it is readily available in the receiver.  When transmitting
892	      the first intra picture, the originator does not exactly know how
893	      many NAL units will be encoded before the first intra picture of
894	      the pre-encoded clip follows in decoding order.  Thus, the values
895	      of DON for the NAL units of the first intra picture of the pre-
896	      encoded clip have to be estimated when they are transmitted, and
897	      gaps in values of DON may occur.

899	5.6. Single NAL Unit Packet

901	   The single NAL unit packet defined here MUST contain only one NAL
902	   unit, of the types defined in [1].  This means that neither an
903	   aggregation packet nor a fragmentation unit can be used within a
904	   single NAL unit packet.  A NAL unit stream composed by de-packetizing
905	   single NAL unit packets in RTP sequence number order MUST conform to
906	   the NAL unit decoding order.  The structure of the single NAL unit
907	   packet is shown in Figure 2.

909	      Informative note: The first byte of a NAL unit co-serves as the
910	      RTP payload header.

912	    0                   1                   2                   3
913	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
914	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
915	   |F|NRI|  Type   |                                               |
916	   +-+-+-+-+-+-+-+-+                                               |
917	   |                                                               |
918	   |               Bytes 2..n of a Single NAL unit                 |
919	   |                                                               |
920	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
921	   |                               :...OPTIONAL RTP padding        |
922	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

924	          Figure 2 RTP payload format for single NAL unit packet

926	5.7. Aggregation Packets

928	   Aggregation packets are the NAL unit aggregation scheme of this
929	   payload specification.  The scheme is introduced to reflect the
930	   dramatically different MTU sizes of two key target networks: wireline
931	   IP networks (with an MTU size that is often limited by the Ethernet
932	   MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T H.324/M)
933	   based wireless communication systems with preferred transmission unit
934	   sizes of 254 bytes or less.  To prevent media transcoding between the
935	   two worlds, and to avoid undesirable packetization overhead, a NAL
936	   unit aggregation scheme is introduced.

938	   Two types of aggregation packets are defined by this specification:

940	   o  Single-time aggregation packet (STAP): aggregates NAL units with
941	      identical NALU-time.  Two types of STAPs are defined, one without
942	      DON (STAP-A) and another including DON (STAP-B).

944	   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
945	      potentially differing NALU-time.  Two different MTAPs are defined,
946	      differing in the length of the NAL unit timestamp offset.

948	   Each NAL unit to be carried in an aggregation packet is encapsulated
949	   in an aggregation unit.  Please see below for the four different
950	   aggregation units and their characteristics.

952	   The structure of the RTP payload format for aggregation packets is
953	   presented in Figure 3.

955	    0                   1                   2                   3
956	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
957	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
958	   |F|NRI|  Type   |                                               |
959	   +-+-+-+-+-+-+-+-+                                               |
960	   |                                                               |
961	   |             one or more aggregation units                     |
962	   |                                                               |
963	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
964	   |                               :...OPTIONAL RTP padding        |
965	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

967	            Figure 3 RTP payload format for aggregation packets

969	   MTAPs and STAPs share the following packetization rules:  The RTP
970	   timestamp MUST be set to the earliest of the NALU-times of all the
971	   NAL units to be aggregated.  The type field of the NAL unit type
972	   octet MUST be set to the appropriate value, as indicated in Table 4.
973	   The F bit MUST be cleared if all F bits of the aggregated NAL units
974	   are zero; otherwise, it MUST be set.  The value of NRI MUST be the
975	   maximum of all the NAL units carried in the aggregation packet.

977	                 Table 4.  Type field for STAPs and MTAPs

979	      Type   Packet    Timestamp offset   DON related fields
980	                       field length       (DON, DONB, DOND)
981	                       (in bits)          present
982	      --------------------------------------------------------
983	      24     STAP-A       0                 no
984	      25     STAP-B       0                 yes
985	      26     MTAP16      16                 yes
986	      27     MTAP24      24                 yes

988	   The marker bit in the RTP header is set to the value that the marker
989	   bit of the last NAL unit of the aggregated packet would have if it
990	   were transported in its own RTP packet.

992	   The payload of an aggregation packet consists of one or more
993	   aggregation units.  See sections 5.7.1 and 5.7.2 for the four
994	   different types of aggregation units.  An aggregation packet can
995	   carry as many aggregation units as necessary; however, the total
996	   amount of data in an aggregation packet obviously MUST fit into an IP
997	   packet, and the size SHOULD be chosen so that the resulting IP packet
998	   is smaller than the MTU size.  An aggregation packet MUST NOT contain
999	   fragmentation units specified in section 5.8.  Aggregation packets
1000	   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
1001	   another aggregation packet.

1003	5.7.1. Single-Time Aggregation Packet

1005	   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
1006	   units are aggregated that all share the same NALU-time.  The payload
1007	   of an STAP-A does not include DON and consists of at least one
1008	   single-time aggregation unit, as presented in Figure 4.  The payload
1009	   of an STAP-B consists of a 16-bit unsigned decoding order number (DON)
1010	   (in network byte order) followed by at least one single-time
1011	   aggregation unit, as presented in Figure 5.

1013	    0                   1                   2                   3
1014	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1015	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1016	                   :                                               |
1017	   +-+-+-+-+-+-+-+-+                                               |
1018	   |                                                               |
1019	   |                single-time aggregation units                  |
1020	   |                                                               |
1021	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1022	   |                               :
1023	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1025	                    Figure 4 Payload format for STAP-A

1027	    0                   1                   2                   3
1028	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1029	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1030	                   :  decoding order number (DON)  |               |
1031	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1032	   |                                                               |
1033	   |                single-time aggregation units                  |
1034	   |                                                               |
1035	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1036	   |                               :
1037	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1039	                    Figure 5 Payload format for STAP-B

1041	   The DON field specifies the value of DON for the first NAL unit in an
1042	   STAP-B in transmission order.  For each successive NAL unit in
1043	   appearance order in an STAP-B, the value of DON is equal to (the
1044	   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
1045	   which '%' stands for the modulo operation.

1047	   A single-time aggregation unit consists of 16-bit unsigned size
1048	   information (in network byte order) that indicates the size of the
1049	   following NAL unit in bytes (excluding these two octets, but
1050	   including the NAL unit type octet of the NAL unit), followed by the
1051	   NAL unit itself, including its NAL unit type byte.  A single-time
1052	   aggregation unit is byte aligned within the RTP payload, but it may
1053	   not be aligned on a 32-bit word boundary.  Figure 6 presents the
1054	   structure of the single-time aggregation unit.

1056	    0                   1                   2                   3
1057	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1058	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1059	                   :        NAL unit size          |               |
1060	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1061	   |                                                               |
1062	   |                           NAL unit                            |
1063	   |                                                               |
1064	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1065	   |                               :
1066	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1068	            Figure 6 Structure for single-time aggregation unit

1070	   Figure 7 presents an example of an RTP packet that contains an STAP-A.
1071	   The STAP contains two single-time aggregation units, labeled as 1 and
1072	   2 in the figure.

1074	    0                   1                   2                   3
1075	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1076	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1077	   |                          RTP Header                           |
1078	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1079	   |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
1080	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1081	   |                         NALU 1 Data                           |
1082	   :                                                               :
1083	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1084	   |               | NALU 2 Size                   | NALU 2 HDR    |
1085	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1086	   |                         NALU 2 Data                           |
1087	   :                                                               :
1088	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1089	   |                               :...OPTIONAL RTP padding        |
1090	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1092	    Figure 7 An example of an RTP packet including an STAP-A containing
1093	                     two single-time aggregation units

1095	   Figure 8 presents an example of an RTP packet that contains an STAP-B.
1096	   The STAP contains two single-time aggregation units, labeled as 1 and
1097	   2 in the figure.

1099	    0                   1                   2                   3
1100	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1101	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1102	   |                          RTP Header                           |
1103	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1104	   |STAP-B NAL HDR | DON                           | NALU 1 Size   |
1105	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1106	   | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
1107	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1108	   :                                                               :
1109	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1110	   |               | NALU 2 Size                   | NALU 2 HDR    |
1111	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1112	   |                       NALU 2 Data                             |
1113	   :                                                               :
1114	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1115	   |                               :...OPTIONAL RTP padding        |
1116	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1118	    Figure 8 An example of an RTP packet including an STAP-B containing
1119	                     two single-time aggregation units

1121	5.7.2. Multi-Time Aggregation Packets (MTAPs)

1123	   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
1124	   order number base (DONB) (in network byte order) and one or more
1125	   multi-time aggregation units, as presented in Figure 9.  DONB MUST
1126	   contain the value of DON for the first NAL unit in the NAL unit
1127	   decoding order among the NAL units of the MTAP.

1129	      Informative note: The first NAL unit in the NAL unit decoding
1130	      order is not necessarily the first NAL unit in the order in which
1131	      the NAL units are encapsulated in an MTAP.

1133	    0                   1                   2                   3
1134	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1135	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1136	                   :  decoding order number base   |               |
1137	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1138	   |                                                               |
1139	   |                 multi-time aggregation units                  |
1140	   |                                                               |
1141	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1142	   |                               :
1143	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1145	                Figure 9 NAL unit payload format for MTAPs

1147	   Two different multi-time aggregation units are defined in this
1148	   specification.  Both of them consist of 16 bits unsigned size
1149	   information of the following NAL unit (in network byte order), an 8-
1150	   bit unsigned decoding order number difference (DOND), and n bits (in
1151	   network byte order) of timestamp offset (TS offset) for this NAL unit,
1152	   whereby n can be 16 or 24.  The choice between the different MTAP
1153	   types (MTAP16 and MTAP24) is application dependent: the larger the
1154	   timestamp offset is, the higher the flexibility of the MTAP, but the
1155	   overhead is also higher.

1157	   The structure of the multi-time aggregation units for MTAP16 and
1158	   MTAP24 are presented in Figures 10 and 11, respectively.  The
1159	   starting or ending position of an aggregation unit within a packet is
1160	   NOT REQUIRED to be on a 32-bit word boundary.  The DON of the NAL
1161	   unit contained in a multi-time aggregation unit is equal to (DONB +
1162	   DOND) % 65536, in which % denotes the modulo operation.  This memo
1163	   does not specify how the NAL units within an MTAP are ordered, but,
1164	   in most cases, NAL unit decoding order SHOULD be used.

1166	   The timestamp offset field MUST be set to a value equal to the value
1167	   of the following formula: If the NALU-time is larger than or equal to
1168	   the RTP timestamp of the packet, then the timestamp offset equals
1169	   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
1170	   If the NALU-time is smaller than the RTP timestamp of the packet,
1171	   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
1172	   timestamp of the packet).

1174	    0                   1                   2                   3
1175	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1176	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1177	   :        NAL unit size          |      DOND     |  TS offset    |
1178	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1179	   |  TS offset    |                                               |
1180	   +-+-+-+-+-+-+-+-+              NAL unit                         |
1181	   |                                                               |
1182	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1183	   |                               :
1184	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1186	             Figure 10  Multi-time aggregation unit for MTAP16

1188	    0                   1                   2                   3
1189	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1190	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1191	   :        NAL unit size         |      DOND     |  TS offset    |
1192	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1193	   |         TS offset             |                               |
1194	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1195	   |                              NAL unit                         |
1196	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1197	   |                               :
1198	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1200	             Figure 11  Multi-time aggregation unit for MTAP24

1202	   For the "earliest" multi-time aggregation unit in an MTAP the
1203	   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
1204	   itself is identical to the earliest NALU-time.

1206	      Informative note: The "earliest" multi-time aggregation unit is
1207	      the one that would have the smallest extended RTP timestamp among
1208	      all the aggregation units of an MTAP if the NAL units contained
1209	      in the aggregation units were encapsulated in single NAL unit
1210	      packets.  An extended timestamp is a timestamp that has more than
1211	      32 bits and is capable of counting the wraparound of the
1212	      timestamp field, thus enabling one to determine the smallest
1213	      value if the timestamp wraps.  Such an "earliest" aggregation
1214	      unit may not be the first one in the order in which the
1215	      aggregation units are encapsulated in an MTAP.  The "earliest"
1216	      NAL unit need not be the same as the first NAL unit in the NAL
1217	      unit decoding order either.

1219	   Figure 12 presents an example of an RTP packet that contains a multi-
1220	   time aggregation packet of type MTAP16 that contains two multi-time
1221	   aggregation units, labeled as 1 and 2 in the figure.

1223	    0                   1                   2                   3
1224	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1225	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1226	   |                          RTP Header                           |
1227	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1228	   |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
1229	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1230	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
1231	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1232	   |  NALU 1 HDR   |  NALU 1 DATA                                  |
1233	   +-+-+-+-+-+-+-+-+                                               +
1234	   :                                                               :
1235	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1236	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1237	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1238	   |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
1239	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1240	   :                                                               :
1241	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1242	   |                               :...OPTIONAL RTP padding        |
1243	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1245	   Figure 12  An RTP packet including a multi-time aggregation packet of
1246	          type MTAP16 containing two multi-time aggregation units

1248	   Figure 13 presents an example of an RTP packet that contains a multi-
1249	   time aggregation packet of type MTAP24 that contains two multi-time
1250	   aggregation units, labeled as 1 and 2 in the figure.

1252	    0                   1                   2                   3
1253	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1254	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1255	   |                          RTP Header                           |
1256	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1257	   |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
1258	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1259	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
1260	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1261	   |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
1262	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1263	   :                                                               :
1264	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1265	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1266	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1267	   |       NALU 2 TS offset                        |  NALU 2 HDR   |
1268	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1269	   |  NALU 2 DATA                                                  |
1270	   :                                                               :
1271	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1272	   |                               :...OPTIONAL RTP padding        |
1273	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1275	   Figure 13  An RTP packet including a multi-time aggregation packet of
1276	          type MTAP24 containing two multi-time aggregation units

1278	5.7.3. Fragmentation Units (FUs)

1280	   This payload type allows fragmenting a NAL unit into several RTP
1281	   packets.  Doing so on the application layer instead of relying on
1282	   lower layer fragmentation (e.g., by IP) has the following advantages:

1284	   o  The payload format is capable of transporting NAL units bigger
1285	      than 64 kbytes over an IPv4 network that may be present in pre-
1286	      recorded video, particularly in High Definition formats (there is
1287	      a limit of the number of slices per picture, which results in a
1288	      limit of NAL units per picture, which may result in big NAL units).

1290	   o  The fragmentation mechanism allows fragmenting a single NAL unit
1291	      and applying generic forward error correction as described in
1292	      section 12.5.

1294	   Fragmentation is defined only for a single NAL unit and not for any
1295	   aggregation packets.  A fragment of a NAL unit consists of an integer
1296	   number of consecutive octets of that NAL unit.  Each octet of the NAL
1297	   unit MUST be part of exactly one fragment of that NAL unit.
1298	   Fragments of the same NAL unit MUST be sent in consecutive order with
1299	   ascending RTP sequence numbers (with no other RTP packets within the
1300	   same RTP packet stream being sent between the first and last
1301	   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
1302	   number order.

1304	   When a NAL unit is fragmented and conveyed within fragmentation units
1305	   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
1306	   MUST NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
1307	   contain another FU.

1309	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1310	   time of the fragmented NAL unit.

1312	   Figure 14 presents the RTP payload format for FU-As.  An FU-A
1313	   consists of a fragmentation unit indicator of one octet, a
1314	   fragmentation unit header of one octet, and a fragmentation unit
1315	   payload.

1317	    0                   1                   2                   3
1318	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1319	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1320	   | FU indicator  |   FU header   |                               |
1321	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1322	   |                                                               |
1323	   |                         FU payload                            |
1324	   |                                                               |
1325	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1326	   |                               :...OPTIONAL RTP padding        |
1327	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1329	                  Figure 14  RTP payload format for FU-A

1331	   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
1332	   consists of a fragmentation unit indicator of one octet, a
1333	   fragmentation unit header of one octet, a decoding order number (DON)
1334	   (in network byte order), and a fragmentation unit payload.  In other
1335	   words, the structure of FU-B is the same as the structure of FU-A,
1336	   except for the additional DON field.

1338	    0                   1                   2                   3
1339	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1340	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1341	   | FU indicator  |   FU header   |               DON             |
1342	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1343	   |                                                               |
1344	   |                         FU payload                            |
1345	   |                                                               |
1346	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1347	   |                               :...OPTIONAL RTP padding        |
1348	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1350	                  Figure 15  RTP payload format for FU-B

1352	   NAL unit type FU-B MUST be used in the interleaved packetization mode
1353	   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
1354	   type FU-B MUST NOT be used in any other case.  In other words, in the
1355	   interleaved packetization mode, each NALU that is fragmented has an
1356	   FU-B as the first fragment, followed by one or more FU-A fragments.

1358	   The FU indicator octet has the following format:

1360	      +---------------+
1361	      |0|1|2|3|4|5|6|7|
1362	      +-+-+-+-+-+-+-+-+
1363	      |F|NRI|  Type   |
1364	      +---------------+

1366	   Values equal to 28 and 29 in the Type field of the FU indicator octet
1367	   identify an FU-A and an FU-B, respectively.  The use of the F bit is
1368	   described in section 5.3.  The value of the NRI field MUST be set
1369	   according to the value of the NRI field in the fragmented NAL unit.

1371	   The FU header has the following format:

1373	      +---------------+
1374	      |0|1|2|3|4|5|6|7|
1375	      +-+-+-+-+-+-+-+-+
1376	      |S|E|R|  Type   |
1377	      +---------------+

1379	   S: 1 bit
1380	      When set to one, the Start bit indicates the start of a
1381	      fragmented NAL unit.  When the following FU payload is not the
1382	      start of a fragmented NAL unit payload, the Start bit is set to
1383	      zero.

1385	   E: 1 bit
1386	      When set to one, the End bit indicates the end of a fragmented
1387	      NAL unit, i.e., the last byte of the payload is also the last
1388	      byte of the fragmented NAL unit.  When the following FU payload
1389	      is not the last fragment of a fragmented NAL unit, the End bit is
1390	      set to zero.

1392	   R: 1 bit
1393	      The Reserved bit MUST be equal to 0 and MUST be ignored by the
1394	      receiver.

1396	   Type: 5 bits
1397	      The NAL unit payload type as defined in Table 7-1 of [1].

1399	   The value of DON in FU-Bs is selected as described in section 5.5.

1401	      Informative note: The DON field in FU-Bs allows gateways to
1402	      fragment NAL units to FU-Bs without organizing the incoming NAL
1403	      units to the NAL unit decoding order.

1405	   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
1406	   Start bit and End bit MUST NOT both be set to one in the same FU
1407	   header.

1409	   The FU payload consists of fragments of the payload of the fragmented
1410	   NAL unit so that if the fragmentation unit payloads of consecutive
1411	   FUs are sequentially concatenated, the payload of the fragmented NAL
1412	   unit can be reconstructed.  The NAL unit type octet of the fragmented
1413	   NAL unit is not included as such in the fragmentation unit payload,
1414	   but rather the information of the NAL unit type octet of the
1415	   fragmented NAL unit is conveyed in F and NRI fields of the FU
1416	   indicator octet of the fragmentation unit and in the type field of
1417	   the FU header.  An FU payload MAY have any number of octets and MAY
1418	   be empty.

1420	      Informative note: Empty FUs are allowed to reduce the latency of
1421	      a certain class of senders in nearly lossless environments.
1422	      These senders can be characterized in that they packetize NALU
1423	      fragments before the NALU is completely generated and, hence,
1424	      before the NALU size is known.  If zero-length NALU fragments
1425	      were not allowed, the sender would have to generate at least one
1426	      bit of data of the following fragment before the current fragment
1427	      could be sent.  Due to the characteristics of H.264, where
1428	      sometimes several macroblocks occupy zero bits, this is
1429	      undesirable and can add delay.  However, the (potential) use of
1430	      zero-length NALU fragments should be carefully weighed against
1431	      the increased risk of the loss of at least a part of the NALU
1432	      because of the additional packets employed for its transmission.

1434	   If a fragmentation unit is lost, the receiver SHOULD discard all
1435	   following fragmentation units in transmission order corresponding to
1436	   the same fragmented NAL unit.

1438	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1439	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1440	   n of that NAL unit is not received.  In this case, the
1441	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1442	   syntax violation.

1444	6. Packetization Rules

1446	   The packetization modes are introduced in section 5.2.  The
1447	   packetization rules common to more than one of the packetization
1448	   modes are specified in section 6.1.  The packetization rules for the
1449	   single NAL unit mode, the non-interleaved mode, and the interleaved
1450	   mode are specified in sections 6.2, 6.3, and 6.4, respectively.

1452	6.1. Common Packetization Rules

1454	   All senders MUST enforce the following packetization rules regardless
1455	   of the packetization mode in use:

1457	   o  Coded slice NAL units or coded slice data partition NAL units
1458	      belonging to the same coded picture (and thus sharing the same RTP
1459	      timestamp value) MAY be sent in any order; however, for delay-
1460	      critical systems, they SHOULD be sent in their original decoding
1461	      order to minimize the delay.  Note that the decoding order is the
1462	      order of the NAL units in the bitstream.

1464	   o  Parameter sets are handled in accordance with the rules and
1465	      recommendations given in section 8.4.

1467	   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
1468	      picture parameter set NAL units, as neither this memo nor the
1469	      H.264 specification provides means to identify duplicated NAL
1470	      units.  Sequence and picture parameter set NAL units MAY be
1471	      duplicated to make their correct reception more probable, but any
1472	      such duplication MUST NOT affect the contents of any active
1473	      sequence or picture parameter set.  Duplication SHOULD be
1474	      performed on the application layer and not by duplicating RTP
1475	      packets (with identical sequence numbers).

1477	   Senders using the non-interleaved mode and the interleaved mode MUST
1478	   enforce the following packetization rule:

1480	   o  MANEs MAY convert single NAL unit packets into one aggregation
1481	      packet, convert an aggregation packet into several single NAL unit
1482	      packets, or mix both concepts, in an RTP translator.  The RTP
1483	      translator SHOULD take into account at least the following
1484	      parameters: path MTU size, unequal protection mechanisms (e.g.,
1485	      through packet-based FEC according to RFC 2733 [18], especially
1486	      for sequence and picture parameter set NAL units and coded slice
1487	      data partition A NAL units), bearable latency of the system, and
1488	      buffering capabilities of the receiver.

1490	         Informative note: An RTP translator is required to handle RTCP
1491	         as per RFC 3550.

1493	6.2. Single NAL Unit Mode

1495	   This mode is in use when the value of the OPTIONAL packetization-mode
1496	   media type parameter is equal to 0 or the packetization-mode is not
1497	   present.  All receivers MUST support this mode.  It is primarily
1498	   intended for low-delay applications that are compatible with systems
1499	   using ITU-T Recommendation H.241 [3] (see section 12.1).  Only single
1500	   NAL unit packets MAY be used in this mode.  STAPs, MTAPs, and FUs
1501	   MUST NOT be used.  The transmission order of single NAL unit packets
1502	   MUST comply with the NAL unit decoding order.

1504	6.3. Non-Interleaved Mode

1506	   This mode is in use when the value of the OPTIONAL packetization-mode
1507	   media type parameter is equal to 1.  This mode SHOULD be supported.
1508	   It is primarily intended for low-delay applications.  Only single NAL
1509	   unit packets, STAP-As, and FU-As MAY be used in this mode.  STAP-Bs,
1510	   MTAPs, and FU-Bs MUST NOT be used.  The transmission order of NAL
1511	   units MUST comply with the NAL unit decoding order.

1513	6.4. Interleaved Mode

1515	   This mode is in use when the value of the OPTIONAL packetization-mode
1516	   media type parameter is equal to 2.  Some receivers MAY support this
1517	   mode.  STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used.  STAP-As and
1518	   single NAL unit packets MUST NOT be used.  The transmission order of
1519	   packets and NAL units is constrained as specified in section 5.5.

1521	7. De-Packetization Process

1523	   The de-packetization process is implementation dependent.  Therefore,
1524	   the following description should be seen as an example of a suitable
1525	   implementation.  Other schemes may be used as well as long as the
1526	   output for the same input is the same as the process described below.
1527	   The same output means that the resulting NAL units, and their order,
1528	   are identical.  Optimizations relative to the described algorithms
1529	   are likely possible.  Section 7.1 presents the de-packetization
1530	   process for the single NAL unit and non-interleaved packetization
1531	   modes, whereas section 7.2 describes the process for the interleaved
1532	   mode.  Section 7.3 includes additional de-packetization guidelines
1533	   for intelligent receivers.

1535	   All normal RTP mechanisms related to buffer management apply.  In
1536	   particular, duplicated or outdated RTP packets (as indicated by the
1537	   RTP sequence number and the RTP timestamp) are removed.  To determine
1538	   the exact time for decoding, factors such as a possible intentional
1539	   delay to allow for proper inter-stream synchronization must be
1540	   factored in.

1542	7.1. Single NAL Unit and Non-Interleaved Mode

1544	   The receiver includes a receiver buffer to compensate for
1545	   transmission delay jitter.  The receiver stores incoming packets in
1546	   reception order into the receiver buffer.  Packets are de-packetized
1547	   in RTP sequence number order.  If a de-packetized packet is a single
1548	   NAL unit packet, the NAL unit contained in the packet is passed
1549	   directly to the decoder.  If a de-packetized packet is an STAP-A, the
1550	   NAL units contained in the packet are passed to the decoder in the
1551	   order in which they are encapsulated in the packet.  For all the FU-A
1552	   packets containing fragments of a single NAL unit, the de-packetized
1553	   fragments are concatenated in their sending order to recover the NAL
1554	   unit, which is then passed to the decoder.

1556	      Informative note: If the decoder supports Arbitrary Slice Order,
1557	      coded slices of a picture can be passed to the decoder in any
1558	      order regardless of their reception and transmission order.

1560	7.2. Interleaved Mode

1562	   The general concept behind these de-packetization rules is to reorder
1563	   NAL units from transmission order to the NAL unit decoding order.

1565	   The receiver includes a receiver buffer, which is used to compensate
1566	   for transmission delay jitter and to reorder NAL units from
1567	   transmission order to the NAL unit decoding order.  In this section,
1568	   the receiver operation is described under the assumption that there
1569	   is no transmission delay jitter.  To make a difference from a
1570	   practical receiver buffer that is also used for compensation of
1571	   transmission delay jitter, the receiver buffer is here after called
1572	   the de-interleaving buffer in this section.  Receivers SHOULD also
1573	   prepare for transmission delay jitter; i.e., either reserve separate
1574	   buffers for transmission delay jitter buffering and de-interleaving
1575	   buffering or use a receiver buffer for both transmission delay jitter
1576	   and de-interleaving.  Moreover, receivers SHOULD take transmission
1577	   delay jitter into account in the buffering operation; e.g., by
1578	   additional initial buffering before starting of decoding and playback.

1580	   This section is organized as follows: subsection 7.2.1 presents how
1581	   to calculate the size of the de-interleaving buffer.  Subsection
1582	   7.2.2 specifies the receiver process on how to organize received NAL
1583	   units to the NAL unit decoding order.

1585	7.2.1. Size of the De-interleaving Buffer

1587	   In either Offer/Answer or declarative SDP usage, the sprop-deint-buf-
1588	   req media type parameter signals the requirement for the de-
1589	   interleaving buffer size.  It is therefore RECOMMENDED to set the de-
1590	   interleaving buffer size, in terms of number of bytes, equal to or
1591	   greater than the value of sprop-deint-buf-req media type parameter.

1593	   When the SDP Offer/Answer model or any other capability exchange
1594	   procedure is used in session setup, the properties of the received
1595	   stream SHOULD be such that the receiver capabilities are not exceeded.
1596	   In the SDP Offer/Answer model, the receiver can indicate its
1597	   capabilities to allocate a de-interleaving buffer with the deint-buf-
1598	   cap media type parameter.  See section 8.1 for further information on
1599	   deint-buf-cap and sprop-deint-buf-req media type parameters and
1600	   section 8.2.2 for further information on their use in the SDP
1601	   Offer/Answer model.

1603	7.2.2. De-interleaving Process

1605	   There are two buffering states in the receiver: initial buffering and
1606	   buffering while playing.  Initial buffering occurs when the RTP
1607	   session is initialized.  After initial buffering, decoding and
1608	   playback are started, and the buffering-while-playing mode is used.

1610	   Regardless of the buffering state, the receiver stores incoming NAL
1611	   units, in reception order, in the de-interleaving buffer as follows.
1612	   NAL units of aggregation packets are stored in the de-interleaving
1613	   buffer individually.  The value of DON is calculated and stored for
1614	   each NAL unit.

1616	   The receiver operation is described below with the help of the
1617	   following functions and constants:

1619	   o  Function AbsDON is specified in section 8.1.

1621	   o  Function don_diff is specified in section 5.5.

1623	   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
1624	      media type parameter (see section 8.1) incremented by 1.

1626	   Initial buffering lasts until one of the following conditions is
1627	   fulfilled:

1629	   o  There are N or more VCL NAL units in the de-interleaving buffer.

1631	   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
1632	      the value of sprop-max-don-diff, in which n corresponds to the NAL
1633	      unit having the greatest value of AbsDON among the received NAL
1634	      units and m corresponds to the NAL unit having the smallest value
1635	      of AbsDON among the received NAL units.

1637	   o  Initial buffering has lasted for the duration equal to or greater
1638	      than the value of the OPTIONAL sprop-init-buf-time media type
1639	      parameter.

1641	   The NAL units to be removed from the de-interleaving buffer are
1642	   determined as follows:

1644	   o  If the de-interleaving buffer contains at least N VCL NAL units,
1645	      NAL units are removed from the de-interleaving buffer and passed
1646	      to the decoder in the order specified below until the buffer
1647	      contains N-1 VCL NAL units.

1649	   o  If sprop-max-don-diff is present, all NAL units m for which
1650	      don_diff(m,n) is greater than sprop-max-don-diff are removed from
1651	      the de-interleaving buffer and passed to the decoder in the order
1652	      specified below.  Herein, n corresponds to the NAL unit having the
1653	      greatest value of AbsDON among the NAL units in the de-
1654	      interleaving buffer.

1656	   The order in which NAL units are passed to the decoder is specified
1657	   as follows:

1659	   o  Let PDON be a variable that is initialized to 0 at the beginning
1660	      of the RTP session.

1662	   o  For each NAL unit associated with a value of DON, a DON distance
1663	      is calculated as follows.  If the value of DON of the NAL unit is
1664	      larger than the value of PDON, the DON distance is equal to DON -
1665	      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
1666	      + 1.

1668	   o  NAL units are delivered to the decoder in ascending order of DON
1669	      distance.  If several NAL units share the same value of DON
1670	      distance, they can be passed to the decoder in any order.

1672	   o  When a desired number of NAL units have been passed to the decoder,
1673	      the value of PDON is set to the value of DON for the last NAL unit
1674	      passed to the decoder.

1676	7.3. Additional De-Packetization Guidelines

1678	   The following additional de-packetization rules may be used to
1679	   implement an operational H.264 de-packetizer:

1681	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1682	      coded slice data partitions A (DPAs).  If a lost DPA is detected,
1683	      after taking into account possible retransmission and FEC, a
1684	      gateway may decide not to send the corresponding coded slice data
1685	      partitions B and C, as their information is meaningless for H.264
1686	      decoders.  In this way a MANE can reduce network load by
1687	      discarding useless packets without parsing a complex bitstream.

1689	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1690	      FUs.  If a lost FU is found, a gateway may decide not to send the
1691	      following FUs of the same fragmented NAL unit, as their
1692	      information is meaningless for H.264 decoders.  In this way a MANE
1693	      can reduce network load by discarding useless packets without
1694	      parsing a complex bitstream.

1696	   o  Intelligent receivers having to discard packets or NALUs should
1697	      first discard all packets/NALUs in which the value of the NRI
1698	      field of the NAL unit type octet is equal to 0.  This will
1699	      minimize the impact on user experience and keep the reference
1700	      pictures intact.  If more packets have to be discarded, then
1701	      packets with a numerically lower NRI value should be discarded
1702	      before packets with a numerically higher NRI value.  However,
1703	      discarding any packets with an NRI bigger than 0 very likely leads
1704	      to decoder drift and SHOULD be avoided.

1706	8. Payload Format Parameters

1708	   This section specifies the parameters that MAY be used to select
1709	   optional features of the payload format and certain features of the
1710	   bitstream.  The parameters are specified here as part of the media
1711	   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
1712	   mapping of the parameters into the Session Description Protocol (SDP)
1713	   [6] is also provided for applications that use SDP.  Equivalent
1714	   parameters could be defined elsewhere for use with control protocols
1715	   that do not use SDP.

1717	   Some parameters provide a receiver with the properties of the stream
1718	   that will be sent.  The names of all these parameters start with
1719	   "sprop" for stream properties.  Some of these "sprop" parameters are
1720	   limited by other payload or codec configuration parameters.  For
1721	   example, the sprop-parameter-sets parameter is constrained by the
1722	   profile-level-id parameter.  The media sender selects all "sprop"
1723	   parameters rather than the receiver.  This uncommon characteristic of
1724	   the "sprop" parameters may not be compatible with some signaling
1725	   protocol concepts, in which case the use of these parameters SHOULD
1726	   be avoided.

1728	8.1. Media Type Registration

1730	   The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
1731	   allocated from the IETF tree.

1733	   The receiver MUST ignore any unspecified parameter.

1735	   Media Type name:     video

1737	   Media subtype name:  H264

1739	   Required parameters: none

1741	   OPTIONAL parameters:

1743	      profile-level-id:
1744	         A base16 [7] (hexadecimal) representation of the following
1745	         three bytes in the sequence parameter set NAL unit specified
1746	         in [1]: 1) profile_idc, 2) a byte herein referred to as
1747	         profile-iop, composed of the values of constraint_set0_flag,
1748	         constraint_set1_flag,constraint_set2_flag,
1749	         constraint_set3_flag, and reserved_zero_4bits in bit-
1750	         significance order, starting from the most significant bit,
1751	         and 3) level_idc.  Note that reserved_zero_4bits is required
1752	         to be equal to 0 in [1], but other values for it may be
1753	         specified in the future by ITU-T or ISO/IEC.

1755	         The profile-level-id parameter indicates the default sub-
1756	         profile, i.e. the subset of coding tools that may have been
1757	         used to generate the stream or that the receiver supports, and
1758	         the default level of the stream or the receiver supports.

1760	         The default sub-profile is indicated collectively by the
1761	         profile_idc byte and some fields in the profile-iop byte.
1762	         Depending on the values of the fields in the profile-iop byte,
1763	         the default sub-profile may be the set of coding tools
1764	         supported by one profile, or a common subset of coding tools
1765	         of multiple profiles, as specified in subsection 7.4.2.1.1 of
1766	         [1].  The default level is indicated by the level_idc byte,
1767	         and, when profile_idc is equal to 66, 77 or 88 (the Baseline,
1768	         Main, or Extended profile) and level_idc is equal to 11,
1769	         additionally by bit 4 (constraint_set3_flag) of the profile-
1770	         iop byte.  When profile_idc is equal to 66, 77 or 88 (the
1771	         Baseline, Main, or Extended profile) and level_idc is equal to
1772	         11, and bit 4 (constraint_set3_flag) of the profile-iop byte
1773	         is equal to 1, the default level is level 1b.

1775	         Table 5 lists all profiles defined in Annex A of [1] and, for
1776	         each of the profiles, the possible combinations of profile_idc
1777	         and profile-iop that represent the same sub-profile.

1779	            Table 5.  Combinations of profile_idc and profile-iop
1780	            representing the same sub-profile corresponding to the full
1781	            set of coding tools supported by one profile.  In the
1782	            following, x may be either 0 or 1, while the profile names
1783	            are indicated as follows. CB: Constrained Baseline profile,
1784	            B: Baseline profile, M: Main profile, E: Extended profile,
1785	            H: High profile, H10: High 10 profile, H42: High 4:2:2
1786	            profile, H44: High 4:4:4 Predictive profile, H10I: High 10
1787	            Intra profile, H42I: High 4:2:2 Intra profile, H44I: High
1788	            4:4:4 Intra profile, and C44I: CAVLC 4:4:4 Intra profile.

1790	              Profile     profile_idc             profile-iop
1791	                          (hexadecimal)           (binary)

1793	              CB          42 (B)                  x1xx0000
1794	                 same as: 4D (M)                  1xxx0000
1795	                 same as: 58 (E)                  11xx0000
1796	                 same as: 64 (H), 6E (H10),       1xx00000
1797	                          7A (H42), or F4 (H44)
1798	              B           42 (B)                  x0xx0000
1799	                 same as: 58 (E)                  10xx0000
1800	              M           4D (M)                  0x0x0000
1801	                 same as: 64 (H), 6E (H10),       01000000
1802	                          7A (H42), or F4 (H44)
1803	              E           58                      00xx0000
1804	              H           64                      00000000
1805	              H10         6E                      00000000
1806	              H42         7A                      00000000
1807	              H44         F4                      00000000
1808	              H10I        64                      00010000
1809	              H42I        7A                      00010000
1810	              H44I        F4                      00010000
1811	              C44I        2C                      00010000

1813	         For example, in the table above, profile_idc equal to 58
1814	         (Extended) with profile-iop equal to 11xx0000 indicates the
1815	         same sub-profile corresponding to profile_idc equal to 42
1816	         (Baseline) with profile-iop equal to x1xx0000.  Note that
1817	         other combinations of profile_idc and profile-iop (not listed
1818	         in Table 5) may represent a sub-profile equivalent to the
1819	         common subset of coding tools for more than one profile.  Note
1820	         also that a decoder conforming to a certain profile may be
1821	         able to decode bitstreams conforming to other profiles.  For
1822	         example, a decoder conforming to the High 4:4:4 profile at
1823	         certain level must be able to decode bitstreams confirming to
1824	         the Constrained Baseline, Main, High, High 10 or High 4:2:2
1825	         profile at the same or a lower level.

1827	         If the profile-level-id parameter is used to indicate
1828	         properties of a NAL unit stream, it indicates that, to decode
1829	         the stream, the minimum subset of coding tools a decoder has
1830	         to support is the default sub-profile, and the lowest level
1831	         the decoder has to support is the default level.

1833	         If the profile-level-id parameter is used for capability
1834	         exchange or session setup procedure, it indicates the subset
1835	         of coding tools, which is equal to the default sub-profile,
1836	         and the highest level, which is equal to the default level,
1837	         that the codec supports.  All levels lower than the default
1838	         level are also supported by the codec.

1840	            Informative note: Capability exchange and session setup
1841	            procedures should provide means to list the capabilities
1842	            for each supported sub-profile separately.  For example,
1843	            the one-of-N codec selection procedure of the SDP
1844	            Offer/Answer model can be used (section 10.2 of [8]).  The
1845	            one-of-N codec selection procedure may also be used to
1846	            provide different combinations of profile_idc and profile-
1847	            iop that represent the same sub-profile.  When there are
1848	            many different combinations of profile_idc and profile-iop
1849	            that represent the same sub-profile, using the one-of-N
1850	            codec selection procedure may result into a fairly large
1851	            SDP message.  Therefore, a receiver should understand the
1852	            different equivalent combinations of profile_idc and
1853	            profile-iop that represent the same sub-profile, and be
1854	            ready to accept an offer using any of the equivalent
1855	            combinations.

1857	         If no profile-level-id is present, the Baseline Profile
1858	         without additional constraints at Level 1 MUST be inferred.

1860	      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
1861	         These parameters MAY be used to signal the capabilities of a
1862	         receiver implementation. These parameters MUST NOT be used for
1863	         any other purpose.  The profile-level-id parameter MUST be
1864	         present in the same receiver capability description that
1865	         contains any of these parameters.  The level conveyed in the
1866	         value of the profile-level-id parameter MUST be such that the
1867	         receiver is fully capable of supporting.  max-mbps, max-smbps,
1868	         max-fs, max-cpb, max-dpb, and max-br MAY be used to indicate
1869	         capabilities of the receiver that extend the required
1870	         capabilities of the signaled level, as specified below.

1872	         When more than one parameter from the set (max-mbps, max-
1873	         smbps , max-fs, max-cpb, max-dpb, max-br) is present, the
1874	         receiver MUST support all signaled capabilities simultaneously.
1875	         For example, if both max-mbps and max-br are present, the
1876	         signaled level with the extension of both the frame rate and
1877	         bit rate is supported.  That is, the receiver is able to
1878	         decode NAL unit streams in which the macroblock processing
1879	         rate is up to max-mbps (inclusive), the bit rate is up to max-
1880	         br (inclusive), the coded picture buffer size is derived as
1881	         specified in the semantics of the max-br parameter below, and
1882	         other properties comply with the level specified in the value
1883	         of the profile-level-id parameter.

1885	         If a receiver can support all the properties of level A, the
1886	         level specified in the value of the profile-level-id MUST be
1887	         level A (i.e. MUST NOT be lower than level A).  In other words,
1888	         a sender or receiver MUST NOT signal values of max-mbps, max-
1889	         fs, max-cpb, max-dpb, and max-br that taken together meet the
1890	         requirements of a higher level compared to the level specified
1891	         in the value of the profile-level-id parameter.

1893	            Informative note: When the OPTIONAL media type parameters
1894	            are used to signal the properties of a NAL unit stream,
1895	            max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
1896	            are not present, and the value of profile-level-id must
1897	            always be such that the NAL unit stream complies fully with
1898	            the specified profile and level.

1900	      max-mbps: The value of max-mbps is an integer indicating the
1901	         maximum macroblock processing rate in units of macroblocks per
1902	         second.  The max-mbps parameter signals that the receiver is
1903	         capable of decoding video at a higher rate than is required by
1904	         the signaled level conveyed in the value of the profile-level-
1905	         id parameter.  When max-mbps is signaled, the receiver MUST be
1906	         able to decode NAL unit streams that conform to the signaled
1907	         level, with the exception that the MaxMBPS value in Table A-1
1908	         of [1] for the signaled level is replaced with the value of
1909	         max-mbps.  The value of max-mbps MUST be greater than or equal
1910	         to the value of MaxMBPS for the level given in Table A-1 of
1911	         [1].  Senders MAY use this knowledge to send pictures of a
1912	         given size at a higher picture rate than is indicated in the
1913	         signaled level.

1915	      max-smbps: The value of max-smbps is an integer indicating the
1916	         maximum static macroblock processing rate in units of static
1917	         macroblocks per second, under the hypothetical assumption that
1918	         all macroblocks are static macroblocks.  When max-smbps is
1919	         signalled the MaxMBPS value in Table A-1 of [1] should be
1920	         replaced with the result of the following computation:

1922	         o If the parameter max-mbps is signalled, set a variable
1923	            MaxMacroblocksPerSecond to the value of max-mbps.
1924	            Otherwise, set MaxMacroblocksPerSecond equal to the value
1925	            of MaxMBPS for the level in Table A-1 [1].

1927	         o Set a variable P_non-static to the proportion of non-static
1928	            macroblocks in picture n.

1930	         o Set a variable P_static to the proportion of static
1931	            macroblocks in picture n.

1933	         o The value of MaxMBPS in Table A-1 of [1] should be
1934	            considered by the encoder to be equal to:

1936	            MaxMacroblocksPerSecond * max-smbps / ( P_non-static * max-
1937	            smbps + P_static * MaxMacroblocksPerSecond)

1939	         The encoder should recompute this value for each picture. The
1940	         value of max-smbps MUST be greater than the value of MaxMBPS
1941	         for the level given in Table A-1 of [1].  Senders MAY use this
1942	         knowledge to send pictures of a given size at a higher picture
1943	         rate than is indicated in the signalled level.

1945	      max-fs: The value of max-fs is an integer indicating the maximum
1946	         frame size in units of macroblocks.  The max-fs parameter
1947	         signals that the receiver is capable of decoding larger
1948	         picture sizes than are required by the signaled level conveyed
1949	         in the value of the profile-level-id parameter.  When max-fs
1950	         is signaled, the receiver MUST be able to decode NAL unit
1951	         streams that conform to the signaled level, with the exception
1952	         that the MaxFS value in Table A-1 of [1] for the signaled
1953	         level is replaced with the value of max-fs.  The value of max-
1954	         fs MUST be greater than or equal to the value of MaxFS for the
1955	         level given in Table A-1 of [1].  Senders MAY use this
1956	         knowledge to send larger pictures at a proportionally lower
1957	         frame rate than is indicated in the signaled level.

1959	      max-cpb: The value of max-cpb is an integer indicating the
1960	         maximum coded picture buffer size in units of 1000 bits for
1961	         the VCL HRD parameters (see A.3.1 item i of [1]) and in units
1962	         of 1200 bits for the NAL HRD parameters (see A.3.1 item j of
1963	         [1]).  The max-cpb parameter signals that the receiver has
1964	         more memory than the minimum amount of coded picture buffer
1965	         memory required by the signaled level conveyed in the value of
1966	         the profile-level-id parameter.  When max-cpb is signaled, the
1967	         receiver MUST be able to decode NAL unit streams that conform
1968	         to the signaled level, with the exception that the MaxCPB
1969	         value in Table A-1 of [1] for the signaled level is replaced
1970	         with the value of max-cpb.  The value of max-cpb MUST be
1971	         greater than or equal to the value of MaxCPB for the level
1972	         given in Table A-1 of [1].  Senders MAY use this knowledge to
1973	         construct coded video streams with greater variation of bit
1974	         rate than can be achieved with the MaxCPB value in Table A-1
1975	         of [1].

1977	            Informative note: The coded picture buffer is used in the
1978	            hypothetical reference decoder (Annex C) of H.264.  The use
1979	            of the hypothetical reference decoder is recommended in
1980	            H.264 encoders to verify that the produced bitstream
1981	            conforms to the standard and to control the output bitrate.
1982	            Thus, the coded picture buffer is conceptually independent
1983	            of any other potential buffers in the receiver, including
1984	            de-interleaving and de-jitter buffers.  The coded picture
1985	            buffer need not be implemented in decoders as specified in
1986	            Annex C of H.264, but rather standard-compliant decoders
1987	            can have any buffering arrangements provided that they can
1988	            decode standard-compliant bitstreams.  Thus, in practice,
1989	            the input buffer for video decoder can be integrated with
1990	            de-interleaving and de-jitter buffers of the receiver.

1992	      max-dpb: The value of max-dpb is an integer indicating the
1993	         maximum decoded picture buffer size in units of 1024 bytes.
1994	         The max-dpb parameter signals that the receiver has more
1995	         memory than the minimum amount of decoded picture buffer
1996	         memory required by the signaled level conveyed in the value of
1997	         the profile-level-id parameter.  When max-dpb is signaled, the
1998	         receiver MUST be able to decode NAL unit streams that conform
1999	         to the signaled level, with the exception that the MaxDPB
2000	         value in Table A-1 of [1] for the signaled level is replaced
2001	         with the value of max-dpb.  Consequently, a receiver that
2002	         signals max-dpb MUST be capable of storing the following
2003	         number of decoded frames, complementary field pairs, and non-
2004	         paired fields in its decoded picture buffer:

2006	            Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs *
2007	            256 * ChromaFormatFactor ), 16)

2009	         PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
2010	         defined in [1].

2012	         The value of max-dpb MUST be greater than or equal to the
2013	         value of MaxDPB for the level given in Table A-1 of [1].
2014	         Senders MAY use this knowledge to construct coded video
2015	         streams with improved compression.

2017	            Informative note: This parameter was added primarily to
2018	            complement a similar codepoint in the ITU-T Recommendation
2019	            H.245, so as to facilitate signaling gateway designs.  The
2020	            decoded picture buffer stores reconstructed samples.  There
2021	            is no relationship between the size of the decoded picture
2022	            buffer and the buffers used in RTP, especially de-
2023	            interleaving and de-jitter buffers.

2025	      max-br: The value of max-br is an integer indicating the maximum
2026	         video bit rate in units of 1000 bits per second for the VCL
2027	         HRD parameters (see A.3.1 item i of [1]) and in units of 1200
2028	         bits per second for the NAL HRD parameters (see A.3.1 item j
2029	         of [1]).

2031	         The max-br parameter signals that the video decoder of the
2032	         receiver is capable of decoding video at a higher bit rate
2033	         than is required by the signaled level conveyed in the value
2034	         of the profile-level-id parameter.

2036	         When max-br is signaled, the video codec of the receiver MUST
2037	         be able to decode NAL unit streams that conform to the
2038	         signaled level, conveyed in the profile-level-id parameter,
2039	         with the following exceptions in the limits specified by the
2040	         level:

2042	         o The value of max-br replaces the MaxBR value of the signaled
2043	            level (in Table A-1 of [1]).

2045	         o When the max-cpb parameter is not present, the result of the
2046	            following formula replaces the value of MaxCPB in Table A-1
2047	            of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
2048	            the signaled level).

2050	         For example, if a receiver signals capability for Level 1.2
2051	         with max-br equal to 1550, this indicates a maximum video
2052	         bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum
2053	         video bitrate of 1860 kbits/sec for NAL HRD parameters, and a
2054	         CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000).

2056	         The value of max-br MUST be greater than or equal to the value
2057	         MaxBR for the signaled level given in Table A-1 of [1].

2059	         Senders MAY use this knowledge to send higher bitrate video as
2060	         allowed in the level definition of Annex A of H.264, to
2061	         achieve improved video quality.

2063	            Informative note: This parameter was added primarily to
2064	            complement a similar codepoint in the ITU-T Recommendation
2065	            H.245, so as to facilitate signaling gateway designs.  No
2066	            assumption can be made from the value of this parameter
2067	            that the network is capable of handling such bit rates at
2068	            any given time.  In particular, no conclusion can be drawn
2069	            that the signaled bit rate is possible under congestion
2070	            control constraints.

2072	      redundant-pic-cap:
2073	         This parameter signals the capabilities of a receiver
2074	         implementation.  When equal to 0, the parameter indicates that
2075	         the receiver makes no attempt to use redundant coded pictures
2076	         to correct incorrectly decoded primary coded pictures.  When
2077	         equal to 0, the receiver is not capable of using redundant
2078	         slices; therefore, a sender SHOULD avoid sending redundant
2079	         slices to save bandwidth.  When equal to 1, the receiver is
2080	         capable of decoding any such redundant slice that covers a
2081	         corrupted area in a primary decoded picture (at least partly),
2082	         and therefore a sender MAY send redundant slices.  When the
2083	         parameter is not present, then a value of 0 MUST be used for
2084	         redundant-pic-cap.  When present, the value of redundant-pic-
2085	         cap MUST be either 0 or 1.

2087	         When the profile-level-id parameter is present in the same
2088	         signaling as the redundant-pic-cap parameter, and the profile
2089	         indicated in profile-level-id is such that it disallows the
2090	         use of redundant coded pictures (e.g., Main Profile), the
2091	         value of redundant-pic-cap MUST be equal to 0.  When a
2092	         receiver indicates redundant-pic-cap equal to 0, the received
2093	         stream SHOULD NOT contain redundant coded pictures.

2095	            Informative note: Even if redundant-pic-cap is equal to 0,
2096	            the decoder is able to ignore redundant codec pictures
2097	            provided that the decoder supports such a profile (Baseline,
2098	            Extended) in which redundant coded pictures are allowed.

2100	            Informative note: Even if redundant-pic-cap is equal to 1,
2101	            the receiver may also choose other error concealment
2102	            strategies to replace or complement decoding of redundant
2103	            slices.

2105	      sprop-parameter-sets:
2106	         This parameter MAY be used to convey any sequence and picture
2107	         parameter set NAL units (herein referred to as the initial
2108	         parameter set NAL units) that can be placed in the NAL unit
2109	         stream to precede any other NAL units in decoding order.  The
2110	         parameter MUST NOT be used to indicate codec capability in any
2111	         capability exchange procedure.  The value of the parameter is
2112	         a comma (',') separated list of base64 [7] representations of
2113	         parameter set NAL units as specified in sections 7.3.2.1 and
2114	         7.3.2.2 of [1].  Note that the number of bytes in a parameter
2115	         set NAL unit is typically less than 10, but a picture
2116	         parameter set NAL unit can contain several hundreds of bytes.

2118	            Informative note: When several payload types are offered in
2119	            the SDP Offer/Answer model, each with its own sprop-
2120	            parameter-sets parameter, then the receiver cannot assume
2121	            that those parameter sets do not use conflicting storage
2122	            locations (i.e., identical values of parameter set
2123	            identifiers).  Therefore, a receiver should buffer all
2124	            sprop-parameter-sets and make them available to the decoder
2125	            instance that decodes a certain payload type.

2127	         The "sprop-parameter-sets" parameter MUST only contain
2128	         parameter sets that are conforming to the profile-level-id,
2129	         i.e., the subset of coding tools indicated by any of the
2130	         parameter sets MUST be equal to the default sub-profile, and
2131	         the level indicated by any of the parameter sets MUST be equal
2132	         to the default level.

2134	      sprop-level-parameter-sets:
2135	         This parameter MAY be used to convey any sequence and picture
2136	         parameter set NAL units (herein referred to as the initial
2137	         parameter set NAL units) that can be placed in the NAL unit
2138	         stream to precede any other NAL units in decoding order and
2139	         that are associated with one or more levels lower than the
2140	         default level.  The parameter MUST NOT be used to indicate
2141	         codec capability in any capability exchange procedure.

2143	         The sprop-level-parameter-sets parameter contains parameter
2144	         sets for one or more levels which are lower than the default
2145	         level.  All parameter sets associated with one level are
2146	         clustered and prefixed with a three-byte field which has the
2147	         same syntax as profile-level-id.  This enables the receiver to
2148	         install the parameter sets for one level and discard the rest.
2149	         The three-byte field is named PLId, and all parameter sets
2150	         associated with one level are named PSL, which has the same
2151	         syntax as sprop-parameter-sets.  Parameter sets for each level
2152	         are represented in the form of PLId:PSL, i.e., PLId followed
2153	         by a colon (':') and the base64 [7] representation of the
2154	         initial parameter set NAL units for the level.  Each pair of
2155	         PLId:PSL is also separated by a colon.  Note that a PSL can
2156	         contain multiple parameter sets for that level, separated with
2157	         commas (',').

2159	         The subset of coding tools indicated by each PLId field MUST
2160	         be equal to the default sub-profile, and the level indicated
2161	         by each PLId field MUST be lower than the default level.  All
2162	         sequence parameter sets contained in each PSL MUST have the
2163	         three bytes from profile_idc to level_idc, inclusive, equal to
2164	         the preceding PLId.

2166	            Informative note: This parameter allows for efficient level
2167	            downgrade in SDP Offer/Answer and out-of-band transport of
2168	            parameter sets, simultaneously.

2170	      use-level-src-parameter-sets:
2171	         This parameter MAY be used to indicate a receiver capability.
2172	         The value MAY be equal to either 0 or 1.  When the parameter
2173	         is not present, the value MUST be inferred to be equal to 0.
2174	         The value 0 indicates that the receiver does not understand
2175	         the sprop-level-parameter-sets parameter, and does not
2176	         understand the "fmtp" source attribute as specified in section
2177	         6.3 of [9], and will ignore sprop-level-parameter-sets when
2178	         present, and will ignore sprop-parameter-sets when conveyed
2179	         using the "fmtp" source attribute.  The value 1 indicates that
2180	         the receiver understands the sprop-level-parameter-sets
2181	         parameter, and understands the "fmtp" source attribute as
2182	         specified in section 6.3 of [9], and is capable of using
2183	         parameter sets contained in the sprop-level-parameter-sets or
2184	         contained in the sprop-parameter-sets that is conveyed using
2185	         the "fmtp" source attribute.

2187	            Informative note: An RFC 3984 receiver does not understand
2188	            sprop-level-parameter-sets, use-level-src-parameter-sets,
2189	            or the "fmtp" source attribute as specified in section 6.3
2190	            of [9].  Therefore, during SDP Offer/Answer, an RFC 3984
2191	            receiver as the answerer will simply ignore sprop-level-
2192	            parameter-sets, when present in an offer, and sprop-
2193	            parameter-sets, when conveyed using the "fmtp" source
2194	            attribute as specified in section 6.3 of [9].  Assume that
2195	            the offered payload type was accepted at a level lower than
2196	            the default level.  If the offered payload type included
2197	            sprop-level-parameter-sets or included sprop-parameter-sets
2198	            conveyed using the "fmtp" source attribute, and the offerer
2199	            sees that the answerer has not included use-level-src-
2200	            parameter-sets equal to 1 in the answer, the offerer gets
2201	            to know that in-band transport of parameter sets is needed.

2203	      in-band-parameter-sets:
2204	         This parameter MAY be used to indicate a receiver capability.
2205	         The value MAY be equal to either 0 or 1.  The value 1
2206	         indicates that receiver discards out-of-band parameter sets in
2207	         sprop-parameter-sets and sprop-level-parameter-sets, therefore
2208	         the sender MUST transmit all parameter sets in-band.  The
2209	         value 0 indicates that the receiver utilizes out-of-band
2210	         parameter sets included in sprop-parameter-sets and sprop-
2211	         level-parameter-sets.  However, in this case, the sender MAY
2212	         still choose to send parameter sets in-band.  When in-band-
2213	         parameter-sets is equal to 1, use-level-src-parameter-sets
2214	         MUST NOT be present or MUST be equal to 0.  When the parameter
2215	         is not present, this receiver capability is not specified, and
2216	         therefore the sender MAY send out-of-band parameter sets only,
2217	         or it MAY send in-band-parameter-sets only, or it MAY send
2218	         both.

2220	      packetization-mode:
2221	         This parameter signals the properties of an RTP payload type
2222	         or the capabilities of a receiver implementation.  Only a
2223	         single configuration point can be indicated; thus, when
2224	         capabilities to support more than one packetization-mode are
2225	         declared, multiple configuration points (RTP payload types)
2226	         must be used.

2228	         When the value of packetization-mode is equal to 0 or
2229	         packetization-mode is not present, the single NAL mode, as
2230	         defined in section 6.2 of RFC 3984, MUST be used.  This mode
2231	         is in use in standards using ITU-T Recommendation H.241 [3]
2232	         (see section 12.1).  When the value of packetization-mode is
2233	         equal to 1, the non-interleaved mode, as defined in section
2234	         6.3 of RFC 3984, MUST be used.  When the value of
2235	         packetization-mode is equal to 2, the interleaved mode, as
2236	         defined in section 6.4 of RFC 3984, MUST be used.  The value
2237	         of packetization-mode MUST be an integer in the range of 0 to
2238	         2, inclusive.

2240	      sprop-interleaving-depth:
2241	         This parameter MUST NOT be present when packetization-mode is
2242	         not present or the value of packetization-mode is equal to 0
2243	         or 1.  This parameter MUST be present when the value of
2244	         packetization-mode is equal to 2.

2246	         This parameter signals the properties of an RTP packet stream.
2247	         It specifies the maximum number of VCL NAL units that precede
2248	         any VCL NAL unit in the RTP packet stream in transmission
2249	         order and follow the VCL NAL unit in decoding order.
2250	         Consequently, it is guaranteed that receivers can reconstruct
2251	         NAL unit decoding order when the buffer size for NAL unit
2252	         decoding order recovery is at least the value of sprop-
2253	         interleaving-depth + 1 in terms of VCL NAL units.

2255	         The value of sprop-interleaving-depth MUST be an integer in
2256	         the range of 0 to 32767, inclusive.

2258	      sprop-deint-buf-req:
2259	         This parameter MUST NOT be present when packetization-mode is
2260	         not present or the value of packetization-mode is equal to 0
2261	         or 1.  It MUST be present when the value of packetization-mode
2262	         is equal to 2.

2264	         sprop-deint-buf-req signals the required size of the de-
2265	         interleaving buffer for the RTP packet stream.  The value of
2266	         the parameter MUST be greater than or equal to the maximum
2267	         buffer occupancy (in units of bytes) required in such a de-
2268	         interleaving buffer that is specified in section 7.2 of RFC
2269	         3984.  It is guaranteed that receivers can perform the de-
2270	         interleaving of interleaved NAL units into NAL unit decoding
2271	         order, when the de-interleaving buffer size is at least the
2272	         value of sprop-deint-buf-req in terms of bytes.

2274	         The value of sprop-deint-buf-req MUST be an integer in the
2275	         range of 0 to 4294967295, inclusive.

2277	            Informative note: sprop-deint-buf-req indicates the
2278	            required size of the de-interleaving buffer only.  When
2279	            network jitter can occur, an appropriately sized jitter
2280	            buffer has to be provisioned for as well.

2282	      deint-buf-cap:
2283	         This parameter signals the capabilities of a receiver
2284	         implementation and indicates the amount of de-interleaving
2285	         buffer space in units of bytes that the receiver has available
2286	         for reconstructing the NAL unit decoding order.  A receiver is
2287	         able to handle any stream for which the value of the sprop-
2288	         deint-buf-req parameter is smaller than or equal to this
2289	         parameter.

2291	         If the parameter is not present, then a value of 0 MUST be
2292	         used for deint-buf-cap.  The value of deint-buf-cap MUST be an
2293	         integer in the range of 0 to 4294967295, inclusive.

2295	            Informative note: deint-buf-cap indicates the maximum
2296	            possible size of the de-interleaving buffer of the receiver
2297	            only.  When network jitter can occur, an appropriately
2298	            sized jitter buffer has to be provisioned for as well.

2300	      sprop-init-buf-time:
2301	         This parameter MAY be used to signal the properties of an RTP
2302	         packet stream.  The parameter MUST NOT be present, if the
2303	         value of packetization-mode is equal to 0 or 1.

2305	         The parameter signals the initial buffering time that a
2306	         receiver MUST wait before starting decoding to recover the NAL
2307	         unit decoding order from the transmission order.  The
2308	         parameter is the maximum value of (decoding time of the NAL
2309	         unit - transmission time of a NAL unit), assuming reliable and
2310	         instantaneous transmission, the same timeline for transmission
2311	         and decoding, and that decoding starts when the first packet
2312	         arrives.

2314	         An example of specifying the value of sprop-init-buf-time
2315	         follows.  A NAL unit stream is sent in the following
2316	         interleaved order, in which the value corresponds to the
2317	         decoding time and the transmission order is from left to right:

2319	            0  2  1  3  5  4  6  8  7 ...

2321	         Assuming a steady transmission rate of NAL units, the
2322	         transmission times are:

2324	            0  1  2  3  4  5  6  7  8 ...

2326	         Subtracting the decoding time from the transmission time
2327	         column-wise results in the following series:

2329	            0 -1  1  0 -1  1  0 -1  1 ...

2331	         Thus, in terms of intervals of NAL unit transmission times,
2332	         the value of sprop-init-buf-time in this example is 1.  The
2333	         parameter is coded as a non-negative base10 integer
2334	         representation in clock ticks of a 90-kHz clock.  If the
2335	         parameter is not present, then no initial buffering time value
2336	         is defined.  Otherwise the value of sprop-init-buf-time MUST
2337	         be an integer in the range of 0 to 4294967295, inclusive.

2339	         In addition to the signaled sprop-init-buf-time, receivers
2340	         SHOULD take into account the transmission delay jitter
2341	         buffering, including buffering for the delay jitter caused by
2342	         mixers, translators, gateways, proxies, traffic-shapers, and
2343	         other network elements.

2345	      sprop-max-don-diff:
2346	         This parameter MAY be used to signal the properties of an RTP
2347	         packet stream.  It MUST NOT be used to signal transmitter or
2348	         receiver or codec capabilities.  The parameter MUST NOT be
2349	         present if the value of packetization-mode is equal to 0 or 1.
2350	         sprop-max-don-diff is an integer in the range of 0 to 32767,
2351	         inclusive.  If sprop-max-don-diff is not present, the value of
2352	         the parameter is unspecified.  sprop-max-don-diff is
2353	         calculated as follows:

2355	            sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
2356	            for any i and any j>i,

2358	         where i and j indicate the index of the NAL unit in the
2359	         transmission order and AbsDON denotes a decoding order number
2360	         of the NAL unit that does not wrap around to 0 after 65535.
2361	         In other words, AbsDON is calculated as follows: Let m and n
2362	         be consecutive NAL units in transmission order.  For the very
2363	         first NAL unit in transmission order (whose index is 0),
2364	         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
2365	         as follows:

2367	            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

2369	            If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
2370	              AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

2372	            If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
2373	              AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

2375	            If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
2376	              AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

2378	            If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
2379	              AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

2381	         where DON(i) is the decoding order number of the NAL unit
2382	         having index i in the transmission order.  The decoding order
2383	         number is specified in section 5.5 of RFC 3984.

2385	            Informative note: Receivers may use sprop-max-don-diff to
2386	            trigger which NAL units in the receiver buffer can be
2387	            passed to the decoder.

2389	      max-rcmd-nalu-size:
2390	         This parameter MAY be used to signal the capabilities of a
2391	         receiver.  The parameter MUST NOT be used for any other
2392	         purposes.  The value of the parameter indicates the largest
2393	         NALU size in bytes that the receiver can handle efficiently.
2394	         The parameter value is a recommendation, not a strict upper
2395	         boundary.  The sender MAY create larger NALUs but must be
2396	         aware that the handling of these may come at a higher cost
2397	         than NALUs conforming to the limitation.

2399	         The value of max-rcmd-nalu-size MUST be an integer in the
2400	         range of 0 to 4294967295, inclusive.  If this parameter is not
2401	         specified, no known limitation to the NALU size exists.
2402	         Senders still have to consider the MTU size available between
2403	         the sender and the receiver and SHOULD run MTU discovery for
2404	         this purpose.

2406	         This parameter is motivated by, for example, an IP to H.223
2407	         video telephony gateway, where NALUs smaller than the H.223
2408	         transport data unit will be more efficient.  A gateway may
2409	         terminate IP; thus, MTU discovery will normally not work
2410	         beyond the gateway.

2412	            Informative note: Setting this parameter to a lower than
2413	            necessary value may have a negative impact.

2415	      sar-understood:
2416	         This parameter MAY be used to indicate a receiver capability
2417	         and not anything else.  The parameter indicates the maximum
2418	         value of aspect_ratio_idc (specified in [1]) smaller than 255
2419	         that the receiver understands.  Table E-1 of [1] specifies
2420	         aspect_ratio_idc equal to 0 as "unspecified", 1 to 16,
2421	         inclusive, as specific Sample Aspect Ratios (SARs), 17 to 254,
2422	         inclusive, as "reserved", and 255 as the Extended SAR, for
2423	         which SAR width and SAR height are explicitly signaled.
2424	         Therefore, a receiver with a decoder according to [1]
2425	         understands aspect_ratio_idc in the range of 1 to 16,
2426	         inclusive and aspect_ratio_idc equal to 255, in the sense that
2427	         the receiver knows what exactly the SAR is.  For such a
2428	         receiver, the value of sar-understood is 16.  If in the future
2429	         Table E-1 of [1] is extended, e.g., such that the SAR for
2430	         aspect_ratio_idc equal to 17 is specified, then for a receiver
2431	         with a decoder that understands the extension, the value of
2432	         sar-understood is 17.  For a receiver with a decoder according
2433	         to the 2003 version of [1], the value of sar-understood is 13,
2434	         as the minimum reserved aspect_ratio_idc therein is 14.

2436	         When sar-understood is not present, the value MUST be inferred
2437	         to be equal to 13.

2439	      sar-supported:
2440	         This parameter MAY be used to indicate a receiver capability
2441	         and not anything else.  The value of this parameter is an
2442	         integer in the range of 1 to sar-understood, inclusive, equal
2443	         to 255.  The value of sar-supported equal to N smaller than
2444	         255 indicates that the reciever supports all the SARs
2445	         corresponding to H.264 aspect_ratio_idc values (see Table E-1
2446	         of [1]) in the range from 1 to N, inclusive, without geometric
2447	         distortion.  The value of sar-supported equal to 255 indicates
2448	         that the receiver supports all sample aspect ratios which are
2449	         expressible using two 16-bit integer values as the numerator
2450	         and denominator, i.e., those that are expressible using the
2451	         H.264 aspect_ratio_idc value of 255 (Extended_SAR, see Table
2452	         E-1 of [1]), without geometric distortion.

2454	         H.264 compliant encoders SHOULD NOT send an aspect_ratio_idc
2455	         equal to 0, or an aspect_ratio_idc larger than sar-understood
2456	         and smaller than 255.  H.264 compliant encoders SHOULD send an
2457	         aspect_ratio_idc that the receiver is able to display without
2458	         geometrical distortion.  However, H.264 compliant encoders MAY
2459	         choose to send pictures using any SAR.

2461	         Note that the actual sample aspect ratio or extended sample
2462	         aspect ratio, when present, of the stream is conveyed in the
2463	         Video Usability Information (VUI) part of the sequence
2464	         parameter set.

2466	      Encoding considerations:
2467	         This type is only defined for transfer via RTP (RFC 3550).

2469	      Security considerations:
2470	         See section 9 of RFC xxxx.

2472	      Public specification:
2473	         Please refer to RFC xxxx and its section 15.

2475	      Additional information:
2476	         None

2478	      File extensions:     none

2480	      Macintosh file type code: none

2482	      Object identifier or OID: none
2483	      Person & email address to contact for further information:
2484	         Ye-Kui Wang, yekuiwang@huawei.com

2486	      Intended usage:      COMMON

2488	      Author:
2489	         Ye-Kui Wang, yekuiwang@huawei.com

2491	      Change controller:
2492	         IETF Audio/Video Transport working group delegated from the
2493	         IESG.

2495	8.2. SDP Parameters

2497	8.2.1. Mapping of Payload Type Parameters to SDP

2499	   The media type video/H264 string is mapped to fields in the Session
2500	   Description Protocol (SDP) [6] as follows:

2502	   o  The media name in the "m=" line of SDP MUST be video.

2504	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
2505	      media subtype).

2507	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2509	   o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-
2510	      smbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic-
2511	      cap", "use-level-src-parameter-sets", "in-band-parameter-sets",
2512	      "packetization-mode", "sprop-interleaving-depth", "sprop-deint-
2513	      buf-req", "deint-buf-cap", "sprop-init-buf-time", "sprop-max-don-
2514	      diff", "max-rcmd-nalu-size", "sar-understood", and "sar-supported",
2515	      when present, MUST be included in the "a=fmtp" line of SDP.  These
2516	      parameters are expressed as a media type string, in the form of a
2517	      semicolon separated list of parameter=value pairs.

2519	   o  The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level-
2520	      parameter-sets", when present, MUST be included in the "a=fmtp"
2521	      line of SDP or conveyed using the "fmtp" source attribute as
2522	      specified in section 6.3 of [9].  For a particular media format
2523	      (i.e., RTP payload type), a "sprop-parameter-sets" or "sprop-
2524	      level-parameter-sets" MUST NOT be both included in the "a=fmtp"
2525	      line of SDP and conveyed using the "fmtp" source attribute.  When
2526	      included in the "a=fmtp" line of SDP, these parameters are
2527	      expressed as a media type string, in the form of a semicolon
2528	      separated list of parameter=value pairs.  When conveyed using the
2529	      "fmtp" source attribute, these parameters are only associated with
2530	      the given source and payload type as parts of the "fmtp" source
2531	      attribute.

2533	         Informative note: Conveyance of "sprop-parameter-sets" and
2534	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2535	         allows for out-of-band transport of parameter sets in
2536	         topologies like Topo-Video-switch-MCU [29].

2538	   An example of media representation in SDP is as follows (Baseline
2539	   Profile, Level 3.0, some of the constraints of the Main profile may
2540	   not be obeyed):

2542	      m=video 49170 RTP/AVP 98
2543	      a=rtpmap:98 H264/90000
2544	      a=fmtp:98 profile-level-id=42A01E;
2545	                packetization-mode=1;
2546	                sprop-parameter-sets=<parameter sets data>

2548	8.2.2. Usage with the SDP Offer/Answer Model

2550	   When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
2551	   for negotiation for unicast usage, the following limitations and
2552	   rules apply:

2554	   o  The parameters identifying a media format configuration for H.264
2555	      are "profile-level-id" and "packetization-mode", when present.
2556	      These media format configuration parameters (except for the level
2557	      part of "profile-level-id") MUST be used symmetrically; i.e., the
2558	      answerer MUST either maintain all configuration parameters or
2559	      remove the media format (payload type) completely, if one or more
2560	      of the parameter values are not supported.  Note that the level
2561	      part of "profile-level-id" includes level_idc, and, for indication
2562	      of level 1b when profile_idc is equal to 66, 77 or 88, bit 4
2563	      (constraint_set3_flag) of profile-iop.  The level part of
2564	      "profile-level-id" is downgradable, i.e. the answerer MUST
2565	      maintain the same or a lower level or remove the media format
2566	      (payload type) completely.

2568	         Informative note: The requirement for symmetric use applies
2569	         only for the above media format configuration parameters
2570	         excluding the level part of "profile-level-id", and not for
2571	         the other stream properties and capability parameters.

2573	         Informative note: In H.264 [1], all the levels except for
2574	         level 1b are equal to the value of level_idc divided by 10.
2575	         Level 1b is a level higher than level 1.0 but lower than level
2576	         1.1, and is signaled in an ad-hoc manner, due to that the
2577	         level was specified after level 1.0 and level 1.1.  For the
2578	         Baseline, Main and Extended profiles (with profile_idc equal
2579	         to 66, 77 and 88, respectively), level 1b is indicated by
2580	         level_idc equal to 11 (i.e. same as level 1.1) and
2581	         constraint_set3_flag equal to 1.  For other profiles, level 1b
2582	         is indicated by level_idc equal to 9 (but note that level 1b
2583	         for these profiles are still higher than level 1, which has
2584	         level_idc equal to 10, and lower than level 1.1).  In SDP
2585	         Offer/Answer, an answer to an offer may indicate a level equal
2586	         to or lower than the level indicated in the offer.  Due to the
2587	         ad-hoc indication of level 1b, offerers and answerers must
2588	         check the value of bit 4 (constraint_set3_flag) of the middle
2589	         octet of the parameter "profile-level-id", when profile_idc is
2590	         equal to 66, 77 or 88 and level_idc is equal to 11.

2592	      To simplify handling and matching of these configurations, the
2593	      same RTP payload type number used in the offer SHOULD also be
2594	      used in the answer, as specified in [8].  An answer MUST NOT
2595	      contain a payload type number used in the offer unless the
2596	      configuration is exactly the same as in the offer or the
2597	      configuration in the answer only differs from that in the offer
2598	      with a level lower than the default level offered.

2600	         Informative note: When an offerer receives an answer, it has
2601	         to compare payload types not declared in the offer based on
2602	         the media type (i.e., video/H264) and the above media
2603	         configuration parameters with any payload types it has already
2604	         declared.  This will enable it to determine whether the
2605	         configuration in question is new or if it is equivalent to
2606	         configuration already offered, since a different payload type
2607	         number may be used in the answer.

2609	   o  The parameters "sprop-deint-buf-req", "sprop-interleaving-depth",
2610	      "sprop-max-don-diff", and "sprop-init-buf-time" describe the
2611	      properties of the RTP packet stream that the offerer or answerer
2612	      is sending for the media format configuration.  This differs from
2613	      the normal usage of the Offer/Answer parameters: normally such
2614	      parameters declare the properties of the stream that the offerer
2615	      or the answerer is able to receive.  When dealing with H.264, the
2616	      offerer assumes that the answerer will be able to receive media
2617	      encoded using the configuration being offered.

2619	         Informative note: The above parameters apply for any stream
2620	         sent by the declaring entity with the same configuration; i.e.,
2621	         they are dependent on their source.  Rather than being bound
2622	         to the payload type, the values may have to be applied to
2623	         another payload type when being sent, as they apply for the
2624	         configuration.

2626	   o  The capability parameters ("max-mbps", "max-smbps", "max-fs",
2627	      "max-cpb", "max-dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-
2628	      nalu-size", "sar-understood", "sar-supported") MAY be used to
2629	      declare further capabilities of the offerer or answerer for
2630	      receiving.  These parameters can only be present when the
2631	      direction attribute is sendrecv or recvonly, and the parameters
2632	      describe the limitations of what the offerer or answerer accepts
2633	      for receiving streams.

2635	   o  An offerer has to include the size of the de-interleaving buffer,
2636	      "sprop-deint-buf-req", in the offer for an interleaved H.264
2637	      stream.  To enable the offerer and answerer to inform each other
2638	      about their capabilities for de-interleaving buffering in
2639	      receiving streams, both parties are RECOMMENDED to include "deint-
2640	      buf-cap".  For interleaved streams, it is also RECOMMENDED to
2641	      consider offering multiple payload types with different buffering
2642	      requirements when the capabilities of the receiver are unknown.

2644	   o  The "sprop-parameter-sets" or "sprop-level-parameter-sets"
2645	      parameter, when present (included in the "a=fmtp" line of SDP or
2646	      conveyed using the "fmtp" source attribute as specified in section
2647	      6.3 of [9]), is used for out-of-band transport of parameter sets.
2648	      However, when out-of-band transport of parameter sets is used,
2649	      parameter sets MAY still be additionally transported in-band.  If
2650	      neither "sprop-parameter-sets" nor "sprop-level-parameter-sets" is
2651	      present, then only in-band transport of parameter sets is used.

2653	      An offer MAY include either or both of "sprop-parameter-sets" and
2654	      "sprop-level-parameter-sets".  An answer MAY include "sprop-
2655	      parameter-sets", and MUST NOT include "sprop-level-parameter-
2656	      sets".

2658	      If the answer includes "in-band-parameter-sets" equal to 1, then
2659	      the sender MUST transmit parameter sets in-band.

2661	      Otherwise, the following applies.

2663	        o When an offered payload type is accepted without level
2664	           downgrade, i.e. the default level is accepted, the following
2665	           applies.

2667	             o When there is a "sprop-parameter-sets" included in the
2668	                "a=fmtp" line of SDP, the answerer MUST be prepared to
2669	                use the parameter sets included in "sprop-parameter-
2670	                sets" for decoding the incoming NAL unit stream.

2672	             o When there is a "sprop-parameter-sets" conveyed using
2673	                the "fmtp" source attribute as specified in section 6.3
2674	                of [9], and the answerer understands the "fmtp" source
2675	                attribute, it MUST be prepared to use the parameter
2676	                sets included in "sprop-parameter-sets" for decoding
2677	                the incoming NAL unit stream, and it MUST include
2678	                either "use-level-src-parameter-sets" equal to 1 or the
2679	                "fmtp" source attribute in the answer.

2681	             o When there is a "sprop-parameter-sets" conveyed using
2682	                the "fmtp" source attribute as specified in section 6.3
2683	                of [9], and the answerer does not understand the "fmtp"
2684	                source attribute, the sender MUST transmit parameter
2685	                sets in-band, and the answerer MUST NOT include "use-
2686	                level-src-parameter-sets" equal to 1 or the "fmtp"
2687	                source attribute in the answer.

2689	             o When "sprop-parameter-sets" is not present, the sender
2690	                MUST transmit parameter sets in-band.

2692	             o The answerer MUST ignore "sprop-level-parameter-sets",
2693	                when present (either included in the "a=fmtp" line of
2694	                SDP or conveyed using the "fmtp" source attribute).

2696	        o When level downgrade is in use, i.e., a level lower than the
2697	           default level offered is accepted, the following applies.

2699	             o The answerer MUST ignore "sprop-parameter-sets", when
2700	                present (either included in the "a=fmtp" line of SDP or
2701	                conveyed using the "fmtp" source attribute).

2703	             o When "use-level-src-parameter-sets" equal to 1 and the
2704	                "fmtp" source attribute are not present in the answer
2705	                for the accepted payload type, the answerer MUST ignore
2706	                "sprop-level-parameter-sets", when present, and the
2707	                sender MUST transmit parameter sets in-band.

2709	             o When "use-level-src-parameter-sets" equal to 1 or the
2710	                "fmtp" source attribute is present in the answer for
2711	                the accepted payload type, the answerer MUST be
2712	                prepared to use the parameter sets that are included in
2713	                "sprop-level-parameter-sets" for the accepted level,
2714	                when present, for decoding the incoming NAL unit stream,
2715	                and ignore all other parameter sets included in "sprop-
2716	                level-parameter-sets".

2718	             o When no parameter sets for the accepted level are
2719	                present in the "sprop-level-parameter-sets", the sender
2720	                MUST transmit parameter sets in-band.

2722	      The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
2723	      the answerer MAY use either out-of-band or in-band transport of
2724	      parameter sets for the stream it is sending, regardless of
2725	      whether out-of-band parameter sets transport has been used in the
2726	      offerer-to-answerer direction.  When the offer includes "in-band-
2727	      parameter-sets" equal to 1, the answerer MUST not include "sprop-
2728	      parameter-sets" and MUST transmit parameter sets in-band.  All
2729	      parameter sets included in the "sprop-parameter-sets", when
2730	      present, for the accepted payload type in an answer MUST be
2731	      associated with the accepted level, as indicated by the profile-
2732	      level-id in the answer for the accepted payload type.

2734	      Parameter sets included in "sprop-parameter-sets" in an answer
2735	      are independent of those parameter sets included in the offer, as
2736	      they are used for decoding two different video streams, one from
2737	      the answerer to the offerer, and the other in the opposite
2738	      direction.  The offerer MUST be prepared to use the parameter
2739	      sets included in the answer's "sprop-parameter-sets", when
2740	      present, for decoding the incoming NAL unit stream.

2742	      When "sprop-parameter-sets" or "sprop-level-parameter-sets" is
2743	      conveyed using the "fmtp" source attribute in as specified in
2744	      section 6.3 of [9], the receiver of the parameters MUST store the
2745	      parameter sets included in the "sprop-parameter-sets" or "sprop-
2746	      level-parameter-sets" for the accepted level and associate them
2747	      to the source given as a part of the "fmtp" source attribute.
2748	      Parameter sets associated with one source MUST only be used to
2749	      decode NAL units conveyed in RTP packets from the same source.
2750	      When this mechanism is in use, SSRC collision detection and
2751	      resolution MUST be performed as specified in [9].

2753	         Informative note: Conveyance of "sprop-parameter-sets" and
2754	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2755	         may be used in topologies like Topo-Video-switch-MCU [29] to
2756	         enable out-of-band transport of parameter sets.

2758	   For streams being delivered over multicast, the following rules apply:

2760	   o  The media format configuration is identified by the same
2761	      parameters as above for unicast (i.e. "profile-level-id" and
2762	      "packetization-mode", when present).  These media format
2763	      configuration parameters (including the level part of "profile-
2764	      level-id") MUST be used symmetrically; i.e., the answerer MUST
2765	      either maintain all configuration parameters or remove the media
2766	      format (payload type) completely.  Note that this implies that the
2767	      level part of "profile-level-id" for Offer/Answer in multicast is
2768	      not downgradable.

2770	      To simplify handling and matching of these configurations, the
2771	      same RTP payload type number used in the offer SHOULD also be
2772	      used in the answer, as specified in [8].  An answer MUST NOT
2773	      contain a payload type number used in the offer unless the
2774	      configuration is the same as in the offer.

2776	   o  Parameter sets received MUST be associated with the originating
2777	      source, and MUST be only used in decoding the incoming NAL unit
2778	      stream from the same source.

2780	   o  The rules for other parameters are the same as above for unicast.

2782	   Table 6 lists the interpretation of all the 20 media type parameters
2783	   that MUST be used for the different direction attributes.

2785	       Table 6. Interpretation of parameters for different direction
2786	                                attributes.

2788	                                              sendonly --+
2789	                                           recvonly --+  |
2790	                                        sendrecv --+  |  |
2791	                                                   |  |  |
2792	                profile-level-id                   C  C  P
2793	                packetization-mode                 C  C  P
2794	                sprop-deint-buf-req                P  -  P
2795	                sprop-interleaving-depth           P  -  P
2796	                sprop-max-don-diff                 P  -  P
2797	                sprop-init-buf-time                P  -  P
2798	                max-mbps                           R  R  -
2799	                max-smbps                          R  R  -
2800	                max-fs                             R  R  -
2801	                max-cpb                            R  R  -
2802	                max-dpb                            R  R  -
2803	                max-br                             R  R  -
2804	                redundant-pic-cap                  R  R  -
2805	                deint-buf-cap                      R  R  -
2806	                max-rcmd-nalu-size                 R  R  -
2807	                sar-understood                     R  R  -
2808	                sar-supported                      R  R  -
2809	                in-band-parameter-sets             R  R  -
2810	                use-level-src-parameter-sets       R  R  -
2811	                sprop-parameter-sets               S  -  S
2812	                sprop-level-parameter-sets         S  -  S

2814	             Legend:

2816	             C: configuration for sending and receiving streams
2817	             P: properties of the stream to be sent
2818	             R: receiver capabilities
2819	             S: out-of-band parameter sets
2820	             -: not usable, when present SHOULD be ignored

2822	   Parameters used for declaring receiver capabilities are in general
2823	   downgradable; i.e., they express the upper limit for a sender's
2824	   possible behavior.  Thus a sender MAY select to set its encoder using
2825	   only lower/less or equal values of these parameters.

2827	   Parameters declaring a configuration point are not downgradable, with
2828	   the exception of the level part of the "profile-level-id" parameter
2829	   for unicast usage.  This expresses values a receiver expects to be
2830	   used and must be used verbatim on the sender side.

2832	   When a sender's capabilities are declared, and non-downgradable
2833	   parameters are used in this declaration, then these parameters
2834	   express a configuration that is acceptable for the sender to receive
2835	   streams.  In order to achieve high interoperability levels, it is
2836	   often advisable to offer multiple alternative configurations; e.g.,
2837	   for the packetization mode.  It is impossible to offer multiple
2838	   configurations in a single payload type.  Thus, when multiple
2839	   configuration offers are made, each offer requires its own RTP
2840	   payload type associated with the offer.

2842	   A receiver SHOULD understand all media type parameters, even if it
2843	   only supports a subset of the payload format's functionality.  This
2844	   ensures that a receiver is capable of understanding when an offer to
2845	   receive media can be downgraded to what is supported by the receiver
2846	   of the offer.

2848	   An answerer MAY extend the offer with additional media format
2849	   configurations.  However, to enable their usage, in most cases a
2850	   second offer is required from the offerer to provide the stream
2851	   property parameters that the media sender will use.  This also has
2852	   the effect that the offerer has to be able to receive this media
2853	   format configuration, not only to send it.

2855	   If an offerer wishes to have non-symmetric capabilities between
2856	   sending and receiving, the offerer should offer different RTP
2857	   sessions; i.e., different media lines declared as "recvonly" and
2858	   "sendonly", respectively.  This may have further implications on the
2859	   system.

2861	8.2.3. Usage in Declarative Session Descriptions

2863	   When H.264 over RTP is offered with SDP in a declarative style, as in
2864	   RTSP [27] or SAP [28], the following considerations are necessary.

2866	   o  All parameters capable of indicating both stream properties and
2867	      receiver capabilities are used to indicate only stream properties.
2868	      For example, in this case, the parameter "profile-level-id"
2869	      declares only the values used by the stream, not the capabilities
2870	      for receiving streams.  This results in that the following
2871	      interpretation of the parameters MUST be used:

2873	      Declaring actual configuration or stream properties:

2875	         - profile-level-id
2876	         - packetization-mode
2877	         - sprop-interleaving-depth
2878	         - sprop-deint-buf-req
2879	         - sprop-max-don-diff
2880	         - sprop-init-buf-time

2882	      Out-of-band transporting of parameter sets:

2884	         - sprop-parameter-sets
2885	         - sprop-level-parameter-sets

2887	      Not usable(when present, they SHOULD be ignored):

2889	         - max-mbps
2890	         - max-smbps
2891	         - max-fs
2892	         - max-cpb
2893	         - max-dpb
2894	         - max-br
2895	         - redundant-pic-cap
2896	         - max-rcmd-nalu-size
2897	         - deint-buf-cap
2898	         - sar-understood
2899	         - sar-supported
2900	         - in-band-parameter-sets
2901	         - use-level-src-parameter-sets

2903	   o  A receiver of the SDP is required to support all parameters and
2904	      values of the parameters provided; otherwise, the receiver MUST
2905	      reject (RTSP) or not participate in (SAP) the session.  It falls
2906	      on the creator of the session to use values that are expected to
2907	      be supported by the receiving application.

2909	8.3. Examples

2911	   An SDP Offer/Answer exchange wherein both parties are expected to
2912	   both send and receive could look like the following.  Only the media
2913	   codec specific parts of the SDP are shown.  Some lines are wrapped
2914	   due to text constraints.

2916	      Offerer -> Answerer SDP message:

2918	      m=video 49170 RTP/AVP 100 99 98
2919	      a=rtpmap:98 H264/90000
2920	      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
2921	        sprop-parameter-sets=<parameter sets data#0>
2922	      a=rtpmap:99 H264/90000
2923	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2924	        sprop-parameter-sets=<parameter sets data#1>
2925	      a=rtpmap:100 H264/90000
2926	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2927	        sprop-parameter-sets=<parameter sets data#2>;
2928	        sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
2929	        sprop-init-buf-time=102478; deint-buf-cap=128000

2931	   The above offer presents the same codec configuration in three
2932	   different packetization formats.  PT 98 represents single NALU mode,
2933	   PT 99 represents non-interleaved mode, and PT 100 indicates the
2934	   interleaved mode.  In the interleaved mode case, the interleaving
2935	   parameters that the offerer would use if the answer indicates support
2936	   for PT 100 are also included.  In all three cases the parameter
2937	   "sprop-parameter-sets" conveys the initial parameter sets that are
2938	   required by the answerer when receiving a stream from the offerer
2939	   when this configuration is accepted.  Note that the value for "sprop-
2940	   parameter-sets" could be different for each payload type.

2942	      Answerer -> Offerer SDP message:

2944	      m=video 49170 RTP/AVP 100 99 97
2945	      a=rtpmap:97 H264/90000
2946	      a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
2947	        sprop-parameter-sets=<parameter sets data#3>
2948	      a=rtpmap:99 H264/90000
2949	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2950	        sprop-parameter-sets=<parameter sets data#4>;
2951	        max-rcmd-nalu-size=3980
2952	      a=rtpmap:100 H264/90000
2953	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2954	        sprop-parameter-sets=<parameter sets data#5>;
2955	        sprop-interleaving-depth=60;
2956	        sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
2957	        deint-buf-cap=128000; max-rcmd-nalu-size=3980

2959	   As the Offer/Answer negotiation covers both sending and receiving
2960	   streams, an offer indicates the exact parameters for what the offerer
2961	   is willing to receive, whereas the answer indicates the same for what
2962	   the answerer accepts to receive.  In this case the offerer declared
2963	   that it is willing to receive payload type 98.  The answerer accepts
2964	   this by declaring an equivalent payload type 97; i.e., it has
2965	   identical values for the two parameters "profile-level-id" and
2966	   "packetization-mode" (since "packetization-mode" is equal to 0,
2967	   "sprop-deint-buf-req" is not present).  As the offered payload type
2968	   98 is accepted, the answerer needs to store parameter sets included
2969	   in sprop-parameter-sets=<parameter sets data#0> in case the offer
2970	   finally decides to use this configuration. In the answer, the
2971	   answerer includes the parameter sets in sprop-parameter-
2972	   sets=<parameter sets data#3> that the answerer would use in the
2973	   stream sent from the answerer if this configuration is finally used.

2975	   The answerer also accepts the reception of the two configurations
2976	   that payload types 99 and 100 represent.  Again, the answerer needs
2977	   to store parameter sets included in sprop-parameter-sets=<parameter
2978	   sets data#1> and sprop-parameter-sets=<parameter sets data#2> in case
2979	   the offer finally decides to use either of these two configurations.
2980	   The answerer provides the initial parameter sets for the answerer-to-
2981	   offerer direction, i.e. the parameter sets in sprop-parameter-
2982	   sets=<parameter sets data#4> and sprop-parameter-sets=<parameter sets
2983	   data#5>, for payload types 99 and 100, respectively, that it will use
2984	   to send the payload types.  The answerer also provides the offerer
2985	   with its memory limit for de-interleaving operations by providing a
2986	   "deint-buf-cap" parameter.  This is only useful if the offerer
2987	   decides on making a second offer, where it can take the new value
2988	   into account.  The "max-rcmd-nalu-size" indicates that the answerer
2989	   can efficiently process NALUs up to the size of 3980 bytes.  However,
2990	   there is no guarantee that the network supports this size.

2992	   In the following example, the offer is accepted without level
2993	   downgrading (i.e. the default level, 3.0, is accepted), and both
2994	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
2995	   in the offer.  The answerer must ignore sprop-level-parameter-
2996	   sets=<parameter sets data#1> and store parameter sets in sprop-
2997	   parameter-sets=<parameter sets data#0> for decoding the incoming NAL
2998	   unit stream.  The offerer must store the parameter sets in sprop-
2999	   parameter-sets=<parameter sets data#2> in the answer for decoding the
3000	   incoming NAL unit stream.  Note that in this example, parameter sets
3001	   in sprop-parameter-sets=<parameter sets data#2> must be associated
3002	   with level 3.0.

3004	      Offer SDP:

3006	      m=video 49170 RTP/AVP 98
3007	      a=rtpmap:98 H264/90000
3008	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3009	        packetization-mode=1;
3010	        sprop-parameter-sets=<parameter sets data#0>;
3011	        sprop-level-parameter-sets=<parameter sets data#1>

3013	      Answer SDP:

3015	      m=video 49170 RTP/AVP 98
3016	      a=rtpmap:98 H264/90000
3017	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3018	        packetization-mode=1;
3019	        sprop-parameter-sets=<parameter sets data#2>

3021	   In the following example, the offer (Baseline profile, level 1.1) is
3022	   accepted with level downgrading (the accepted level is 1b), and both
3023	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3024	   in the offer.  The answerer must ignore sprop-parameter-
3025	   sets=<parameter sets data#0> and all parameter sets not for the
3026	   accepted level (level 1b) in sprop-level-parameter-sets=<parameter
3027	   sets data#1>, and must store parameter sets for the accepted level
3028	   (level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for
3029	   decoding the incoming NAL unit stream.  The offerer must store the
3030	   parameter sets in sprop-parameter-sets=<parameter sets data#2> in the
3031	   answer for decoding the incoming NAL unit stream.  Note that in this
3032	   example, parameter sets in sprop-parameter-sets=<parameter sets
3033	   data#2> must be associated with level 1b.

3035	      Offer SDP:

3037	      m=video 49170 RTP/AVP 98
3038	      a=rtpmap:98 H264/90000
3039	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3040	        packetization-mode=1;
3041	        sprop-parameter-sets=<parameter sets data#0>;
3042	        sprop-level-parameter-sets=<parameter sets data#1>

3044	      Answer SDP:

3046	      m=video 49170 RTP/AVP 98
3047	      a=rtpmap:98 H264/90000
3048	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3049	        packetization-mode=1;
3050	        sprop-parameter-sets=<parameter sets data#2>;
3051	        use-level-src-parameter-sets=1

3053	   In the following example, the offer (Baseline profile, level 1.1) is
3054	   accepted with level downgrading (the accepted level is 1b), and both
3055	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3056	   in the offer.  However, the answerer is a legacy RFC 3984
3057	   implementation and does not understand "sprop-level-parameter-sets",
3058	   hence it does not include "use-level-src-parameter-sets" (which the
3059	   answerer does not understand, either) in the answer.  Therefore, the
3060	   answerer must ignore both sprop-parameter-sets=<parameter sets
3061	   data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and
3062	   the offerer must transport parameter sets in-band.

3064	      Offer SDP:

3066	      m=video 49170 RTP/AVP 98
3067	      a=rtpmap:98 H264/90000
3068	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3069	        packetization-mode=1;
3070	        sprop-parameter-sets=<parameter sets data#0>;
3071	        sprop-level-parameter-sets=<parameter sets data#1>

3073	      Answer SDP:

3075	      m=video 49170 RTP/AVP 98
3076	      a=rtpmap:98 H264/90000
3077	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3078	        packetization-mode=1

3080	   In the following example, the offer is accepted without level
3081	   downgrading, and "sprop-parameter-sets" is present in the offer.
3082	   Parameter sets in sprop-parameter-sets=<parameter sets data#0> must
3083	   be stored and used used by the encoder of the offerer and the decoder
3084	   of the answerer, and parameter sets in sprop-parameter-
3085	   sets=<parameter sets data#1>must be used by the encoder of the
3086	   answerer and the decoder of the offerer.  Note that sprop-parameter-
3087	   sets=<parameter sets data#0> is basically independent of sprop-
3088	   parameter-sets=<parameter sets data#1>.

3090	      Offer SDP:

3092	      m=video 49170 RTP/AVP 98
3093	      a=rtpmap:98 H264/90000
3094	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3095	        packetization-mode=1;
3096	        sprop-parameter-sets=<parameter sets data#0>

3098	      Answer SDP:

3100	      m=video 49170 RTP/AVP 98
3101	      a=rtpmap:98 H264/90000
3102	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3103	        packetization-mode=1;
3104	        sprop-parameter-sets=<parameter sets data#1>

3106	   In the following example, the offer is accepted without level
3107	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3108	   parameter-sets" is present in the offer, meaning that there is no
3109	   out-of-band transmission of parameter sets, which then have to be
3110	   transported in-band.

3112	      Offer SDP:

3114	      m=video 49170 RTP/AVP 98
3115	      a=rtpmap:98 H264/90000
3116	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3117	        packetization-mode=1

3119	      Answer SDP:

3121	      m=video 49170 RTP/AVP 98
3122	      a=rtpmap:98 H264/90000
3123	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3124	        packetization-mode=1

3126	   In the following example, the offer is accepted with level
3127	   downgrading and "sprop-parameter-sets" is present in the offer.  As
3128	   sprop-parameter-sets=<parameter sets data#0> contains level_idc
3129	   indicating Level 3.0, therefore cannot be used as the answerer wants
3130	   Level 2.0 and must be ignored by the answerer, and in-band parameter
3131	   sets must be used.

3133	      Offer SDP:

3135	      m=video 49170 RTP/AVP 98
3136	      a=rtpmap:98 H264/90000
3137	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3138	        packetization-mode=1;
3139	        sprop-parameter-sets=<parameter sets data#0>

3141	      Answer SDP:

3143	      m=video 49170 RTP/AVP 98
3144	      a=rtpmap:98 H264/90000
3145	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3146	        packetization-mode=1

3148	   In the following example, the offer is also accepted with level
3149	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3150	   parameter-sets" is present in the offer, meaning that there is no
3151	   out-of-band transmission of parameter sets, which then have to be
3152	   transported in-band.

3154	      Offer SDP:

3156	      m=video 49170 RTP/AVP 98
3157	      a=rtpmap:98 H264/90000
3158	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3159	        packetization-mode=1

3161	      Answer SDP:

3163	      m=video 49170 RTP/AVP 98
3164	      a=rtpmap:98 H264/90000
3165	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3166	        packetization-mode=1

3168	   In the following example, the offerer is a Multipoint Control Unit
3169	   (MCU) in a Topo-Video-switch-MCU like topology [29], offering
3170	   parameter sets received (using out-of-band transport) from three
3171	   other participants B, C, and D, and receiving parameter sets from the
3172	   participant A, which is the answerer.  The participants are
3173	   identified by their values of CNAME, which are mapped to different
3174	   SSRC values.  The same codec configuration is used by all the four
3175	   participants.  The participant A stores and associates the parameter
3176	   sets included in <parameter sets data#B>, <parameter sets data#C>,
3177	   and <parameter sets data#D> to participants B, C, and D, respectively,
3178	   and uses <parameter sets data#B> for decoding NAL units carried in
3179	   RTP packets originated from participant B only, uses <parameter sets
3180	   data#C> for decoding NAL units carried in RTP packets originated from
3181	   participant C only, and uses <parameter sets data#D> for decoding NAL
3182	   units carried in RTP packets originated from participant D only.

3184	      Offer SDP:

3186	      m=video 49170 RTP/AVP 98
3187	      a=ssrc:SSRC-B cname:CNAME-B
3188	      a=ssrc:SSRC-C cname:CNAME-C
3189	      a=ssrc:SSRC-D cname:CNAME-D
3190	      a=ssrc:SSRC-B fmtp:98
3191	        sprop-parameter-sets=<parameter sets data#B>
3192	      a=ssrc:SSRC-C fmtp:98
3193	        sprop-parameter-sets=<parameter sets data#C>
3194	      a=ssrc:SSRC-D fmtp:98
3195	        sprop-parameter-sets=<parameter sets data#D>
3196	      a=rtpmap:98 H264/90000
3197	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3198	        packetization-mode=1

3200	      Answer SDP:

3202	      m=video 49170 RTP/AVP 98
3203	      a=ssrc:SSRC-A cname:CNAME-A
3204	      a=ssrc:SSRC-A fmtp:98
3205	        sprop-parameter-sets=<parameter sets data#A>
3206	      a=rtpmap:98 H264/90000
3207	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3208	        packetization-mode=1

3210	8.4. Parameter Set Considerations

3212	   The H.264 parameter sets are a fundamental part of the video codec
3213	   and vital to its operation; see section 1.2.  Due to their
3214	   characteristics and their importance for the decoding process, lost
3215	   or erroneously transmitted parameter sets can hardly be concealed
3216	   locally at the receiver.  A reference to a corrupt parameter set has
3217	   normally fatal results to the decoding process.  Corruption could
3218	   occur, for example, due to the erroneous transmission or loss of a
3219	   parameter set NAL unit, but also due to the untimely transmission of
3220	   a parameter set update.  A parameter set update refers to a change of
3221	   at least one parameter in a picture parameter set or sequence
3222	   parameter set for which the picture parameter set or sequence
3223	   parameter set identifier remains unchanged.  Therefore, the following
3224	   recommendations are provided as a guideline for the implementer of
3225	   the RTP sender.

3227	   Parameter set NALUs can be transported using three different
3228	   principles:

3230	   A. Using a session control protocol (out-of-band) prior to the actual
3231	     RTP session.

3233	   B. Using a session control protocol (out-of-band) during an ongoing
3234	     RTP session.

3236	   C. Within the RTP packet stream in the payload (in-band) during an
3237	     ongoing RTP session.

3239	   It is recommended to implement principles A and B within a session
3240	   control protocol.  SIP and SDP can be used as described in the SDP
3241	   Offer/Answer model and in the previous sections of this memo.
3242	   Section 8.2.2 includes a detailed discussion on transport of
3243	   parameter sets in-band or out-of-band in SDP Offer/Answer using media
3244	   type parameters "sprop-parameter-sets", "sprop-level-parameter-sets",
3245	   "use-level-src-parameter-sets" and "in-band-parameter-sets".  This
3246	   section contains guidelines on how principles A and B should be
3247	   implemented within session control protocols.  It is independent of
3248	   the particular protocol used.  Principle C is supported by the RTP
3249	   payload format defined in this specification.  There are topologies
3250	   like Topo-Video-switch-MCU [29] for which the use of principle C may
3251	   be desirable.

3253	   If in-band signaling of parameter sets is used, the picture and
3254	   sequence parameter set NALUs SHOULD be transmitted in the RTP payload
3255	   using a reliable method of delivering of RTP (see below), as a loss
3256	   of a parameter set of either type will likely prevent decoding of a
3257	   considerable portion of the corresponding RTP packet stream.

3259	   If in-band signaling of parameter sets is used, the sender SHOULD
3260	   take the error characteristics into account and use mechanisms to
3261	   provide a high probability for delivering the parameter sets
3262	   correctly.  Mechanisms that increase the probability for a correct
3263	   reception include packet repetition, FEC, and retransmission.  The
3264	   use of an unreliable, out-of-band control protocol has similar
3265	   disadvantages as the in-band signaling (possible loss) and, in
3266	   addition, may also lead to difficulties in the synchronization (see
3267	   below).  Therefore, it is NOT RECOMMENDED.

3269	   Parameter sets MAY be added or updated during the lifetime of a
3270	   session using principles B and C.  It is required that parameter sets
3271	   are present at the decoder prior to the NAL units that refer to them.
3272	   Updating or adding of parameter sets can result in further problems,
3273	   and therefore the following recommendations should be considered.

3275	   - When parameter sets are added or updated, care SHOULD be taken to
3276	     ensure that any parameter set is delivered prior to its usage.
3277	     When new parameter sets are added, previously unused parameter set
3278	     identifiers are used.  It is common that no synchronization is
3279	     present between out-of-band signaling and in-band traffic.  If
3280	     out-of-band signaling is used, it is RECOMMENDED that a sender
3281	     does not start sending NALUs requiring the added or updated
3282	     parameter sets prior to acknowledgement of delivery from the
3283	     signaling protocol.

3285	   - When parameter sets are updated, the following synchronization
3286	     issue should be taken into account.  When overwriting a parameter
3287	     set at the receiver, the sender has to ensure that the parameter
3288	     set in question is not needed by any NALU present in the network
3289	     or receiver buffers.  Otherwise, decoding with a wrong parameter
3290	     set may occur.  To lessen this problem, it is RECOMMENDED either
3291	     to overwrite only those parameter sets that have not been used for
3292	     a sufficiently long time (to ensure that all related NALUs have
3293	     been consumed), or to add a new parameter set instead (which may
3294	     have negative consequences for the efficiency of the video coding).

3296	         Informative note: In some topologies like Topo-Video-switch-
3297	         MCU [29] the origin of the whole set of parameter sets may
3298	         come from multiple sources that may use non-unique parameter
3299	         sets identifiers.  In this case an offer may overwrite an
3300	         existing parameter set if no other mechanism that enables
3301	         uniqueness of the parameter sets in the out-of-band channel
3302	         exists.

3304	   - In a multiparty session, one participant MUST associate parameter
3305	     sets coming from different sources with the source identification
3306	     whenever possible, e.g. by conveying out-of-band transported
3307	     parameter sets, as different sources typically use independent
3308	     parameter set identifier value spaces.

3310	   - Adding or modifying parameter sets by using both principles B and
3311	     C in the same RTP session may lead to inconsistencies of the
3312	     parameter sets because of the lack of synchronization between the
3313	     control and the RTP channel.  Therefore, principles B and C MUST
3314	     NOT both be used in the same session unless sufficient
3315	     synchronization can be provided.

3317	   In some scenarios (e.g., when only the subset of this payload format
3318	   specification corresponding to H.241 is used) or topologies, it is
3319	   not possible to employ out-of-band parameter set transmission.  In
3320	   this case, parameter sets have to be transmitted in-band.  Here, the
3321	   synchronization with the non-parameter-set-data in the bitstream is
3322	   implicit, but the possibility of a loss has to be taken into account.
3323	   The loss probability should be reduced using the mechanisms discussed
3324	   above.  In case a loss of a parameter set is detected, recovery may
3325	   be achieved by using a Decoder Refresh Point procedure, for example,
3326	   using RTCP feedback Full Intra Request (FIR) [30].  Two example
3327	   Decoder Refresh Point procedures are provided in the informative
3328	   Section 8.5.

3330	   - When parameter sets are initially provided using principle A and
3331	     then later added or updated in-band (principle C), there is a risk
3332	     associated with updating the parameter sets delivered out-of-band.
3333	     If receivers miss some in-band updates (for example, because of a
3334	     loss or a late tune-in), those receivers attempt to decode the
3335	     bitstream using out-dated parameters.  It is therefore RECOMMENDED
3336	     that parameter set IDs be partitioned between the out-of-band and
3337	     in-band parameter sets.

3339	8.5. Decoder Refresh Point Procedure using In-Band Transport of
3340	   Parameter Sets (Informative)

3342	   When a sender with a video encoder according to [1] receives a
3343	   request for a decoder refresh point, the encoder shall enter the fast
3344	   update mode by using one of the procedures specified in Section 8.5.1
3345	   or 8.5.2 below.  The procedure in 8.5.1 is the preferred response in
3346	   a lossless transmission environment.  Both procedures satisfy the
3347	   requirement to enter the fast update mode for H.264 video encoding.

3349	8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point

3351	   This section gives one possible way to respond to a request for a
3352	   decoder refresh point.

3354	   The encoder shall, in the order presented here:

3356	   1) Immediately prepare to send an IDR picture.

3358	   2) Send a sequence parameter set to be used by the IDR picture to be
3359	     sent. The encoder may optionally also send other sequence
3360	     parameter sets.

3362	   3) Send a picture parameter set to be used by the IDR picture to be
3363	     sent. The encoder may optionally also send other picture parameter
3364	     sets.

3366	   4) Send the IDR picture.

3368	   5) From this point forward in time, send any other sequence or
3369	     picture parameter sets that have not yet been sent in this
3370	     procedure, prior to their reference by any NAL unit, regardless of
3371	     whether such parameter sets were previously sent prior to
3372	     receiving the request for a decoder refresh point.  As needed,
3373	     such parameter sets may be sent in a batch, one at a time, or in
3374	     any combination of these two methods.  Parameter sets may be re-
3375	     sent at any time for redundancy.  Caution should be taken when
3376	     parameter set updates are present, as described above in Section
3377	     8.4.

3379	8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder
3380	   Refresh Point

3382	   This section gives another possible way to respond to a request for a
3383	   decoder refresh point.

3385	   The encoder shall, in the order presented here:

3387	   1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
3388	     [1]).

3390	   2) Repeat any sequence and picture parameter sets that were sent
3391	     before the recovery point SEI message, prior to their reference by
3392	     a NAL unit.

3394	   The encoder shall ensure that the decoder has access to all reference
3395	   pictures for inter prediction of pictures at or after the recovery
3396	   point, which is indicated by the recovery point SEI message, in
3397	   output order, assuming that the transmission from now on is error-
3398	   free.

3400	   The value of the recovery_frame_cnt syntax element in the recovery
3401	   point SEI message should be small enough to ensure a fast recovery.

3403	   As needed, such parameter sets may be re-sent in a batch, one at a
3404	   time, or in any combination of these two methods.  Parameter sets may
3405	   be re-sent at any time for redundancy.  Caution should be taken when
3406	   parameter set updates are present, as described above in Section 8.4.

3408	9. Security Considerations

3410	   RTP packets using the payload format defined in this specification
3411	   are subject to the security considerations discussed in the RTP
3412	   specification [5], and in any appropriate RTP profile (for example,
3413	   [16]).  This implies that confidentiality of the media streams is
3414	   achieved by encryption; for example, through the application of SRTP
3415	   [26].  Because the data compression used with this payload format is
3416	   applied end-to-end, any encryption needs to be performed after
3417	   compression.  A potential denial-of-service threat exists for data
3418	   encodings using compression techniques that have non-uniform
3419	   receiver-end computational load.  The attacker can inject
3420	   pathological datagrams into the stream that are complex to decode and
3421	   that cause the receiver to be overloaded.  H.264 is particularly
3422	   vulnerable to such attacks, as it is extremely simple to generate
3423	   datagrams containing NAL units that affect the decoding process of
3424	   many future NAL units.  Therefore, the usage of data origin
3425	   authentication and data integrity protection of at least the RTP
3426	   packet is RECOMMENDED; for example, with SRTP [26].

3428	   Note that the appropriate mechanism to ensure confidentiality and
3429	   integrity of RTP packets and their payloads is very dependent on the
3430	   application and on the transport and signaling protocols employed.
3431	   Thus, although SRTP is given as an example above, other possible
3432	   choices exist.

3434	   Decoders MUST exercise caution with respect to the handling of user
3435	   data SEI messages, particularly if they contain active elements, and
3436	   MUST restrict their domain of applicability to the presentation
3437	   containing the stream.

3439	   End-to-End security with either authentication, integrity or
3440	   confidentiality protection will prevent a MANE from performing media-
3441	   aware operations other than discarding complete packets.  And in the
3442	   case of confidentiality protection it will even be prevented from
3443	   performing discarding of packets in a media aware way.  To allow any
3444	   MANE to perform its operations, it will be required to be a trusted
3445	   entity which is included in the security context establishment.

3447	10. Congestion Control

3449	   Congestion control for RTP SHALL be used in accordance with RFC 3550
3450	   [5], and with any applicable RTP profile; e.g., RFC 3551 [16].  An
3451	   additional requirement if best-effort service is being used is: users
3452	   of this payload format MUST monitor packet loss to ensure that the
3453	   packet loss rate is within acceptable parameters.  Packet loss is
3454	   considered acceptable if a TCP flow across the same network path, and
3455	   experiencing the same network conditions, would achieve an average
3456	   throughput, measured on a reasonable timescale, that is not less than
3457	   the RTP flow is achieving.  This condition can be satisfied by
3458	   implementing congestion control mechanisms to adapt the transmission
3459	   rate (or the number of layers subscribed for a layered multicast
3460	   session), or by arranging for a receiver to leave the session if the
3461	   loss rate is unacceptably high.

3463	   The bit rate adaptation necessary for obeying the congestion control
3464	   principle is easily achievable when real-time encoding is used.
3465	   However, when pre-encoded content is being transmitted, bandwidth
3466	   adaptation requires the availability of more than one coded
3467	   representation of the same content, at different bit rates, or the
3468	   existence of non-reference pictures or sub-sequences [22] in the
3469	   bitstream.  The switching between the different representations can
3470	   normally be performed in the same RTP session; e.g., by employing a
3471	   concept known as SI/SP slices of the Extended Profile, or by
3472	   switching streams at IDR picture boundaries.  Only when non-
3473	   downgradable parameters (such as the profile part of the
3474	   profile/level ID) are required to be changed does it become necessary
3475	   to terminate and re-start the media stream.  This may be accomplished
3476	   by using a different RTP payload type.

3478	   MANEs MAY follow the suggestions outlined in section 7.3 and remove
3479	   certain unusable packets from the packet stream when that stream was
3480	   damaged due to previous packet losses.  This can help reduce the
3481	   network load in certain special cases.

3483	11. IANA Consideration

3485	   The H264 media subtype name specified by RFC 3984 should be updated
3486	   as defined in section 8.1 of this memo.

3488	12. Informative Appendix: Application Examples

3490	   This payload specification is very flexible in its use, in order to
3491	   cover the extremely wide application space anticipated for H.264.
3492	   However, this great flexibility also makes it difficult for an
3493	   implementer to decide on a reasonable packetization scheme.  Some
3494	   information on how to apply this specification to real-world
3495	   scenarios is likely to appear in the form of academic publications
3496	   and a test model software and description in the near future.
3497	   However, some preliminary usage scenarios are described here as well.

3499	12.1. Video Telephony according to ITU-T Recommendation H.241 Annex A

3501	   H.323-based video telephony systems that use H.264 as an optional
3502	   video compression scheme are required to support H.241 Annex A [3] as
3503	   a packetization scheme.  The packetization mechanism defined in this
3504	   Annex is technically identical with a small subset of this
3505	   specification.

3507	   When a system operates according to H.241 Annex A, parameter set NAL
3508	   units are sent in-band.  Only Single NAL unit packets are used.  Many
3509	   such systems are not sending IDR pictures regularly, but only when
3510	   required by user interaction or by control protocol means; e.g., when
3511	   switching between video channels in a Multipoint Control Unit or for
3512	   error recovery requested by feedback.

3514	12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
3515	   Aggregation

3517	   The RTP part of this scheme is implemented and tested (though not the
3518	   control-protocol part; see below).

3520	   In most real-world video telephony applications, picture parameters
3521	   such as picture size or optional modes never change during the
3522	   lifetime of a connection.  Therefore, all necessary parameter sets
3523	   (usually only one) are sent as a side effect of the capability
3524	   exchange/announcement process, e.g., according to the SDP syntax
3525	   specified in section 8.2 of this document.  As all necessary
3526	   parameter set information is established before the RTP session
3527	   starts, there is no need for sending any parameter set NAL units.
3528	   Slice data partitioning is not used, either.  Thus, the RTP packet
3529	   stream basically consists of NAL units that carry single coded slices.

3531	   The encoder chooses the size of coded slice NAL units so that they
3532	   offer the best performance.  Often, this is done by adapting the
3533	   coded slice size to the MTU size of the IP network.  For small
3534	   picture sizes, this may result in a one-picture-per-one-packet
3535	   strategy.  Intra refresh algorithms clean up the loss of packets and
3536	   the resulting drift-related artifacts.

3538	12.3. Video Telephony, Interleaved Packetization Using NAL Unit
3539	   Aggregation

3541	   This scheme allows better error concealment and is used in H.263
3542	   based designs using RFC 2429 packetization [11].  It has been
3543	   implemented, and good results were reported [13].

3545	   The VCL encoder codes the source picture so that all macroblocks (MBs)
3546	   of one MB line are assigned to one slice.  All slices with even MB
3547	   row addresses are combined into one STAP, and all slices with odd MB
3548	   row addresses into another.  Those STAPs are transmitted as RTP
3549	   packets.  The establishment of the parameter sets is performed as
3550	   discussed above.

3552	   Note that the use of STAPs is essential here, as the high number of
3553	   individual slices (18 for a CIF picture) would lead to unacceptably
3554	   high IP/UDP/RTP header overhead (unless the source coding tool FMO is
3555	   used, which is not assumed in this scenario).  Furthermore, some
3556	   wireless video transmission systems, such as H.324M and the IP-based
3557	   video telephony specified in 3GPP, are likely to use relatively small
3558	   transport packet size.  For example, a typical MTU size of H.223 AL3
3559	   SDU is around 100 bytes [17].  Coding individual slices according to
3560	   this packetization scheme provides further advantage in communication
3561	   between wired and wireless networks, as individual slices are likely
3562	   to be smaller than the preferred maximum packet size of wireless
3563	   systems.  Consequently, a gateway can convert the STAPs used in a
3564	   wired network into several RTP packets with only one NAL unit, which
3565	   are preferred in a wireless network, and vice versa.

3567	12.4. Video Telephony with Data Partitioning

3569	   This scheme has been implemented and has been shown to offer good
3570	   performance, especially at higher packet loss rates [13].

3572	   Data Partitioning is known to be useful only when some form of
3573	   unequal error protection is available.  Normally, in single-session
3574	   RTP environments, even error characteristics are assumed; i.e., the
3575	   packet loss probability of all packets of the session is the same
3576	   statistically.  However, there are means to reduce the packet loss
3577	   probability of individual packets in an RTP session.  A FEC packet
3578	   according to RFC 2733 [18], for example, specifies which media
3579	   packets are associated with the FEC packet.

3581	   In all cases, the incurred overhead is substantial but is in the same
3582	   order of magnitude as the number of bits that have otherwise been
3583	   spent for intra information.  However, this mechanism does not add
3584	   any delay to the system.

3586	   Again, the complete parameter set establishment is performed through
3587	   control protocol means.

3589	12.5. Video Telephony or Streaming with FUs and Forward Error Correction

3591	   This scheme has been implemented and has been shown to provide good
3592	   performance, especially at higher packet loss rates [19].

3594	   The most efficient means to combat packet losses for scenarios where
3595	   retransmissions are not applicable is forward error correction (FEC).
3596	   Although application layer, end-to-end use of FEC is often less
3597	   efficient than an FEC-based protection of individual links
3598	   (especially when links of different characteristics are in the
3599	   transmission path), application layer, end-to-end FEC is unavoidable
3600	   in some scenarios.  RFC 5109 [18] provides means to use generic,
3601	   application layer, end-to-end FEC in packet-loss environments.  A
3602	   binary forward error correcting code is generated by applying the XOR
3603	   operation to the bits at the same bit position in different packets.
3604	   The binary code can be specified by the parameters (n,k) in which k
3605	   is the number of information packets used in the connection and n is
3606	   the total number of packets generated for k information packets; i.e.,
3607	   n-k parity packets are generated for k information packets.

3609	   When a code is used with parameters (n,k) within the RFC 5109
3610	   framework, the following properties are well known:

3612	   a) If applied over one RTP packet, RFC 5109 provides only packet
3613	     repetition.

3615	   b) RFC 5109 is most bit rate efficient if XOR-connected packets have
3616	     equal length.

3618	   c) At the same packet loss probability p and for a fixed k, the
3619	     greater the value of n is, the smaller the residual error
3620	     probability becomes.  For example, for a packet loss probability
3621	     of 10%, k=1, and n=2, the residual error probability is about 1%,
3622	     whereas for n=3, the residual error probability is about 0.1%.

3624	   d) At the same packet loss probability p and for a fixed code rate
3625	     k/n, the greater the value of n is, the smaller the residual error
3626	     probability becomes.  For example, at a packet loss probability of
3627	     p=10%, k=1 and n=2, the residual error rate is about 1%, whereas
3628	     for an extended Golay code with k=12 and n=24, the residual error
3629	     rate is about 0.01%.

3631	   For applying RFC 5109 in combination with H.264 baseline coded video
3632	   without using FUs, several options might be considered:

3634	   1) The video encoder produces NAL units for which each video frame is
3635	     coded in a single slice.  Applying FEC, one could use a simple
3636	     code; e.g., (n=2, k=1).  That is, each NAL unit would basically
3637	     just be repeated.  The disadvantage is obviously the bad code
3638	     performance according to d), above, and the low flexibility, as
3639	     only (n, k=1) codes can be used.

3641	   2) The video encoder produces NAL units for which each video frame is
3642	     encoded in one or more consecutive slices.  Applying FEC, one
3643	     could use a better code, e.g., (n=24, k=12), over a sequence of
3644	     NAL units.  Depending on the number of RTP packets per frame, a
3645	     loss may introduce a significant delay, which is reduced when more
3646	     RTP packets are used per frame.  Packets of completely different
3647	     length might also be connected, which decreases bit rate
3648	     efficiency according to b), above.  However, with some care and
3649	     for slices of 1kb or larger, similar length (100-200 bytes
3650	     difference) may be produced, which will not lower the bit
3651	     efficiency catastrophically.

3653	   3) The video encoder produces NAL units, for which a certain frame
3654	     contains k slices of possibly almost equal length.  Then, applying
3655	     FEC, a better code, e.g., (n=24, k=12), can be used over the
3656	     sequence of NAL units for each frame.  The delay compared to that
3657	     of 2), above, may be reduced, but several disadvantages are
3658	     obvious.  First, the coding efficiency of the encoded video is
3659	     lowered significantly, as slice-structured coding reduces intra-
3660	     frame prediction and additional slice overhead is necessary.
3661	     Second, pre-encoded content or, when operating over a gateway, the
3662	     video is usually not appropriately coded with k slices such that
3663	     FEC can be applied.  Finally, the encoding of video producing k
3664	     slices of equal length is not straightforward and might require
3665	     more than one encoding pass.

3667	   Many of the mentioned disadvantages can be avoided by applying FUs in
3668	   combination with FEC.  Each NAL unit can be split into any number of
3669	   FUs of basically equal length; therefore, FEC with a reasonable k and
3670	   n can be applied, even if the encoder made no effort to produce
3671	   slices of equal length.  For example, a coded slice NAL unit
3672	   containing an entire frame can be split to k FUs, and a parity check
3673	   code (n=k+1, k) can be applied.  However, this has the disadvantage
3674	   that unless all created fragments can be recovered, the whole slice
3675	   will be lost.  Thus a larger section is lost than would be if the
3676	   frame had been split into several slices.

3678	   The presented technique makes it possible to achieve good
3679	   transmission error tolerance, even if no additional source coding
3680	   layer redundancy (such as periodic intra frames) is present.
3681	   Consequently, the same coded video sequence can be used to achieve
3682	   the maximum compression efficiency and quality over error-free
3683	   transmission and for transmission over error-prone networks.
3684	   Furthermore, the technique allows the application of FEC to pre-
3685	   encoded sequences without adding delay.  In this case, pre-encoded
3686	   sequences that are not encoded for error-prone networks can still be
3687	   transmitted almost reliably without adding extensive delays.  In
3688	   addition, FUs of equal length result in a bit rate efficient use of
3689	   RFC 5109.

3691	   If the error probability depends on the length of the transmitted
3692	   packet (e.g., in case of mobile transmission [15]), the benefits of
3693	   applying FUs with FEC are even more obvious.  Basically, the
3694	   flexibility of the size of FUs allows appropriate FEC to be applied
3695	   for each NAL unit and unequal error protection of NAL units.

3697	   When FUs and FEC are used, the incurred overhead is substantial but
3698	   is in the same order of magnitude as the number of bits that have to
3699	   be spent for intra-coded macroblocks if no FEC is applied.  In [19],
3700	   it was shown that the overall performance of the FEC-based approach
3701	   enhanced quality when using the same error rate and same overall bit
3702	   rate, including the overhead.

3704	12.6. Low Bit-Rate Streaming

3706	   This scheme has been implemented with H.263 and non-standard RTP
3707	   packetization and has given good results [20].  There is no technical
3708	   reason why similarly good results could not be achievable with H.264.

3710	   In today's Internet streaming, some of the offered bit rates are
3711	   relatively low in order to allow terminals with dial-up modems to
3712	   access the content.  In wired IP networks, relatively large packets,
3713	   say 500 - 1500 bytes, are preferred to smaller and more frequently
3714	   occurring packets in order to reduce network congestion.  Moreover,
3715	   use of large packets decreases the amount of RTP/UDP/IP header
3716	   overhead.  For low bit-rate video, the use of large packets means
3717	   that sometimes up to few pictures should be encapsulated in one
3718	   packet.

3720	   However, loss of a packet including many coded pictures would have
3721	   drastic consequences for visual quality, as there is practically no
3722	   other way to conceal a loss of an entire picture than to repeat the
3723	   previous one.  One way to construct relatively large packets and
3724	   maintain possibilities for successful loss concealment is to
3725	   construct MTAPs that contain interleaved slices from several pictures.
3726	   An MTAP should not contain spatially adjacent slices from the same
3727	   picture or spatially overlapping slices from any picture.  If a
3728	   packet is lost, it is likely that a lost slice is surrounded by
3729	   spatially adjacent slices of the same picture and spatially
3730	   corresponding slices of the temporally previous and succeeding
3731	   pictures.  Consequently, concealment of the lost slice is likely to
3732	   be relatively successful.

3734	12.7. Robust Packet Scheduling in Video Streaming

3736	   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
3737	   simulated in a wireless streaming environment [21].  There is no
3738	   technical reason why similar or better results could not be
3739	   achievable with H.264.

3741	   Streaming clients typically have a receiver buffer that is capable of
3742	   storing a relatively large amount of data.  Initially, when a
3743	   streaming session is established, a client does not start playing the
3744	   stream back immediately.  Rather, it typically buffers the incoming
3745	   data for a few seconds.  This buffering helps maintain continuous
3746	   playback, as, in case of occasional increased transmission delays or
3747	   network throughput drops, the client can decode and play buffered
3748	   data.  Otherwise, without initial buffering, the client has to freeze
3749	   the display, stop decoding, and wait for incoming data.  The
3750	   buffering is also necessary for either automatic or selective
3751	   retransmission in any protocol level.  If any part of a picture is
3752	   lost, a retransmission mechanism may be used to resend the lost data.
3753	   If the retransmitted data is received before its scheduled decoding
3754	   or playback time, the loss is recovered perfectly.  Coded pictures
3755	   can be ranked according to their importance in the subjective quality
3756	   of the decoded sequence.  For example, non-reference pictures, such
3757	   as conventional B pictures, are subjectively least important, as
3758	   their absence does not affect decoding of any other pictures.  In
3759	   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-
3760	   10 standard includes a temporal scalability method called sub-
3761	   sequences [22].  Subjective ranking can also be made on coded slice
3762	   data partition or slice group basis.  Coded slices and coded slice
3763	   data partitions that are subjectively the most important can be sent
3764	   earlier than their decoding order indicates, whereas coded slices and
3765	   coded slice data partitions that are subjectively the least important
3766	   can be sent later than their natural coding order indicates.
3767	   Consequently, any retransmitted parts of the most important slices
3768	   and coded slice data partitions are more likely to be received before
3769	   their scheduled decoding or playback time compared to the least
3770	   important slices and slice data partitions.

3772	13. Informative Appendix: Rationale for Decoding Order Number

3774	13.1. Introduction

3776	   The Decoding Order Number (DON) concept was introduced mainly to
3777	   enable efficient multi-picture slice interleaving (see section 12.6)
3778	   and robust packet scheduling (see section 12.7).  In both of these
3779	   applications, NAL units are transmitted out of decoding order.  DON
3780	   indicates the decoding order of NAL units and should be used in the
3781	   receiver to recover the decoding order.  Example use cases for
3782	   efficient multi-picture slice interleaving and for robust packet
3783	   scheduling are given in sections 13.2 and 13.3, respectively.
3784	   Section 13.4 describes the benefits of the DON concept in error
3785	   resiliency achieved by redundant coded pictures.  Section 13.5
3786	   summarizes considered alternatives to DON and justifies why DON was
3787	   chosen to this RTP payload specification.

3789	13.2. Example of Multi-Picture Slice Interleaving

3791	   An example of multi-picture slice interleaving follows.  A subset of
3792	   a coded video sequence is depicted below in output order.  R denotes
3793	   a reference picture, N denotes a non-reference picture, and the
3794	   number indicates a relative output time.

3796	      ... R1 N2 R3 N4 R5 ...

3798	   The decoding order of these pictures from left to right is as follows:

3800	      ... R1 R3 N2 R5 N4 ...

3802	   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
3803	   DON equal to 1, 2, 3, 4, and 5, respectively.

3805	   Each reference picture consists of three slice groups that are
3806	   scattered as follows (a number denotes the slice group number for
3807	   each macroblock in a QCIF frame):

3809	      0 1 2 0 1 2 0 1 2 0 1
3810	      2 0 1 2 0 1 2 0 1 2 0
3811	      1 2 0 1 2 0 1 2 0 1 2
3812	      0 1 2 0 1 2 0 1 2 0 1
3813	      2 0 1 2 0 1 2 0 1 2 0
3814	      1 2 0 1 2 0 1 2 0 1 2
3815	      0 1 2 0 1 2 0 1 2 0 1
3816	      2 0 1 2 0 1 2 0 1 2 0
3817	      1 2 0 1 2 0 1 2 0 1 2

3819	   For the sake of simplicity, we assume that all the macroblocks of a
3820	   slice group are included in one slice.  Three MTAPs are constructed
3821	   from three consecutive reference pictures so that each MTAP contains
3822	   three aggregation units, each of which contains all the macroblocks
3823	   from one slice group.  The first MTAP contains slice group 0 of
3824	   picture R1, slice group 1 of picture R3, and slice group 2 of picture
3825	   R5.  The second MTAP contains slice group 1 of picture R1, slice
3826	   group 2 of picture R3, and slice group 0 of picture R5.  The third
3827	   MTAP contains slice group 2 of picture R1, slice group 0 of picture
3828	   R3, and slice group 1 of picture R5.  Each non-reference picture is
3829	   encapsulated into an STAP-B.

3831	   Consequently, the transmission order of NAL units is the following:

3833	      R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
3834	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
3835	      R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
3836	      R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
3837	      R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
3838	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
3839	      R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
3840	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
3841	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
3842	      N2, DON 3, carried in STAP-B, RTP SN: N+3
3843	      N4, DON 5, carried in STAP-B, RTP SN: N+4

3845	   The receiver is able to organize the NAL units back in decoding order
3846	   based on the value of DON associated with each NAL unit.

3848	   If one of the MTAPs is lost, the spatially adjacent and temporally
3849	   co-located macroblocks are received and can be used to conceal the
3850	   loss efficiently.  If one of the STAPs is lost, the effect of the
3851	   loss does not propagate temporally.

3853	13.3. Example of Robust Packet Scheduling

3855	   An example of robust packet scheduling follows.  The communication
3856	   system used in the example consists of the following components in
3857	   the order that the video is processed from source to sink:

3859	      o camera and capturing
3860	      o pre-encoding buffer
3861	      o encoder
3862	      o encoded picture buffer
3863	      o transmitter
3864	      o transmission channel
3865	      o receiver
3866	      o receiver buffer
3867	      o decoder
3868	      o decoded picture buffer
3869	      o display

3871	   The video communication system used in the example operates as
3872	   follows.  Note that processing of the video stream happens gradually
3873	   and at the same time in all components of the system.  The source
3874	   video sequence is shot and captured to a pre-encoding buffer.  The
3875	   pre-encoding buffer can be used to order pictures from sampling order
3876	   to encoding order or to analyze multiple uncompressed frames for bit
3877	   rate control purposes, for example.  In some cases, the pre-encoding
3878	   buffer may not exist; instead, the sampled pictures are encoded right
3879	   away.  The encoder encodes pictures from the pre-encoding buffer and
3880	   stores the output; i.e., coded pictures, to the encoded picture
3881	   buffer.  The transmitter encapsulates the coded pictures from the
3882	   encoded picture buffer to transmission packets and sends them to a
3883	   receiver through a transmission channel.  The receiver stores the
3884	   received packets to the receiver buffer.  The receiver buffering
3885	   process typically includes buffering for transmission delay jitter.
3886	   The receiver buffer can also be used to recover correct decoding
3887	   order of coded data.  The decoder reads coded data from the receiver
3888	   buffer and produces decoded pictures as output into the decoded
3889	   picture buffer.  The decoded picture buffer is used to recover the
3890	   output (or display) order of pictures.  Finally, pictures are
3891	   displayed.

3893	   In the following example figures, I denotes an IDR picture, R denotes
3894	   a reference picture, N denotes a non-reference picture, and the
3895	   number after I, R, or N indicates the sampling time relative to the
3896	   previous IDR picture in decoding order.  Values below the sequence of
3897	   pictures indicate scaled system clock timestamps.  The system clock
3898	   is initialized arbitrarily in this example, and time runs from left
3899	   to right.  Each I, R, and N picture is mapped into the same timeline
3900	   compared to the previous processing step, if any, assuming that
3901	   encoding, transmission, and decoding take no time.  Thus, events
3902	   happening at the same time are located in the same column throughout
3903	   all example figures.

3905	   A subset of a sequence of coded pictures is depicted below in
3906	   sampling order.

3908	       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
3909	       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
3910	       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...

3912	             Figure 16  Sequence of pictures in sampling order

3914	   The sampled pictures are buffered in the pre-encoding buffer to
3915	   arrange them in encoding order.  In this example, we assume that the
3916	   non-reference pictures are predicted from both the previous and the
3917	   next reference picture in output order, except for the non-reference
3918	   pictures immediately preceding an IDR picture, which are predicted
3919	   only from the previous reference picture in output order.  Thus, the
3920	   pre-encoding buffer has to contain at least two pictures, and the
3921	   buffering causes a delay of two picture intervals.  The output of the
3922	   pre-encoding buffering process and the encoding (and decoding) order
3923	   of the pictures are as follows:

3925	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3926	       ... -|---|---|---|---|---|---|---|---|- ...
3927	       ... 60  61  62  63  64  65  66  67  68  ...

3929	         Figure 17  Re-ordered pictures in the pre-encoding buffer

3931	   The encoder or the transmitter can set the value of DON for each
3932	   picture to a value of DON for the previous picture in decoding order
3933	   plus one.

3935	   For the sake of simplicity, let us assume that:

3937	   o  the frame rate of the sequence is constant,
3938	   o  each picture consists of only one slice,
3939	   o  each slice is encapsulated in a single NAL unit packet,
3940	   o  there is no transmission delay, and
3941	   o  pictures are transmitted at constant intervals (that is, 1 /
3942	   (frame rate)).

3944	   When pictures are transmitted in decoding order, they are received as
3945	   follows:

3947	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3948	       ... -|---|---|---|---|---|---|---|---|- ...
3949	       ... 60  61  62  63  64  65  66  67  68  ...

3951	              Figure 18  Received pictures in decoding order

3953	   The OPTIONAL sprop-interleaving-depth media type parameter is set to
3954	   0, as the transmission (or reception) order is identical to the
3955	   decoding order.

3957	   The decoder has to buffer for one picture interval initially in its
3958	   decoded picture buffer to organize pictures from decoding order to
3959	   output order as depicted below:

3961	        ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
3962	        ... -|---|---|---|---|---|---|---|---|- ...
3963	        ... 61  62  63  64  65  66  67  68  69  ...

3965	                          Figure 19  Output order

3967	   The amount of required initial buffering in the decoded picture
3968	   buffer can be signaled in the buffering period SEI message or with
3969	   the num_reorder_frames syntax element of H.264 video usability
3970	   information.  num_reorder_frames indicates the maximum number of
3971	   frames, complementary field pairs, or non-paired fields that precede
3972	   any frame, complementary field pair, or non-paired field in the
3973	   sequence in decoding order and that follow it in output order.  For
3974	   the sake of simplicity, we assume that num_reorder_frames is used to
3975	   indicate the initial buffer in the decoded picture buffer.  In this
3976	   example, num_reorder_frames is equal to 1.

3978	   It can be observed that if the IDR picture I00 is lost during
3979	   transmission and a retransmission request is issued when the value of
3980	   the system clock is 62, there is one picture interval of time (until
3981	   the system clock reaches timestamp 63) to receive the retransmitted
3982	   IDR picture I00.

3984	   Let us then assume that IDR pictures are transmitted two frame
3985	   intervals earlier than their decoding position; i.e., the pictures
3986	   are transmitted as follows:

3988	        ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
3989	        ... --|---|---|---|---|---|---|---|---|- ...
3990	        ...  62  63  64  65  66  67  68  69  70  ...

3992	       Figure 20  Interleaving: Early IDR pictures in sending order

3994	   The OPTIONAL sprop-interleaving-depth media type parameter is set
3995	   equal to 1 according to its definition.  (The value of sprop-
3996	   interleaving-depth in this example can be derived as follows: Picture
3997	   I00 is the only picture preceding picture N58 or N59 in transmission
3998	   order and following it in decoding order.  Except for pictures I00,
3999	   N58, and N59, the transmission order is the same as the decoding
4000	   order of pictures.  As a coded picture is encapsulated into exactly
4001	   one NAL unit, the value of sprop-interleaving-depth is equal to the
4002	   maximum number of pictures preceding any picture in transmission
4003	   order and following the picture in decoding order.)

4005	   The receiver buffering process contains two pictures at a time
4006	   according to the value of the sprop-interleaving-depth parameter and
4007	   orders pictures from the reception order to the correct decoding
4008	   order based on the value of DON associated with each picture.  The
4009	   output of the receiver buffering process is as follows:

4011	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
4012	       ... -|---|---|---|---|---|---|---|---|- ...
4013	       ... 63  64  65  66  67  68  69  70  71  ...

4015	                 Figure 21  Interleaving: Receiver buffer

4017	   Again, an initial buffering delay of one picture interval is needed
4018	   to organize pictures from decoding order to output order, as depicted
4019	   below:

4021	        ... N58 N59 I00 N01 N02 R03 N04 N05 ...
4022	        ... -|---|---|---|---|---|---|---|- ...
4023	        ... 64  65  66  67  68  69  70  71  ...

4025	         Figure 22  Interleaving: Receiver buffer after reordering

4027	   Note that the maximum delay that IDR pictures can undergo during
4028	   transmission, including possible application, transport, or link
4029	   layer retransmission, is equal to three picture intervals.  Thus, the
4030	   loss resiliency of IDR pictures is improved in systems supporting
4031	   retransmission compared to the case in which pictures were
4032	   transmitted in their decoding order.

4034	13.4. Robust Transmission Scheduling of Redundant Coded Slices

4036	   A redundant coded picture is a coded representation of a picture or a
4037	   part of a picture that is not used in the decoding process if the
4038	   corresponding primary coded picture is correctly decoded.  There
4039	   should be no noticeable difference between any area of the decoded
4040	   primary picture and a corresponding area that would result from
4041	   application of the H.264 decoding process for any redundant picture
4042	   in the same access unit.  A redundant coded slice is a coded slice
4043	   that is a part of a redundant coded picture.

4045	   Redundant coded pictures can be used to provide unequal error
4046	   protection in error-prone video transmission.  If a primary coded
4047	   representation of a picture is decoded incorrectly, a corresponding
4048	   redundant coded picture can be decoded.  Examples of applications and
4049	   coding techniques using the redundant codec picture feature include
4050	   the video redundancy coding [23] and the protection of "key pictures"
4051	   in multicast streaming [24].

4053	   One property of many error-prone video communications systems is that
4054	   transmission errors are often bursty.  Therefore, they may affect
4055	   more than one consecutive transmission packets in transmission order.
4056	   In low bit-rate video communication, it is relatively common that an
4057	   entire coded picture can be encapsulated into one transmission packet.
4058	   Consequently, a primary coded picture and the corresponding redundant
4059	   coded pictures may be transmitted in consecutive packets in
4060	   transmission order.  To make the transmission scheme more tolerant of
4061	   bursty transmission errors, it is beneficial to transmit the primary
4062	   coded picture and redundant coded picture separated by more than a
4063	   single packet.  The DON concept enables this.

4065	13.5. Remarks on Other Design Possibilities

4067	   The slice header syntax structure of the H.264 coding standard
4068	   contains the frame_num syntax element that can indicate the decoding
4069	   order of coded frames.  However, the usage of the frame_num syntax
4070	   element is not feasible or desirable to recover the decoding order,
4071	   due to the following reasons:

4073	   o  The receiver is required to parse at least one slice header per
4074	      coded picture (before passing the coded data to the decoder).

4076	   o  Coded slices from multiple coded video sequences cannot be
4077	      interleaved, as the frame number syntax element is reset to 0 in
4078	      each IDR picture.

4080	   o  The coded fields of a complementary field pair share the same
4081	      value of the frame_num syntax element.  Thus, the decoding order
4082	      of the coded fields of a complementary field pair cannot be
4083	      recovered based on the frame_num syntax element or any other
4084	      syntax element of the H.264 coding syntax.

4086	   The RTP payload format for transport of MPEG-4 elementary streams [25]
4087	   enables interleaving of access units and transmission of multiple
4088	   access units in the same RTP packet.  An access unit is specified in
4089	   the H.264 coding standard to comprise all NAL units associated with a
4090	   primary coded picture according to subclause 7.4.1.2 of [1].
4091	   Consequently, slices of different pictures cannot be interleaved, and
4092	   the multi-picture slice interleaving technique (see section 12.6) for
4093	   improved error resilience cannot be used.

4095	14. Acknowledgements

4097	   Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
4098	   Westerlund, and David Singer are thanked as the authors of RFC 3984.
4099	   Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
4100	   Joerg Ott, and Colin Perkins are thanked for careful review during
4101	   the development of RFC 3984. Randell Jesup, Stephen Botzko, Magnus
4102	   Westerlund, Alex Eleftheriadis, Thomas Schierl, and Tom Taylor are
4103	   thanked for their valuable comments and inputs during the development
4104	   of this memo.

4106	   This document was prepared using 2-Word-v2.0.template.dot.

4108	15. References

4110	15.1. Normative References

4112	   [1]   ITU-T Recommendation H.264, "Advanced video coding for generic
4113	         audiovisual services", November 2007.

4115	   [2]   ISO/IEC International Standard 14496-10:2008.

4117	   [3]   ITU-T Recommendation H.241, "Extended video procedures and
4118	         control signals for H.300 series terminals", May 2006.

4120	   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
4121	         Levels", BCP 14, RFC 2119, March 1997.

4123	   [5]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
4124	         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
4125	         RFC 3550, July 2003.

4127	   [6]   Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
4128	         Description Protocol", RFC 4566, July 2006.

4130	   [7]   Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
4131	         RFC 3548, July 2003.

4133	   [8]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
4134	         Session Description Protocol (SDP)", RFC 3264, June 2002.

4136	   [9]   Lennox, J., Ott, J., and Schierl, T., "Source-Specific Media
4137	         Attributes in the Session Description Protocol", draft-ietf-
4138	         mmusic-sdp-source-attributes-02 (work in progress), October
4139	         2008.

4141	15.2. Informative References

4143	   [10]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special
4144	         Issue on H.264/AVC. IEEE Transactions on Circuits and Systems
4145	         on Video Technology, July 2003.

4147	   [11]  Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
4148	         Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
4149	         Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
4150	         (H.263+)", RFC 2429, October 1998.

4152	   [12]  ISO/IEC IS 14496-2.

4154	   [13]  Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
4155	         Systems for Video technology, Vol. 13, No. 7, July 2003.

4157	   [14]  Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
4158	         Proceedings Packet Video Workshop 02, April 2002.

4160	   [15]  Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
4161	         Coding Network Abstraction Layer and IP-based Transport" in
4162	         Proc. ICIP 2002, Rochester, NY, September 2002.

4164	   [16]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
4165	         Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

4167	   [17]  ITU-T Recommendation H.223, "Multiplexing protocol for low bit
4168	         rate multimedia communication", July 2001.

4170	   [18]  Li, A., "RTP Payload Format for Generic Forward Error
4171	         Correction", RFC 5109, December 2007.

4173	   [19]  Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
4174	         "Video Coding and Transport Layer Techniques for H.264/AVC-
4175	         Based Transmission over Packet-Lossy Networks", IEEE
4176	         International Conference on Image Processing (ICIP 2003),
4177	         Barcelona, Spain, September 2003.

4179	   [20]  Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
4180	         video packetization", Packet Video Workshop 2000.

4182	   [21]  Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
4183	         wireless video streaming," International Packet Video Workshop
4184	         2002.

4186	   [22]  Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042,
4187	         available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
4188	         B042.doc, anuary 2002.

4190	   [23]  Wenger, S., "Video Redundancy Coding in H.263+", 1997
4191	         International Workshop on Audio-Visual Services over Packet
4192	         Networks, September 1997.

4194	   [24]  Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
4195	         Video Coding Using Unequally Protected Key Pictures", in Proc.
4196	         International Workshop VLBV03, September 2003.

4198	   [25]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
4199	         P. Gentric, "RTP Payload Format for Transport of MPEG-4
4200	         Elementary Streams", RFC 3640, November 2003.

4202	   [26]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
4203	         Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
4204	         3711, March 2004.

4206	   [27]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
4207	         Protocol (RTSP)", RFC 2326, April 1998.

4209	   [28]  Handley, M., Perkins, C., and E. Whelan, "Session Announcement
4210	         Protocol", RFC 2974, October 2000.

4212	   [29]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
4213	         January 2008.

4215	   [30]  Wenger, S., Chandra, U., and M. Westerlund, "Codec Control
4216	         Messages in the RTP Audio-Visual Profile with Feedback (AVPF)",
4217	         RFC 5104, February 2008.

4219	16. Authors' Addresses

4221	   Ye-Kui Wang
4222	   Huawei Technologies
4223	   400 Somerset Corporate Blvd
4224	   Bridgewater, NJ 08807
4225	   USA

4227	   Phone: +1-908-393-4758
4228	   EMail: yekuiwang@huawei.com

4230	   Roni Even
4231	   14 David Hamelech
4232	   Tel Aviv 64953
4233	   Israel

4235	   Phone: +972-545481099
4236	   Email: ron.even.tlv@gmail.com

4238	   Tom Kristensen
4239	   TANDBERG
4240	   Philip Pedersens vei 22
4241	   N-1366 Lysaker
4242	   Norway

4244	   Phone: +47 67125125
4245	   Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no

4247	17. Backward Compatibility to RFC 3984

4249	   The current document is a revision of RFC 3984 and intends to
4250	   obsolete it.  This section addresses the backward compatibility
4251	   issues.

4253	   The technical changes are listed in section 18.

4255	   Items 1), 2), 3), 7), 9), 10), 12), 13) are bug-fix type of changes,
4256	   and do not incur any backward compatibility issues.

4258	   Item 4), addition of six new media type parameters, does not incur
4259	   any backward compatibility issues for SDP Offer/Answer based
4260	   applications, as legacy RFC 3984 receivers ignore these parameters,
4261	   and it is fine for legacy RFC 3984 senders not to use these
4262	   parameters as they are optional.  However, there is a backward
4263	   compatibility issue for SDP declarative usage based applications, e.g.
4264	   those using RTSP and SAP, because the SDP receiver per RFC 3984
4265	   cannot accept a session for which the SDP includes an unrecognized
4266	   parameter.  Therefore, the RTSP or SAP server may have to prepare two
4267	   sets of streams, one for legacy RFC 3984 receivers and one for
4268	   receivers according to this memo.

4270	   Items 5), 6) and 11) are related to out-of-band transport of
4271	   parameter sets.  There are following backward compatibility issues.

4273	   1) When a legacy sender per RFC 3984 includes parameter sets for a
4274	     level different than the default level indicated by profile-level-
4275	     id to sprop-parameter-sets, the parameter value of sprop-
4276	     parameter-sets is invalid to the receiver per this memo and
4277	     therefore the session may be rejected.

4279	   2) In SDP Offer/Answer between a legacy offerer per RFC 3984 and an
4280	     answerer per this memo, when the answerer includes in the answer
4281	     parameter sets that are not a superset of the parameter sets
4282	     included in the offer, the parameter value of sprop-parameter-sets
4283	     is invalid to offerer and the session may not be initiated
4284	     properly (related to change item 11)).

4286	   3) When one endpoint A per this memo includes in-band-parameter-sets
4287	     equal to 1, the other side B per RFC 3984 does not understand that
4288	     it must transmit parameter sets in-band and B may still exclude
4289	     parameter sets in the in-band stream it is sending. Consequently
4290	     endpoint A cannot decode the stream it receives.

4292	   Item 7), allowance of conveying sprop-parameter-sets and sprop-level-
4293	   parameter-sets using the "fmtp" source attribute as specified in
4294	   section 6.3 of [9], is similar as item 4).  It does not incur any
4295	   backward compatibility issues for SDP Offer/Answer based applications,
4296	   as legacy RFC 3984 receivers ignore the "fmtp" source attribute, and
4297	   it is fine for legacy RFC 3984 senders not to use the "fmtp" source
4298	   attribute as it is optional.  However, there is a backward
4299	   compatibility issue for SDP declarative usage based applications, e.g.
4300	   those using RTSP and SAP, because the SDP receiver per RFC 3984
4301	   cannot accept a session for which the SDP includes an unrecognized
4302	   parameter (i.e., the "fmtp" source attribute).  Therefore, the RTSP
4303	   or SAP server may have to prepare two sets of streams, one for legacy
4304	   RFC 3984 receivers and one for receivers according to this memo.

4306	   Item 14) removed that use of out-of-band transport of parameter sets
4307	   is recommended.  As out-of-band transport of parameter sets is still
4308	   allowed, this change does not incur any backward compatibility issues.

4310	   Item 15) does not incur any backward compatibility issues as the
4311	   added subsection 8.5 is informative.

4313	18. Changes from RFC 3984

4315	   Following is the list of technical changes (including bug fixes) from
4316	   RFC 3984.  Besides this list of technical changes, numerous editorial
4317	   changes have been made, but not documented in this memo.

4319	   1) In subsections 5.4, 5.5, 6.2, 6,3 and 6.4, removed that the
4320	     packetization mode in use may be signaled by external means.

4322	   2) In subsection 7.2.2, changed the sentence

4324	      There are N VCL NAL units in the deinterleaving buffer.

4326	      to

4328	      There are N or more VCL NAL units in the de-interleaving buffer.

4330	   3) In subsection 8.1, the semantics of sprop-init-buf-time, paragraph
4331	     2, changed the sentence

4333	      The parameter is the maximum value of (transmission time of a NAL
4334	      unit - decoding time of the NAL unit), assuming reliable and
4335	      instantaneous transmission, the same timeline for transmission
4336	      and decoding, and that decoding starts when the first packet
4337	      arrives.

4339	      to

4341	      The parameter is the maximum value of (decoding time of the NAL
4342	      unit - transmission time of a NAL unit), assuming reliable and
4343	      instantaneous transmission, the same timeline for transmission
4344	      and decoding, and that decoding starts when the first packet
4345	      arrives.

4347	   4) Added six new media type parameters, namely max-smbps, sprop-
4348	     level-parameter-sets, use-level-src-parameter-sets, in-band-
4349	     parameter-sets, sar-understood and sar-supported.

4351	   5) In subsection 8.1, removed the specification of parameter-add.
4352	     Other descriptions of parameter-add (in subsections 8.2 and 8.4)
4353	     are also removed.

4355	   6) In subsection 8.1, added a constraint to sprop-parameter-sets such
4356	     that it can only contain parameter sets for the same profile and
4357	     level as indicated by profile-level-id.

4359	   7) In subsection 8.2.1, added that sprop-parameter-sets and sprop-
4360	     level-parameter-sets may be either included in the "a=fmtp" line
4361	     of SDP or conveyed using the "fmtp" source attribute as specified
4362	     in section 6.3 of [9].

4364	   8) In subsection 8.2.2, removed sprop-deint-buf-req from being part
4365	     of the media format configuration in usage with the SDP
4366	     Offer/Answer model.

4368	   9) In subsection 8.2.2, made it clear that level is downgradable in
4369	     the SDP Offer/Answer model, i.e. the use of the level part of
4370	     "profile-level-id" does not need to be symmetric (the level
4371	     included in the answer can be lower than or equal to the level
4372	     included in the offer).

4374	   10)In subsection 8.2.2, removed that the capability parameters may be
4375	     used to declare encoding capabilities.

4377	   11)In subsection 8.2.2, added rules on how to use sprop-parameter-
4378	     sets and sprop-level-parameter-sets for out-of-band transport of
4379	     parameter sets, with or without level downgrading.

4381	   12)In subsection 8.2.2, clarified the rules of using the media type
4382	     parameters with SDP Offer/Answer for multicast.

4384	   13)In subsection 8.2.2, completed and corrected the list of how
4385	     different media type parameters shall be interpreted in the
4386	     different combinations of offer or answer and direction attribute.

4388	   14)In subsection 8.4, changed the text such that both out-of-band and
4389	     in-band transport of parameter sets are allowed and neither is
4390	     recommended or required.

4392	   15)Added subsection 8.5 (informative) providing example methods for
4393	     decoder refresh to handle parameter set losses.