idnits 2.17.1 

draft-ietf-avt-rtp-rfc3984bis-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 4253.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4230.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4237.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4243.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 16, 2008) is 5609 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '4' is defined on line 4094, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648)

  -- Obsolete informational reference (is this intentional?): RFC 2429 (ref.
     '11') (Obsoleted by RFC 4629)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '27') (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117 (ref.
     '29') (Obsoleted by RFC 7667)


     Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 14 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport WG                                     Y.-K. Wang
2	Internet Draft                                                    Nokia
3	Intended status: Standards track                                R. Even
4	Expires: June 2009                                        Self-employed
5	                                                          T. Kristensen
6	                                                               Tandberg
7	                                                      December 16, 2008

9	                    RTP Payload Format for H.264 Video
10	                   draft-ietf-avt-rtp-rfc3984bis-02.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html

35	   This Internet-Draft will expire on June 16, 2009.

37	Copyright Notice

39	   Copyright (C) The IETF Trust (2008).

41	Abstract

43	   This memo describes an RTP Payload format for the ITU-T
44	   Recommendation H.264 video codec and the technically identical
45	   ISO/IEC International Standard 14496-10 video codec, excluding the
46	   Scalable Video Coding (SVC) extension and the Multivew Video Coding
47	   extension, for which the RTP payload formats are defined elsewhere.
48	   The RTP payload format allows for packetization of one or more
49	   Network Abstraction Layer Units (NALUs), produced by an H.264 video
50	   encoder, in each RTP payload.  The payload format has wide
51	   applicability, as it supports applications from simple low bit-rate
52	   conversational usage, to Internet video streaming with interleaved
53	   transmission, to high bit-rate video-on-demand.

55	   This memo intends to obsolete RFC 3984.  Changes from RFC 3984 are
56	   summarized in section 17.  Issues on backward compatibility to RFC
57	   3984 are discussed in section 16.

59	Table of Contents

61	   1. Introduction...................................................3
62	      1.1. The H.264 Codec...........................................3
63	      1.2. Parameter Set Concept.....................................3
64	      1.3. Network Abstraction Layer Unit Types......................3
65	   2. Conventions....................................................3
66	   3. Scope..........................................................3
67	   4. Definitions and Abbreviations..................................3
68	      4.1. Definitions...............................................3
69	      4.2. Abbreviations.............................................3
70	   5. RTP Payload Format.............................................3
71	      5.1. RTP Header Usage..........................................3
72	      5.2. Payload Structures........................................3
73	      5.3. NAL Unit Header Usage.....................................3
74	      5.4. Packetization Modes.......................................3
75	      5.5. Decoding Order Number (DON)...............................3
76	      5.6. Single NAL Unit Packet....................................3
77	      5.7. Aggregation Packets.......................................3
78	         5.7.1. Single-Time Aggregation Packet.......................3
79	         5.7.2. Multi-Time Aggregation Packets (MTAPs)...............3
80	         5.7.3. Fragmentation Units (FUs)............................3
81	   6. Packetization Rules............................................3
82	      6.1. Common Packetization Rules................................3
83	      6.2. Single NAL Unit Mode......................................3
84	      6.3. Non-Interleaved Mode......................................3
85	      6.4. Interleaved Mode..........................................3
86	   7. De-Packetization Process.......................................3
87	      7.1. Single NAL Unit and Non-Interleaved Mode..................3
88	      7.2. Interleaved Mode..........................................3
89	         7.2.1. Size of the De-interleaving Buffer...................3
90	         7.2.2. De-interleaving Process..............................3
91	      7.3. Additional De-Packetization Guidelines....................3
92	   8. Payload Format Parameters......................................3
93	      8.1. Media Type Registration...................................3
94	      8.2. SDP Parameters............................................3
95	         8.2.1. Mapping of Payload Type Parameters to SDP............3
96	         8.2.2. Usage with the SDP Offer/Answer Model................3
97	         8.2.3. Usage in Declarative Session Descriptions............3
98	      8.3. Examples..................................................3
99	      8.4. Parameter Set Considerations..............................3
100	      8.5. Decoder Refresh Point Procedure using In-Band Transport of
101	      Parameter Sets (Informative)...................................3
102	         8.5.1. IDR Procedure to Respond to a Request for a Decoder
103	         Refresh Point...............................................3
104	         8.5.2. Gradual Recovery Procedure to Respond to a Request for a
105	         Decoder Refresh Point.......................................3
106	   9. Security Considerations........................................3
107	   10. Congestion Control............................................3
108	   11. IANA Consideration............................................3
109	   12. Informative Appendix: Application Examples....................3
110	      12.1. Video Telephony according to ITU-T Recommendation H.241
111	      Annex A........................................................3
112	      12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
113	      Aggregation....................................................3
114	      12.3. Video Telephony, Interleaved Packetization Using NAL Unit
115	      Aggregation....................................................3
116	      12.4. Video Telephony with Data Partitioning...................3
117	      12.5. Video Telephony or Streaming with FUs and Forward Error
118	      Correction.....................................................3
119	      12.6. Low Bit-Rate Streaming...................................3
120	      12.7. Robust Packet Scheduling in Video Streaming..............3
121	   13. Informative Appendix: Rationale for Decoding Order Number.....3
122	      13.1. Introduction.............................................3
123	      13.2. Example of Multi-Picture Slice Interleaving..............3
124	      13.3. Example of Robust Packet Scheduling......................3
125	      13.4. Robust Transmission Scheduling of Redundant Coded Slices.3
126	      13.5. Remarks on Other Design Possibilities....................3
127	   14. Acknowledgements..............................................3
128	   15. References....................................................3
129	      15.1. Normative References.....................................3
130	      15.2. Informative References...................................3
131	   Authors' Addresses................................................3
132	   Intellectual Property Statement...................................3
133	   Disclaimer of Validity............................................3
134	   Acknowledgement...................................................3
135	   16. Backward Compatibility to RFC 3984............................3
136	   17. Changes from RFC 3984.........................................3

138	1. Introduction

140	   This memo intends to obsolete RFC 3984.  Changes from RFC 3984 are
141	   summarized in section 17.   Issues on backward compatibility to RFC
142	   3984 are discussed in section 16.

144	1.1. The H.264 Codec

146	   This memo specifies an RTP payload specification for the video coding
147	   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
148	   International Standard 14496 Part 10 [2] (both also known as Advanced
149	   Video Coding, or AVC).  In this memo the H.264 acronym is used for
150	   the codec and the standard, but the memo is equally applicable to the
151	   ISO/IEC counterpart of the coding standard.

153	   The H.264 video codec has a very broad application range that covers
154	   all forms of digital compressed video from, low bit-rate Internet
155	   streaming applications to HDTV broadcast and Digital Cinema
156	   applications with nearly lossless coding.  Compared to the current
157	   state of technology, the overall performance of H.264 is such that
158	   bit rate savings of 50% or more are reported.  Digital Satellite TV
159	   quality, for example, was reported to be achievable at 1.5 Mbit/s,
160	   compared to the current operation point of MPEG 2 video at around 3.5
161	   Mbit/s [10].

163	   The codec specification [1] itself distinguishes conceptually between
164	   a video coding layer (VCL) and a network abstraction layer (NAL).
165	   The VCL contains the signal processing functionality of the codec;
166	   mechanisms such as transform, quantization, and motion compensated
167	   prediction; and a loop filter.  It follows the general concept of
168	   most of today's video codecs, a macroblock-based coder that uses
169	   inter picture prediction with motion compensation and transform
170	   coding of the residual signal.  The VCL encoder outputs slices: a bit
171	   string that contains the macroblock data of an integer number of
172	   macroblocks, and the information of the slice header (containing the
173	   spatial address of the first macroblock in the slice, the initial
174	   quantization parameter, and similar information).  Macroblocks in
175	   slices are arranged in scan order unless a different macroblock
176	   allocation is specified, by using the so-called Flexible Macroblock
177	   Ordering syntax.  In-picture prediction is used only within a slice.
178	   More information is provided in [10].

180	   The Network Abstraction Layer (NAL) encoder encapsulates the slice
181	   output of the VCL encoder into Network Abstraction Layer Units (NAL
182	   units), which are suitable for transmission over packet networks or
183	   use in packet oriented multiplex environments.  Annex B of H.264
184	   defines an encapsulation process to transmit such NAL units over
185	   byte-stream oriented networks.  In the scope of this memo, Annex B is
186	   not relevant.

188	   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
189	   byte header and the payload byte string.  The header indicates the
190	   type of the NAL unit, the (potential) presence of bit errors or
191	   syntax violations in the NAL unit payload, and information regarding
192	   the relative importance of the NAL unit for the decoding process.
193	   This RTP payload specification is designed to be unaware of the bit
194	   string in the NAL unit payload.

196	   One of the main properties of H.264 is the complete decoupling of the
197	   transmission time, the decoding time, and the sampling or
198	   presentation time of slices and pictures.  The decoding process
199	   specified in H.264 is unaware of time, and the H.264 syntax does not
200	   carry information such as the number of skipped frames (as is common
201	   in the form of the Temporal Reference in earlier video compression
202	   standards).  Also, there are NAL units that affect many pictures and
203	   that are, therefore, inherently timeless.  For this reason, the
204	   handling of the RTP timestamp requires some special considerations
205	   for NAL units for which the sampling or presentation time is not
206	   defined or, at transmission time, unknown.

208	1.2. Parameter Set Concept

210	   One very fundamental design concept of H.264 is to generate self-
211	   contained packets, to make mechanisms such as the header duplication
212	   of RFC 2429 [11] or MPEG-4's Header Extension Code (HEC) [12]
213	   unnecessary.  This was achieved by decoupling information relevant to
214	   more than one slice from the media stream.  This higher layer meta
215	   information should be sent reliably, asynchronously, and in advance
216	   from the RTP packet stream that contains the slice packets.
217	   (Provisions for sending this information in-band are also available
218	   for applications that do not have an out-of-band transport channel
219	   appropriate for the purpose.)  The combination of the higher-level
220	   parameters is called a parameter set.  The H.264 specification
221	   includes two types of parameter sets: sequence parameter set and
222	   picture parameter set.  An active sequence parameter set remains
223	   unchanged throughout a coded video sequence, and an active picture
224	   parameter set remains unchanged within a coded picture.  The sequence
225	   and picture parameter set structures contain information such as
226	   picture size, optional coding modes employed, and macroblock to slice
227	   group map.

229	   To be able to change picture parameters (such as the picture size)
230	   without having to transmit parameter set updates synchronously to the
231	   slice packet stream, the encoder and decoder can maintain a list of
232	   more than one sequence and picture parameter set.  Each slice header
233	   contains a codeword that indicates the sequence and picture parameter
234	   set to be used.

236	   This mechanism allows the decoupling of the transmission of parameter
237	   sets from the packet stream, and the transmission of them by external
238	   means (e.g., as a side effect of the capability exchange), or through
239	   a (reliable or unreliable) control protocol.  It may even be possible
240	   that they are never transmitted but are fixed by an application
241	   design specification.

243	1.3. Network Abstraction Layer Unit Types

245	   Tutorial information on the NAL design can be found in [13], [14],
246	   and [15].

248	   All NAL units consist of a single NAL unit type octet, which also co-
249	   serves as the payload header of this RTP payload format.  The payload
250	   of a NAL unit follows immediately.

252	   The syntax and semantics of the NAL unit type octet are specified in
253	   [1], but the essential properties of the NAL unit type octet are
254	   summarized below.  The NAL unit type octet has the following format:

256	      +---------------+
257	      |0|1|2|3|4|5|6|7|
258	      +-+-+-+-+-+-+-+-+
259	      |F|NRI|  Type   |
260	      +---------------+

262	   The semantics of the components of the NAL unit type octet, as
263	   specified in the H.264 specification, are described briefly below.

265	   F: 1 bit
266	      forbidden_zero_bit.  The H.264 specification declares a value of
267	      1 as a syntax violation.

269	   NRI: 2 bits
270	      nal_ref_idc.  A value of 00 indicates that the content of the NAL
271	      unit is not used to reconstruct reference pictures for inter
272	      picture prediction.  Such NAL units can be discarded without
273	      risking the integrity of the reference pictures.  Values greater
274	      than 00 indicate that the decoding of the NAL unit is required to
275	      maintain the integrity of the reference pictures.

277	   Type: 5 bits
278	      nal_unit_type.  This component specifies the NAL unit payload
279	      type as defined in Table 7-1 of [1], and later within this memo.
280	      For a reference of all currently defined NAL unit types and their
281	      semantics, please refer to section 7.4.1 in [1].

283	   This memo introduces new NAL unit types, which are presented in
284	   section 5.2.  The NAL unit types defined in this memo are marked as
285	   unspecified in [1].  Moreover, this specification extends the
286	   semantics of F and NRI as described in section 5.3.

288	2. Conventions

290	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
291	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
292	   document are to be interpreted as described in RFC-2119 [3].

294	   This specification uses the notion of setting and clearing a bit when
295	   bit fields are handled.  Setting a bit is the same as assigning that
296	   bit the value of 1 (On).  Clearing a bit is the same as assigning
297	   that bit the value of 0 (Off).

299	3. Scope

301	   This payload specification can only be used to carry the "naked"
302	   H.264 NAL unit stream over RTP, and not the bitstream format
303	   discussed in Annex B of H.264.  Likely, the first applications of
304	   this specification will be in the conversational multimedia field,
305	   video telephony or video conferencing, but the payload format also
306	   covers other applications, such as Internet streaming and TV over IP.

308	4. Definitions and Abbreviations

310	4.1. Definitions

312	   This document uses the definitions of [1].  The following terms,
313	   defined in [1], are summed up for convenience:

315	      access unit: A set of NAL units always containing a primary coded
316	      picture.  In addition to the primary coded picture, an access
317	      unit may also contain one or more redundant coded pictures or
318	      other NAL units not containing slices or slice data partitions of
319	      a coded picture.  The decoding of an access unit always results
320	      in a decoded picture.

322	      coded video sequence: A sequence of access units that consists,
323	      in decoding order, of an instantaneous decoding refresh (IDR)
324	      access unit followed by zero or more non-IDR access units
325	      including all subsequent access units up to but not including any
326	      subsequent IDR access unit.

328	      IDR access unit: An access unit in which the primary coded
329	      picture is an IDR picture.

331	      IDR picture: A coded picture containing only slices with I or SI
332	      slice types that causes a "reset" in the decoding process.  After
333	      the decoding of an IDR picture, all following coded pictures in
334	      decoding order can be decoded without inter prediction from any
335	      picture decoded prior to the IDR picture.

337	      primary coded picture: The coded representation of a picture to
338	      be used by the decoding process for a bitstream conforming to
339	      H.264.  The primary coded picture contains all macroblocks of the
340	      picture.

342	      redundant coded picture: A coded representation of a picture or a
343	      part of a picture.  The content of a redundant coded picture
344	      shall not be used by the decoding process for a bitstream
345	      conforming to H.264.  The content of a redundant coded picture
346	      may be used by the decoding process for a bitstream that contains
347	      errors or losses.

349	      VCL NAL unit: A collective term used to refer to coded slice and
350	      coded data partition NAL units.

352	   In addition, the following definitions apply:

354	      decoding order number (DON): A field in the payload structure or
355	      a derived variable indicating NAL unit decoding order.  Values of
356	      DON are in the range of 0 to 65535, inclusive.  After reaching
357	      the maximum value, the value of DON wraps around to 0.

359	      NAL unit decoding order: A NAL unit order that conforms to the
360	      constraints on NAL unit order given in section 7.4.1.2 in [1].

362	      NALU-time: The value that the RTP timestamp would have if the NAL
363	      unit would be transported in its own RTP packet.

365	      transmission order: The order of packets in ascending RTP
366	      sequence number order (in modulo arithmetic).  Within an
367	      aggregation packet, the NAL unit transmission order is the same
368	      as the order of appearance of NAL units in the packet.

370	      media aware network element (MANE): A network element, such as a
371	      middlebox or application layer gateway that is capable of parsing
372	      certain aspects of the RTP payload headers or the RTP payload and
373	      reacting to the contents.

375	         Informative note: The concept of a MANE goes beyond normal
376	         routers or gateways in that a MANE has to be aware of the
377	         signaling (e.g., to learn about the payload type mappings of
378	         the media streams), and in that it has to be trusted when
379	         working with SRTP.  The advantage of using MANEs is that they
380	         allow packets to be dropped according to the needs of the
381	         media coding.  For example, if a MANE has to drop packets due
382	         to congestion on a certain link, it can identify and remove
383	         those packets whose elimination produces the least adverse
384	         effect on the user experience.

386	      static macroblock: A certain amount of macroblocks in the video
387	      stream can be defined as static, as defined in section 8.3.2.8 in
388	      [3].  Static macroblocks free up additional processing cycles for
389	      the handling of non-static macroblocks.  Based on a given amount
390	      of video processing resources and a given resolution, a higher
391	      number of static macroblocks enables a correspondingly higher
392	      frame rate.

394	      default sub-profile: The subset of coding tools, which may be all
395	      coding tools of one profile or the common subset of coding tools
396	      of more than one profile, indicated by the profile-level-id
397	      parameter.  In SDP Offer/Answer, the default sub-profile must be
398	      used in a symmetric manner, i.e. the answer must either use the
399	      same sub-profile as the offer or reject the offer.

401	      default level: The level indicated by the profile-level-id
402	      parameter.  In SDP Offer/Answer, level is downgradable, i.e., the
403	      answer may either use the default level or a lower level.

405	4.2. Abbreviations

407	      DON:        Decoding Order Number
408	      DONB:       Decoding Order Number Base
409	      DOND:       Decoding Order Number Difference
410	      FEC:        Forward Error Correction
411	      FU:         Fragmentation Unit
412	      IDR:        Instantaneous Decoding Refresh
413	      IEC:        International Electrotechnical Commission
414	      ISO:        International Organization for Standardization
415	      ITU-T:      International Telecommunication Union,
416	                  Telecommunication Standardization Sector
417	      MANE:       Media Aware Network Element
418	      MTAP:       Multi-Time Aggregation Packet
419	      MTAP16:     MTAP with 16-bit timestamp offset
420	      MTAP24:     MTAP with 24-bit timestamp offset
421	      NAL:        Network Abstraction Layer
422	      NALU:       NAL Unit
423	      SAR:        Sample Aspect Ratio
424	      SEI:        Supplemental Enhancement Information
425	      STAP:       Single-Time Aggregation Packet
426	      STAP-A:     STAP type A
427	      STAP-B:     STAP type B
428	      TS:         Timestamp
429	      VCL:        Video Coding Layer
430	      VUI:        Video Usability Information

432	5. RTP Payload Format

434	5.1. RTP Header Usage

436	   The format of the RTP header is specified in RFC 3550 [5] and
437	   reprinted in Figure 1 for convenience.  This payload format uses the
438	   fields of the header in a manner consistent with that specification.

440	   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
441	   payload format is specified in section 5.6.  The RTP payload (and the
442	   settings for some RTP header bits) for aggregation packets and
443	   fragmentation units are specified in sections 5.7 and 5.8,
444	   respectively.

446	    0                   1                   2                   3
447	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
448	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
449	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
450	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
451	   |                           timestamp                           |
452	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
453	   |           synchronization source (SSRC) identifier            |
454	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
455	   |            contributing source (CSRC) identifiers             |
456	   |                             ....                              |
457	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

459	                 Figure 1 RTP header according to RFC 3550

461	   The RTP header information to be set according to this RTP payload
462	   format is set as follows:

464	   Marker bit (M): 1 bit
465	      Set for the very last packet of the access unit indicated by the
466	      RTP timestamp, in line with the normal use of the M bit in video
467	      formats, to allow an efficient playout buffer handling.  For
468	      aggregation packets (STAP and MTAP), the marker bit in the RTP
469	      header MUST be set to the value that the marker bit of the last
470	      NAL unit of the aggregation packet would have been if it were
471	      transported in its own RTP packet.  Decoders MAY use this bit as
472	      an early indication of the last packet of an access unit, but
473	      MUST NOT rely on this property.

475	         Informative note: Only one M bit is associated with an
476	         aggregation packet carrying multiple NAL units.  Thus, if a
477	         gateway has re-packetized an aggregation packet into several
478	         packets, it cannot reliably set the M bit of those packets.

480	   Payload type (PT): 7 bits
481	      The assignment of an RTP payload type for this new packet format
482	      is outside the scope of this document and will not be specified
483	      here.  The assignment of a payload type has to be performed
484	      either through the profile used or in a dynamic way.

486	   Sequence number (SN): 16 bits
487	      Set and used in accordance with RFC 3550.  For the single NALU
488	      and non-interleaved packetization mode, the sequence number is
489	      used to determine decoding order for the NALU.

491	   Timestamp: 32 bits
492	      The RTP timestamp is set to the sampling timestamp of the
493	      content.  A 90 kHz clock rate MUST be used.

495	      If the NAL unit has no timing properties of its own (e.g.,
496	      parameter set and SEI NAL units), the RTP timestamp is set to the
497	      RTP timestamp of the primary coded picture of the access unit in
498	      which the NAL unit is included, according to section 7.4.1.2 of
499	      [1].

501	      The setting of the RTP Timestamp for MTAPs is defined in section
502	      5.7.2.

504	      Receivers SHOULD ignore any picture timing SEI messages included
505	      in access units that have only one display timestamp.  Instead,
506	      receivers SHOULD use the RTP timestamp for synchronizing the
507	      display process.

509	      RTP senders SHOULD NOT transmit picture timing SEI messages for
510	      pictures that are not supposed to be displayed as multiple
511	      fields.

513	      If one access unit has more than one display timestamp carried in
514	      a picture timing SEI message, then the information in the SEI
515	      message SHOULD be treated as relative to the RTP timestamp, with
516	      the earliest event occurring at the time given by the RTP
517	      timestamp, and subsequent events later, as given by the
518	      difference in SEI message picture timing values.  Let tSEI1,
519	      tSEI2, ..., tSEIn be the display timestamps carried in the SEI
520	      message of an access unit, where tSEI1 is the earliest of all
521	      such timestamps.  Let tmadjst() be a function that adjusts the
522	      SEI messages time scale to a 90-kHz time scale.  Let TS be the
523	      RTP timestamp.  Then, the display time for the event associated
524	      with tSEI1 is TS.  The display time for the event with tSEIx,
525	      where x is [2..n] is TS + tmadjst (tSEIx - tSEI1).

527	         Informative note: Displaying coded frames as fields is needed
528	         commonly in an operation known as 3:2 pulldown, in which film
529	         content that consists of coded frames is displayed on a
530	         display using interlaced scanning.  The picture timing SEI
531	         message enables carriage of multiple timestamps for the same
532	         coded picture, and therefore the 3:2 pulldown process is
533	         perfectly controlled.  The picture timing SEI message
534	         mechanism is necessary because only one timestamp per coded
535	         frame can be conveyed in the RTP timestamp.

537	         Informative note: Because H.264 allows the decoding order to
538	         be different from the display order, values of RTP timestamps
539	         may not be monotonically non-decreasing as a function of RTP
540	         sequence numbers.  Furthermore, the value for inter-arrival
541	         jitter reported in the RTCP reports may not be a trustworthy
542	         indication of the network performance, as the calculation
543	         rules for inter-arrival jitter (section 6.4.1 of RFC 3550)
544	         assume that the RTP timestamp of a packet is directly
545	         proportional to its transmission time.

547	5.2. Payload Structures

549	   The payload format defines three different basic payload structures.
550	   A receiver can identify the payload structure by the first byte of
551	   the RTP packet payload, which co-serves as the RTP payload header
552	   and, in some cases, as the first byte of the payload.  This byte is
553	   always structured as a NAL unit header.  The NAL unit type field
554	   indicates which structure is present.  The possible structures are as
555	   follows:

557	   Single NAL Unit Packet: Contains only a single NAL unit in the
558	   payload.  The NAL header type field will be equal to the original NAL
559	   unit type; i.e., in the range of 1 to 23, inclusive.  Specified in
560	   section 5.6.

562	   Aggregation Packet: Packet type used to aggregate multiple NAL units
563	   into a single RTP payload.  This packet exists in four versions, the
564	   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
565	   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
566	   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
567	   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
568	   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
569	   27, respectively.  Specified in section 5.7.

571	   Fragmentation Unit: Used to fragment a single NAL unit over multiple
572	   RTP packets.  Exists with two versions, FU-A and FU-B, identified
573	   with the NAL unit type numbers 28 and 29, respectively.  Specified in
574	   section 5.8.

576	      Informative note: This specification does not limit the size of
577	      NAL units encapsulated in single NAL unit packets and
578	      fragmentation units.  The maximum size of a NAL unit encapsulated
579	      in any aggregation packet is 65535 bytes.

581	   Table 1 summarizes NAL unit types and the corresponding RTP packet
582	   types when each of these NAL units is directly used a packet payload,
583	   and where the types are described in this memo.

585	     Table 1.  Summary of NAL unit types and the corresponding packet
586	                                   types

588	      NAL Unit  Packet    Packet Type Name               Section
589	      Type      Type
590	      ---------------------------------------------------------
591	      0        reserved                                     -
592	      1-23     NAL unit  Single NAL unit packet             5.6
593	      24       STAP-A    Single-time aggregation packet     5.7.1
594	      25       STAP-B    Single-time aggregation packet     5.7.1
595	      26       MTAP16    Multi-time aggregation packet      5.7.2
596	      27       MTAP24    Multi-time aggregation packet      5.7.2
597	      28       FU-A      Fragmentation unit                 5.8
598	      29       FU-B      Fragmentation unit                 5.8
599	      30-31    reserved                                     -

601	5.3. NAL Unit Header Usage

603	   The structure and semantics of the NAL unit header were introduced in
604	   section 1.3.  For convenience, the format of the NAL unit header is
605	   reprinted below:

607	      +---------------+
608	      |0|1|2|3|4|5|6|7|
609	      +-+-+-+-+-+-+-+-+
610	      |F|NRI|  Type   |
611	      +---------------+

613	   This section specifies the semantics of F and NRI according to this
614	   specification.

616	   F: 1 bit
617	      forbidden_zero_bit.  A value of 0 indicates that the NAL unit
618	      type octet and payload should not contain bit errors or other
619	      syntax violations.  A value of 1 indicates that the NAL unit type
620	      octet and payload may contain bit errors or other syntax
621	      violations.

623	      MANEs SHOULD set the F bit to indicate detected bit errors in the
624	      NAL unit.  The H.264 specification requires that the F bit is
625	      equal to 0.  When the F bit is set, the decoder is advised that
626	      bit errors or any other syntax violations may be present in the
627	      payload or in the NAL unit type octet.  The simplest decoder
628	      reaction to a NAL unit in which the F bit is equal to 1 is to
629	      discard such a NAL unit and to conceal the lost data in the
630	      discarded NAL unit.

632	   NRI: 2 bits
633	      nal_ref_idc.  The semantics of value 00 and a non-zero value
634	      remain unchanged from the H.264 specification.  In other words, a
635	      value of 00 indicates that the content of the NAL unit is not
636	      used to reconstruct reference pictures for inter picture
637	      prediction. Such NAL units can be discarded without risking the
638	      integrity of the reference pictures.  Values greater than 00
639	      indicate that the decoding of the NAL unit is required to
640	      maintain the integrity of the reference pictures.

642	      In addition to the specification above, according to this RTP
643	      payload specification, values of NRI indicate the relative
644	      transport priority, as determined by the encoder.  MANEs can use
645	      this information to protect more important NAL units better than
646	      they do less important NAL units.  The highest transport priority
647	      is 11, followed by 10, and then by 01; finally, 00 is the lowest.

649	         Informative note: Any non-zero value of NRI is handled
650	         identically in H.264 decoders.  Therefore, receivers need not
651	         manipulate the value of NRI when passing NAL units to the
652	         decoder.

654	      An H.264 encoder MUST set the value of NRI according to the H.264
655	      specification (subclause 7.4.1) when the value of nal_unit_type
656	      is in the range of 1 to 12, inclusive.  In particular, the H.264
657	      specification requires that the value of NRI SHALL be equal to 0
658	      for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or
659	      12.

661	      For NAL units having nal_unit_type equal to 7 or 8 (indicating a
662	      sequence parameter set or a picture parameter set, respectively),
663	      an H.264 encoder SHOULD set the value of NRI to 11 (in binary
664	      format).  For coded slice NAL units of a primary coded picture
665	      having nal_unit_type equal to 5 (indicating a coded slice
666	      belonging to an IDR picture), an H.264 encoder SHOULD set the
667	      value of NRI to 11 (in binary format).

669	      For a mapping of the remaining nal_unit_types to NRI values, the
670	      following example MAY be used and has been shown to be efficient
671	      in a certain environment [14].  Other mappings MAY also be
672	      desirable, depending on the application and the H.264/AVC Annex A
673	      profile in use.

675	         Informative note: Data Partitioning is not available in
676	         certain profiles; e.g., in the Main or Baseline profiles.
677	         Consequently, the NAL unit types 2, 3, and 4 can occur only if
678	         the video bitstream conforms to a profile in which data
679	         partitioning is allowed and not in streams that conform to the
680	         Main or Baseline profiles.

682	   Table 2.  Example of NRI values for coded slices and coded slice data
683	              partitions of primary coded reference pictures

685	      NAL Unit Type     Content of NAL unit              NRI (binary)
686	      ----------------------------------------------------------------
687	       1              non-IDR coded slice                         10
688	       2              Coded slice data partition A                10
689	       3              Coded slice data partition B                01
690	       4              Coded slice data partition C                01

692	         Informative note: As mentioned before, the NRI value of non-
693	         reference pictures is 00 as mandated by H.264/AVC.

695	      An H.264 encoder SHOULD set the value of NRI for coded slice and
696	      coded slice data partition NAL units of redundant coded reference
697	      pictures equal to 01 (in binary format).

699	      Definitions of the values for NRI for NAL unit types 24 to 29,
700	      inclusive, are given in sections 5.7 and 5.8 of this memo.

702	      No recommendation for the value of NRI is given for NAL units
703	      having nal_unit_type in the range of 13 to 23, inclusive, because
704	      these values are reserved for ITU-T and ISO/IEC.  No
705	      recommendation for the value of NRI is given for NAL units having
706	      nal_unit_type equal to 0 or in the range of 30 to 31, inclusive,
707	      as the semantics of these values are not specified in this memo.

709	5.4. Packetization Modes

711	   This memo specifies three cases of packetization modes:

713	   o  Single NAL unit mode

715	   o  Non-interleaved mode

717	   o  Interleaved mode

719	   The single NAL unit mode is targeted for conversational systems that
720	   comply with ITU-T Recommendation H.241 [3]  (see section 12.1).  The
721	   non-interleaved mode is targeted for conversational systems that may
722	   not comply with ITU-T Recommendation H.241.  In the non-interleaved
723	   mode, NAL units are transmitted in NAL unit decoding order.  The
724	   interleaved mode is targeted for systems that do not require very low
725	   end-to-end latency.  The interleaved mode allows transmission of NAL
726	   units out of NAL unit decoding order.

728	   The packetization mode in use MAY be signaled by the value of the
729	   OPTIONAL packetization-mode media type parameter.  The used
730	   packetization mode governs which NAL unit types are allowed in RTP
731	   payloads.  Table 3 summarizes the allowed packet payload types for
732	   each packetization mode.  Packetization modes are explained in more
733	   detail in section 6.

735	    Table 3.  Summary of allowed NAL unit types for each packetization
736	            mode (yes = allowed, no = disallowed, ig = ignore)

738	      Payload Packet    Single NAL    Non-Interleaved    Interleaved
739	      Type    Type      Unit Mode           Mode             Mode
740	      -------------------------------------------------------------
741	      0      reserved      ig               ig               ig
742	      1-23   NAL unit     yes              yes               no
743	      24     STAP-A        no              yes               no
744	      25     STAP-B        no               no              yes
745	      26     MTAP16        no               no              yes
746	      27     MTAP24        no               no              yes
747	      28     FU-A          no              yes              yes
748	      29     FU-B          no               no              yes
749	      30-31  reserved      ig               ig               ig

751	   Some NAL unit or payload type values (indicated as reserved in
752	   Table 3) are reserved for future extensions.  NAL units of those
753	   types SHOULD NOT be sent by a sender (direct as packet payloads, or
754	   as aggregation units in aggregation packets, or as fragmented units
755	   in FU packets) and MUST be ignored by a receiver.  For example, the
756	   payload types 1-23, with the associated packet type "NAL unit", are
757	   allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode", but
758	   disallowed in "Interleaved Mode".  However, NAL units of NAL unit
759	   types 1-23 can be used in "Interleaved Mode" as aggregation units in
760	   STAP-B, MTAP16 and MTAP14 packets as well as fragmented units in FU-A
761	   and FU-B packets.  Similarly, NAL units of NAL unit types 1-23 can
762	   also be used in the "Non-Interleaved Mode" as aggregation units in
763	   STAP-A packets or fragmented units in FU-A packets, in addition to
764	   being directly used as packet payloads.

766	5.5. Decoding Order Number (DON)

768	   In the interleaved packetization mode, the transmission order of NAL
769	   units is allowed to differ from the decoding order of the NAL units.
770	   Decoding order number (DON) is a field in the payload structure or a
771	   derived variable that indicates the NAL unit decoding order.

773	   Rationale and examples of use cases for transmission out of decoding
774	   order and for the use of DON are given in section 13.

776	   The coupling of transmission and decoding order is controlled by the
777	   OPTIONAL sprop-interleaving-depth media type parameter as follows.
778	   When the value of the OPTIONAL sprop-interleaving-depth media type
779	   parameter is equal to 0 (explicitly or per default), the transmission
780	   order of NAL units MUST conform to the NAL unit decoding order.  When
781	   the value of the OPTIONAL sprop-interleaving-depth media type
782	   parameter is greater than 0,

784	   o  the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED
785	      to be the NAL unit decoding order, and

787	   o  the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
788	      and FUs in two consecutive packets is NOT REQUIRED to be the NAL
789	      unit decoding order.

791	   The RTP payload structures for a single NAL unit packet, an STAP-A,
792	   and an FU-A do not include DON.  STAP-B and FU-B structures include
793	   DON, and the structure of MTAPs enables derivation of DON as
794	   specified in section 5.7.2.

796	      Informative note: When an FU-A occurs in interleaved mode, it
797	      always follows an FU-B, which sets its DON.

799	      Informative note: If a transmitter wants to encapsulate a single
800	      NAL unit per packet and transmit packets out of their decoding
801	      order, STAP-B packet type can be used.

803	   In the single NAL unit packetization mode, the transmission order of
804	   NAL units, determined by the RTP sequence number, MUST be the same as
805	   their NAL unit decoding order.  In the non-interleaved packetization
806	   mode, the transmission order of NAL units in single NAL unit packets,
807	   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
808	   The NAL units within an STAP MUST appear in the NAL unit decoding
809	   order.  Thus, the decoding order is first provided through the
810	   implicit order within a STAP, and second provided through the RTP
811	   sequence number for the order between STAPs, FUs, and single NAL unit
812	   packets.

814	   Signaling of the value of DON for NAL units carried in STAP-B, MTAP,
815	   and a series of fragmentation units starting with an FU-B is
816	   specified in sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
817	   value of the first NAL unit in transmission order MAY be set to any
818	   value.  Values of DON are in the range of 0 to 65535, inclusive.
819	   After reaching the maximum value, the value of DON wraps around to 0.

821	   The decoding order of two NAL units contained in any STAP-B, MTAP, or
822	   a series of fragmentation units starting with an FU-B is determined
823	   as follows.  Let DON(i) be the decoding order number of the NAL unit
824	   having index i in the transmission order.  Function don_diff(m,n) is
825	   specified as follows:

827	         If DON(m) == DON(n), don_diff(m,n) = 0

829	         If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
830	         don_diff(m,n) = DON(n) - DON(m)

832	         If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
833	         don_diff(m,n) = 65536 - DON(m) + DON(n)

835	         If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
836	         don_diff(m,n) = - (DON(m) + 65536 - DON(n))

838	         If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
839	         don_diff(m,n) = - (DON(m) - DON(n))

841	   A positive value of don_diff(m,n) indicates that the NAL unit having
842	   transmission order index n follows, in decoding order, the NAL unit
843	   having transmission order index m.  When don_diff(m,n) is equal to 0,
844	   then the NAL unit decoding order of the two NAL units can be in
845	   either order.  A negative value of don_diff(m,n) indicates that the
846	   NAL unit having transmission order index n precedes, in decoding
847	   order, the NAL unit having transmission order index m.

849	   Values of DON related fields (DON, DONB, and DOND; see section 5.7)
850	   MUST be such that the decoding order determined by the values of DON,
851	   as specified above, conforms to the NAL unit decoding order.  If the
852	   order of two NAL units in NAL unit decoding order is switched and the
853	   new order does not conform to the NAL unit decoding order, the NAL
854	   units MUST NOT have the same value of DON.  If the order of two
855	   consecutive NAL units in the NAL unit stream is switched and the new
856	   order still conforms to the NAL unit decoding order, the NAL units
857	   MAY have the same value of DON.  For example, when arbitrary slice
858	   order is allowed by the video coding profile in use, all the coded
859	   slice NAL units of a coded picture are allowed to have the same value
860	   of DON.  Consequently, NAL units having the same value of DON can be
861	   decoded in any order, and two NAL units having a different value of
862	   DON should be passed to the decoder in the order specified above.
863	   When two consecutive NAL units in the NAL unit decoding order have a
864	   different value of DON, the value of DON for the second NAL unit in
865	   decoding order SHOULD be the value of DON for the first, incremented
866	   by one.

868	   An example of the de-packetization process to recover the NAL unit
869	   decoding order is given in section 7.

871	      Informative note: Receivers should not expect that the absolute
872	      difference of values of DON for two consecutive NAL units in the
873	      NAL unit decoding order will be equal to one, even in error-free
874	      transmission.  An increment by one is not required, as at the
875	      time of associating values of DON to NAL units, it may not be
876	      known whether all NAL units are delivered to the receiver.  For
877	      example, a gateway may not forward coded slice NAL units of non-
878	      reference pictures or SEI NAL units when there is a shortage of
879	      bit rate in the network to which the packets are forwarded.  In
880	      another example, a live broadcast is interrupted by pre-encoded
881	      content, such as commercials, from time to time.  The first intra
882	      picture of a pre-encoded clip is transmitted in advance to ensure
883	      that it is readily available in the receiver.  When transmitting
884	      the first intra picture, the originator does not exactly know how
885	      many NAL units will be encoded before the first intra picture of
886	      the pre-encoded clip follows in decoding order.  Thus, the values
887	      of DON for the NAL units of the first intra picture of the pre-
888	      encoded clip have to be estimated when they are transmitted, and
889	      gaps in values of DON may occur.

891	5.6. Single NAL Unit Packet

893	   The single NAL unit packet defined here MUST contain only one NAL
894	   unit, of the types defined in [1].  This means that neither an
895	   aggregation packet nor a fragmentation unit can be used within a
896	   single NAL unit packet.  A NAL unit stream composed by de-packetizing
897	   single NAL unit packets in RTP sequence number order MUST conform to
898	   the NAL unit decoding order.  The structure of the single NAL unit
899	   packet is shown in Figure 2.

901	      Informative note: The first byte of a NAL unit co-serves as the
902	      RTP payload header.

904	    0                   1                   2                   3
905	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
906	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
907	   |F|NRI|  Type   |                                               |
908	   +-+-+-+-+-+-+-+-+                                               |
909	   |                                                               |
910	   |               Bytes 2..n of a Single NAL unit                 |
911	   |                                                               |
912	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
913	   |                               :...OPTIONAL RTP padding        |
914	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

916	          Figure 2 RTP payload format for single NAL unit packet

918	5.7. Aggregation Packets

920	   Aggregation packets are the NAL unit aggregation scheme of this
921	   payload specification.  The scheme is introduced to reflect the
922	   dramatically different MTU sizes of two key target networks: wireline
923	   IP networks (with an MTU size that is often limited by the Ethernet
924	   MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T H.324/M)
925	   based wireless communication systems with preferred transmission unit
926	   sizes of 254 bytes or less.  To prevent media transcoding between the
927	   two worlds, and to avoid undesirable packetization overhead, a NAL
928	   unit aggregation scheme is introduced.

930	   Two types of aggregation packets are defined by this specification:

932	   o  Single-time aggregation packet (STAP): aggregates NAL units with
933	      identical NALU-time.  Two types of STAPs are defined, one without
934	      DON (STAP-A) and another including DON (STAP-B).

936	   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
937	      potentially differing NALU-time.  Two different MTAPs are defined,
938	      differing in the length of the NAL unit timestamp offset.

940	   Each NAL unit to be carried in an aggregation packet is encapsulated
941	   in an aggregation unit.  Please see below for the four different
942	   aggregation units and their characteristics.

944	   The structure of the RTP payload format for aggregation packets is
945	   presented in Figure 3.

947	    0                   1                   2                   3
948	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
949	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
950	   |F|NRI|  Type   |                                               |
951	   +-+-+-+-+-+-+-+-+                                               |
952	   |                                                               |
953	   |             one or more aggregation units                     |
954	   |                                                               |
955	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
956	   |                               :...OPTIONAL RTP padding        |
957	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

959	            Figure 3 RTP payload format for aggregation packets

961	   MTAPs and STAPs share the following packetization rules:  The RTP
962	   timestamp MUST be set to the earliest of the NALU-times of all the
963	   NAL units to be aggregated.  The type field of the NAL unit type
964	   octet MUST be set to the appropriate value, as indicated in Table 4.
965	   The F bit MUST be cleared if all F bits of the aggregated NAL units
966	   are zero; otherwise, it MUST be set.  The value of NRI MUST be the
967	   maximum of all the NAL units carried in the aggregation packet.

969	                 Table 4.  Type field for STAPs and MTAPs

971	      Type   Packet    Timestamp offset   DON related fields
972	                       field length       (DON, DONB, DOND)
973	                       (in bits)          present
974	      --------------------------------------------------------
975	      24     STAP-A       0                 no
976	      25     STAP-B       0                 yes
977	      26     MTAP16      16                 yes
978	      27     MTAP24      24                 yes

980	   The marker bit in the RTP header is set to the value that the marker
981	   bit of the last NAL unit of the aggregated packet would have if it
982	   were transported in its own RTP packet.

984	   The payload of an aggregation packet consists of one or more
985	   aggregation units.  See sections 5.7.1 and 5.7.2 for the four
986	   different types of aggregation units.  An aggregation packet can
987	   carry as many aggregation units as necessary; however, the total
988	   amount of data in an aggregation packet obviously MUST fit into an IP
989	   packet, and the size SHOULD be chosen so that the resulting IP packet
990	   is smaller than the MTU size.  An aggregation packet MUST NOT contain
991	   fragmentation units specified in section 5.8.  Aggregation packets
992	   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
993	   another aggregation packet.

995	5.7.1. Single-Time Aggregation Packet

997	   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
998	   units are aggregated that all share the same NALU-time.  The payload
999	   of an STAP-A does not include DON and consists of at least one
1000	   single-time aggregation unit, as presented in Figure 4.  The payload
1001	   of an STAP-B consists of a 16-bit unsigned decoding order number
1002	   (DON) (in network byte order) followed by at least one single-time
1003	   aggregation unit, as presented in Figure 5.

1005	    0                   1                   2                   3
1006	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1007	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1008	                   :                                               |
1009	   +-+-+-+-+-+-+-+-+                                               |
1010	   |                                                               |
1011	   |                single-time aggregation units                  |
1012	   |                                                               |
1013	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1014	   |                               :
1015	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1017	                    Figure 4 Payload format for STAP-A

1019	    0                   1                   2                   3
1020	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1021	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1022	                   :  decoding order number (DON)  |               |
1023	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1024	   |                                                               |
1025	   |                single-time aggregation units                  |
1026	   |                                                               |
1027	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1028	   |                               :
1029	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1031	                    Figure 5 Payload format for STAP-B

1033	   The DON field specifies the value of DON for the first NAL unit in an
1034	   STAP-B in transmission order.  For each successive NAL unit in
1035	   appearance order in an STAP-B, the value of DON is equal to (the
1036	   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
1037	   which '%' stands for the modulo operation.

1039	   A single-time aggregation unit consists of 16-bit unsigned size
1040	   information (in network byte order) that indicates the size of the
1041	   following NAL unit in bytes (excluding these two octets, but
1042	   including the NAL unit type octet of the NAL unit), followed by the
1043	   NAL unit itself, including its NAL unit type byte.  A single-time
1044	   aggregation unit is byte aligned within the RTP payload, but it may
1045	   not be aligned on a 32-bit word boundary.  Figure 6 presents the
1046	   structure of the single-time aggregation unit.

1048	    0                   1                   2                   3
1049	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1050	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1051	                   :        NAL unit size          |               |
1052	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1053	   |                                                               |
1054	   |                           NAL unit                            |
1055	   |                                                               |
1056	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1057	   |                               :
1058	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1060	            Figure 6 Structure for single-time aggregation unit

1062	   Figure 7 presents an example of an RTP packet that contains an STAP-
1063	   A.  The STAP contains two single-time aggregation units, labeled as 1
1064	   and 2 in the figure.

1066	    0                   1                   2                   3
1067	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1068	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1069	   |                          RTP Header                           |
1070	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1071	   |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
1072	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1073	   |                         NALU 1 Data                           |
1074	   :                                                               :
1075	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1076	   |               | NALU 2 Size                   | NALU 2 HDR    |
1077	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1078	   |                         NALU 2 Data                           |
1079	   :                                                               :
1080	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1081	   |                               :...OPTIONAL RTP padding        |
1082	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1084	    Figure 7 An example of an RTP packet including an STAP-A containing
1085	                     two single-time aggregation units

1087	   Figure 8 presents an example of an RTP packet that contains an STAP-
1088	   B.  The STAP contains two single-time aggregation units, labeled as 1
1089	   and 2 in the figure.

1091	    0                   1                   2                   3
1092	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1093	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1094	   |                          RTP Header                           |
1095	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1096	   |STAP-B NAL HDR | DON                           | NALU 1 Size   |
1097	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1098	   | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
1099	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1100	   :                                                               :
1101	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1102	   |               | NALU 2 Size                   | NALU 2 HDR    |
1103	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1104	   |                       NALU 2 Data                             |
1105	   :                                                               :
1106	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1107	   |                               :...OPTIONAL RTP padding        |
1108	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1110	    Figure 8 An example of an RTP packet including an STAP-B containing
1111	                     two single-time aggregation units

1113	5.7.2. Multi-Time Aggregation Packets (MTAPs)

1115	   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
1116	   order number base (DONB) (in network byte order) and one or more
1117	   multi-time aggregation units, as presented in Figure 9.  DONB MUST
1118	   contain the value of DON for the first NAL unit in the NAL unit
1119	   decoding order among the NAL units of the MTAP.

1121	      Informative note: The first NAL unit in the NAL unit decoding
1122	      order is not necessarily the first NAL unit in the order in which
1123	      the NAL units are encapsulated in an MTAP.

1125	    0                   1                   2                   3
1126	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1127	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1128	                   :  decoding order number base   |               |
1129	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1130	   |                                                               |
1131	   |                 multi-time aggregation units                  |
1132	   |                                                               |
1133	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1134	   |                               :
1135	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1137	                Figure 9 NAL unit payload format for MTAPs

1139	   Two different multi-time aggregation units are defined in this
1140	   specification.  Both of them consist of 16 bits unsigned size
1141	   information of the following NAL unit (in network byte order), an 8-
1142	   bit unsigned decoding order number difference (DOND), and n bits (in
1143	   network byte order) of timestamp offset (TS offset) for this NAL
1144	   unit, whereby n can be 16 or 24.  The choice between the different
1145	   MTAP types (MTAP16 and MTAP24) is application dependent: the larger
1146	   the timestamp offset is, the higher the flexibility of the MTAP, but
1147	   the overhead is also higher.

1149	   The structure of the multi-time aggregation units for MTAP16 and
1150	   MTAP24 are presented in Figures 10 and 11, respectively.  The
1151	   starting or ending position of an aggregation unit within a packet is
1152	   NOT REQUIRED to be on a 32-bit word boundary.  The DON of the NAL
1153	   unit contained in a multi-time aggregation unit is equal to (DONB +
1154	   DOND) % 65536, in which % denotes the modulo operation.  This memo
1155	   does not specify how the NAL units within an MTAP are ordered, but,
1156	   in most cases, NAL unit decoding order SHOULD be used.

1158	   The timestamp offset field MUST be set to a value equal to the value
1159	   of the following formula: If the NALU-time is larger than or equal to
1160	   the RTP timestamp of the packet, then the timestamp offset equals
1161	   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
1162	   If the NALU-time is smaller than the RTP timestamp of the packet,
1163	   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
1164	   timestamp of the packet).

1166	    0                   1                   2                   3
1167	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1168	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1169	   :        NAL unit size          |      DOND     |  TS offset    |
1170	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1171	   |  TS offset    |                                               |
1172	   +-+-+-+-+-+-+-+-+              NAL unit                         |
1173	   |                                                               |
1174	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1175	   |                               :
1176	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1178	             Figure 10  Multi-time aggregation unit for MTAP16

1180	    0                   1                   2                   3
1181	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1182	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1183	   :        NAL unit size         |      DOND     |  TS offset    |
1184	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1185	   |         TS offset             |                               |
1186	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1187	   |                              NAL unit                         |
1188	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1189	   |                               :
1190	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1192	             Figure 11  Multi-time aggregation unit for MTAP24

1194	   For the "earliest" multi-time aggregation unit in an MTAP the
1195	   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
1196	   itself is identical to the earliest NALU-time.

1198	      Informative note: The "earliest" multi-time aggregation unit is
1199	      the one that would have the smallest extended RTP timestamp among
1200	      all the aggregation units of an MTAP if the NAL units contained
1201	      in the aggregation units were encapsulated in single NAL unit
1202	      packets.  An extended timestamp is a timestamp that has more than
1203	      32 bits and is capable of counting the wraparound of the
1204	      timestamp field, thus enabling one to determine the smallest
1205	      value if the timestamp wraps.  Such an "earliest" aggregation
1206	      unit may not be the first one in the order in which the
1207	      aggregation units are encapsulated in an MTAP.  The "earliest"
1208	      NAL unit need not be the same as the first NAL unit in the NAL
1209	      unit decoding order either.

1211	   Figure 12 presents an example of an RTP packet that contains a multi-
1212	   time aggregation packet of type MTAP16 that contains two multi-time
1213	   aggregation units, labeled as 1 and 2 in the figure.

1215	    0                   1                   2                   3
1216	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1217	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1218	   |                          RTP Header                           |
1219	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1220	   |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
1221	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1222	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
1223	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1224	   |  NALU 1 HDR   |  NALU 1 DATA                                  |
1225	   +-+-+-+-+-+-+-+-+                                               +
1226	   :                                                               :
1227	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1228	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1229	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1230	   |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
1231	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1232	   :                                                               :
1233	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1234	   |                               :...OPTIONAL RTP padding        |
1235	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1237	   Figure 12  An RTP packet including a multi-time aggregation packet of
1238	          type MTAP16 containing two multi-time aggregation units

1240	   Figure 13 presents an example of an RTP packet that contains a multi-
1241	   time aggregation packet of type MTAP24 that contains two multi-time
1242	   aggregation units, labeled as 1 and 2 in the figure.

1244	    0                   1                   2                   3
1245	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1246	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1247	   |                          RTP Header                           |
1248	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1249	   |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
1250	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1251	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
1252	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1253	   |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
1254	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1255	   :                                                               :
1256	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1257	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1258	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1259	   |       NALU 2 TS offset                        |  NALU 2 HDR   |
1260	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1261	   |  NALU 2 DATA                                                  |
1262	   :                                                               :
1263	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1264	   |                               :...OPTIONAL RTP padding        |
1265	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1267	   Figure 13  An RTP packet including a multi-time aggregation packet of
1268	          type MTAP24 containing two multi-time aggregation units

1270	5.7.3. Fragmentation Units (FUs)

1272	   This payload type allows fragmenting a NAL unit into several RTP
1273	   packets.  Doing so on the application layer instead of relying on
1274	   lower layer fragmentation (e.g., by IP) has the following advantages:

1276	   o  The payload format is capable of transporting NAL units bigger
1277	      than 64 kbytes over an IPv4 network that may be present in pre-
1278	      recorded video, particularly in High Definition formats (there is
1279	      a limit of the number of slices per picture, which results in a
1280	      limit of NAL units per picture, which may result in big NAL
1281	      units).

1283	   o  The fragmentation mechanism allows fragmenting a single NAL unit
1284	      and applying generic forward error correction as described in
1285	      section 12.5.

1287	   Fragmentation is defined only for a single NAL unit and not for any
1288	   aggregation packets.  A fragment of a NAL unit consists of an integer
1289	   number of consecutive octets of that NAL unit.  Each octet of the NAL
1290	   unit MUST be part of exactly one fragment of that NAL unit.

1292	   Fragments of the same NAL unit MUST be sent in consecutive order with
1293	   ascending RTP sequence numbers (with no other RTP packets within the
1294	   same RTP packet stream being sent between the first and last
1295	   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
1296	   number order.

1298	   When a NAL unit is fragmented and conveyed within fragmentation units
1299	   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
1300	   MUST NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
1301	   contain another FU.

1303	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1304	   time of the fragmented NAL unit.

1306	   Figure 14 presents the RTP payload format for FU-As.  An FU-A
1307	   consists of a fragmentation unit indicator of one octet, a
1308	   fragmentation unit header of one octet, and a fragmentation unit
1309	   payload.

1311	    0                   1                   2                   3
1312	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1313	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1314	   | FU indicator  |   FU header   |                               |
1315	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1316	   |                                                               |
1317	   |                         FU payload                            |
1318	   |                                                               |
1319	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1320	   |                               :...OPTIONAL RTP padding        |
1321	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1323	                  Figure 14  RTP payload format for FU-A

1325	   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
1326	   consists of a fragmentation unit indicator of one octet, a
1327	   fragmentation unit header of one octet, a decoding order number (DON)
1328	   (in network byte order), and a fragmentation unit payload.  In other
1329	   words, the structure of FU-B is the same as the structure of FU-A,
1330	   except for the additional DON field.

1332	    0                   1                   2                   3
1333	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1334	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1335	   | FU indicator  |   FU header   |               DON             |
1336	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1337	   |                                                               |
1338	   |                         FU payload                            |
1339	   |                                                               |
1340	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1341	   |                               :...OPTIONAL RTP padding        |
1342	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1344	                  Figure 15  RTP payload format for FU-B

1346	   NAL unit type FU-B MUST be used in the interleaved packetization mode
1347	   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
1348	   type FU-B MUST NOT be used in any other case.  In other words, in the
1349	   interleaved packetization mode, each NALU that is fragmented has an
1350	   FU-B as the first fragment, followed by one or more FU-A fragments.

1352	   The FU indicator octet has the following format:

1354	      +---------------+
1355	      |0|1|2|3|4|5|6|7|
1356	      +-+-+-+-+-+-+-+-+
1357	      |F|NRI|  Type   |
1358	      +---------------+

1360	   Values equal to 28 and 29 in the Type field of the FU indicator octet
1361	   identify an FU-A and an FU-B, respectively.  The use of the F bit is
1362	   described in section 5.3.  The value of the NRI field MUST be set
1363	   according to the value of the NRI field in the fragmented NAL unit.

1365	   The FU header has the following format:

1367	      +---------------+
1368	      |0|1|2|3|4|5|6|7|
1369	      +-+-+-+-+-+-+-+-+
1370	      |S|E|R|  Type   |
1371	      +---------------+

1373	   S: 1 bit
1374	      When set to one, the Start bit indicates the start of a
1375	      fragmented NAL unit.  When the following FU payload is not the
1376	      start of a fragmented NAL unit payload, the Start bit is set to
1377	      zero.

1379	   E: 1 bit
1380	      When set to one, the End bit indicates the end of a fragmented
1381	      NAL unit, i.e., the last byte of the payload is also the last
1382	      byte of the fragmented NAL unit.  When the following FU payload
1383	      is not the last fragment of a fragmented NAL unit, the End bit is
1384	      set to zero.

1386	   R: 1 bit
1387	      The Reserved bit MUST be equal to 0 and MUST be ignored by the
1388	      receiver.

1390	   Type: 5 bits
1391	      The NAL unit payload type as defined in Table 7-1 of [1].

1393	   The value of DON in FU-Bs is selected as described in section 5.5.

1395	      Informative note: The DON field in FU-Bs allows gateways to
1396	      fragment NAL units to FU-Bs without organizing the incoming NAL
1397	      units to the NAL unit decoding order.

1399	   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
1400	   Start bit and End bit MUST NOT both be set to one in the same FU
1401	   header.

1403	   The FU payload consists of fragments of the payload of the fragmented
1404	   NAL unit so that if the fragmentation unit payloads of consecutive
1405	   FUs are sequentially concatenated, the payload of the fragmented NAL
1406	   unit can be reconstructed.  The NAL unit type octet of the fragmented
1407	   NAL unit is not included as such in the fragmentation unit payload,
1408	   but rather the information of the NAL unit type octet of the
1409	   fragmented NAL unit is conveyed in F and NRI fields of the FU
1410	   indicator octet of the fragmentation unit and in the type field of
1411	   the FU header.  An FU payload MAY have any number of octets and MAY
1412	   be empty.

1414	      Informative note: Empty FUs are allowed to reduce the latency of
1415	      a certain class of senders in nearly lossless environments.
1416	      These senders can be characterized in that they packetize NALU
1417	      fragments before the NALU is completely generated and, hence,
1418	      before the NALU size is known.  If zero-length NALU fragments
1419	      were not allowed, the sender would have to generate at least one
1420	      bit of data of the following fragment before the current fragment
1421	      could be sent.  Due to the characteristics of H.264, where
1422	      sometimes several macroblocks occupy zero bits, this is
1423	      undesirable and can add delay.  However, the (potential) use of
1424	      zero-length NALU fragments should be carefully weighed against
1425	      the increased risk of the loss of at least a part of the NALU
1426	      because of the additional packets employed for its transmission.

1428	   If a fragmentation unit is lost, the receiver SHOULD discard all
1429	   following fragmentation units in transmission order corresponding to
1430	   the same fragmented NAL unit.

1432	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1433	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1434	   n of that NAL unit is not received.  In this case, the
1435	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1436	   syntax violation.

1438	6. Packetization Rules

1440	   The packetization modes are introduced in section 5.2.  The
1441	   packetization rules common to more than one of the packetization
1442	   modes are specified in section 6.1.  The packetization rules for the
1443	   single NAL unit mode, the non-interleaved mode, and the interleaved
1444	   mode are specified in sections 6.2, 6.3, and 6.4, respectively.

1446	6.1. Common Packetization Rules

1448	   All senders MUST enforce the following packetization rules regardless
1449	   of the packetization mode in use:

1451	   o  Coded slice NAL units or coded slice data partition NAL units
1452	      belonging to the same coded picture (and thus sharing the same RTP
1453	      timestamp value) MAY be sent in any order; however, for delay-
1454	      critical systems, they SHOULD be sent in their original decoding
1455	      order to minimize the delay.  Note that the decoding order is the
1456	      order of the NAL units in the bitstream.

1458	   o  Parameter sets are handled in accordance with the rules and
1459	      recommendations given in section 8.4.

1461	   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
1462	      picture parameter set NAL units, as neither this memo nor the
1463	      H.264 specification provides means to identify duplicated NAL
1464	      units.  Sequence and picture parameter set NAL units MAY be
1465	      duplicated to make their correct reception more probable, but any
1466	      such duplication MUST NOT affect the contents of any active
1467	      sequence or picture parameter set.  Duplication SHOULD be
1468	      performed on the application layer and not by duplicating RTP
1469	      packets (with identical sequence numbers).

1471	   Senders using the non-interleaved mode and the interleaved mode MUST
1472	   enforce the following packetization rule:

1474	   o  MANEs MAY convert single NAL unit packets into one aggregation
1475	      packet, convert an aggregation packet into several single NAL unit
1476	      packets, or mix both concepts, in an RTP translator.  The RTP
1477	      translator SHOULD take into account at least the following
1478	      parameters: path MTU size, unequal protection mechanisms (e.g.,
1479	      through packet-based FEC according to RFC 2733 [18], especially
1480	      for sequence and picture parameter set NAL units and coded slice
1481	      data partition A NAL units), bearable latency of the system, and
1482	      buffering capabilities of the receiver.

1484	         Informative note: An RTP translator is required to handle RTCP
1485	         as per RFC 3550.

1487	6.2. Single NAL Unit Mode

1489	   This mode is in use when the value of the OPTIONAL packetization-mode
1490	   media type parameter is equal to 0 or the packetization-mode is not
1491	   present.  All receivers MUST support this mode.  It is primarily
1492	   intended for low-delay applications that are compatible with systems
1493	   using ITU-T Recommendation H.241 [3] (see section 12.1).  Only single
1494	   NAL unit packets MAY be used in this mode.  STAPs, MTAPs, and FUs
1495	   MUST NOT be used.  The transmission order of single NAL unit packets
1496	   MUST comply with the NAL unit decoding order.

1498	6.3. Non-Interleaved Mode

1500	   This mode is in use when the value of the OPTIONAL packetization-mode
1501	   media type parameter is equal to 1.  This mode SHOULD be supported.
1502	   It is primarily intended for low-delay applications.  Only single NAL
1503	   unit packets, STAP-As, and FU-As MAY be used in this mode.  STAP-Bs,
1504	   MTAPs, and FU-Bs MUST NOT be used.  The transmission order of NAL
1505	   units MUST comply with the NAL unit decoding order.

1507	6.4. Interleaved Mode

1509	   This mode is in use when the value of the OPTIONAL packetization-mode
1510	   media type parameter is equal to 2.  Some receivers MAY support this
1511	   mode.  STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used.  STAP-As and
1512	   single NAL unit packets MUST NOT be used.  The transmission order of
1513	   packets and NAL units is constrained as specified in section 5.5.

1515	7. De-Packetization Process

1517	   The de-packetization process is implementation dependent.  Therefore,
1518	   the following description should be seen as an example of a suitable
1519	   implementation.  Other schemes may be used as well as long as the
1520	   output for the same input is the same as the process described below.
1521	   The output is the same meaning that the number of NAL units and their
1522	   order are both the identical.  Optimizations relative to the
1523	   described algorithms are likely possible.  Section 7.1 presents the
1524	   de-packetization process for the single NAL unit and non-interleaved
1525	   packetization modes, whereas section 7.2 describes the process for
1526	   the interleaved mode.  Section 7.3 includes additional de-
1527	   packetization guidelines for intelligent receivers.

1529	   All normal RTP mechanisms related to buffer management apply.  In
1530	   particular, duplicated or outdated RTP packets (as indicated by the
1531	   RTP sequences number and the RTP timestamp) are removed.  To
1532	   determine the exact time for decoding, factors such as a possible
1533	   intentional delay to allow for proper inter-stream synchronization
1534	   must be factored in.

1536	7.1. Single NAL Unit and Non-Interleaved Mode

1538	   The receiver includes a receiver buffer to compensate for
1539	   transmission delay jitter.  The receiver stores incoming packets in
1540	   reception order into the receiver buffer.  Packets are de-packetized
1541	   in RTP sequence number order.  If a de-packetized packet is a single
1542	   NAL unit packet, the NAL unit contained in the packet is passed
1543	   directly to the decoder.  If a de-packetized packet is an STAP-A, the
1544	   NAL units contained in the packet are passed to the decoder in the
1545	   order in which they are encapsulated in the packet.  For all the FU-A
1546	   packets containing fragments of a single NAL unit, the de-packetized
1547	   fragments are concatenated in their sending order to recover the NAL
1548	   unit, which is then passed to the decoder.

1550	      Informative note: If the decoder supports Arbitrary Slice Order,
1551	      coded slices of a picture can be passed to the decoder in any
1552	      order regardless of their reception and transmission order.

1554	7.2. Interleaved Mode

1556	   The general concept behind these de-packetization rules is to reorder
1557	   NAL units from transmission order to the NAL unit decoding order.

1559	   The receiver includes a receiver buffer, which is used to compensate
1560	   for transmission delay jitter and to reorder NAL units from
1561	   transmission order to the NAL unit decoding order.  In this section,
1562	   the receiver operation is described under the assumption that there
1563	   is no transmission delay jitter.  To make a difference from a
1564	   practical receiver buffer that is also used for compensation of
1565	   transmission delay jitter, the receiver buffer is here after called
1566	   the de-interleaving buffer in this section.  Receivers SHOULD also
1567	   prepare for transmission delay jitter; i.e., either reserve separate
1568	   buffers for transmission delay jitter buffering and de-interleaving
1569	   buffering or use a receiver buffer for both transmission delay jitter
1570	   and de-interleaving.  Moreover, receivers SHOULD take transmission
1571	   delay jitter into account in the buffering operation; e.g., by
1572	   additional initial buffering before starting of decoding and
1573	   playback.

1575	   This section is organized as follows: subsection 7.2.1 presents how o
1576	   calculate the size of the de-interleaving buffer.  Subsection 7.2.2
1577	   specifies the receiver process how to organize received NAL units to
1578	   the NAL unit decoding order.

1580	7.2.1. Size of the De-interleaving Buffer

1582	   When the SDP Offer/Answer model or any other capability exchange
1583	   procedure is used in session setup, the properties of the received
1584	   stream SHOULD be such that the receiver capabilities are not
1585	   exceeded.  In the SDP Offer/Answer model, the receiver can indicate
1586	   its capabilities to allocate a de-interleaving buffer with the deint-
1587	   buf-cap media type parameter.  The sender indicates the requirement
1588	   for the de-interleaving buffer size with the sprop-deint-buf-req
1589	   media type parameter.  It is therefore RECOMMENDED to set the de-
1590	   interleaving buffer size, in terms of number of bytes, equal to or
1591	   greater than the value of sprop-deint-buf-req media type parameter.
1592	   See section 8.1 for further information on deint-buf-cap and sprop-
1593	   deint-buf-req media type parameters and section 8.2.2 for further
1594	   information on their use in the SDP Offer/Answer model.

1596	   When a declarative session description is used in session setup, the
1597	   sprop-deint-buf-req media type parameter signals the requirement for
1598	   the de-interleaving buffer size.  It is therefore RECOMMENDED to set
1599	   the de-interleaving buffer size, in terms of number of bytes, equal
1600	   to or greater than the value of sprop-deint-buf-req media type
1601	   parameter.

1603	7.2.2. De-interleaving Process

1605	   There are two buffering states in the receiver: initial buffering and
1606	   buffering while playing.  Initial buffering occurs when the RTP
1607	   session is initialized.  After initial buffering, decoding and
1608	   playback are started, and the buffering-while-playing mode is used.

1610	   Regardless of the buffering state, the receiver stores incoming NAL
1611	   units, in reception order, in the de-interleaving buffer as follows.
1612	   NAL units of aggregation packets are stored in the de-interleaving
1613	   buffer individually.  The value of DON is calculated and stored for
1614	   each NAL unit.

1616	   The receiver operation is described below with the help of the
1617	   following functions and constants:

1619	   o  Function AbsDON is specified in section 8.1.

1621	   o  Function don_diff is specified in section 5.5.

1623	   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
1624	      media type type parameter (see section 8.1) incremented by 1.

1626	   Initial buffering lasts until one of the following conditions is
1627	   fulfilled:

1629	   o  There are N or more VCL NAL units in the de-interleaving buffer.

1631	   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
1632	      the value of sprop-max-don-diff, in which n corresponds to the NAL
1633	      unit having the greatest value of AbsDON among the received NAL
1634	      units and m corresponds to the NAL unit having the smallest value
1635	      of AbsDON among the received NAL units.

1637	   o  Initial buffering has lasted for the duration equal to or greater
1638	      than the value of the OPTIONAL sprop-init-buf-time media type
1639	      parameter.

1641	   The NAL units to be removed from the de-interleaving buffer are
1642	   determined as follows:

1644	   o  If the de-interleaving buffer contains at least N VCL NAL units,
1645	      NAL units are removed from the de-interleaving buffer and passed
1646	      to the decoder in the order specified below until the buffer
1647	      contains N-1 VCL NAL units.

1649	   o  If sprop-max-don-diff is present, all NAL units m for which
1650	      don_diff(m,n) is greater than sprop-max-don-diff are removed from
1651	      the de-interleaving buffer and passed to the decoder in the order
1652	      specified below.  Herein, n corresponds to the NAL unit having the
1653	      greatest value of AbsDON among the NAL units in the de-
1654	      interleaving buffer.

1656	   The order in which NAL units are passed to the decoder is specified
1657	   as follows:

1659	   o  Let PDON be a variable that is initialized to 0 at the beginning
1660	      of the RTP session.

1662	   o  For each NAL unit associated with a value of DON, a DON distance
1663	      is calculated as follows.  If the value of DON of the NAL unit is
1664	      larger than the value of PDON, the DON distance is equal to DON -
1665	      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
1666	      + 1.

1668	   o  NAL units are delivered to the decoder in ascending order of DON
1669	      distance.  If several NAL units share the same value of DON
1670	      distance, they can be passed to the decoder in any order.

1672	   o  When a desired number of NAL units have been passed to the
1673	      decoder, the value of PDON is set to the value of DON for the last
1674	      NAL unit passed to the decoder.

1676	7.3. Additional De-Packetization Guidelines

1678	   The following additional de-packetization rules may be used to
1679	   implement an operational H.264 de-packetizer:

1681	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1682	      coded slice data partitions A (DPAs).  If a lost DPA is found, a
1683	      gateway may decide not to send the corresponding coded slice data
1684	      partitions B and C, as their information is meaningless for H.264
1685	      decoders.  In this way a MANE can reduce network load by
1686	      discarding useless packets without parsing a complex bitstream.

1688	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1689	      FUs.  If a lost FU is found, a gateway may decide not to send the
1690	      following FUs of the same fragmented NAL unit, as their
1691	      information is meaningless for H.264 decoders.  In this way a MANE
1692	      can reduce network load by discarding useless packets without
1693	      parsing a complex bitstream.

1695	   o  Intelligent receivers having to discard packets or NALUs should
1696	      first discard all packets/NALUs in which the value of the NRI
1697	      field of the NAL unit type octet is equal to 0.  This will
1698	      minimize the impact on user experience and keep the reference
1699	      pictures intact.  If more packets have to be discarded, then
1700	      packets with a numerically lower NRI value should be discarded
1701	      before packets with a numerically higher NRI value.  However,
1702	      discarding any packets with an NRI bigger than 0 very likely leads
1703	      to decoder drift and SHOULD be avoided.

1705	8. Payload Format Parameters

1707	   This section specifies the parameters that MAY be used to select
1708	   optional features of the payload format and certain features of the
1709	   bitstream.  The parameters are specified here as part of the media
1710	   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
1711	   mapping of the parameters into the Session Description Protocol (SDP)
1712	   [6] is also provided for applications that use SDP.  Equivalent
1713	   parameters could be defined elsewhere for use with control protocols
1714	   that do not use SDP.

1716	   Some parameters provide a receiver with the properties of the stream
1717	   that will be sent.  The names of all these parameters start with
1718	   "sprop" for stream properties.  Some of these "sprop" parameters are
1719	   limited by other payload or codec configuration parameters.  For
1720	   example, the sprop-parameter-sets parameter is constrained by the
1721	   profile-level-id parameter.  The media sender selects all "sprop"
1722	   parameters rather than the receiver.  This uncommon characteristic of
1723	   the "sprop" parameters may not be compatible with some signaling
1724	   protocol concepts, in which case the use of these parameters SHOULD
1725	   be avoided.

1727	8.1. Media Type Registration

1729	   The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
1730	   allocated from the IETF tree.

1732	   The receiver MUST ignore any unspecified parameter.

1734	   Media Type name:     video

1736	   Media subtype name:  H264

1738	   Required parameters: none

1740	   OPTIONAL parameters:

1742	      profile-level-id:
1743	         A base16 [7] (hexadecimal) representation of the following
1744	         three bytes in the sequence parameter set NAL unit specified
1745	         in [1]: 1) profile_idc, 2) a byte herein referred to as
1746	         profile-iop, composed of the values of constraint_set0_flag,
1747	         constraint_set1_flag,constraint_set2_flag,
1748	         constraint_set3_flag, and reserved_zero_4bits in bit-
1749	         significance order, starting from the most significant bit,
1750	         and 3) level_idc.  Note that reserved_zero_4bits is required
1751	         to be equal to 0 in [1], but other values for it may be
1752	         specified in the future by ITU-T or ISO/IEC.

1754	         The profile-level-id parameter indicates the default sub-
1755	         profile, i.e. the subset of coding tools that may have been
1756	         used to generate the stream or the receiver supports, and the
1757	         default level of the stream or the receiver supports.

1759	         The default sub-profile is indicated collectively by the
1760	         profile_idc byte and some fields in the profile-iop byte.
1761	         Depending on the values of the fields in the profile-iop byte,
1762	         the default sub-profile may be the same set of coding tools
1763	         supported by one profile, or a common subset of coding tools
1764	         of multiple profiles, as specified in subsection 7.4.2.1.1 of
1765	         [1].  The default level is indicated by the level_idc byte,
1766	         and, when profile_idc is equal to 66, 77 or 88 (the Baseline,
1767	         Main, or Extended profile) and level_idc is equal to 11,
1768	         additionally by bit 4 (constraint_set3_flag) of the profile-
1769	         iop byte.  When profile_idc is equal to 66, 77 or 88 (the
1770	         Baseline, Main, or Extended profile) and level_idc is equal to
1771	         11, and bit 4 (constraint_set3_flag) of the profile-iop byte
1772	         is equal to 1, the default level is level 1b.

1774	         Table 5 lists all profiles defined in Annex A of [1] and, for
1775	         each of the profiles, the possible combinations of profile_idc
1776	         and profile-iop that represent the same sub-profile.

1778	            Table 5.  Combinations of profile_idc and profile-iop
1779	            representing the same sub-profile corresponding to the full
1780	            set of coding tools supported by one profile.  In the
1781	            following, x may be either 0 or 1, while the profile names
1782	            are indicated as follows. CB: Constrained Baseline profile,
1783	            B: Baseline profile, M: Main profile, E: Extended profile,
1784	            H: High profile, H10: High 10 profile, H42: High 4:2:2
1785	            profile, H44: High 4:4:4 Predictive profile, H10I: High 10
1786	            Intra profile, H42I: High 4:2:2 Intra profile, H44I: High
1787	            4:4:4 Intra profile, and C44I: CAVLC 4:4:4 Intra profile.

1789	              Profile     profile_idc             profile-iop
1790	                          (hexadecimal)           (binary)

1792	              CB          42 (B)                  x1xx0000
1793	                 same as: 4D (M)                  1xxx0000
1794	                 same as: 58 (E)                  11xx0000
1795	                 same as: 64 (H), 6E (H10),       1xx00000
1796	                          7A (H42), or F4 (H44)
1797	              B           42 (B)                  x0xx0000
1798	                 same as: 58 (E)                  10xx0000
1799	              M           4D (M)                  0x0x0000
1800	                 same as: 64 (H), 6E (H10),       01000000
1801	                          7A (H42), or F4 (H44)
1802	              E           58                      00xx0000
1803	              H           64                      00000000
1804	              H10         6E                      00000000
1805	              H42         7A                      00000000
1806	              H44         F4                      00000000
1807	              H10I        64                      00010000
1808	              H42I        7A                      00010000
1809	              H44I        F4                      00010000
1810	              C44I        2C                      00010000

1812	         For example, in the table above, profile_idc equal to 58
1813	         (Extended) with profile-iop equal to 11xx0000 indicates the
1814	         same sub-profile corresponding to profile_idc equal to 42
1815	         (Baseline) with profile-iop equal to x1xx0000.  Note that
1816	         other combinations of profile_idc and profile-iop (note listed
1817	         in Table 5) may represent a sub-profile equivalent to the
1818	         common subset of coding tools for more than one profile.  Note
1819	         also that a decoder conforming to a certain profile may be
1820	         able to decode bitstreams conforming to other profiles.  For
1821	         example, a decoder conforming to the High 4:4:4 profile at
1822	         certain level must be able to decode bitstreams confirming to
1823	         the Constrained Baseline, Main, High, High 10 or High 4:2:2
1824	         profile at the same or a lower level.

1826	         If the profile-level-id parameter is used to indicate
1827	         properties of a NAL unit stream, it indicates that, to decode
1828	         the stream, the minimum subset of coding tools a decoder has
1829	         to support is the default sub-profile, and the lowest level
1830	         the decoder has to support is the default level.

1832	         If the profile-level-id parameter is used for capability
1833	         exchange or session setup procedure, it indicates the subset
1834	         of coding tools, which is equal to the default sub-profile,
1835	         and the highest level, which is equal to the default level,
1836	         that the codec supports.  All levels lower than the default
1837	         level are also supported by the codec.

1839	            Informative note: Capability exchange and session setup
1840	            procedures should provide means to list the capabilities
1841	            for each supported sub-profile separately.  For example,
1842	            the one-of-N codec selection procedure of the SDP
1843	            Offer/Answer model can be used (section 10.2 of [8]).  The
1844	            one-of-N codec selection procedure may also be used to
1845	            provide different combinations of profile_idc and profile-
1846	            iop that represent the same sub-profile.  When there are
1847	            many different combinations of profile_idc and profile-iop
1848	            that represent the same sub-profile, using the one-of-N
1849	            codec selection procedure may result into a fairly large
1850	            SDP message.  Therefore, a receiver should understand the
1851	            different equivalent combinations of profile_idc and
1852	            profile-iop that represent the same sub-profile, and be
1853	            ready to accept an offer using any of the equivalent
1854	            combinations.

1856	         If no profile-level-id is present, the Baseline Profile
1857	         without additional constraints at Level 1 MUST be implied.

1859	      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
1860	         These parameters MAY be used to signal the capabilities of a
1861	         receiver implementation. These parameters MUST NOT be used for
1862	         any other purpose.  The profile-level-id parameter MUST be
1863	         present in the same receiver capability description that
1864	         contains any of these parameters.  The level conveyed in the
1865	         value of the profile-level-id parameter MUST be such that the
1866	         receiver is fully capable of supporting.  max-mbps, max-smbps,
1867	         max-fs, max-cpb, max-dpb, and max-br MAY be used to indicate
1868	         capabilities of the receiver that extend the required
1869	         capabilities of the signaled level, as specified below.

1871	         When more than one parameter from the set (max-mbps, max-smbps
1872	         , max-fs, max-cpb, max-dpb, max-br) is present, the receiver
1873	         MUST support all signaled capabilities simultaneously.  For
1874	         example, if both max-mbps and max-br are present, the signaled
1875	         level with the extension of both the frame rate and bit rate
1876	         is supported.  That is, the receiver is able to decode NAL
1877	         unit streams in which the macroblock processing rate is up to
1878	         max-mbps (inclusive), the bit rate is up to max-br
1879	         (inclusive), the coded picture buffer size is derived as
1880	         specified in the semantics of the max-br parameter below, and
1881	         other properties comply with the level specified in the value
1882	         of the profile-level-id parameter.

1884	         If a receiver can support all the properties of level A, the
1885	         level specified in the value of the profile-level-id MUST be
1886	         level A (i.e. MUST NOT be lower than level A).  In other
1887	         words, a sender or receiver MUST NOT signal values of max-
1888	         mbps, max-fs, max-cpb, max-dpb, and max-br that meet the
1889	         requirements of a higher level compared to the level specified
1890	         in the value of the profile-level-id parameter.

1892	            Informative note: When the OPTIONAL media type parameters
1893	            are used to signal the properties of a NAL unit stream,
1894	            max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
1895	            are not present, and the value of profile-level-id must
1896	            always be such that the NAL unit stream complies fully with
1897	            the specified profile and level.

1899	      max-mbps: The value of max-mbps is an integer indicating the
1900	         maximum macroblock processing rate in units of macroblocks per
1901	         second.  The max-mbps parameter signals that the receiver is
1902	         capable of decoding video at a higher rate than is required by
1903	         the signaled level conveyed in the value of the profile-level-
1904	         id parameter.  When max-mbps is signaled, the receiver MUST be
1905	         able to decode NAL unit streams that conform to the signaled
1906	         level, with the exception that the MaxMBPS value in Table A-1
1907	         of [1] for the signaled level is replaced with the value of
1908	         max-mbps.  The value of max-mbps MUST be greater than or equal
1909	         to the value of MaxMBPS for the level given in Table A-1 of
1910	         [1].  Senders MAY use this knowledge to send pictures of a
1911	         given size at a higher picture rate than is indicated in the
1912	         signaled level.

1914	      max-smbps: The value of max-smbps is an integer indicating the
1915	         maximum static macroblock processing rate in units of static
1916	         macroblocks per second, under the hypothetical assumption that
1917	         all macroblocks are static macroblocks.  When max-smbps is
1918	         signalled the MaxMBPS value in Table A-1 of [1] should be
1919	         replaced with the result of the following computation:

1921	         o If the parameter max-mbps is signalled, set a variable
1922	            MaxMacroblocksPerSecond to the value of max-mbps.
1923	            Otherwise, set MaxMacroblocksPerSecond equal to the value
1924	            of MaxMBPS for the level in Table A-1 [1].

1926	         o Set a variable P_non-static to the proportion of non-static
1927	            macroblocks in picture n.

1929	         o Set a variable P_static to the proportion of static
1930	            macroblocks in picture n.

1932	         o The value of MaxMBPS in Table A-1 of [1] should be
1933	            considered by the encoder to be equal to:

1935	            MaxMacroblocksPerSecond * max-smbps / ( P_non-static * max-
1936	            smbps + P_static * MaxMacroblocksPerSecond)

1938	         The encoder should recompute this value for each picture. The
1939	         value of max-smbps MUST be greater than the value of MaxMBPS
1940	         for the level given in Table A-1 of [1].  Senders MAY use this
1941	         knowledge to send pictures of a given size at a higher picture
1942	         rate than is indicated in the signalled level.

1944	      max-fs: The value of max-fs is an integer indicating the maximum
1945	         frame size in units of macroblocks.  The max-fs parameter
1946	         signals that the receiver is capable of decoding larger
1947	         picture sizes than are required by the signaled level conveyed
1948	         in the value of the profile-level-id parameter.  When max-fs
1949	         is signaled, the receiver MUST be able to decode NAL unit
1950	         streams that conform to the signaled level, with the exception
1951	         that the MaxFS value in Table A-1 of [1] for the signaled
1952	         level is replaced with the value of max-fs.  The value of max-
1953	         fs MUST be greater than or equal to the value of MaxFS for the
1954	         level given in Table A-1 of [1].  Senders MAY use this
1955	         knowledge to send larger pictures at a proportionally lower
1956	         frame rate than is indicated in the signaled level.

1958	      max-cpb: The value of max-cpb is an integer indicating the
1959	         maximum coded picture buffer size in units of 1000 bits for
1960	         the VCL HRD parameters (see A.3.1 item i of [1]) and in units
1961	         of 1200 bits for the NAL HRD parameters (see A.3.1 item j of
1962	         [1]).  The max-cpb parameter signals that the receiver has
1963	         more memory than the minimum amount of coded picture buffer
1964	         memory required by the signaled level conveyed in the value of
1965	         the profile-level-id parameter.  When max-cpb is signaled, the
1966	         receiver MUST be able to decode NAL unit streams that conform
1967	         to the signaled level, with the exception that the MaxCPB
1968	         value in Table A-1 of [1] for the signaled level is replaced
1969	         with the value of max-cpb.  The value of max-cpb MUST be
1970	         greater than or equal to the value of MaxCPB for the level
1971	         given in Table A-1 of [1].  Senders MAY use this knowledge to
1972	         construct coded video streams with greater variation of bit
1973	         rate than can be achieved with the MaxCPB value in Table A-1
1974	         of [1].

1976	            Informative note: The coded picture buffer is used in the
1977	            hypothetical reference decoder (Annex C) of H.264.  The use
1978	            of the hypothetical reference decoder is recommended in
1979	            H.264 encoders to verify that the produced bitstream
1980	            conforms to the standard and to control the output bitrate.
1981	            Thus, the coded picture buffer is conceptually independent
1982	            of any other potential buffers in the receiver, including
1983	            de-interleaving and de-jitter buffers.  The coded picture
1984	            buffer need not be implemented in decoders as specified in
1985	            Annex C of H.264, but rather standard-compliant decoders
1986	            can have any buffering arrangements provided that they can
1987	            decode standard-compliant bitstreams.  Thus, in practice,
1988	            the input buffer for video decoder can be integrated with
1989	            de-interleaving and de-jitter buffers of the receiver.

1991	      max-dpb: The value of max-dpb is an integer indicating the
1992	         maximum decoded picture buffer size in units of 1024 bytes.
1993	         The max-dpb parameter signals that the receiver has more
1994	         memory than the minimum amount of decoded picture buffer
1995	         memory required by the signaled level conveyed in the value of
1996	         the profile-level-id parameter.  When max-dpb is signaled, the
1997	         receiver MUST be able to decode NAL unit streams that conform
1998	         to the signaled level, with the exception that the MaxDPB
1999	         value in Table A-1 of [1] for the signaled level is replaced
2000	         with the value of max-dpb.  Consequently, a receiver that
2001	         signals max-dpb MUST be capable of storing the following
2002	         number of decoded frames, complementary field pairs, and non-
2003	         paired fields in its decoded picture buffer:

2005	            Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs *
2006	            256 * ChromaFormatFactor ), 16)

2008	         PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
2009	         defined in [1].

2011	         The value of max-dpb MUST be greater than or equal to the
2012	         value of MaxDPB for the level given in Table A-1 of [1].
2013	         Senders MAY use this knowledge to construct coded video
2014	         streams with improved compression.

2016	            Informative note: This parameter was added primarily to
2017	            complement a similar codepoint in the ITU-T Recommendation
2018	            H.245, so as to facilitate signaling gateway designs.  The
2019	            decoded picture buffer stores reconstructed samples.  There
2020	            is no relationship between the size of the decoded picture
2021	            buffer and the buffers used in RTP, especially de-
2022	            interleaving and de-jitter buffers.

2024	      max-br: The value of max-br is an integer indicating the maximum
2025	         video bit rate in units of 1000 bits per second for the VCL
2026	         HRD parameters (see A.3.1 item i of [1]) and in units of 1200
2027	         bits per second for the NAL HRD parameters (see A.3.1 item j
2028	         of [1]).

2030	         The max-br parameter signals that the video decoder of the
2031	         receiver is capable of decoding video at a higher bit rate
2032	         than is required by the signaled level conveyed in the value
2033	         of the profile-level-id parameter.

2035	         When max-br is signaled, the video codec of the receiver MUST
2036	         be able to decode NAL unit streams that conform to the
2037	         signaled level, conveyed in the profile-level-id parameter,
2038	         with the following exceptions in the limits specified by the
2039	         level:

2041	         o The value of max-br replaces the MaxBR value of the signaled
2042	            level (in Table A-1 of [1]).

2044	         o When the max-cpb parameter is not present, the result of the
2045	            following formula replaces the value of MaxCPB in Table A-1
2046	            of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
2047	            the signaled level).

2049	         For example, if a receiver signals capability for Level 1.2
2050	         with max-br equal to 1550, this indicates a maximum video
2051	         bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum
2052	         video bitrate of 1860 kbits/sec for NAL HRD parameters, and a
2053	         CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000).

2055	         The value of max-br MUST be greater than or equal to the value
2056	         MaxBR for the signaled level given in Table A-1 of [1].

2058	         Senders MAY use this knowledge to send higher bitrate video as
2059	         allowed in the level definition of Annex A of H.264, to
2060	         achieve improved video quality.

2062	            Informative note: This parameter was added primarily to
2063	            complement a similar codepoint in the ITU-T Recommendation
2064	            H.245, so as to facilitate signaling gateway designs.  No
2065	            assumption can be made from the value of this parameter
2066	            that the network is capable of handling such bit rates at
2067	            any given time.  In particular, no conclusion can be drawn
2068	            that the signaled bit rate is possible under congestion
2069	            control constraints.

2071	      redundant-pic-cap:
2072	         This parameter signals the capabilities of a receiver
2073	         implementation.  When equal to 0, the parameter indicates that
2074	         the receiver makes no attempt to use redundant coded pictures
2075	         to correct incorrectly decoded primary coded pictures.  When
2076	         equal to 0, the receiver is not capable of using redundant
2077	         slices; therefore, a sender SHOULD avoid sending redundant
2078	         slices to save bandwidth.  When equal to 1, the receiver is
2079	         capable of decoding any such redundant slice that covers a
2080	         corrupted area in a primary decoded picture (at least partly),
2081	         and therefore a sender MAY send redundant slices.  When the
2082	         parameter is not present, then a value of 0 MUST be used for
2083	         redundant-pic-cap.  When present, the value of redundant-pic-
2084	         cap MUST be either 0 or 1.

2086	         When the profile-level-id parameter is present in the same
2087	         signaling as the redundant-pic-cap parameter, and the profile
2088	         indicated in profile-level-id is such that it disallows the
2089	         use of redundant coded pictures (e.g., Main Profile), the
2090	         value of redundant-pic-cap MUST be equal to 0.  When a
2091	         receiver indicates redundant-pic-cap equal to 0, the received
2092	         stream SHOULD NOT contain redundant coded pictures.

2094	            Informative note: Even if redundant-pic-cap is equal to 0,
2095	            the decoder is able to ignore redundant codec pictures
2096	            provided that the decoder supports such a profile
2097	            (Baseline, Extended) in which redundant coded pictures are
2098	            allowed.

2100	            Informative note: Even if redundant-pic-cap is equal to 1,
2101	            the receiver may also choose other error concealment
2102	            strategies to replace or complement decoding of redundant
2103	            slices.

2105	      sprop-parameter-sets:
2106	         This parameter MAY be used to convey any sequence and picture
2107	         parameter set NAL units (herein referred to as the initial
2108	         parameter set NAL units) that can be placed in the NAL unit
2109	         stream to precede any other NAL units in decoding order.  The
2110	         parameter MUST NOT be used to indicate codec capability in any
2111	         capability exchange procedure.  The value of the parameter is
2112	         a comma (',') separated list of base64 [7] representations of
2113	         parameter set NAL units as specified in sections 7.3.2.1 and
2114	         7.3.2.2 of [1].  Note that the number of bytes in a parameter
2115	         set NAL unit is typically less than 10, but a picture
2116	         parameter set NAL unit can contain several hundreds of bytes.

2118	            Informative note: When several payload types are offered in
2119	            the SDP Offer/Answer model, each with its own sprop-
2120	            parameter-sets parameter, then the receiver cannot assume
2121	            that those parameter sets do not use conflicting storage
2122	            locations (i.e., identical values of parameter set
2123	            identifiers).  Therefore, a receiver should buffer all
2124	            sprop-parameter-sets and make them available to the decoder
2125	            instance that decodes a certain payload type.

2127	         The "sprop-parameter-sets" parameter MUST only contain
2128	         parameter sets that are conforming to the profile-level-id,
2129	         i.e., the subset of coding tools indicated by any of the
2130	         parameter sets MUST be equal to the default sub-profile, and
2131	         the level indicated by any of the parameter sets MUST be equal
2132	         to the default level.

2134	      sprop-level-parameter-sets:
2135	         This parameter MAY be used to convey any sequence and picture
2136	         parameter set NAL units (herein referred to as the initial
2137	         parameter set NAL units) that can be placed in the NAL unit
2138	         stream to precede any other NAL units in decoding order and
2139	         that are associated with one or more levels lower than the
2140	         default level.  The parameter MUST NOT be used to indicate
2141	         codec capability in any capability exchange procedure.

2143	         The sprop-level-parameter-sets parameter contains parameter
2144	         sets for one or more levels which are lower than the default
2145	         level.  All parameter sets associated with one level are
2146	         clustered and prefixed with a three-byte field which has the
2147	         same syntax as profile-level-id.  This enables the receiver to
2148	         install the parameter sets for one level and discard the rest.
2149	         The three-byte field is named PLId, and all parameter sets
2150	         associated with one level are named PSL, which has the same
2151	         syntax as sprop-parameter-sets.  Parameter sets for each level
2152	         are represented in the form of PLId:PSL, i.e., PLId followed
2153	         by a colon (':') and the base64 [7] representation of the
2154	         initial parameter set NAL units for the level.  Each pair of
2155	         PLId:PSL is also separated by a colon.  Note that a PSL can
2156	         contain multiple parameter sets for that level, separated with
2157	         commas (',').

2159	         The subset of coding tools indicated by each PLId field MUST
2160	         be equal to the default sub-profile, and the level indicated
2161	         by each PLId field MUST be lower than the default level.  All
2162	         sequence parameter sets contained in each PSL MUST have the
2163	         three bytes from profile_idc to level_idc, inclusive, equal to
2164	         the preceding PLId.

2166	            Informative note: This parameter allows for efficient level
2167	            downgrade in SDP Offer/Answer and out-of-band transport of
2168	            parameter sets, simultaneously.

2170	      use-level-src-parameter-sets:
2171	         This parameter MAY be used to indicate a receiver capability.
2172	         The value MAY be equal to either 0 or 1.  When the parameter
2173	         is not present, the value MUST be inferred to be equal to 0.
2174	         The value 0 indicates that the receiver does not understand
2175	         the sprop-level-parameter-sets parameter, and does not
2176	         understand the "fmtp" source attribute as specified in section
2177	         6.3 of [9], and will ignore sprop-level-parameter-sets when
2178	         present, and will ignore sprop-parameter-sets when conveyed
2179	         using the "fmtp" source attribute.  The value 1 indicates that
2180	         the receiver understands the sprop-level-parameter-sets
2181	         parameter, and understands the "fmtp" source attribute as
2182	         specified in section 6.3 of [9], and is capable of using
2183	         parameter sets contained in the sprop-level-parameter-sets or
2184	         contained in the sprop-parameter-sets that is conveyed using
2185	         the "fmtp" source attribute.

2187	            Informative note: An RFC 3984 receiver does not understand
2188	            sprop-level-parameter-sets, use-level-src-parameter-sets,
2189	            or the "fmtp" source attribute as specified in section 6.3
2190	            of [9].  Therefore, during SDP Offer/Answer, an RFC 3984
2191	            receiver as the answerer will simply ignore sprop-level-
2192	            parameter-sets, when present in an offer, and sprop-
2193	            parameter-sets, when conveyed using the "fmtp" source
2194	            attribute as specified in section 6.3 of [9].  Assume that
2195	            the offered payload type was accepted at a level lower than
2196	            the default level.  If the offered payload type included
2197	            sprop-level-parameter-sets or included sprop-parameter-sets
2198	            conveyed using the "fmtp" source attribute, and the offerer
2199	            sees that the answerer has not included use-level-src-
2200	            parameter-sets equal to 1 in the answer, the offerer gets
2201	            to know that in-band transport of parameter sets is needed.

2203	      packetization-mode:
2204	         This parameter signals the properties of an RTP payload type
2205	         or the capabilities of a receiver implementation.  Only a
2206	         single configuration point can be indicated; thus, when
2207	         capabilities to support more than one packetization-mode are
2208	         declared, multiple configuration points (RTP payload types)
2209	         must be used.

2211	         When the value of packetization-mode is equal to 0 or
2212	         packetization-mode is not present, the single NAL mode, as
2213	         defined in section 6.2 of RFC 3984, MUST be used.  This mode
2214	         is in use in standards using ITU-T Recommendation H.241 [3]
2215	         (see section 12.1).  When the value of packetization-mode is
2216	         equal to 1, the non-interleaved mode, as defined in section
2217	         6.3 of RFC 3984, MUST be used.  When the value of
2218	         packetization-mode is equal to 2, the interleaved mode, as
2219	         defined in section 6.4 of RFC 3984, MUST be used.  The value
2220	         of packetization-mode MUST be an integer in the range of 0 to
2221	         2, inclusive.

2223	      sprop-interleaving-depth:
2224	         This parameter MUST NOT be present when packetization-mode is
2225	         not present or the value of packetization-mode is equal to 0
2226	         or 1.  This parameter MUST be present when the value of
2227	         packetization-mode is equal to 2.

2229	         This parameter signals the properties of an RTP packet stream.
2230	         It specifies the maximum number of VCL NAL units that precede
2231	         any VCL NAL unit in the RTP packet stream in transmission
2232	         order and follow the VCL NAL unit in decoding order.
2233	         Consequently, it is guaranteed that receivers can reconstruct
2234	         NAL unit decoding order when the buffer size for NAL unit
2235	         decoding order recovery is at least the value of sprop-
2236	         interleaving-depth + 1 in terms of VCL NAL units.

2238	         The value of sprop-interleaving-depth MUST be an integer in
2239	         the range of 0 to 32767, inclusive.

2241	      sprop-deint-buf-req:
2242	         This parameter MUST NOT be present when packetization-mode is
2243	         not present or the value of packetization-mode is equal to 0
2244	         or 1.  It MUST be present when the value of packetization-mode
2245	         is equal to 2.

2247	         sprop-deint-buf-req signals the required size of the de-
2248	         interleaving buffer for the RTP packet stream.  The value of
2249	         the parameter MUST be greater than or equal to the maximum
2250	         buffer occupancy (in units of bytes) required in such a de-
2251	         interleaving buffer that is specified in section 7.2 of RFC
2252	         3984.  It is guaranteed that receivers can perform the de-
2253	         interleaving of interleaved NAL units into NAL unit decoding
2254	         order, when the de-interleaving buffer size is at least the
2255	         value of sprop-deint-buf-req in terms of bytes.

2257	         The value of sprop-deint-buf-req MUST be an integer in the
2258	         range of 0 to 4294967295, inclusive.

2260	            Informative note: sprop-deint-buf-req indicates the
2261	            required size of the de-interleaving buffer only.  When
2262	            network jitter can occur, an appropriately sized jitter
2263	            buffer has to be provisioned for as well.

2265	      deint-buf-cap:
2266	         This parameter signals the capabilities of a receiver
2267	         implementation and indicates the amount of de-interleaving
2268	         buffer space in units of bytes that the receiver has available
2269	         for reconstructing the NAL unit decoding order.  A receiver is
2270	         able to handle any stream for which the value of the sprop-
2271	         deint-buf-req parameter is smaller than or equal to this
2272	         parameter.

2274	         If the parameter is not present, then a value of 0 MUST be
2275	         used for deint-buf-cap.  The value of deint-buf-cap MUST be an
2276	         integer in the range of 0 to 4294967295, inclusive.

2278	            Informative note: deint-buf-cap indicates the maximum
2279	            possible size of the de-interleaving buffer of the receiver
2280	            only.  When network jitter can occur, an appropriately
2281	            sized jitter buffer has to be provisioned for as well.

2283	      sprop-init-buf-time:
2284	         This parameter MAY be used to signal the properties of an RTP
2285	         packet stream.  The parameter MUST NOT be present, if the
2286	         value of packetization-mode is equal to 0 or 1.

2288	         The parameter signals the initial buffering time that a
2289	         receiver MUST wait before starting decoding to recover the NAL
2290	         unit decoding order from the transmission order.  The
2291	         parameter is the maximum value of (decoding time of the NAL
2292	         unit - transmission time of a NAL unit), assuming reliable and
2293	         instantaneous transmission, the same timeline for transmission
2294	         and decoding, and that decoding starts when the first packet
2295	         arrives.

2297	         An example of specifying the value of sprop-init-buf-time
2298	         follows.  A NAL unit stream is sent in the following
2299	         interleaved order, in which the value corresponds to the
2300	         decoding time and the transmission order is from left to
2301	         right:

2303	            0  2  1  3  5  4  6  8  7 ...

2305	         Assuming a steady transmission rate of NAL units, the
2306	         transmission times are:

2308	            0  1  2  3  4  5  6  7  8 ...

2310	         Subtracting the decoding time from the transmission time
2311	         column-wise results in the following series:

2313	            0 -1  1  0 -1  1  0 -1  1 ...

2315	         Thus, in terms of intervals of NAL unit transmission times,
2316	         the value of sprop-init-buf-time in this example is 1.  The
2317	         parameter is coded as a non-negative base10 integer
2318	         representation in clock ticks of a 90-kHz clock.  If the
2319	         parameter is not present, then no initial buffering time value
2320	         is defined.  Otherwise the value of sprop-init-buf-time MUST
2321	         be an integer in the range of 0 to 4294967295, inclusive.

2323	         In addition to the signaled sprop-init-buf-time, receivers
2324	         SHOULD take into account the transmission delay jitter
2325	         buffering, including buffering for the delay jitter caused by
2326	         mixers, translators, gateways, proxies, traffic-shapers, and
2327	         other network elements.

2329	      sprop-max-don-diff:
2330	         This parameter MAY be used to signal the properties of an RTP
2331	         packet stream.  It MUST NOT be used to signal transmitter or
2332	         receiver or codec capabilities.  The parameter MUST NOT be
2333	         present if the value of packetization-mode is equal to 0 or 1.
2334	         sprop-max-don-diff is an integer in the range of 0 to 32767,
2335	         inclusive.  If sprop-max-don-diff is not present, the value of
2336	         the parameter is unspecified.  sprop-max-don-diff is
2337	         calculated as follows:

2339	            sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
2340	            for any i and any j>i,

2342	         where i and j indicate the index of the NAL unit in the
2343	         transmission order and AbsDON denotes a decoding order number
2344	         of the NAL unit that does not wrap around to 0 after 65535.
2345	         In other words, AbsDON is calculated as follows: Let m and n
2346	         be consecutive NAL units in transmission order.  For the very
2347	         first NAL unit in transmission order (whose index is 0),
2348	         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
2349	         as follows:

2351	            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

2353	            If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
2354	              AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

2356	            If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
2357	              AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

2359	            If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
2360	              AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

2362	            If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
2363	              AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

2365	         where DON(i) is the decoding order number of the NAL unit
2366	         having index i in the transmission order.  The decoding order
2367	         number is specified in section 5.5 of RFC 3984.

2369	            Informative note: Receivers may use sprop-max-don-diff to
2370	            trigger which NAL units in the receiver buffer can be
2371	            passed to the decoder.

2373	      max-rcmd-nalu-size:
2374	         This parameter MAY be used to signal the capabilities of a
2375	         receiver.  The parameter MUST NOT be used for any other
2376	         purposes.  The value of the parameter indicates the largest
2377	         NALU size in bytes that the receiver can handle efficiently.
2378	         The parameter value is a recommendation, not a strict upper
2379	         boundary.  The sender MAY create larger NALUs but must be
2380	         aware that the handling of these may come at a higher cost
2381	         than NALUs conforming to the limitation.

2383	         The value of max-rcmd-nalu-size MUST be an integer in the
2384	         range of 0 to 4294967295, inclusive.  If this parameter is not
2385	         specified, no known limitation to the NALU size exists.
2386	         Senders still have to consider the MTU size available between
2387	         the sender and the receiver and SHOULD run MTU discovery for
2388	         this purpose.

2390	         This parameter is motivated by, for example, an IP to H.223
2391	         video telephony gateway, where NALUs smaller than the H.223
2392	         transport data unit will be more efficient.  A gateway may
2393	         terminate IP; thus, MTU discovery will normally not work
2394	         beyond the gateway.

2396	            Informative note: Setting this parameter to a lower than
2397	            necessary value may have a negative impact.

2399	      sar-understood:
2400	         This parameter MAY be used to indicate a receiver capability
2401	         and not anything else.  The parameter indicates the maximum
2402	         value of aspect_ratio_idc (specified in [1]) smaller than 255
2403	         that the receiver understands.  Table E-1 of [1] specifies
2404	         aspect_ratio_idc equal to 0 as "unspecified", 1 to 16,
2405	         inclusive, as specific Sample Aspect Ratios (SARs), 17 to 254,
2406	         inclusive, as "reserved", and 255 as the Extended SAR, for
2407	         which SAR width and SAR height are explicitly signaled.
2408	         Therefore, a receiver with a decoder according to [1]
2409	         understands aspect_ratio_idc in the range of 1 to 16,
2410	         inclusive and aspect_ratio_idc equal to 255, in the sense that
2411	         the receiver knows what exactly the SAR is.  For such a
2412	         receiver, the value of sar-understood is 16.  If in the future
2413	         Table E-1 of [1] is extended, e.g., such that the SAR for
2414	         aspect_ratio_idc equal to 17 is specified, then for a receiver
2415	         with a decoder that understands the extension, the value of
2416	         sar-understood is 17.  For a receiver with a decoder according
2417	         to the 2003 version of [1], the value of sar-understood is 13,
2418	         as the minimum reserved aspect_ratio_idc therein is 14.

2420	         When sar-understood is not present, the value MUST be inferred
2421	         to be equal to 13.

2423	      sar-supported:
2424	         This parameter MAY be used to indicate a receiver capability
2425	         and not anything else.  The value of this parameter is an
2426	         integer in the range of 1 to sar-understood, inclusive, equal
2427	         to 255.  The value of sar-supported equal to N smaller than
2428	         255 indicates that the reciever supports all the SARs
2429	         corresponding to H.264 aspect_ratio_idc values (see Table E-1
2430	         of [1]) in the range from 1 to N, inclusive, without geometric
2431	         distortion.  The value of sar-supported equal to 255 indicates
2432	         that the receiver supports all sample aspect ratios which are
2433	         expressible using two 16-bit integer values as the numerator
2434	         and denominator, i.e., those that are expressible using the
2435	         H.264 aspect_ratio_idc value of 255 (Extended_SAR, see Table
2436	         E-1 of [1]), without geometric distortion.

2438	         H.264 compliant encoders SHOULD NOT send an aspect_ratio_idc
2439	         equal to 0, or an aspect_ratio_idc larger than sar-understood
2440	         and smaller than 255.  H.264 compliant encoders SHOULD send an
2441	         aspect_ratio_idc that the receiver is able to display without
2442	         geometrical distortion.  However, H.264 compliant encoders MAY
2443	         choose to send pictures using any SAR.

2445	         Note that the actual sample aspect ratio or extended sample
2446	         aspect ratio, when present, of the stream is conveyed in the
2447	         Video Usability Information (VUI) part of the sequence
2448	         parameter set.

2450	      Encoding considerations:
2451	         This type is only defined for transfer via RTP (RFC 3550).

2453	      Security considerations:
2454	         See section 9 of RFC xxxx.

2456	      Public specification:
2457	         Please refer to RFC xxxx and its section 15.

2459	      Additional information:
2460	         None

2462	      File extensions:     none

2464	      Macintosh file type code: none

2466	      Object identifier or OID: none

2468	      Person & email address to contact for further information:
2469	         Ye-Kui Wang, ye-kui.wang@nokia.com

2471	      Intended usage:      COMMON

2473	      Author:
2474	         Ye-Kui Wang, ye-kui.wang@nokia.com

2476	      Change controller:
2477	         IETF Audio/Video Transport working group delegated from the
2478	         IESG.

2480	8.2. SDP Parameters

2482	8.2.1. Mapping of Payload Type Parameters to SDP

2484	   The media type video/H264 string is mapped to fields in the Session
2485	   Description Protocol (SDP) [6] as follows:

2487	   o  The media name in the "m=" line of SDP MUST be video.

2489	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
2490	      media subtype).

2492	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2494	   o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-
2495	      smbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic-
2496	      cap", "use-level-src-parameter-sets", "packetization-mode",
2497	      "sprop-interleaving-depth", "sprop-deint-buf-req", "deint-buf-
2498	      cap", "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-
2499	      size", "sar-understood", and "sar-supported", when present, MUST
2500	      be included in the "a=fmtp" line of SDP.  These parameters are
2501	      expressed as a media type string, in the form of a semicolon
2502	      separated list of parameter=value pairs.

2504	   o  The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level-
2505	      parameter-sets", when present, MUST be included in the "a=fmtp"
2506	      line of SDP or conveyed using the "fmtp" source attribute as
2507	      specified in section 6.3 of [9].  For a particular media format
2508	      (i.e., RTP payload type), a "sprop-parameter-sets" or "sprop-
2509	      level-parameter-sets" MUST NOT be both included in the "a=fmtp"
2510	      line of SDP and conveyed using the "fmtp" source attribute.  When
2511	      included in the "a=fmtp" line of SDP, these parameters are
2512	      expressed as a media type string, in the form of a semicolon
2513	      separated list of parameter=value pairs.  When conveyed using the
2514	      "fmtp" source attribute, these parameters are only associated with
2515	      the given source and payload type as parts of the "fmtp" source
2516	      attribute.

2518	         Informative note: Conveyance of "sprop-parameter-sets" and
2519	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2520	         allows for out-of-band transport of parameter sets in
2521	         topologies like Topo-Video-switch-MCU [29].

2523	   An example of media representation in SDP is as follows (Baseline
2524	   Profile, Level 3.0, some of the constraints of the Main profile may
2525	   not be obeyed):

2527	      m=video 49170 RTP/AVP 98
2528	      a=rtpmap:98 H264/90000
2529	      a=fmtp:98 profile-level-id=42A01E;
2530	                packetization-mode=1;
2531	                sprop-parameter-sets=<parameter sets data>

2533	8.2.2. Usage with the SDP Offer/Answer Model

2535	   When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
2536	   for negotiation for unicast usage, the following limitations and
2537	   rules apply:

2539	   o  The parameters identifying a media format configuration for H.264
2540	      are "profile-level-id" and "packetization-mode", when present.
2541	      These media format configuration parameters (except for the level
2542	      part of "profile-level-id") MUST be used symmetrically; i.e., the
2543	      answerer MUST either maintain all configuration parameters or
2544	      remove the media format (payload type) completely, if one or more
2545	      of the parameter values are not supported.  Note that the level
2546	      part of "profile-level-id" includes level_idc, and, for indication
2547	      of level 1b when profile_idc is equal to 66, 77 or 88, bit 4
2548	      (constraint_set3_flag) of profile-iop.  The level part of
2549	      "profile-level-id" is downgradable, i.e. the answerer MUST
2550	      maintain the same or a lower level or remove the media format
2551	      (payload type) completely.

2553	         Informative note: The requirement for symmetric use applies
2554	         only for the above media format configuration parameters
2555	         excluding the level part of "profile-level-id", and not for
2556	         the other stream properties and capability parameters.

2558	         Informative note: In H.264 [1], all the levels except for
2559	         level 1b are equal to the value of level_idc divided by 10.
2560	         Level 1b is a level higher than level 1.0 but lower than level
2561	         1.1, and is signaled in an ad-hoc manner, due to that the
2562	         level was specified after level 1.0 and level 1.1.  For the
2563	         Baseline, Main and Extended profiles (with profile_idc equal
2564	         to 66, 77 and 88, respectively), level 1b is indicated by
2565	         level_idc equal to 11 (i.e. same as level 1.1) and
2566	         constraint_set3_flag equal to 1.  For other profiles, level 1b
2567	         is indicated by level_idc equal to 9 (but note that level 1b
2568	         for these profiles are still higher than level 1, which has
2569	         level_idc equal to 10, and lower than level 1.1).  In SDP
2570	         Offer/Answer, an answer to an offer may indicate a level equal
2571	         to or lower than the level indicated in the offer.  Due to the
2572	         ad-hoc indication of level 1b, offerers and answerers must
2573	         check the value of bit 4 (constraint_set3_flag) of the middle
2574	         octet of the parameter "profile-level-id", when profile_idc is
2575	         equal to 66, 77 or 88 and level_idc is equal to 11.

2577	      To simplify handling and matching of these configurations, the
2578	      same RTP payload type number used in the offer SHOULD also be
2579	      used in the answer, as specified in [8].  An answer MUST NOT
2580	      contain a payload type number used in the offer unless the
2581	      configuration is exactly the same as in the offer or the
2582	      configuration in the answer only differs from that in the offer
2583	      with a level lower than the default level offered.

2585	         Informative note: When an offerer receives an answer, it has
2586	         to compare payload types not declared in the offer based on
2587	         the media type (i.e., video/H264) and the above media
2588	         configuration parameters with any payload types it has already
2589	         declared.  This will enable it to determine whether the
2590	         configuration in question is new or if it is equivalent to
2591	         configuration already offered, since a different payload type
2592	         number may be used in the answer.

2594	   o  The parameters "sprop-deint-buf-req", "sprop-interleaving-depth",
2595	      "sprop-max-don-diff", and "sprop-init-buf-time" describe the
2596	      properties of the RTP packet stream that the offerer or answerer
2597	      is sending for the media format configuration.  This differs from
2598	      the normal usage of the Offer/Answer parameters: normally such
2599	      parameters declare the properties of the stream that the offerer
2600	      or the answerer is able to receive.  When dealing with H.264, the
2601	      offerer assumes that the answerer will be able to receive media
2602	      encoded using the configuration being offered.

2604	         Informative note: The above parameters apply for any stream
2605	         sent by the declaring entity with the same configuration;
2606	         i.e., they are dependent on their source.  Rather than being
2607	         bound to the payload type, the values may have to be applied
2608	         to another payload type when being sent, as they apply for the
2609	         configuration.

2611	   o  The capability parameters ("max-mbps", "max-smbps", "max-fs",
2612	      "max-cpb", "max-dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-
2613	      nalu-size", "sar-understood", "sar-supported") MAY be used to
2614	      declare further capabilities of the offerer or answerer for
2615	      receiving.  These parameters can only be present when the
2616	      direction attribute is sendrecv or recvonly, and the parameters
2617	      describe the limitations of what the offerer or answerer accepts
2618	      for receiving streams.

2620	   o  An offerer has to include the size of the de-interleaving buffer,
2621	      "sprop-deint-buf-req", in the offer for an interleaved H.264
2622	      stream.  To enable the offerer and answerer to inform each other
2623	      about their capabilities for de-interleaving buffering in
2624	      receiving streams, both parties are RECOMMENDED to include "deint-
2625	      buf-cap".  For interleaved streams, it is also RECOMMENDED to
2626	      consider offering multiple payload types with different buffering
2627	      requirements when the capabilities of the receiver are unknown.

2629	   o  The "sprop-parameter-sets" or "sprop-level-parameter-sets"
2630	      parameter, when present (included in the "a=fmtp" line of SDP or
2631	      conveyed using the "fmtp" source attribute as specified in section
2632	      6.3 of [9]), is used for out-of-band transport of parameter sets.
2633	      However, when out-of-band transport of parameter sets is used,
2634	      parameter sets MAY still be additionally transported in-band.  If
2635	      neither "sprop-parameter-sets" nor "sprop-level-parameter-sets" is
2636	      present, then only in-band transport of parameter sets is used.

2638	      An offer MAY include either or both of "sprop-parameter-sets" and
2639	      "sprop-level-parameter-sets".  An answer MAY include "sprop-
2640	      parameter-sets", and MUST NOT include "sprop-level-parameter-
2641	      sets".

2643	      When an offered payload type is accepted without level downgrade,
2644	      i.e. the default level is accepted, the following applies.

2646	        o When there is a "sprop-parameter-sets" included in the
2647	           "a=fmtp" line of SDP, the answerer MUST be prepared to use
2648	           the parameter sets included in "sprop-parameter-sets" for
2649	           decoding the incoming NAL unit stream.

2651	        o When there is a "sprop-parameter-sets" conveyed using the
2652	           "fmtp" source attribute as specified in section 6.3 of [9],
2653	           and the answerer understands the "fmtp" source attribute, it
2654	           MUST be prepared to use the parameter sets included in
2655	           "sprop-parameter-sets" for decoding the incoming NAL unit
2656	           stream, and it MUST include either "use-level-src-parameter-
2657	           sets" equal to 1 or the "fmtp" source attribute in the
2658	           answer.

2660	        o When there is a "sprop-parameter-sets" conveyed using the
2661	           "fmtp" source attribute as specified in section 6.3 of [9],
2662	           and the answerer does not understand the "fmtp" source
2663	           attribute, in-band transport of parameter sets MUST be used,
2664	           and the answerer MUST NOT include "use-level-src-parameter-
2665	           sets" equal to 1 or the "fmtp" source attribute in the
2666	           answer.

2668	        o When "sprop-parameter-sets" is not present, in-band
2669	           transport of parameter sets MUST be used, and the answer
2670	           MUST NOT include "use-level-src-parameter-sets" equal to 1.

2672	        o The answerer MUST ignore "sprop-level-parameter-sets", when
2673	           present (either included in the "a=fmtp" line of SDP or
2674	           conveyed using the "fmtp" source attribute).

2676	      When level downgrade is in use, i.e., a level lower than the
2677	      default level offered is accepted, the following applies.

2679	        o The answerer MUST ignore "sprop-parameter-sets", when
2680	           present (either included in the "a=fmtp" line of SDP or
2681	           conveyed using the "fmtp" source attribute).

2683	        o If "use-level-src-parameter-sets" equal to 1 the "fmtp"
2684	           source attribute are not present in the answer for the
2685	           accepted payload type, the answerer MUST ignore "sprop-
2686	           level-parameter-sets", when present.

2688	        o Otherwise ("use-level-src-parameter-sets" equal to 1 or the
2689	           "fmtp" source attribute is present in the answer for the
2690	           accepted payload type), the answerer MUST be prepared to use
2691	           the parameter sets that are included in "sprop-level-
2692	           parameter-sets" for the accepted level, when present, for
2693	           decoding the incoming NAL unit stream, and ignore all other
2694	           parameter sets included in "sprop-level-parameter-sets".

2696	        o When no parameter sets for the accepted level are present in
2697	           the "sprop-level-parameter-sets", in-band transport of
2698	           parameter sets MUST be used.

2700	      The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
2701	      the answerer MAY use either out-of-band or in-band transport of
2702	      parameter sets for the stream it is sending, regardless of
2703	      whether out-of-band parameter sets transport has been used in the
2704	      offerer-to-answerer direction.  All parameter sets included in
2705	      the "sprop-parameter-sets", when present, for the accepted
2706	      payload type in an answer MUST be associated with the accepted
2707	      level, as indicated by the profile-level-id in the answer for the
2708	      accepted payload type.

2710	      Parameter sets included in "sprop-parameter-sets" in an answer
2711	      are independent of those parameter sets included in the offer, as
2712	      they are used for decoding two different video streams, one from
2713	      the answerer to the offerer, and the other in the opposite
2714	      direction.  The offerer MUST be prepared to use the parameter
2715	      sets included in the answer's "sprop-parameter-sets", when
2716	      present, for decoding the incoming NAL unit stream.

2718	      When "sprop-parameter-sets" or "sprop-level-parameter-sets" is
2719	      conveyed using the "fmtp" source attribute in as specified in
2720	      section 6.3 of [9], the receiver of the parameters MUST store the
2721	      parameter sets included in the "sprop-parameter-sets" or "sprop-
2722	      level-parameter-sets" for the accepted level and associate them
2723	      to the source given as a part of the "fmtp" source attribute.
2724	      Parameter sets associated with one source MUST only be used to
2725	      decode NAL units conveyed in RTP packets from the same source.
2726	      When this mechanism is in use, SSRC collision detection and
2727	      resolution MUST be performed as specified in [9].

2729	         Informative note: Conveyance of "sprop-parameter-sets" and
2730	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2731	         may be used in topologies like Topo-Video-switch-MCU [29] to
2732	         enable out-of-band transport of parameter sets.

2734	   For streams being delivered over multicast, the following rules
2735	   apply:

2737	   o  The media format configuration is identified by the same
2738	      parameters as above for unicast (i.e. "profile-level-id" and
2739	      "packetization-mode", when present).  These media format
2740	      configuration parameters (including the level part of "profile-
2741	      level-id") MUST be used symmetrically; i.e., the answerer MUST
2742	      either maintain all configuration parameters or remove the media
2743	      format (payload type) completely.  Note that this implies that the
2744	      level part of "profile-level-id" for Offer/Answer in multicast is
2745	      not downgradable.

2747	      To simplify handling and matching of these configurations, the
2748	      same RTP payload type number used in the offer SHOULD also be
2749	      used in the answer, as specified in [8].  An answer MUST NOT
2750	      contain a payload type number used in the offer unless the
2751	      configuration is the same as in the offer.

2753	   o  Parameter sets received MUST be associated with the originating
2754	      source, and MUST be only used in decoding the incoming NAL unit
2755	      stream from the same source.

2757	   o  The rules for other parameters are the same as above for unicast.

2759	   Table 6 lists the interpretation of all the 20 media type parameters
2760	   that MUST be used for the different direction attributes.

2762	       Table 6. Interpretation of parameters for different direction
2763	                                attributes.

2765	                                              sendonly --+
2766	                                           recvonly --+  |
2767	                                        sendrecv --+  |  |
2768	                                                   |  |  |
2769	                profile-level-id                   C  C  P
2770	                packetization-mode                 C  C  P
2771	                sprop-deint-buf-req                P  -  P
2772	                sprop-interleaving-depth           P  -  P
2773	                sprop-max-don-diff                 P  -  P
2774	                sprop-init-buf-time                P  -  P
2775	                max-mbps                           R  R  -
2776	                max-smbps                          R  R  -
2777	                max-fs                             R  R  -
2778	                max-cpb                            R  R  -
2779	                max-dpb                            R  R  -
2780	                max-br                             R  R  -
2781	                redundant-pic-cap                  R  R  -
2782	                deint-buf-cap                      R  R  -
2783	                max-rcmd-nalu-size                 R  R  -
2784	                sar-understood                     R  R  -
2785	                sar-supported                      R  R  -
2786	                use-level-src-parameter-sets       R  R  -
2787	                sprop-parameter-sets               S  -  S
2788	                sprop-level-parameter-sets         S  -  S

2790	             Legend:

2792	             C: configuration for sending and receiving streams
2793	             P: properties of the stream to be sent
2794	             R: receiver capabilities
2795	             S: out-of-band parameter sets
2796	             -: not usable, when present SHOULD be ignored

2798	   Parameters used for declaring receiver capabilities are in general
2799	   downgradable; i.e., they express the upper limit for a sender's
2800	   possible behavior.  Thus a sender MAY select to set its encoder using
2801	   only lower/less or equal values of these parameters.

2803	   Parameters declaring a configuration point are not downgradable, with
2804	   the exception of the level part of the "profile-level-id" parameter
2805	   for unicast usage.  This expresses values a receiver expects to be
2806	   used and must be used verbatim on the sender side.

2808	   When a sender's capabilities are declared, and non-downgradable
2809	   parameters are used in this declaration, then these parameters
2810	   express a configuration that is acceptable for the sender to receive
2811	   streams.  In order to achieve high interoperability levels, it is
2812	   often advisable to offer multiple alternative configurations; e.g.,
2813	   for the packetization mode.  It is impossible to offer multiple
2814	   configurations in a single payload type.  Thus, when multiple
2815	   configuration offers are made, each offer requires its own RTP
2816	   payload type associated with the offer.

2818	   A receiver SHOULD understand all media type parameters, even if it
2819	   only supports a subset of the payload format's functionality.  This
2820	   ensures that a receiver is capable of understanding when an offer to
2821	   receive media can be downgraded to what is supported by the receiver
2822	   of the offer.

2824	   An answerer MAY extend the offer with additional media format
2825	   configurations.  However, to enable their usage, in most cases a
2826	   second offer is required from the offerer to provide the stream
2827	   property parameters that the media sender will use.  This also has
2828	   the effect that the offerer has to be able to receive this media
2829	   format configuration, not only to send it.

2831	   If an offerer wishes to have non-symmetric capabilities between
2832	   sending and receiving, the offerer should offer different RTP
2833	   sessions; i.e., different media lines declared as "recvonly" and
2834	   "sendonly", respectively.  This may have further implications on the
2835	   system.

2837	8.2.3. Usage in Declarative Session Descriptions

2839	   When H.264 over RTP is offered with SDP in a declarative style, as in
2840	   RTSP [27] or SAP [28], the following considerations are necessary.

2842	   o  All parameters capable of indicating both stream properties and
2843	      receiver capabilities are used to indicate only stream properties.
2844	      For example, in this case, the parameter "profile-level-id"
2845	      declares only the values used by the stream, not the capabilities
2846	      for receiving streams.  This results in that the following
2847	      interpretation of the parameters MUST be used:

2849	      Declaring actual configuration or stream properties:

2851	         - profile-level-id
2852	         - packetization-mode
2853	         - sprop-interleaving-depth
2854	         - sprop-deint-buf-req
2855	         - sprop-max-don-diff
2856	         - sprop-init-buf-time

2858	      Out-of-band transporting of parameter sets:

2860	         - sprop-parameter-sets
2861	         - sprop-level-parameter-sets

2863	      Not usable(when present, they SHOULD be ignored):

2865	         - max-mbps
2866	         - max-smbps
2867	         - max-fs
2868	         - max-cpb
2869	         - max-dpb
2870	         - max-br
2871	         - redundant-pic-cap
2872	         - max-rcmd-nalu-size
2873	         - deint-buf-cap
2874	         - sar-understood
2875	         - sar-supported
2876	         - use-level-src-parameter-sets

2878	   o  A receiver of the SDP is required to support all parameters and
2879	      values of the parameters provided; otherwise, the receiver MUST
2880	      reject (RTSP) or not participate in (SAP) the session.  It falls
2881	      on the creator of the session to use values that are expected to
2882	      be supported by the receiving application.

2884	8.3. Examples

2886	   An SDP Offer/Answer exchange wherein both parties are expected to
2887	   both send and receive could look like the following.  Only the media
2888	   codec specific parts of the SDP are shown.  Some lines are wrapped
2889	   due to text constraints.

2891	      Offerer -> Answerer SDP message:

2893	      m=video 49170 RTP/AVP 100 99 98
2894	      a=rtpmap:98 H264/90000
2895	      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
2896	        sprop-parameter-sets=<parameter sets data#0>
2897	      a=rtpmap:99 H264/90000
2898	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2899	        sprop-parameter-sets=<parameter sets data#1>
2900	      a=rtpmap:100 H264/90000
2901	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2902	        sprop-parameter-sets=<parameter sets data#2>;
2903	        sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
2904	        sprop-init-buf-time=102478; deint-buf-cap=128000

2906	   The above offer presents the same codec configuration in three
2907	   different packetization formats.  PT 98 represents single NALU mode,
2908	   PT 99 represents non-interleaved mode, and PT 100 indicates the
2909	   interleaved mode.  In the interleaved mode case, the interleaving
2910	   parameters that the offerer would use if the answer indicates support
2911	   for PT 100 are also included.  In all three cases the parameter
2912	   "sprop-parameter-sets" conveys the initial parameter sets that are
2913	   required by the answerer when receiving a stream from the offerer
2914	   when this configuration is accepted.  Note that the value for "sprop-
2915	   parameter-sets" could be different for each payload type.

2917	      Answerer -> Offerer SDP message:

2919	      m=video 49170 RTP/AVP 100 99 97
2920	      a=rtpmap:97 H264/90000
2921	      a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
2922	        sprop-parameter-sets=<parameter sets data#3>
2923	      a=rtpmap:99 H264/90000
2924	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2925	        sprop-parameter-sets=<parameter sets data#4>;
2926	        max-rcmd-nalu-size=3980
2927	      a=rtpmap:100 H264/90000
2928	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2929	        sprop-parameter-sets=<parameter sets data#5>;
2930	        sprop-interleaving-depth=60;
2931	        sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
2932	        deint-buf-cap=128000; max-rcmd-nalu-size=3980

2934	   As the Offer/Answer negotiation covers both sending and receiving
2935	   streams, an offer indicates the exact parameters for what the offerer
2936	   is willing to receive, whereas the answer indicates the same for what
2937	   the answerer accepts to receive.  In this case the offerer declared
2938	   that it is willing to receive payload type 98.  The answerer accepts
2939	   this by declaring an equivalent payload type 97; i.e., it has
2940	   identical values for the two parameters "profile-level-id" and
2941	   "packetization-mode" (since "packetization-mode" is equal to 0,
2942	   "sprop-deint-buf-req" is not present).  As the offered payload type
2943	   98 is accepted, the answerer needs to store parameter sets included
2944	   in sprop-parameter-sets=<parameter sets data#0> in case the offer
2945	   finally decides to use this configuration. In the answer, the
2946	   answerer includes the parameter sets in sprop-parameter-
2947	   sets=<parameter sets data#3> that the answerer would use in the
2948	   stream sent from the answerer if this configuration is finally used.

2950	   The answerer also accepts the reception of the two configurations
2951	   that payload types 99 and 100 represent.  Again, the answerer needs
2952	   to store parameter sets included in sprop-parameter-sets=<parameter
2953	   sets data#1> and sprop-parameter-sets=<parameter sets data#2> in case
2954	   the offer finally decides to use either of these two configurations.
2955	   The answerer provides the initial parameter sets for the answerer-to-
2956	   offerer direction, i.e. the parameter sets in sprop-parameter-
2957	   sets=<parameter sets data#4> and sprop-parameter-sets=<parameter sets
2958	   data#5>, for payload types 99 and 100, respectively, that it will use
2959	   to send the payload types.  The answerer also provides the offerer
2960	   with its memory limit for de-interleaving operations by providing a
2961	   "deint-buf-cap" parameter.  This is only useful if the offerer
2962	   decides on making a second offer, where it can take the new value
2963	   into account.  The "max-rcmd-nalu-size" indicates that the answerer
2964	   can efficiently process NALUs up to the size of 3980 bytes.  However,
2965	   there is no guarantee that the network supports this size.

2967	   In the following example, the offer is accepted without level
2968	   downgrading (i.e. the default level, 3.0, is accepted), and both
2969	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
2970	   in the offer.  The answerer must ignore sprop-level-parameter-
2971	   sets=<parameter sets data#1> and store parameter sets in sprop-
2972	   parameter-sets=<parameter sets data#0> for decoding the incoming NAL
2973	   unit stream.  The offerer must store the parameter sets in sprop-
2974	   parameter-sets=<parameter sets data#2> in the answer for decoding the
2975	   incoming NAL unit stream.  Note that in this example, parameter sets
2976	   in sprop-parameter-sets=<parameter sets data#2> must be associated
2977	   with level 3.0.

2979	      Offer SDP:

2981	      m=video 49170 RTP/AVP 98
2982	      a=rtpmap:98 H264/90000
2983	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
2984	        packetization-mode=1;
2985	        sprop-parameter-sets=<parameter sets data#0>;
2986	        sprop-level-parameter-sets=<parameter sets data#1>

2988	      Answer SDP:

2990	      m=video 49170 RTP/AVP 98
2991	      a=rtpmap:98 H264/90000
2992	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
2993	        packetization-mode=1;
2994	        sprop-parameter-sets=<parameter sets data#2>

2996	   In the following example, the offer (Baseline profile, level 1.1) is
2997	   accepted with level downgrading (the accepted level is 1b), and both
2998	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
2999	   in the offer.  The answerer must ignore sprop-parameter-
3000	   sets=<parameter sets data#0> and all parameter sets not for the
3001	   accepted level (level 1b) in sprop-level-parameter-sets=<parameter
3002	   sets data#1>, and must store parameter sets for the accepted level
3003	   (level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for
3004	   decoding the incoming NAL unit stream.  The offerer must store the
3005	   parameter sets in sprop-parameter-sets=<parameter sets data#2> in the
3006	   answer for decoding the incoming NAL unit stream.  Note that in this
3007	   example, parameter sets in sprop-parameter-sets=<parameter sets
3008	   data#2> must be associated with level 1b.

3010	      Offer SDP:

3012	      m=video 49170 RTP/AVP 98
3013	      a=rtpmap:98 H264/90000
3014	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3015	        packetization-mode=1;
3016	        sprop-parameter-sets=<parameter sets data#0>;
3017	        sprop-level-parameter-sets=<parameter sets data#1>

3019	      Answer SDP:

3021	      m=video 49170 RTP/AVP 98
3022	      a=rtpmap:98 H264/90000
3023	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3024	        packetization-mode=1;
3025	        sprop-parameter-sets=<parameter sets data#2>;
3026	        use-level-src-parameter-sets=1

3028	   In the following example, the offer (Baseline profile, level 1.1) is
3029	   accepted with level downgrading (the accepted level is 1b), and both
3030	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3031	   in the offer.  However, the answerer is a legacy RFC 3984
3032	   implementation and does not understand "sprop-level-parameter-sets",
3033	   hence it does not include "use-level-src-parameter-sets" (which the
3034	   answerer does not understand, either) in the answer.  Therefore, the
3035	   answerer must ignore both sprop-parameter-sets=<parameter sets
3036	   data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and
3037	   the offerer must transport parameter sets in-band.

3039	      Offer SDP:

3041	      m=video 49170 RTP/AVP 98
3042	      a=rtpmap:98 H264/90000
3043	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3044	        packetization-mode=1;
3045	        sprop-parameter-sets=<parameter sets data#0>;
3046	        sprop-level-parameter-sets=<parameter sets data#1>

3048	      Answer SDP:

3050	      m=video 49170 RTP/AVP 98
3051	      a=rtpmap:98 H264/90000
3052	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3053	        packetization-mode=1

3055	   In the following example, the offer is accepted without level
3056	   downgrading, and "sprop-parameter-sets" is present in the offer.
3057	   Parameter sets in sprop-parameter-sets=<parameter sets data#0> must
3058	   be stored and used used by the encoder of the offerer and the decoder
3059	   of the answerer, and parameter sets in sprop-parameter-
3060	   sets=<parameter sets data#1>must be used by the encoder of the
3061	   answerer and the decoder of the offerer.  Note that sprop-parameter-
3062	   sets=<parameter sets data#0> is basically independent of sprop-
3063	   parameter-sets=<parameter sets data#1>.

3065	      Offer SDP:

3067	      m=video 49170 RTP/AVP 98
3068	      a=rtpmap:98 H264/90000
3069	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3070	        packetization-mode=1;
3071	        sprop-parameter-sets=<parameter sets data#0>

3073	      Answer SDP:

3075	      m=video 49170 RTP/AVP 98
3076	      a=rtpmap:98 H264/90000
3077	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3078	        packetization-mode=1;
3079	        sprop-parameter-sets=<parameter sets data#1>

3081	   In the following example, the offer is accepted without level
3082	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3083	   parameter-sets" is present in the offer, meaning that there is no
3084	   out-of-band transmission of parameter sets, which then have to be
3085	   transported in-band.

3087	      Offer SDP:

3089	      m=video 49170 RTP/AVP 98
3090	      a=rtpmap:98 H264/90000
3091	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3092	        packetization-mode=1

3094	      Answer SDP:

3096	      m=video 49170 RTP/AVP 98
3097	      a=rtpmap:98 H264/90000
3098	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3099	        packetization-mode=1

3101	   In the following example, the offer is accepted with level
3102	   downgrading and "sprop-parameter-sets" is present in the offer.  As
3103	   sprop-parameter-sets=<parameter sets data#0> contains level_idc
3104	   indicating Level 3.0, therefore cannot be used as the answerer wants
3105	   Level 2.0 and must be ignored by the answerer, and in-band parameter
3106	   sets must be used.

3108	      Offer SDP:

3110	      m=video 49170 RTP/AVP 98
3111	      a=rtpmap:98 H264/90000
3112	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3113	        packetization-mode=1;
3114	        sprop-parameter-sets=<parameter sets data#0>

3116	      Answer SDP:

3118	      m=video 49170 RTP/AVP 98
3119	      a=rtpmap:98 H264/90000
3120	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3121	        packetization-mode=1

3123	   In the following example, the offer is also accepted with level
3124	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3125	   parameter-sets" is present in the offer, meaning that there is no
3126	   out-of-band transmission of parameter sets, which then have to be
3127	   transported in-band.

3129	      Offer SDP:

3131	      m=video 49170 RTP/AVP 98
3132	      a=rtpmap:98 H264/90000
3133	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3134	        packetization-mode=1

3136	      Answer SDP:

3138	      m=video 49170 RTP/AVP 98
3139	      a=rtpmap:98 H264/90000
3140	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3141	        packetization-mode=1

3143	   In the following example, the offerer is a Multipoint Control Unit
3144	   (MCU) in a Topo-Video-switch-MCU like topology [29], offering
3145	   parameter sets received (using out-of-band transport) from three
3146	   other participants B, C, and D, and receiving parameter sets from the
3147	   participant A, which is the answerer.  The participants are
3148	   identified by their values of CNAME, which are mapped to different
3149	   SSRC values.  The same codec configuration is used by all the four
3150	   participants.  The participant A stores and associates the parameter
3151	   sets included in <parameter sets data#B>, <parameter sets data#C>,
3152	   and <parameter sets data#D> to participants B, C, and D,
3153	   respectively, and uses <parameter sets data#B> for decoding NAL units
3154	   carried in RTP packets originated from participant B only, uses
3155	   <parameter sets data#C> for decoding NAL units carried in RTP packets
3156	   originated from participant C only, and uses <parameter sets data#D>
3157	   for decoding NAL units carried in RTP packets originated from
3158	   participant D only.

3160	      Offer SDP:

3162	      m=video 49170 RTP/AVP 98
3163	      a=ssrc:SSRC-B cname:CNAME-B
3164	      a=ssrc:SSRC-C cname:CNAME-C
3165	      a=ssrc:SSRC-D cname:CNAME-D
3166	      a=ssrc:SSRC-B fmtp:98
3167	        sprop-parameter-sets=<parameter sets data#B>
3168	      a=ssrc:SSRC-C fmtp:98
3169	        sprop-parameter-sets=<parameter sets data#C>
3170	      a=ssrc:SSRC-D fmtp:98
3171	        sprop-parameter-sets=<parameter sets data#D>
3172	      a=rtpmap:98 H264/90000
3173	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3174	        packetization-mode=1

3176	      Answer SDP:

3178	      m=video 49170 RTP/AVP 98
3179	      a=ssrc:SSRC-A cname:CNAME-A
3180	      a=ssrc:SSRC-A fmtp:98
3181	        sprop-parameter-sets=<parameter sets data#A>
3182	      a=rtpmap:98 H264/90000
3183	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3184	        packetization-mode=1

3186	8.4. Parameter Set Considerations

3188	   The H.264 parameter sets are a fundamental part of the video codec
3189	   and vital to its operation; see section 1.2.  Due to their
3190	   characteristics and their importance for the decoding process, lost
3191	   or erroneously transmitted parameter sets can hardly be concealed
3192	   locally at the receiver.  A reference to a corrupt parameter set has
3193	   normally fatal results to the decoding process.  Corruption could
3194	   occur, for example, due to the erroneous transmission or loss of a
3195	   parameter set NAL unit, but also due to the untimely transmission of
3196	   a parameter set update.  A parameter set update refers to a change of
3197	   at least one parameter in a picture parameter set or sequence
3198	   parameter set for which the picture parameter set or sequence
3199	   parameter set identifier remains unchanged.  Therefore, the following
3200	   recommendations are provided as a guideline for the implementer of
3201	   the RTP sender.

3203	   Parameter set NALUs can be transported using three different
3204	   principles:

3206	   A. Using a session control protocol (out-of-band) prior to the actual
3207	     RTP session.

3209	   B. Using a session control protocol (out-of-band) during an ongoing
3210	     RTP session.

3212	   C. Within the RTP packet stream in the payload (in-band) during an
3213	     ongoing RTP session.

3215	   It is recommended to implement principles A and B within a session
3216	   control protocol.  SIP and SDP can be used as described in the SDP
3217	   Offer/Answer model and in the previous sections of this memo.  This
3218	   section contains guidelines on how principles A and B should be
3219	   implemented within session control protocols.  It is independent of
3220	   the particular protocol used.  Principle C is supported by the RTP
3221	   payload format defined in this specification.  There are topologies
3222	   like Topo-Video-switch-MCU [29] for which the use of principle C may
3223	   be desirable.

3225	   If in-band signaling of parameter sets is used, the picture and
3226	   sequence parameter set NALUs SHOULD be transmitted in the RTP payload
3227	   using a reliable method of delivering of RTP (see below), as a loss
3228	   of a parameter set of either type will likely prevent decoding of a
3229	   considerable portion of the corresponding RTP packet stream.

3231	   If in-band signaling of parameter sets is used, the sender SHOULD
3232	   take the error characteristics into account and use mechanisms to
3233	   provide a high probability for delivering the parameter sets
3234	   correctly.  Mechanisms that increase the probability for a correct
3235	   reception include packet repetition, FEC, and retransmission.  The
3236	   use of an unreliable, out-of-band control protocol has similar
3237	   disadvantages as the in-band signaling (possible loss) and, in
3238	   addition, may also lead to difficulties in the synchronization (see
3239	   below).  Therefore, it is NOT RECOMMENDED.

3241	   Parameter sets MAY be added or updated during the lifetime of a
3242	   session using principles B and C.  It is required that parameter sets
3243	   are present at the decoder prior to the NAL units that refer to them.
3244	   Updating or adding of parameter sets can result in further problems,
3245	   and therefore the following recommendations should be considered.

3247	   - When parameter sets are added or updated, care SHOULD be taken to
3248	     ensure that any parameter set is delivered prior to its usage.
3249	     When new parameter sets are added, previously unused parameter set
3250	     identifiers are used.  It is common that no synchronization is
3251	     present between out-of-band signaling and in-band traffic.  If
3252	     out-of-band signaling is used, it is RECOMMENDED that a sender
3253	     does not start sending NALUs requiring the added or updated
3254	     parameter sets prior to acknowledgement of delivery from the
3255	     signaling protocol.

3257	   - When parameter sets are updated, the following synchronization
3258	     issue should be taken into account.  When overwriting a parameter
3259	     set at the receiver, the sender has to ensure that the parameter
3260	     set in question is not needed by any NALU present in the network
3261	     or receiver buffers.  Otherwise, decoding with a wrong parameter
3262	     set may occur.  To lessen this problem, it is RECOMMENDED either
3263	     to overwrite only those parameter sets that have not been used for
3264	     a sufficiently long time (to ensure that all related NALUs have
3265	     been consumed), or to add a new parameter set instead (which may
3266	     have negative consequences for the efficiency of the video
3267	     coding).

3269	         Informative note: In some topologies like Topo-Video-switch-
3270	         MCU [29] the origin of the whole set of parameter sets may
3271	         come from multiple sources that may use non-unique parameter
3272	         sets identifiers.  In this case an offer may overwrite an
3273	         existing parameter set if no other mechanism that enables
3274	         uniqueness of the parameter sets in the out-of-band channel
3275	         exists.

3277	   - In a multiparty session, one participant MUST associate parameter
3278	     sets coming from different sources with the source identification
3279	     whenever possible, e.g. by conveying out-of-band transported
3280	     parameter sets, as different sources typically use independent
3281	     parameter set identifier value spaces.

3283	   - Adding or modifying parameter sets by using both principles B and
3284	     C in the same RTP session may lead to inconsistencies of the
3285	     parameter sets because of the lack of synchronization between the
3286	     control and the RTP channel.  Therefore, principles B and C MUST
3287	     NOT both be used in the same session unless sufficient
3288	     synchronization can be provided.

3290	   In some scenarios (e.g., when only the subset of this payload format
3291	   specification corresponding to H.241 is used) or topologies, it is
3292	   not possible to employ out-of-band parameter set transmission.  In
3293	   this case, parameter sets have to be transmitted in-band.  Here, the
3294	   synchronization with the non-parameter-set-data in the bitstream is
3295	   implicit, but the possibility of a loss has to be taken into account.
3296	   The loss probability should be reduced using the mechanisms discussed
3297	   above.  In case a loss of a parameter set is detected, recovery may
3298	   be achieved by using a Decoder Refresh Point procedure, for example,
3299	   using RTCP feedback Full Intra Request (FIR) [30].  Two example
3300	   Decoder Refresh Point procedures are provided in the informative
3301	   Section 8.5.

3303	   - When parameter sets are initially provided using principle A and
3304	     then later added or updated in-band (principle C), there is a risk
3305	     associated with updating the parameter sets delivered out-of-band.
3306	     If receivers miss some in-band updates (for example, because of a
3307	     loss or a late tune-in), those receivers attempt to decode the
3308	     bitstream using out-dated parameters.  It is therefore RECOMMENDED
3309	     that parameter set IDs be partitioned between the out-of-band and
3310	     in-band parameter sets.

3312	8.5. Decoder Refresh Point Procedure using In-Band Transport of
3313	   Parameter Sets (Informative)

3315	   When a sender with a video encoder according to [1] receives a
3316	   request for a decoder refresh point, the encoder shall enter the fast
3317	   update mode by using one of the procedures specified in Section 8.5.1
3318	   or 8.5.2 below.  The procedure in 8.5.1 is the preferred response in
3319	   a lossless transmission environment.  Both procedures satisfy the
3320	   requirement to enter the fast update mode for H.264 video encoding.

3322	8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point

3324	   This section gives one possible way to respond to a request for a
3325	   decoder refresh point.

3327	   The encoder shall, in the order presented here:

3329	   1) Immediately prepare to send an IDR picture.

3331	   2) Send a sequence parameter set to be used by the IDR picture to be
3332	     sent. The encoder may optionally also send other sequence
3333	     parameter sets.

3335	   3) Send a picture parameter set to be used by the IDR picture to be
3336	     sent. The encoder may optionally also send other picture parameter
3337	     sets.

3339	   4) Send the IDR picture.

3341	   5) From this point forward in time, send any other sequence or
3342	     picture parameter sets that have not yet been sent in this
3343	     procedure, prior to their reference by any NAL unit, regardless of
3344	     whether such parameter sets were previously sent prior to
3345	     receiving the request for a decoder refresh point.  As needed,
3346	     such parameter sets may be sent in a batch, one at a time, or in
3347	     any combination of these two methods.  Parameter sets may be re-
3348	     sent at any time for redundancy.  Caution should be taken when
3349	     parameter set updates are present, as described above in Section
3350	     8.4.

3352	8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder
3353	   Refresh Point

3355	   This section gives another possible way to respond to a request for a
3356	   decoder refresh point.

3358	   The encoder shall, in the order presented here:

3360	   1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
3361	     [1]).

3363	   2) Repeat any sequence and picture parameter sets that were sent
3364	     before the recovery point SEI message, prior to their reference by
3365	     a NAL unit.

3367	   The encoder shall ensure that the decoder has access to all reference
3368	   pictures for inter prediction of pictures at or after the recovery
3369	   point, which is indicated by the recovery point SEI message, in
3370	   output order, assuming that the transmission from now on is error-
3371	   free.

3373	   The value of the recovery_frame_cnt syntax element in the recovery
3374	   point SEI message should be small enough to ensure a fast recovery.

3376	   As needed, such parameter sets may be re-sent in a batch, one at a
3377	   time, or in any combination of these two methods.  Parameter sets may
3378	   be re-sent at any time for redundancy.  Caution should be taken when
3379	   parameter set updates are present, as described above in Section 8.4.

3381	9. Security Considerations

3383	   RTP packets using the payload format defined in this specification
3384	   are subject to the security considerations discussed in the RTP
3385	   specification [5], and in any appropriate RTP profile (for example,
3386	   [16]).  This implies that confidentiality of the media streams is
3387	   achieved by encryption; for example, through the application of SRTP
3388	   [26].  Because the data compression used with this payload format is
3389	   applied end-to-end, any encryption needs to be performed after
3390	   compression.  A potential denial-of-service threat exists for data
3391	   encodings using compression techniques that have non-uniform
3392	   receiver-end computational load.  The attacker can inject
3393	   pathological datagrams into the stream that are complex to decode and
3394	   that cause the receiver to be overloaded.  H.264 is particularly
3395	   vulnerable to such attacks, as it is extremely simple to generate
3396	   datagrams containing NAL units that affect the decoding process of
3397	   many future NAL units.  Therefore, the usage of data origin
3398	   authentication and data integrity protection of at least the RTP
3399	   packet is RECOMMENDED; for example, with SRTP [26].

3401	   Note that the appropriate mechanism to ensure confidentiality and
3402	   integrity of RTP packets and their payloads is very dependent on the
3403	   application and on the transport and signaling protocols employed.
3404	   Thus, although SRTP is given as an example above, other possible
3405	   choices exist.

3407	   Decoders MUST exercise caution with respect to the handling of user
3408	   data SEI messages, particularly if they contain active elements, and
3409	   MUST restrict their domain of applicability to the presentation
3410	   containing the stream.

3412	   End-to-End security with either authentication, integrity or
3413	   confidentiality protection will prevent a MANE from performing media-
3414	   aware operations other than discarding complete packets.  And in the
3415	   case of confidentiality protection it will even be prevented from
3416	   performing discarding of packets in a media aware way.  To allow any
3417	   MANE to perform its operations, it will be required to be a trusted
3418	   entity which is included in the security context establishment.

3420	10. Congestion Control

3422	   Congestion control for RTP SHALL be used in accordance with RFC 3550
3423	   [5], and with any applicable RTP profile; e.g., RFC 3551 [16].  An
3424	   additional requirement if best-effort service is being used is: users
3425	   of this payload format MUST monitor packet loss to ensure that the
3426	   packet loss rate is within acceptable parameters.  Packet loss is
3427	   considered acceptable if a TCP flow across the same network path, and
3428	   experiencing the same network conditions, would achieve an average
3429	   throughput, measured on a reasonable timescale that is not less than
3430	   the RTP flow is achieving.  This condition can be satisfied by
3431	   implementing congestion control mechanisms to adapt the transmission
3432	   rate (or the number of layers subscribed for a layered multicast
3433	   session), or by arranging for a receiver to leave the session if the
3434	   loss rate is unacceptably high.

3436	   The bit rate adaptation necessary for obeying the congestion control
3437	   principle is easily achievable when real-time encoding is used.
3438	   However, when pre-encoded content is being transmitted, bandwidth
3439	   adaptation requires the availability of more than one coded
3440	   representation of the same content, at different bit rates, or the
3441	   existence of non-reference pictures or sub-sequences [22] in the
3442	   bitstream.  The switching between the different representations can
3443	   normally be performed in the same RTP session; e.g., by employing a
3444	   concept known as SI/SP slices of the Extended Profile, or by
3445	   switching streams at IDR picture boundaries.  Only when non-
3446	   downgradable parameters (such as the profile part of the
3447	   profile/level ID) are required to be changed does it become necessary
3448	   to terminate and re-start the media stream.  This may be accomplished
3449	   by using a different RTP payload type.

3451	   MANEs MAY follow the suggestions outlined in section 7.3 and remove
3452	   certain unusable packets from the packet stream when that stream was
3453	   damaged due to previous packet losses.  This can help reduce the
3454	   network load in certain special cases.

3456	11. IANA Consideration

3458	   IANA has registered one new media type; see section 8.1.

3460	12. Informative Appendix: Application Examples

3462	   This payload specification is very flexible in its use, in order to
3463	   cover the extremely wide application space anticipated for H.264.
3464	   However, this great flexibility also makes it difficult for an
3465	   implementer to decide on a reasonable packetization scheme.  Some
3466	   information on how to apply this specification to real-world
3467	   scenarios is likely to appear in the form of academic publications
3468	   and a test model software and description in the near future.
3469	   However, some preliminary usage scenarios are described here as well.

3471	12.1. Video Telephony according to ITU-T Recommendation H.241 Annex A

3473	   H.323-based video telephony systems that use H.264 as an optional
3474	   video compression scheme are required to support H.241 Annex A [3] as
3475	   a packetization scheme.  The packetization mechanism defined in this
3476	   Annex is technically identical with a small subset of this
3477	   specification.

3479	   When a system operates according to H.241 Annex A, parameter set NAL
3480	   units are sent in-band.  Only Single NAL unit packets are used.  Many
3481	   such systems are not sending IDR pictures regularly, but only when
3482	   required by user interaction or by control protocol means; e.g., when
3483	   switching between video channels in a Multipoint Control Unit or for
3484	   error recovery requested by feedback.

3486	12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
3487	   Aggregation

3489	   The RTP part of this scheme is implemented and tested (though not the
3490	   control-protocol part; see below).

3492	   In most real-world video telephony applications, picture parameters
3493	   such as picture size or optional modes never change during the
3494	   lifetime of a connection.  Therefore, all necessary parameter sets
3495	   (usually only one) are sent as a side effect of the capability
3496	   exchange/announcement process, e.g., according to the SDP syntax
3497	   specified in section 8.2 of this document.  As all necessary
3498	   parameter set information is established before the RTP session
3499	   starts, there is no need for sending any parameter set NAL units.
3500	   Slice data partitioning is not used, either.  Thus, the RTP packet
3501	   stream basically consists of NAL units that carry single coded
3502	   slices.

3504	   The encoder chooses the size of coded slice NAL units so that they
3505	   offer the best performance.  Often, this is done by adapting the
3506	   coded slice size to the MTU size of the IP network.  For small
3507	   picture sizes, this may result in a one-picture-per-one-packet
3508	   strategy.  Intra refresh algorithms clean up the loss of packets and
3509	   the resulting drift-related artifacts.

3511	12.3. Video Telephony, Interleaved Packetization Using NAL Unit
3512	   Aggregation

3514	   This scheme allows better error concealment and is used in H.263
3515	   based designs using RFC 2429 packetization [11].  It has been
3516	   implemented, and good results were reported [13].

3518	   The VCL encoder codes the source picture so that all macroblocks
3519	   (MBs) of one MB line are assigned to one slice.  All slices with even
3520	   MB row addresses are combined into one STAP, and all slices with odd
3521	   MB row addresses into another.  Those STAPs are transmitted as RTP
3522	   packets.  The establishment of the parameter sets is performed as
3523	   discussed above.

3525	   Note that the use of STAPs is essential here, as the high number of
3526	   individual slices (18 for a CIF picture) would lead to unacceptably
3527	   high IP/UDP/RTP header overhead (unless the source coding tool FMO is
3528	   used, which is not assumed in this scenario).  Furthermore, some
3529	   wireless video transmission systems, such as H.324M and the IP-based
3530	   video telephony specified in 3GPP, are likely to use relatively small
3531	   transport packet size.  For example, a typical MTU size of H.223 AL3
3532	   SDU is around 100 bytes [17].  Coding individual slices according to
3533	   this packetization scheme provides further advantage in communication
3534	   between wired and wireless networks, as individual slices are likely
3535	   to be smaller than the preferred maximum packet size of wireless
3536	   systems.  Consequently, a gateway can convert the STAPs used in a
3537	   wired network into several RTP packets with only one NAL unit, which
3538	   are preferred in a wireless network, and vice versa.

3540	12.4. Video Telephony with Data Partitioning

3542	   This scheme has been implemented and has been shown to offer good
3543	   performance, especially at higher packet loss rates [13].

3545	   Data Partitioning is known to be useful only when some form of
3546	   unequal error protection is available.  Normally, in single-session
3547	   RTP environments, even error characteristics are assumed; i.e., the
3548	   packet loss probability of all packets of the session is the same
3549	   statistically.  However, there are means to reduce the packet loss
3550	   probability of individual packets in an RTP session.  A FEC packet
3551	   according to RFC 2733 [18], for example, specifies which media
3552	   packets are associated with the FEC packet.

3554	   In all cases, the incurred overhead is substantial but is in the same
3555	   order of magnitude as the number of bits that have otherwise been
3556	   spent for intra information.  However, this mechanism does not add
3557	   any delay to the system.

3559	   Again, the complete parameter set establishment is performed through
3560	   control protocol means.

3562	12.5. Video Telephony or Streaming with FUs and Forward Error Correction

3564	   This scheme has been implemented and has been shown to provide good
3565	   performance, especially at higher packet loss rates [19].

3567	   The most efficient means to combat packet losses for scenarios where
3568	   retransmissions are not applicable is forward error correction (FEC).
3569	   Although application layer, end-to-end use of FEC is often less
3570	   efficient than an FEC-based protection of individual links
3571	   (especially when links of different characteristics are in the
3572	   transmission path), application layer, end-to-end FEC is unavoidable
3573	   in some scenarios.  RFC 5109 [18] provides means to use generic,
3574	   application layer, end-to-end FEC in packet-loss environments.  A
3575	   binary forward error correcting code is generated by applying the XOR
3576	   operation to the bits at the same bit position in different packets.
3577	   The binary code can be specified by the parameters (n,k) in which k
3578	   is the number of information packets used in the connection and n is
3579	   the total number of packets generated for k information packets;
3580	   i.e., n-k parity packets are generated for k information packets.

3582	   When a code is used with parameters (n,k) within the RFC 5109
3583	   framework, the following properties are well known:

3585	   a) If applied over one RTP packet, RFC 5109 provides only packet
3586	     repetition.

3588	   b) RFC 5109 is most bit rate efficient if XOR-connected packets have
3589	     equal length.

3591	   c) At the same packet loss probability p and for a fixed k, the
3592	     greater the value of n is, the smaller the residual error
3593	     probability becomes.  For example, for a packet loss probability
3594	     of 10%, k=1, and n=2, the residual error probability is about 1%,
3595	     whereas for n=3, the residual error probability is about 0.1%.

3597	   d) At the same packet loss probability p and for a fixed code rate
3598	     k/n, the greater the value of n is, the smaller the residual error
3599	     probability becomes.  For example, at a packet loss probability of
3600	     p=10%, k=1 and n=2, the residual error rate is about 1%, whereas
3601	     for an extended Golay code with k=12 and n=24, the residual error
3602	     rate is about 0.01%.

3604	   For applying RFC 5109 in combination with H.264 baseline coded video
3605	   without using FUs, several options might be considered:

3607	   1) The video encoder produces NAL units for which each video frame is
3608	     coded in a single slice.  Applying FEC, one could use a simple
3609	     code; e.g., (n=2, k=1).  That is, each NAL unit would basically
3610	     just be repeated.  The disadvantage is obviously the bad code
3611	     performance according to d), above, and the low flexibility, as
3612	     only (n, k=1) codes can be used.

3614	   2) The video encoder produces NAL units for which each video frame is
3615	     encoded in one or more consecutive slices.  Applying FEC, one
3616	     could use a better code, e.g., (n=24, k=12), over a sequence of
3617	     NAL units.  Depending on the number of RTP packets per frame, a
3618	     loss may introduce a significant delay, which is reduced when more
3619	     RTP packets are used per frame.  Packets of completely different
3620	     length might also be connected, which decreases bit rate
3621	     efficiency according to b), above.  However, with some care and
3622	     for slices of 1kb or larger, similar length (100-200 bytes
3623	     difference) may be produced, which will not lower the bit
3624	     efficiency catastrophically.

3626	   3) The video encoder produces NAL units, for which a certain frame
3627	     contains k slices of possibly almost equal length.  Then, applying
3628	     FEC, a better code, e.g., (n=24, k=12), can be used over the
3629	     sequence of NAL units for each frame.  The delay compared to that
3630	     of 2), above, may be reduced, but several disadvantages are
3631	     obvious.  First, the coding efficiency of the encoded video is
3632	     lowered significantly, as slice-structured coding reduces intra-
3633	     frame prediction and additional slice overhead is necessary.
3634	     Second, pre-encoded content or, when operating over a gateway, the
3635	     video is usually not appropriately coded with k slices such that
3636	     FEC can be applied.  Finally, the encoding of video producing k
3637	     slices of equal length is not straightforward and might require
3638	     more than one encoding pass.

3640	   Many of the mentioned disadvantages can be avoided by applying FUs in
3641	   combination with FEC.  Each NAL unit can be split into any number of
3642	   FUs of basically equal length; therefore, FEC with a reasonable k and
3643	   n can be applied, even if the encoder made no effort to produce
3644	   slices of equal length.  For example, a coded slice NAL unit
3645	   containing an entire frame can be split to k FUs, and a parity check
3646	   code (n=k+1, k) can be applied.  However, this has the disadvantage
3647	   that unless all created fragments can be recovered, the whole slice
3648	   will be lost.  Thus a larger section is lost than would be if the
3649	   frame had been split into several slices.

3651	   The presented technique makes it possible to achieve good
3652	   transmission error tolerance, even if no additional source coding
3653	   layer redundancy (such as periodic intra frames) is present.
3654	   Consequently, the same coded video sequence can be used to achieve
3655	   the maximum compression efficiency and quality over error-free
3656	   transmission and for transmission over error-prone networks.
3657	   Furthermore, the technique allows the application of FEC to pre-
3658	   encoded sequences without adding delay.  In this case, pre-encoded
3659	   sequences that are not encoded for error-prone networks can still be
3660	   transmitted almost reliably without adding extensive delays.  In
3661	   addition, FUs of equal length result in a bit rate efficient use of
3662	   RFC 5109.

3664	   If the error probability depends on the length of the transmitted
3665	   packet (e.g., in case of mobile transmission [15]), the benefits of
3666	   applying FUs with FEC are even more obvious.  Basically, the
3667	   flexibility of the size of FUs allows appropriate FEC to be applied
3668	   for each NAL unit and unequal error protection of NAL units.

3670	   When FUs and FEC are used, the incurred overhead is substantial but
3671	   is in the same order of magnitude as the number of bits that have to
3672	   be spent for intra-coded macroblocks if no FEC is applied.  In [19],
3673	   it was shown that the overall performance of the FEC-based approach
3674	   enhanced quality when using the same error rate and same overall bit
3675	   rate, including the overhead.

3677	12.6. Low Bit-Rate Streaming

3679	   This scheme has been implemented with H.263 and non-standard RTP
3680	   packetization and has given good results [20].  There is no technical
3681	   reason why similarly good results could not be achievable with H.264.

3683	   In today's Internet streaming, some of the offered bit rates are
3684	   relatively low in order to allow terminals with dial-up modems to
3685	   access the content.  In wired IP networks, relatively large packets,
3686	   say 500 - 1500 bytes, are preferred to smaller and more frequently
3687	   occurring packets in order to reduce network congestion.  Moreover,
3688	   use of large packets decreases the amount of RTP/UDP/IP header
3689	   overhead.  For low bit-rate video, the use of large packets means
3690	   that sometimes up to few pictures should be encapsulated in one
3691	   packet.

3693	   However, loss of a packet including many coded pictures would have
3694	   drastic consequences for visual quality, as there is practically no
3695	   other way to conceal a loss of an entire picture than to repeat the
3696	   previous one.  One way to construct relatively large packets and
3697	   maintain possibilities for successful loss concealment is to
3698	   construct MTAPs that contain interleaved slices from several
3699	   pictures.  An MTAP should not contain spatially adjacent slices from
3700	   the same picture or spatially overlapping slices from any picture.
3701	   If a packet is lost, it is likely that a lost slice is surrounded by
3702	   spatially adjacent slices of the same picture and spatially
3703	   corresponding slices of the temporally previous and succeeding
3704	   pictures.  Consequently, concealment of the lost slice is likely to
3705	   be relatively successful.

3707	12.7. Robust Packet Scheduling in Video Streaming

3709	   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
3710	   simulated in a wireless streaming environment [21].  There is no
3711	   technical reason why similar or better results could not be
3712	   achievable with H.264.

3714	   Streaming clients typically have a receiver buffer that is capable of
3715	   storing a relatively large amount of data.  Initially, when a
3716	   streaming session is established, a client does not start playing the
3717	   stream back immediately.  Rather, it typically buffers the incoming
3718	   data for a few seconds.  This buffering helps maintain continuous
3719	   playback, as, in case of occasional increased transmission delays or
3720	   network throughput drops, the client can decode and play buffered
3721	   data.  Otherwise, without initial buffering, the client has to freeze
3722	   the display, stop decoding, and wait for incoming data.  The
3723	   buffering is also necessary for either automatic or selective
3724	   retransmission in any protocol level.  If any part of a picture is
3725	   lost, a retransmission mechanism may be used to resend the lost data.
3726	   If the retransmitted data is received before its scheduled decoding
3727	   or playback time, the loss is recovered perfectly.  Coded pictures
3728	   can be ranked according to their importance in the subjective quality
3729	   of the decoded sequence.  For example, non-reference pictures, such
3730	   as conventional B pictures, are subjectively least important, as
3731	   their absence does not affect decoding of any other pictures.  In
3732	   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-
3733	   10 standard includes a temporal scalability method called sub-
3734	   sequences [22].  Subjective ranking can also be made on coded slice
3735	   data partition or slice group basis.  Coded slices and coded slice
3736	   data partitions that are subjectively the most important can be sent
3737	   earlier than their decoding order indicates, whereas coded slices and
3738	   coded slice data partitions that are subjectively the least important
3739	   can be sent later than their natural coding order indicates.
3740	   Consequently, any retransmitted parts of the most important slices
3741	   and coded slice data partitions are more likely to be received before
3742	   their scheduled decoding or playback time compared to the least
3743	   important slices and slice data partitions.

3745	13. Informative Appendix: Rationale for Decoding Order Number

3747	13.1. Introduction

3749	   The Decoding Order Number (DON) concept was introduced mainly to
3750	   enable efficient multi-picture slice interleaving (see section 12.6)
3751	   and robust packet scheduling (see section 12.7).  In both of these
3752	   applications, NAL units are transmitted out of decoding order.  DON
3753	   indicates the decoding order of NAL units and should be used in the
3754	   receiver to recover the decoding order.  Example use cases for
3755	   efficient multi-picture slice interleaving and for robust packet
3756	   scheduling are given in sections 13.2 and 13.3, respectively.
3757	   Section 13.4 describes the benefits of the DON concept in error
3758	   resiliency achieved by redundant coded pictures.  Section 13.5
3759	   summarizes considered alternatives to DON and justifies why DON was
3760	   chosen to this RTP payload specification.

3762	13.2. Example of Multi-Picture Slice Interleaving

3764	   An example of multi-picture slice interleaving follows.  A subset of
3765	   a coded video sequence is depicted below in output order.  R denotes
3766	   a reference picture, N denotes a non-reference picture, and the
3767	   number indicates a relative output time.

3769	      ... R1 N2 R3 N4 R5 ...

3771	   The decoding order of these pictures from left to right is as
3772	   follows:

3774	      ... R1 R3 N2 R5 N4 ...

3776	   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
3777	   DON equal to 1, 2, 3, 4, and 5, respectively.

3779	   Each reference picture consists of three slice groups that are
3780	   scattered as follows (a number denotes the slice group number for
3781	   each macroblock in a QCIF frame):

3783	      0 1 2 0 1 2 0 1 2 0 1
3784	      2 0 1 2 0 1 2 0 1 2 0
3785	      1 2 0 1 2 0 1 2 0 1 2
3786	      0 1 2 0 1 2 0 1 2 0 1
3787	      2 0 1 2 0 1 2 0 1 2 0
3788	      1 2 0 1 2 0 1 2 0 1 2
3789	      0 1 2 0 1 2 0 1 2 0 1
3790	      2 0 1 2 0 1 2 0 1 2 0
3791	      1 2 0 1 2 0 1 2 0 1 2

3793	   For the sake of simplicity, we assume that all the macroblocks of a
3794	   slice group are included in one slice.  Three MTAPs are constructed
3795	   from three consecutive reference pictures so that each MTAP contains
3796	   three aggregation units, each of which contains all the macroblocks
3797	   from one slice group.  The first MTAP contains slice group 0 of
3798	   picture R1, slice group 1 of picture R3, and slice group 2 of picture
3799	   R5.  The second MTAP contains slice group 1 of picture R1, slice
3800	   group 2 of picture R3, and slice group 0 of picture R5.  The third
3801	   MTAP contains slice group 2 of picture R1, slice group 0 of picture
3802	   R3, and slice group 1 of picture R5.  Each non-reference picture is
3803	   encapsulated into an STAP-B.

3805	   Consequently, the transmission order of NAL units is the following:

3807	      R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
3808	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
3809	      R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
3810	      R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
3811	      R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
3812	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
3813	      R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
3814	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
3815	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
3816	      N2, DON 3, carried in STAP-B, RTP SN: N+3
3817	      N4, DON 5, carried in STAP-B, RTP SN: N+4

3819	   The receiver is able to organize the NAL units back in decoding order
3820	   based on the value of DON associated with each NAL unit.

3822	   If one of the MTAPs is lost, the spatially adjacent and temporally
3823	   co-located macroblocks are received and can be used to conceal the
3824	   loss efficiently.  If one of the STAPs is lost, the effect of the
3825	   loss does not propagate temporally.

3827	13.3. Example of Robust Packet Scheduling

3829	   An example of robust packet scheduling follows.  The communication
3830	   system used in the example consists of the following components in
3831	   the order that the video is processed from source to sink:

3833	      o camera and capturing
3834	      o pre-encoding buffer
3835	      o encoder
3836	      o encoded picture buffer
3837	      o transmitter
3838	      o transmission channel
3839	      o receiver
3840	      o receiver buffer
3841	      o decoder
3842	      o decoded picture buffer
3843	      o display

3845	   The video communication system used in the example operates as
3846	   follows.  Note that processing of the video stream happens gradually
3847	   and at the same time in all components of the system.  The source
3848	   video sequence is shot and captured to a pre-encoding buffer.  The
3849	   pre-encoding buffer can be used to order pictures from sampling order
3850	   to encoding order or to analyze multiple uncompressed frames for bit
3851	   rate control purposes, for example.  In some cases, the pre-encoding
3852	   buffer may not exist; instead, the sampled pictures are encoded right
3853	   away.  The encoder encodes pictures from the pre-encoding buffer and
3854	   stores the output; i.e., coded pictures, to the encoded picture
3855	   buffer.  The transmitter encapsulates the coded pictures from the
3856	   encoded picture buffer to transmission packets and sends them to a
3857	   receiver through a transmission channel.  The receiver stores the
3858	   received packets to the receiver buffer.  The receiver buffering
3859	   process typically includes buffering for transmission delay jitter.
3860	   The receiver buffer can also be used to recover correct decoding
3861	   order of coded data.  The decoder reads coded data from the receiver
3862	   buffer and produces decoded pictures as output into the decoded
3863	   picture buffer.  The decoded picture buffer is used to recover the
3864	   output (or display) order of pictures.  Finally, pictures are
3865	   displayed.

3867	   In the following example figures, I denotes an IDR picture, R denotes
3868	   a reference picture, N denotes a non-reference picture, and the
3869	   number after I, R, or N indicates the sampling time relative to the
3870	   previous IDR picture in decoding order.  Values below the sequence of
3871	   pictures indicate scaled system clock timestamps.  The system clock
3872	   is initialized arbitrarily in this example, and time runs from left
3873	   to right.  Each I, R, and N picture is mapped into the same timeline
3874	   compared to the previous processing step, if any, assuming that
3875	   encoding, transmission, and decoding take no time.  Thus, events
3876	   happening at the same time are located in the same column throughout
3877	   all example figures.

3879	   A subset of a sequence of coded pictures is depicted below in
3880	   sampling order.

3882	       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
3883	       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
3884	       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...

3886	             Figure 16  Sequence of pictures in sampling order

3888	   The sampled pictures are buffered in the pre-encoding buffer to
3889	   arrange them in encoding order.  In this example, we assume that the
3890	   non-reference pictures are predicted from both the previous and the
3891	   next reference picture in output order, except for the non-reference
3892	   pictures immediately preceding an IDR picture, which are predicted
3893	   only from the previous reference picture in output order.  Thus, the
3894	   pre-encoding buffer has to contain at least two pictures, and the
3895	   buffering causes a delay of two picture intervals.  The output of the
3896	   pre-encoding buffering process and the encoding (and decoding) order
3897	   of the pictures are as follows:

3899	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3900	       ... -|---|---|---|---|---|---|---|---|- ...
3901	       ... 60  61  62  63  64  65  66  67  68  ...

3903	         Figure 17  Re-ordered pictures in the pre-encoding buffer

3905	   The encoder or the transmitter can set the value of DON for each
3906	   picture to a value of DON for the previous picture in decoding order
3907	   plus one.

3909	   For the sake of simplicity, let us assume that:

3911	   o  the frame rate of the sequence is constant,
3912	   o  each picture consists of only one slice,
3913	   o  each slice is encapsulated in a single NAL unit packet,
3914	   o  there is no transmission delay, and
3915	   o  pictures are transmitted at constant intervals (that is, 1 /
3916	   (frame rate)).

3918	   When pictures are transmitted in decoding order, they are received as
3919	   follows:

3921	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3922	       ... -|---|---|---|---|---|---|---|---|- ...
3923	       ... 60  61  62  63  64  65  66  67  68  ...

3925	              Figure 18  Received pictures in decoding order

3927	   The OPTIONAL sprop-interleaving-depth media type parameter is set to
3928	   0, as the transmission (or reception) order is identical to the
3929	   decoding order.

3931	   The decoder has to buffer for one picture interval initially in its
3932	   decoded picture buffer to organize pictures from decoding order to
3933	   output order as depicted below:

3935	        ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
3936	        ... -|---|---|---|---|---|---|---|---|- ...
3937	        ... 61  62  63  64  65  66  67  68  69  ...

3939	                          Figure 19  Output order

3941	   The amount of required initial buffering in the decoded picture
3942	   buffer can be signaled in the buffering period SEI message or with
3943	   the num_reorder_frames syntax element of H.264 video usability
3944	   information.  num_reorder_frames indicates the maximum number of
3945	   frames, complementary field pairs, or non-paired fields that precede
3946	   any frame, complementary field pair, or non-paired field in the
3947	   sequence in decoding order and that follow it in output order.  For
3948	   the sake of simplicity, we assume that num_reorder_frames is used to
3949	   indicate the initial buffer in the decoded picture buffer.  In this
3950	   example, num_reorder_frames is equal to 1.

3952	   It can be observed that if the IDR picture I00 is lost during
3953	   transmission and a retransmission request is issued when the value of
3954	   the system clock is 62, there is one picture interval of time (until
3955	   the system clock reaches timestamp 63) to receive the retransmitted
3956	   IDR picture I00.

3958	   Let us then assume that IDR pictures are transmitted two frame
3959	   intervals earlier than their decoding position; i.e., the pictures
3960	   are transmitted as follows:

3962	        ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
3963	        ... --|---|---|---|---|---|---|---|---|- ...
3964	        ...  62  63  64  65  66  67  68  69  70  ...

3966	       Figure 20  Interleaving: Early IDR pictures in sending order

3968	   The OPTIONAL sprop-interleaving-depth media type parameter is set
3969	   equal to 1 according to its definition.  (The value of sprop-
3970	   interleaving-depth in this example can be derived as follows: Picture
3971	   I00 is the only picture preceding picture N58 or N59 in transmission
3972	   order and following it in decoding order.  Except for pictures I00,
3973	   N58, and N59, the transmission order is the same as the decoding
3974	   order of pictures.  As a coded picture is encapsulated into exactly
3975	   one NAL unit, the value of sprop-interleaving-depth is equal to the
3976	   maximum number of pictures preceding any picture in transmission
3977	   order and following the picture in decoding order.)

3979	   The receiver buffering process contains two pictures at a time
3980	   according to the value of the sprop-interleaving-depth parameter and
3981	   orders pictures from the reception order to the correct decoding
3982	   order based on the value of DON associated with each picture.  The
3983	   output of the receiver buffering process is as follows:

3985	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3986	       ... -|---|---|---|---|---|---|---|---|- ...
3987	       ... 63  64  65  66  67  68  69  70  71  ...

3989	                 Figure 21  Interleaving: Receiver buffer

3991	   Again, an initial buffering delay of one picture interval is needed
3992	   to organize pictures from decoding order to output order, as depicted
3993	   below:

3995	        ... N58 N59 I00 N01 N02 R03 N04 N05 ...
3996	        ... -|---|---|---|---|---|---|---|- ...
3997	        ... 64  65  66  67  68  69  70  71  ...

3999	         Figure 22  Interleaving: Receiver buffer after reordering

4001	   Note that the maximum delay that IDR pictures can undergo during
4002	   transmission, including possible application, transport, or link
4003	   layer retransmission, is equal to three picture intervals.  Thus, the
4004	   loss resiliency of IDR pictures is improved in systems supporting
4005	   retransmission compared to the case in which pictures were
4006	   transmitted in their decoding order.

4008	13.4. Robust Transmission Scheduling of Redundant Coded Slices

4010	   A redundant coded picture is a coded representation of a picture or a
4011	   part of a picture that is not used in the decoding process if the
4012	   corresponding primary coded picture is correctly decoded.  There
4013	   should be no noticeable difference between any area of the decoded
4014	   primary picture and a corresponding area that would result from
4015	   application of the H.264 decoding process for any redundant picture
4016	   in the same access unit.  A redundant coded slice is a coded slice
4017	   that is a part of a redundant coded picture.

4019	   Redundant coded pictures can be used to provide unequal error
4020	   protection in error-prone video transmission.  If a primary coded
4021	   representation of a picture is decoded incorrectly, a corresponding
4022	   redundant coded picture can be decoded.  Examples of applications and
4023	   coding techniques using the redundant codec picture feature include
4024	   the video redundancy coding [23] and the protection of "key pictures"
4025	   in multicast streaming [24].

4027	   One property of many error-prone video communications systems is that
4028	   transmission errors are often bursty.  Therefore, they may affect
4029	   more than one consecutive transmission packets in transmission order.
4030	   In low bit-rate video communication, it is relatively common that an
4031	   entire coded picture can be encapsulated into one transmission
4032	   packet.  Consequently, a primary coded picture and the corresponding
4033	   redundant coded pictures may be transmitted in consecutive packets in
4034	   transmission order.  To make the transmission scheme more tolerant of
4035	   bursty transmission errors, it is beneficial to transmit the primary
4036	   coded picture and redundant coded picture separated by more than a
4037	   single packet.  The DON concept enables this.

4039	13.5. Remarks on Other Design Possibilities

4041	   The slice header syntax structure of the H.264 coding standard
4042	   contains the frame_num syntax element that can indicate the decoding
4043	   order of coded frames.  However, the usage of the frame_num syntax
4044	   element is not feasible or desirable to recover the decoding order,
4045	   due to the following reasons:

4047	   o  The receiver is required to parse at least one slice header per
4048	      coded picture (before passing the coded data to the decoder).

4050	   o  Coded slices from multiple coded video sequences cannot be
4051	      interleaved, as the frame number syntax element is reset to 0 in
4052	      each IDR picture.

4054	   o  The coded fields of a complementary field pair share the same
4055	      value of the frame_num syntax element.  Thus, the decoding order
4056	      of the coded fields of a complementary field pair cannot be
4057	      recovered based on the frame_num syntax element or any other
4058	      syntax element of the H.264 coding syntax.

4060	   The RTP payload format for transport of MPEG-4 elementary streams
4061	   [25] enables interleaving of access units and transmission of
4062	   multiple access units in the same RTP packet.  An access unit is
4063	   specified in the H.264 coding standard to comprise all NAL units
4064	   associated with a primary coded picture according to subclause
4065	   7.4.1.2 of [1].  Consequently, slices of different pictures cannot be
4066	   interleaved, and the multi-picture slice interleaving technique (see
4067	   section 12.6) for improved error resilience cannot be used.

4069	14. Acknowledgements

4071	   Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
4072	   Westerlund, and David Singer are thanked as the authors of RFC 3984.
4073	   Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
4074	   Joerg Ott, and Colin Perkins are thanked for careful review during
4075	   the development of RFC 3984. Randell Jesup, Stephen Botzko, Magnus
4076	   Westerlund, Alex Eleftheriadis, and Thomas Schierl are thanked for
4077	   their valuable comments and inputs during the development of this
4078	   memo.

4080	   This document was prepared using 2-Word-v2.0.template.dot.

4082	15. References

4084	15.1. Normative References

4086	   [1]   ITU-T Recommendation H.264, "Advanced video coding for generic
4087	         audiovisual services", November 2007.

4089	   [2]   ISO/IEC International Standard 14496-10:2008.

4091	   [3]   ITU-T Recommendation H.241, "Extended video procedures and
4092	         control signals for H.300 series terminals", May 2006.

4094	   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
4095	         Levels", BCP 14, RFC 2119, March 1997.

4097	   [5]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
4098	         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
4099	         RFC 3550, July 2003.

4101	   [6]   Handley, M. and V. Jacobson, "SDP: Session Description
4102	         Protocol", RFC 2327, April 1998.

4104	   [7]   Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
4105	         RFC 3548, July 2003.

4107	   [8]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
4108	         Session Description Protocol (SDP)", RFC 3264, June 2002.

4110	   [9]   Lennox, J., Ott, J., and Schierl, T., "Source-Specific Media
4111	         Attributes in the Session Description Protocol", draft-ietf-
4112	         mmusic-sdp-source-attributes-02 (work in progress), October
4113	         2008.

4115	15.2. Informative References

4117	   [10]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special
4118	         Issue on H.264/AVC. IEEE Transactions on Circuits and Systems
4119	         on Video Technology, July 2003.

4121	   [11]  Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
4122	         Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
4123	         Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
4124	         (H.263+)", RFC 2429, October 1998.

4126	   [12]  ISO/IEC IS 14496-2.

4128	   [13]  Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
4129	         Systems for Video technology, Vol. 13, No. 7, July 2003.

4131	   [14]  Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
4132	         Proceedings Packet Video Workshop 02, April 2002.

4134	   [15]  Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
4135	         Coding Network Abstraction Layer and IP-based Transport" in
4136	         Proc. ICIP 2002, Rochester, NY, September 2002.

4138	   [16]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
4139	         Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

4141	   [17]  ITU-T Recommendation H.223, "Multiplexing protocol for low bit
4142	         rate multimedia communication", July 2001.

4144	   [18]  Li, A., "RTP Payload Format for Generic Forward Error
4145	         Correction", RFC 5109, December 2007.

4147	   [19]  Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
4148	         "Video Coding and Transport Layer Techniques for H.264/AVC-
4149	         Based Transmission over Packet-Lossy Networks", IEEE
4150	         International Conference on Image Processing (ICIP 2003),
4151	         Barcelona, Spain, September 2003.

4153	   [20]  Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
4154	         video packetization", Packet Video Workshop 2000.

4156	   [21]  Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
4157	         wireless video streaming," International Packet Video Workshop
4158	         2002.

4160	   [22]  Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042,
4161	         available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
4162	         B042.doc, anuary 2002.

4164	   [23]  Wenger, S., "Video Redundancy Coding in H.263+", 1997
4165	         International Workshop on Audio-Visual Services over Packet
4166	         Networks, September 1997.

4168	   [24]  Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
4169	         Video Coding Using Unequally Protected Key Pictures", in Proc.
4170	         International Workshop VLBV03, September 2003.

4172	   [25]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
4173	         P. Gentric, "RTP Payload Format for Transport of MPEG-4
4174	         Elementary Streams", RFC 3640, November 2003.

4176	   [26]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
4177	         Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
4178	         3711, March 2004.

4180	   [27]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
4181	         Protocol (RTSP)", RFC 2326, April 1998.

4183	   [28]  Handley, M., Perkins, C., and E. Whelan, "Session Announcement
4184	         Protocol", RFC 2974, October 2000.

4186	   [29]  Westerlund, M. and Wenger, S., "RTP Topologies", RFC 5117,
4187	         January 2008.

4189	   [30]  Wenger, S., Chandra, U., and Westerlund, M., "Codec Control
4190	         Messages in the RTP Audio-Visual Profile with Feedback (AVPF)",
4191	         RFC 5104, February 2008.

4193	Authors' Addresses

4195	   Ye-Kui Wang
4196	   Nokia Research Center
4197	   P.O. Box 1000
4198	   33721 Tampere
4199	   Finland

4201	   Phone: +358-50-466-7004
4202	   EMail: ye-kui.wang@nokia.com

4204	   Roni Even
4205	   14 David Hamelech
4206	   Tel Aviv 64953
4207	   Israel

4209	   Phone: +972-545481099
4210	   Email:ron.even.tlv@gmail.com

4212	   Tom Kristensen
4213	   TANDBERG
4214	   Philip Pedersens vei 22
4215	   N-1366 Lysaker
4216	   Norway

4218	   Phone: +47 67125125
4219	   Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no

4221	Intellectual Property Statement

4223	   The IETF takes no position regarding the validity or scope of any
4224	   Intellectual Property Rights or other rights that might be claimed to
4225	   pertain to the implementation or use of the technology described in
4226	   this document or the extent to which any license under such rights
4227	   might or might not be available; nor does it represent that it has
4228	   made any independent effort to identify any such rights.  Information
4229	   on the procedures with respect to rights in RFC documents can be
4230	   found in BCP 78 and BCP 79.

4232	   Copies of IPR disclosures made to the IETF Secretariat and any
4233	   assurances of licenses to be made available, or the result of an
4234	   attempt made to obtain a general license or permission for the use of
4235	   such proprietary rights by implementers or users of this
4236	   specification can be obtained from the IETF on-line IPR repository at
4237	   http://www.ietf.org/ipr.

4239	   The IETF invites any interested party to bring to its attention any
4240	   copyrights, patents or patent applications, or other proprietary
4241	   rights that may cover technology that may be required to implement
4242	   this standard.  Please address the information to the IETF at
4243	   ietf-ipr@ietf.org.

4245	Disclaimer of Validity

4247	   This document and the information contained herein are provided on an
4248	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
4249	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
4250	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
4251	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
4252	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
4253	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

4255	Copyright Statement

4257	   Copyright (C) The IETF Trust (2008).

4259	   This document is subject to the rights, licenses and restrictions
4260	   contained in BCP 78, and except as set forth therein, the authors
4261	   retain all their rights.

4263	Acknowledgement

4265	   Funding for the RFC Editor function is currently provided by the
4266	   Internet Society.

4268	16. Backward Compatibility to RFC 3984

4270	   The current document is a revision of RFC 3984 and intends to
4271	   obsolete it.  This section addresses the backward compatibility
4272	   issues.

4274	   The technical changes are listed in section 17.

4276	   Items 1), 2), 3), 7), 9), 10), 12), 13) are bug-fix type of changes,
4277	   and do not incur any backward compatibility issues.

4279	   Item 4), addition of five new media type parameters, does not incur
4280	   any backward compatibility issues for SDP Offer/Answer based
4281	   applications, as legacy RFC 3984 receivers ignore these parameters,
4282	   and it is fine for legacy RFC 3984 senders not to use these
4283	   parameters as they are optional.  However, there is a backward
4284	   compatibility issue for SDP declarative usage based applications,
4285	   e.g. those using RTSP and SAP, because the SDP receiver per RFC 3984
4286	   cannot accept a session for which the SDP includes an unrecognized
4287	   parameter.  Therefore, the RTSP or SAP server may have to prepare two
4288	   sets of streams, one for legacy RFC 3984 receivers and one for
4289	   receivers according to this memo.

4291	   Items 5), 6) and 11) are related to out-of-band transport of
4292	   parameter sets.  When a sender according to this memo is
4293	   communicating with a legacy receiver according to RFC 3984, there is
4294	   no backward compatibility issue. When the legacy receiver sees an SDP
4295	   message with no parameter-add the value of parameter-add is inferred
4296	   to be equal to 1 by the legacy receiver (related to change item 5)).
4297	   As RFC 3984 allows inclusion of any parameter sets in sprop-
4298	   parameter-sets, it is fine to the legacy receiver to include
4299	   parameter sets only for the default level in sprop-parameter-sets
4300	   (related to change item 6)).  When there are new parameters e.g.
4301	   sprop-level-parameter-sets present, the legacy receiver simply
4302	   ignores them (related to change item 11)).  When a legacy sender
4303	   according to RFC 3984 is communicating with a receiver according to
4304	   this memo, there is one backward compatibility issue.  When the
4305	   legacy sender includes parameter sets for a level different than the
4306	   default level indicated by profile-level-id to sprop-parameter-sets,
4307	   the parameter value of sprop-parameter-sets is invalid to the
4308	   receiver and therefore the session may be rejected.  In SDP
4309	   Offer/Answer between a legacy offerer according to RFC 3984 and an
4310	   answerer according to this memo, when the answerer includes in the
4311	   answer parameter sets that are not a superset of the parameter sets
4312	   included in the offer, the parameter value of sprop-parameter-sets is
4313	   invalid to offerer and the session may not be initiated properly
4314	   (related to change item 11)).

4316	   Item 7), allowance of conveying sprop-parameter-sets and sprop-level-
4317	   parameter-sets using the "fmtp" source attribute as specified in
4318	   section 6.3 of [9], is similar as item 4).  It does not incur any
4319	   backward compatibility issues for SDP Offer/Answer based
4320	   applications, as legacy RFC 3984 receivers ignore the "fmtp" source
4321	   attribute, and it is fine for legacy RFC 3984 senders not to use the
4322	   "fmtp" source attribute as it is optional.  However, there is a
4323	   backward compatibility issue for SDP declarative usage based
4324	   applications, e.g. those using RTSP and SAP, because the SDP receiver
4325	   per RFC 3984 cannot accept a session for which the SDP includes an
4326	   unrecognized parameter (i.e., the "fmtp" source attribute).
4327	   Therefore, the RTSP or SAP server may have to prepare two sets of
4328	   streams, one for legacy RFC 3984 receivers and one for receivers
4329	   according to this memo.

4331	   Item 14) removed that use of out-of-band transport of parameter sets
4332	   is recommended.  As out-of-band transport of parameter sets is still
4333	   allowed, this change does not incur any backward compatibility
4334	   issues.

4336	   Item 15) does not incur any backward compatibility issues as the
4337	   added subsection 8.5 is informative.

4339	17. Changes from RFC 3984

4341	   Following is the list of technical changes (including bug fixes) from
4342	   RFC 3984.  Besides this list of technical changes, numerous editorial
4343	   changes have been made, but not documented in this memo.

4345	   1) In subsections 5.4, 5.5, 6.2, 6,3 and 6.4, removed that the
4346	     packetization mode in use may be signaled by external means.

4348	   2) In subsection 7.2.2, changed the sentence

4350	      There are N VCL NAL units in the deinterleaving buffer.

4352	      to

4354	      There are N or more VCL NAL units in the de-interleaving buffer.

4356	   3) In subsection 8.1, the semantics of sprop-init-buf-time, paragraph
4357	     2, changed the sentence

4359	      The parameter is the maximum value of (transmission time of a NAL
4360	      unit - decoding time of the NAL unit), assuming reliable and
4361	      instantaneous transmission, the same timeline for transmission
4362	      and decoding, and that decoding starts when the first packet
4363	      arrives.

4365	      to

4367	      The parameter is the maximum value of (decoding time of the NAL
4368	      unit - transmission time of a NAL unit), assuming reliable and
4369	      instantaneous transmission, the same timeline for transmission
4370	      and decoding, and that decoding starts when the first packet
4371	      arrives.

4373	   4) Added five new media type parameters, namely max-smbps, sprop-
4374	     level-parameter-sets, use-level-src-parameter-sets, sar-understood
4375	     and sar-supported.

4377	   5) In subsection 8.1, removed the specification of parameter-add.
4378	     Other descriptions of parameter-add (in subsections 8.2 and 8.4)
4379	     are also removed.

4381	   6) In subsection 8.1, added a constraint to sprop-parameter-sets such
4382	     that it can only contain parameter sets for the same profile and
4383	     level as indicated by profile-level-id.

4385	   7) In subsection 8.2.1, added that sprop-parameter-sets and sprop-
4386	     level-parameter-sets may be either included in the "a=fmtp" line
4387	     of SDP or conveyed using the "fmtp" source attribute as specified
4388	     in section 6.3 of [9].

4390	   8) In subsection 8.2.2, removed sprop-deint-buf-req from being part
4391	     of the media format configuration in usage with the SDP
4392	     Offer/Answer model.

4394	   9) In subsection 8.2.2, made it clear that level is downgradable in
4395	     the SDP Offer/Answer model, i.e. the use of the level part of
4396	     "profile-level-id" does not need to be symmetric (the level
4397	     included in the answer can be lower than or equal to the level
4398	     included in the offer).

4400	   10)In subsection 8.2.2, removed that the capability parameters may be
4401	     used to declare encoding capabilities.

4403	   11)In subsection 8.2.2, added rules on how to use sprop-parameter-
4404	     sets and sprop-level-parameter-sets for out-of-band transport of
4405	     parameter sets, with or without level downgrading.

4407	   12)In subsection 8.2.2, clarified the rules of using the media type
4408	     parameters with SDP Offer/Answer for multicast.

4410	   13)In subsection 8.2.2, completed and corrected the list of how
4411	     different media type parameters shall be interpreted in the
4412	     different combinations of offer or answer and direction attribute.

4414	   14)In subsection 8.4, changed the text such that both out-of-band and
4415	     in-band transport of parameter sets are allowed and neither is
4416	     recommended or required.

4418	   15)Added subsection 8.5 (informative) providing example methods for
4419	     decoder refresh to handle parameter set losses.