idnits 2.17.1 

draft-ietf-avt-rtp-rfc3984bis-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

  -- It seems you're using the 'non-IETF stream' Licence Notice instead


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 24 instances of too long lines in the document, the longest
     one being 1 character in excess of 72.

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.

  -- The abstract seems to indicate that this document obsoletes RFC3984, but
     the header doesn't have an 'Obsoletes:' line to match this.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
     the answerer MAY use either out-of-band or in-band transport of parameter
     sets for the stream it is sending, regardless of whether out-of-band
     parameter sets transport has been used in the offerer-to-answerer
     direction.  When the offer includes "in-band-parameter-sets" equal to 1,
     the answerer MUST not include "sprop-parameter-sets" and MUST transmit
     parameter sets in-band.  All parameter sets included in the
     "sprop-parameter-sets", when present, for the accepted payload type in an
     answer MUST be associated with the accepted level, as indicated by the
     profile-level-id in the answer for the accepted payload type.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 6, 2009) is 5529 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '4' is defined on line 4116, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648)

  -- Obsolete informational reference (is this intentional?): RFC 2429 (ref.
     '11') (Obsoleted by RFC 4629)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '27') (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117 (ref.
     '29') (Obsoleted by RFC 7667)


     Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport WG                                     Y.-K. Wang
2	Internet Draft                                      Huawei Technologies
3	Intended status: Standards track                                R. Even
4	Expires: September 2009                                   Self-employed
5	                                                          T. Kristensen
6	                                                               Tandberg
7	                                                          March 6, 2009

9	                    RTP Payload Format for H.264 Video
10	                   draft-ietf-avt-rtp-rfc3984bis-04.txt

12	Status of this Memo

14	   This Internet-Draft is submitted to IETF in full conformance with the
15	   provisions of BCP 78 and BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt.

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	   This Internet-Draft will expire on September 6, 2009.

34	Copyright Notice

36	   Copyright (c) 2009 IETF Trust and the persons identified as the
37	   document authors.  All rights reserved.

39	   This document is subject to BCP 78 and the IETF Trust's Legal
40	   Provisions Relating to IETF Documents
41	   (http://trustee.ietf.org/license-info) in effect on the date of
42	   publication of this document.  Please review these documents
43	   carefully, as they describe your rights and restrictions with respect
44	   to this document.

46	Abstract

48	   This memo describes an RTP Payload format for the ITU-T
49	   Recommendation H.264 video codec and the technically identical
50	   ISO/IEC International Standard 14496-10 video codec, excluding the
51	   Scalable Video Coding (SVC) extension and the Multivew Video Coding
52	   extension, for which the RTP payload formats are defined elsewhere.
53	   The RTP payload format allows for packetization of one or more
54	   Network Abstraction Layer Units (NALUs), produced by an H.264 video
55	   encoder, in each RTP payload.  The payload format has wide
56	   applicability, as it supports applications from simple low bit-rate
57	   conversational usage, to Internet video streaming with interleaved
58	   transmission, to high bit-rate video-on-demand.

60	   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
61	   in section 18.  Issues on backward compatibility to RFC 3984 are
62	   discussed in section 17.

64	Table of Contents

66	   1. Introduction...................................................4
67	      1.1. The H.264 Codec...........................................4
68	      1.2. Parameter Set Concept.....................................5
69	      1.3. Network Abstraction Layer Unit Types......................6
70	   2. Conventions....................................................7
71	   3. Scope..........................................................7
72	   4. Definitions and Abbreviations..................................7
73	      4.1. Definitions...............................................7
74	      4.2. Abbreviations.............................................9
75	   5. RTP Payload Format............................................10
76	      5.1. RTP Header Usage.........................................10
77	      5.2. Payload Structures.......................................12
78	      5.3. NAL Unit Header Usage....................................14
79	      5.4. Packetization Modes......................................16
80	      5.5. Decoding Order Number (DON)..............................17
81	      5.6. Single NAL Unit Packet...................................20
82	      5.7. Aggregation Packets......................................21
83	         5.7.1. Single-Time Aggregation Packet......................23
84	         5.7.2. Multi-Time Aggregation Packets (MTAPs)..............25
85	         5.7.3. Fragmentation Units (FUs)...........................29
86	   6. Packetization Rules...........................................33
87	      6.1. Common Packetization Rules...............................33
88	      6.2. Single NAL Unit Mode.....................................34
89	      6.3. Non-Interleaved Mode.....................................34
90	      6.4. Interleaved Mode.........................................34
91	   7. De-Packetization Process......................................35
92	      7.1. Single NAL Unit and Non-Interleaved Mode.................35
93	      7.2. Interleaved Mode.........................................35
94	         7.2.1. Size of the De-interleaving Buffer..................36
95	         7.2.2. De-interleaving Process.............................36
96	      7.3. Additional De-Packetization Guidelines...................38
97	   8. Payload Format Parameters.....................................39
98	      8.1. Media Type Registration..................................39
99	      8.2. SDP Parameters...........................................56
100	         8.2.1. Mapping of Payload Type Parameters to SDP...........56
101	         8.2.2. Usage with the SDP Offer/Answer Model...............57
102	         8.2.3. Usage in Declarative Session Descriptions...........64
103	      8.3. Examples.................................................65
104	      8.4. Parameter Set Considerations.............................72
105	      8.5. Decoder Refresh Point Procedure using In-Band Transport of
106	      Parameter Sets (Informative)..................................74
107	         8.5.1. IDR Procedure to Respond to a Request for a Decoder
108	         Refresh Point..............................................75
109	         8.5.2. Gradual Recovery Procedure to Respond to a Request for a
110	         Decoder Refresh Point......................................75
111	   9. Security Considerations.......................................76
112	   10. Congestion Control...........................................77
113	   11. IANA Consideration...........................................77
114	   12. Informative Appendix: Application Examples...................78
115	      12.1. Video Telephony according to ITU-T Recommendation H.241
116	      Annex A.......................................................78
117	      12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
118	      Aggregation...................................................78
119	      12.3. Video Telephony, Interleaved Packetization Using NAL Unit
120	      Aggregation...................................................79
121	      12.4. Video Telephony with Data Partitioning..................79
122	      12.5. Video Telephony or Streaming with FUs and Forward Error
123	      Correction....................................................80
124	      12.6. Low Bit-Rate Streaming..................................82
125	      12.7. Robust Packet Scheduling in Video Streaming.............83
126	   13. Informative Appendix: Rationale for Decoding Order Number....84
127	      13.1. Introduction............................................84
128	      13.2. Example of Multi-Picture Slice Interleaving.............84
129	      13.3. Example of Robust Packet Scheduling.....................86
130	      13.4. Robust Transmission Scheduling of Redundant Coded Slices89
131	      13.5. Remarks on Other Design Possibilities...................90
132	   14. Acknowledgements.............................................91
133	   15. References...................................................91
134	      15.1. Normative References....................................91
135	      15.2. Informative References..................................92
136	   16. Authors' Addresses...........................................94
137	   17. Backward Compatibility to RFC 3984...........................94
138	   18. Changes from RFC 3984........................................96

140	1. Introduction

142	   This memo specifies an RTP payload specification for the video coding
143	   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
144	   International Standard 14496 Part 10 [2] (both also known as Advanced
145	   Video Coding, or AVC).  In this memo the name H.264 is used for the
146	   codec and the standard, but the memo is equally applicable to the
147	   ISO/IEC counterpart of the coding standard.

149	   This memo obsoletes RFC 3984.  Changes from RFC 3984 are summarized
150	   in section 18.   Issues on backward compatibility to RFC 3984 are
151	   discussed in section 17.

153	1.1. The H.264 Codec

155	   The H.264 video codec has a very broad application range that covers
156	   all forms of digital compressed video, from low bit-rate Internet
157	   streaming applications to HDTV broadcast and Digital Cinema
158	   applications with nearly lossless coding.  Compared to the current
159	   state of technology, the overall performance of H.264 is such that
160	   bit rate savings of 50% or more are reported.  Digital Satellite TV
161	   quality, for example, was reported to be achievable at 1.5 Mbit/s,
162	   compared to the current operation point of MPEG 2 video at around 3.5
163	   Mbit/s [10].

165	   The codec specification [1] itself distinguishes conceptually between
166	   a video coding layer (VCL) and a network abstraction layer (NAL).
167	   The VCL contains the signal processing functionality of the codec;
168	   mechanisms such as transform, quantization, and motion compensated
169	   prediction; and a loop filter.  It follows the general concept of
170	   most of today's video codecs, a macroblock-based coder that uses
171	   inter picture prediction with motion compensation and transform
172	   coding of the residual signal.  The VCL encoder outputs slices: a bit
173	   string that contains the macroblock data of an integer number of
174	   macroblocks, and the information of the slice header (containing the
175	   spatial address of the first macroblock in the slice, the initial
176	   quantization parameter, and similar information).  Macroblocks in
177	   slices are arranged in scan order unless a different macroblock
178	   allocation is specified, by using the so-called Flexible Macroblock
179	   Ordering syntax.  In-picture prediction is used only within a slice.
180	   More information is provided in [10].

182	   The Network Abstraction Layer (NAL) encoder encapsulates the slice
183	   output of the VCL encoder into Network Abstraction Layer Units (NAL
184	   units), which are suitable for transmission over packet networks or
185	   use in packet oriented multiplex environments.  Annex B of H.264
186	   defines an encapsulation process to transmit such NAL units over
187	   byte-stream oriented networks.  In the scope of this memo, Annex B is
188	   not relevant.

190	   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
191	   byte header and the payload byte string.  The header indicates the
192	   type of the NAL unit, the (potential) presence of bit errors or
193	   syntax violations in the NAL unit payload, and information regarding
194	   the relative importance of the NAL unit for the decoding process.
195	   This RTP payload specification is designed to be unaware of the bit
196	   string in the NAL unit payload.

198	   One of the main properties of H.264 is the complete decoupling of the
199	   transmission time, the decoding time, and the sampling or
200	   presentation time of slices and pictures.  The decoding process
201	   specified in H.264 is unaware of time, and the H.264 syntax does not
202	   carry information such as the number of skipped frames (as is common
203	   in the form of the Temporal Reference in earlier video compression
204	   standards).  Also, there are NAL units that affect many pictures and
205	   that are, therefore, inherently timeless.  For this reason, the
206	   handling of the RTP timestamp requires some special considerations
207	   for NAL units for which the sampling or presentation time is not
208	   defined or, at transmission time, unknown.

210	1.2. Parameter Set Concept

212	   One very fundamental design concept of H.264 is to generate self-
213	   contained packets, to make mechanisms such as the header duplication
214	   of RFC 2429 [11] or MPEG-4's Header Extension Code (HEC) [12]
215	   unnecessary.  This was achieved by decoupling information relevant to
216	   more than one slice from the media stream.  This higher layer meta
217	   information should be sent reliably, asynchronously, and in advance
218	   from the RTP packet stream that contains the slice packets.
219	   (Provisions for sending this information in-band are also available
220	   for applications that do not have an out-of-band transport channel
221	   appropriate for the purpose.)  The combination of the higher-level
222	   parameters is called a parameter set.  The H.264 specification
223	   includes two types of parameter sets: sequence parameter set and
224	   picture parameter set.  An active sequence parameter set remains
225	   unchanged throughout a coded video sequence, and an active picture
226	   parameter set remains unchanged within a coded picture.  The sequence
227	   and picture parameter set structures contain information such as
228	   picture size, optional coding modes employed, and macroblock to slice
229	   group map.

231	   To be able to change picture parameters (such as the picture size)
232	   without having to transmit parameter set updates synchronously to the
233	   slice packet stream, the encoder and decoder can maintain a list of
234	   more than one sequence and picture parameter set.  Each slice header
235	   contains a codeword that indicates the sequence and picture parameter
236	   set to be used.

238	   This mechanism allows the decoupling of the transmission of parameter
239	   sets from the packet stream, and the transmission of them by external
240	   means (e.g., as a side effect of the capability exchange), or through
241	   a (reliable or unreliable) control protocol.  It may even be possible
242	   that they are never transmitted but are fixed by an application
243	   design specification.

245	1.3. Network Abstraction Layer Unit Types

247	   Tutorial information on the NAL design can be found in [13], [14],
248	   and [15].

250	   All NAL units consist of a single NAL unit type octet, which also co-
251	   serves as the payload header of this RTP payload format.  The payload
252	   of a NAL unit follows immediately.

254	   The syntax and semantics of the NAL unit type octet are specified in
255	   [1], but the essential properties of the NAL unit type octet are
256	   summarized below.  The NAL unit type octet has the following format:

258	      +---------------+
259	      |0|1|2|3|4|5|6|7|
260	      +-+-+-+-+-+-+-+-+
261	      |F|NRI|  Type   |
262	      +---------------+

264	   The semantics of the components of the NAL unit type octet, as
265	   specified in the H.264 specification, are described briefly below.

267	   F: 1 bit
268	      forbidden_zero_bit.  The H.264 specification declares a value of
269	      1 as a syntax violation.

271	   NRI: 2 bits
272	      nal_ref_idc.  A value of 00 indicates that the content of the NAL
273	      unit is not used to reconstruct reference pictures for inter
274	      picture prediction.  Such NAL units can be discarded without
275	      risking the integrity of the reference pictures.  Values greater
276	      than 00 indicate that the decoding of the NAL unit is required to
277	      maintain the integrity of the reference pictures.

279	   Type: 5 bits
280	      nal_unit_type.  This component specifies the NAL unit payload
281	      type as defined in Table 7-1 of [1], and later within this memo.
282	      For a reference of all currently defined NAL unit types and their
283	      semantics, please refer to section 7.4.1 in [1].

285	   This memo introduces new NAL unit types, which are presented in
286	   section 5.2.  The NAL unit types defined in this memo are marked as
287	   unspecified in [1].  Moreover, this specification extends the
288	   semantics of F and NRI as described in section 5.3.

290	2. Conventions

292	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
293	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
294	   document are to be interpreted as described in RFC-2119 [3].

296	   This specification uses the notion of setting and clearing a bit when
297	   bit fields are handled.  Setting a bit is the same as assigning that
298	   bit the value of 1 (On).  Clearing a bit is the same as assigning
299	   that bit the value of 0 (Off).

301	3. Scope

303	   This payload specification can only be used to carry the "naked"
304	   H.264 NAL unit stream over RTP, and not the bitstream format
305	   discussed in Annex B of H.264.  Likely, the first applications of
306	   this specification will be in the conversational multimedia field,
307	   video telephony or video conferencing, but the payload format also
308	   covers other applications, such as Internet streaming and TV over IP.

310	4. Definitions and Abbreviations

312	4.1. Definitions

314	   This document uses the definitions of [1].  The following terms,
315	   defined in [1], are summed up for convenience:

317	      access unit: A set of NAL units always containing a primary coded
318	      picture.  In addition to the primary coded picture, an access
319	      unit may also contain one or more redundant coded pictures or
320	      other NAL units not containing slices or slice data partitions of
321	      a coded picture.  The decoding of an access unit always results
322	      in a decoded picture.

324	      coded video sequence: A sequence of access units that consists,
325	      in decoding order, of an instantaneous decoding refresh (IDR)
326	      access unit followed by zero or more non-IDR access units
327	      including all subsequent access units up to but not including any
328	      subsequent IDR access unit.

330	      IDR access unit: An access unit in which the primary coded
331	      picture is an IDR picture.

333	      IDR picture: A coded picture containing only slices with I or SI
334	      slice types that causes a "reset" in the decoding process.  After
335	      the decoding of an IDR picture, all following coded pictures in
336	      decoding order can be decoded without inter prediction from any
337	      picture decoded prior to the IDR picture.

339	      primary coded picture: The coded representation of a picture to
340	      be used by the decoding process for a bitstream conforming to
341	      H.264.  The primary coded picture contains all macroblocks of the
342	      picture.

344	      redundant coded picture: A coded representation of a picture or a
345	      part of a picture.  The content of a redundant coded picture
346	      shall not be used by the decoding process for a bitstream
347	      conforming to H.264.  The content of a redundant coded picture
348	      may be used by the decoding process for a bitstream that contains
349	      errors or losses.

351	      VCL NAL unit: A collective term used to refer to coded slice and
352	      coded data partition NAL units.

354	   In addition, the following definitions apply:

356	      decoding order number (DON): A field in the payload structure or
357	      a derived variable indicating NAL unit decoding order.  Values of
358	      DON are in the range of 0 to 65535, inclusive.  After reaching
359	      the maximum value, the value of DON wraps around to 0.

361	      NAL unit decoding order: A NAL unit order that conforms to the
362	      constraints on NAL unit order given in section 7.4.1.2 in [1].

364	      NALU-time: The value that the RTP timestamp would have if the NAL
365	      unit would be transported in its own RTP packet.

367	      transmission order: The order of packets in ascending RTP
368	      sequence number order (in modulo arithmetic).  Within an
369	      aggregation packet, the NAL unit transmission order is the same
370	      as the order of appearance of NAL units in the packet.

372	      media aware network element (MANE): A network element, such as a
373	      middlebox or application layer gateway that is capable of parsing
374	      certain aspects of the RTP payload headers or the RTP payload and
375	      reacting to the contents.

377	         Informative note: The concept of a MANE goes beyond normal
378	         routers or gateways in that a MANE has to be aware of the
379	         signaling (e.g., to learn about the payload type mappings of
380	         the media streams), and in that it has to be trusted when
381	         working with SRTP.  The advantage of using MANEs is that they
382	         allow packets to be dropped according to the needs of the
383	         media coding.  For example, if a MANE has to drop packets due
384	         to congestion on a certain link, it can identify and remove
385	         those packets whose elimination produces the least adverse
386	         effect on the user experience.

388	      static macroblock: A certain amount of macroblocks in the video
389	      stream can be defined as static, as defined in section 8.3.2.8 in
390	      [3].  Static macroblocks free up additional processing cycles for
391	      the handling of non-static macroblocks.  Based on a given amount
392	      of video processing resources and a given resolution, a higher
393	      number of static macroblocks enables a correspondingly higher
394	      frame rate.

396	      default sub-profile: The subset of coding tools, which may be all
397	      coding tools of one profile or the common subset of coding tools
398	      of more than one profile, indicated by the profile-level-id
399	      parameter.

401	      default level: The level indicated by the profile-level-id
402	      parameter, which consists of three octets, profile_idc, profile-
403	      iop, and level_idc.  The default level is indicated by level_idc
404	      in most cases, and, in some cases, additionally by profile-iop.

406	4.2. Abbreviations

408	      DON:        Decoding Order Number
409	      DONB:       Decoding Order Number Base
410	      DOND:       Decoding Order Number Difference
411	      FEC:        Forward Error Correction
412	      FU:         Fragmentation Unit
413	      IDR:        Instantaneous Decoding Refresh
414	      IEC:        International Electrotechnical Commission
415	      ISO:        International Organization for Standardization
416	      ITU-T:      International Telecommunication Union,
417	                  Telecommunication Standardization Sector
418	      MANE:       Media Aware Network Element
419	      MTAP:       Multi-Time Aggregation Packet
420	      MTAP16:     MTAP with 16-bit timestamp offset
421	      MTAP24:     MTAP with 24-bit timestamp offset
422	      NAL:        Network Abstraction Layer
423	      NALU:       NAL Unit
424	      SAR:        Sample Aspect Ratio
425	      SEI:        Supplemental Enhancement Information
426	      STAP:       Single-Time Aggregation Packet
427	      STAP-A:     STAP type A
428	      STAP-B:     STAP type B
429	      TS:         Timestamp
430	      VCL:        Video Coding Layer
431	      VUI:        Video Usability Information

433	5. RTP Payload Format

435	5.1. RTP Header Usage

437	   The format of the RTP header is specified in RFC 3550 [5] and
438	   reprinted in Figure 1 for convenience.  This payload format uses the
439	   fields of the header in a manner consistent with that specification.

441	   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
442	   payload format is specified in section 5.6.  The RTP payload (and the
443	   settings for some RTP header bits) for aggregation packets and
444	   fragmentation units are specified in sections 5.7 and 5.8,
445	   respectively.

447	    0                   1                   2                   3
448	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
449	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
450	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
451	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	   |                           timestamp                           |
453	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
454	   |           synchronization source (SSRC) identifier            |
455	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
456	   |            contributing source (CSRC) identifiers             |
457	   |                             ....                              |
458	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

460	                 Figure 1 RTP header according to RFC 3550

462	   The RTP header information to be set according to this RTP payload
463	   format is set as follows:

465	   Marker bit (M): 1 bit
466	      Set for the very last packet of the access unit indicated by the
467	      RTP timestamp, in line with the normal use of the M bit in video
468	      formats, to allow an efficient playout buffer handling.  For
469	      aggregation packets (STAP and MTAP), the marker bit in the RTP
470	      header MUST be set to the value that the marker bit of the last
471	      NAL unit of the aggregation packet would have been if it were
472	      transported in its own RTP packet.  Decoders MAY use this bit as
473	      an early indication of the last packet of an access unit, but
474	      MUST NOT rely on this property.

476	         Informative note: Only one M bit is associated with an
477	         aggregation packet carrying multiple NAL units.  Thus, if a
478	         gateway has re-packetized an aggregation packet into several
479	         packets, it cannot reliably set the M bit of those packets.

481	   Payload type (PT): 7 bits
482	      The assignment of an RTP payload type for this new packet format
483	      is outside the scope of this document and will not be specified
484	      here.  The assignment of a payload type has to be performed
485	      either through the profile used or in a dynamic way.

487	   Sequence number (SN): 16 bits
488	      Set and used in accordance with RFC 3550.  For the single NALU
489	      and non-interleaved packetization mode, the sequence number is
490	      used to determine decoding order for the NALU.

492	   Timestamp: 32 bits
493	      The RTP timestamp is set to the sampling timestamp of the content.
494	      A 90 kHz clock rate MUST be used.

496	      If the NAL unit has no timing properties of its own (e.g.,
497	      parameter set and SEI NAL units), the RTP timestamp is set to the
498	      RTP timestamp of the primary coded picture of the access unit in
499	      which the NAL unit is included, according to section 7.4.1.2 of
500	      [1].

502	      The setting of the RTP Timestamp for MTAPs is defined in section
503	      5.7.2.

505	      Receivers SHOULD ignore any picture timing SEI messages included
506	      in access units that have only one display timestamp.  Instead,
507	      receivers SHOULD use the RTP timestamp for synchronizing the
508	      display process.

510	      RTP senders SHOULD NOT transmit picture timing SEI messages for
511	      pictures that are not supposed to be displayed as multiple fields.

513	      If one access unit has more than one display timestamp carried in
514	      a picture timing SEI message, then the information in the SEI
515	      message SHOULD be treated as relative to the RTP timestamp, with
516	      the earliest event occurring at the time given by the RTP
517	      timestamp, and subsequent events later, as given by the
518	      difference in SEI message picture timing values.  Let tSEI1,
519	      tSEI2, ..., tSEIn be the display timestamps carried in the SEI
520	      message of an access unit, where tSEI1 is the earliest of all
521	      such timestamps.  Let tmadjst() be a function that adjusts the
522	      SEI messages time scale to a 90-kHz time scale.  Let TS be the
523	      RTP timestamp.  Then, the display time for the event associated
524	      with tSEI1 is TS.  The display time for the event with tSEIx,
525	      where x is [2..n] is TS + tmadjst (tSEIx - tSEI1).

527	         Informative note: Displaying coded frames as fields is needed
528	         commonly in an operation known as 3:2 pulldown, in which film
529	         content that consists of coded frames is displayed on a
530	         display using interlaced scanning.  The picture timing SEI
531	         message enables carriage of multiple timestamps for the same
532	         coded picture, and therefore the 3:2 pulldown process is
533	         perfectly controlled.  The picture timing SEI message
534	         mechanism is necessary because only one timestamp per coded
535	         frame can be conveyed in the RTP timestamp.

537	         Informative note: Because H.264 allows the decoding order to
538	         be different from the display order, values of RTP timestamps
539	         may not be monotonically non-decreasing as a function of RTP
540	         sequence numbers.  Furthermore, the value for inter-arrival
541	         jitter reported in the RTCP reports may not be a trustworthy
542	         indication of the network performance, as the calculation
543	         rules for inter-arrival jitter (section 6.4.1 of RFC 3550)
544	         assume that the RTP timestamp of a packet is directly
545	         proportional to its transmission time.

547	5.2. Payload Structures

549	   The payload format defines three different basic payload structures.
550	   A receiver can identify the payload structure by the first byte of
551	   the RTP packet payload, which co-serves as the RTP payload header and,
552	   in some cases, as the first byte of the payload.  This byte is always
553	   structured as a NAL unit header.  The NAL unit type field indicates
554	   which structure is present.  The possible structures are as follows:

556	   Single NAL Unit Packet: Contains only a single NAL unit in the
557	   payload.  The NAL header type field will be equal to the original NAL
558	   unit type; i.e., in the range of 1 to 23, inclusive.  Specified in
559	   section 5.6.

561	   Aggregation Packet: Packet type used to aggregate multiple NAL units
562	   into a single RTP payload.  This packet exists in four versions, the
563	   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
564	   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
565	   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
566	   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
567	   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
568	   27, respectively.  Specified in section 5.7.

570	   Fragmentation Unit: Used to fragment a single NAL unit over multiple
571	   RTP packets.  Exists with two versions, FU-A and FU-B, identified
572	   with the NAL unit type numbers 28 and 29, respectively.  Specified in
573	   section 5.8.

575	      Informative note: This specification does not limit the size of
576	      NAL units encapsulated in single NAL unit packets and
577	      fragmentation units.  The maximum size of a NAL unit encapsulated
578	      in any aggregation packet is 65535 bytes.

580	   Table 1 summarizes NAL unit types and the corresponding RTP packet
581	   types when each of these NAL units is directly used as a packet
582	   payload, and where the types are described in this memo.

584	     Table 1.  Summary of NAL unit types and the corresponding packet
585	                                   types

587	      NAL Unit  Packet    Packet Type Name               Section
588	      Type      Type
589	      ---------------------------------------------------------
590	      0        reserved                                     -
591	      1-23     NAL unit  Single NAL unit packet             5.6
592	      24       STAP-A    Single-time aggregation packet     5.7.1
593	      25       STAP-B    Single-time aggregation packet     5.7.1
594	      26       MTAP16    Multi-time aggregation packet      5.7.2
595	      27       MTAP24    Multi-time aggregation packet      5.7.2
596	      28       FU-A      Fragmentation unit                 5.8
597	      29       FU-B      Fragmentation unit                 5.8
598	      30-31    reserved                                     -

600	5.3. NAL Unit Header Usage

602	   The structure and semantics of the NAL unit header were introduced in
603	   section 1.3.  For convenience, the format of the NAL unit header is
604	   reprinted below:

606	      +---------------+
607	      |0|1|2|3|4|5|6|7|
608	      +-+-+-+-+-+-+-+-+
609	      |F|NRI|  Type   |
610	      +---------------+

612	   This section specifies the semantics of F and NRI according to this
613	   specification.

615	   F: 1 bit
616	      forbidden_zero_bit.  A value of 0 indicates that the NAL unit
617	      type octet and payload should not contain bit errors or other
618	      syntax violations.  A value of 1 indicates that the NAL unit type
619	      octet and payload may contain bit errors or other syntax
620	      violations.

622	      MANEs SHOULD set the F bit to indicate detected bit errors in the
623	      NAL unit.  The H.264 specification requires that the F bit is
624	      equal to 0.  When the F bit is set, the decoder is advised that
625	      bit errors or any other syntax violations may be present in the
626	      payload or in the NAL unit type octet.  The simplest decoder
627	      reaction to a NAL unit in which the F bit is equal to 1 is to
628	      discard such a NAL unit and to conceal the lost data in the
629	      discarded NAL unit.

631	   NRI: 2 bits
632	      nal_ref_idc.  The semantics of value 00 and a non-zero value
633	      remain unchanged from the H.264 specification.  In other words, a
634	      value of 00 indicates that the content of the NAL unit is not
635	      used to reconstruct reference pictures for inter picture
636	      prediction. Such NAL units can be discarded without risking the
637	      integrity of the reference pictures.  Values greater than 00
638	      indicate that the decoding of the NAL unit is required to
639	      maintain the integrity of the reference pictures.

641	      In addition to the specification above, according to this RTP
642	      payload specification, values of NRI indicate the relative
643	      transport priority, as determined by the encoder.  MANEs can use
644	      this information to protect more important NAL units better than
645	      they do less important NAL units.  The highest transport priority
646	      is 11, followed by 10, and then by 01; finally, 00 is the lowest.

648	         Informative note: Any non-zero value of NRI is handled
649	         identically in H.264 decoders.  Therefore, receivers need not
650	         manipulate the value of NRI when passing NAL units to the
651	         decoder.

653	      An H.264 encoder MUST set the value of NRI according to the H.264
654	      specification (subclause 7.4.1) when the value of nal_unit_type
655	      is in the range of 1 to 12, inclusive.  In particular, the H.264
656	      specification requires that the value of NRI SHALL be equal to 0
657	      for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or
658	      12.

660	      For NAL units having nal_unit_type equal to 7 or 8 (indicating a
661	      sequence parameter set or a picture parameter set, respectively),
662	      an H.264 encoder SHOULD set the value of NRI to 11 (in binary
663	      format).  For coded slice NAL units of a primary coded picture
664	      having nal_unit_type equal to 5 (indicating a coded slice
665	      belonging to an IDR picture), an H.264 encoder SHOULD set the
666	      value of NRI to 11 (in binary format).

668	      For a mapping of the remaining nal_unit_types to NRI values, the
669	      following example MAY be used and has been shown to be efficient
670	      in a certain environment [14].  Other mappings MAY also be
671	      desirable, depending on the application and the H.264/AVC Annex A
672	      profile in use.

674	         Informative note: Data Partitioning is not available in
675	         certain profiles; e.g., in the Main or Baseline profiles.
676	         Consequently, the NAL unit types 2, 3, and 4 can occur only if
677	         the video bitstream conforms to a profile in which data
678	         partitioning is allowed and not in streams that conform to the
679	         Main or Baseline profiles.

681	   Table 2.  Example of NRI values for coded slices and coded slice data
682	              partitions of primary coded reference pictures

684	      NAL Unit Type     Content of NAL unit              NRI (binary)
685	      ----------------------------------------------------------------
686	       1              non-IDR coded slice                         10
687	       2              Coded slice data partition A                10
688	       3              Coded slice data partition B                01
689	       4              Coded slice data partition C                01

691	         Informative note: As mentioned before, the NRI value of non-
692	         reference pictures is 00 as mandated by H.264/AVC.

694	      An H.264 encoder SHOULD set the value of NRI for coded slice and
695	      coded slice data partition NAL units of redundant coded reference
696	      pictures equal to 01 (in binary format).

698	      Definitions of the values for NRI for NAL unit types 24 to 29,
699	      inclusive, are given in sections 5.7 and 5.8 of this memo.

701	      No recommendation for the value of NRI is given for NAL units
702	      having nal_unit_type in the range of 13 to 23, inclusive, because
703	      these values are reserved for ITU-T and ISO/IEC.  No
704	      recommendation for the value of NRI is given for NAL units having
705	      nal_unit_type equal to 0 or in the range of 30 to 31, inclusive,
706	      as the semantics of these values are not specified in this memo.

708	5.4. Packetization Modes

710	   This memo specifies three cases of packetization modes:

712	   o  Single NAL unit mode

714	   o  Non-interleaved mode

716	   o  Interleaved mode

718	   The single NAL unit mode is targeted for conversational systems that
719	   comply with ITU-T Recommendation H.241 [3]  (see section 12.1).  The
720	   non-interleaved mode is targeted for conversational systems that may
721	   not comply with ITU-T Recommendation H.241.  In the non-interleaved
722	   mode, NAL units are transmitted in NAL unit decoding order.  The
723	   interleaved mode is targeted for systems that do not require very low
724	   end-to-end latency.  The interleaved mode allows transmission of NAL
725	   units out of NAL unit decoding order.

727	   The packetization mode in use MAY be signaled by the value of the
728	   OPTIONAL packetization-mode media type parameter.  The used
729	   packetization mode governs which NAL unit types are allowed in RTP
730	   payloads.  Table 3 summarizes the allowed packet payload types for
731	   each packetization mode.  Packetization modes are explained in more
732	   detail in section 6.

734	    Table 3.  Summary of allowed NAL unit types for each packetization
735	            mode (yes = allowed, no = disallowed, ig = ignore)

737	      Payload Packet    Single NAL    Non-Interleaved    Interleaved
738	      Type    Type      Unit Mode           Mode             Mode
739	      -------------------------------------------------------------
740	      0      reserved      ig               ig               ig
741	      1-23   NAL unit     yes              yes               no
742	      24     STAP-A        no              yes               no
743	      25     STAP-B        no               no              yes
744	      26     MTAP16        no               no              yes
745	      27     MTAP24        no               no              yes
746	      28     FU-A          no              yes              yes
747	      29     FU-B          no               no              yes
748	      30-31  reserved      ig               ig               ig

750	   Some NAL unit or payload type values (indicated as reserved in
751	   Table 3) are reserved for future extensions.  NAL units of those
752	   types SHOULD NOT be sent by a sender (direct as packet payloads, or
753	   as aggregation units in aggregation packets, or as fragmented units
754	   in FU packets) and MUST be ignored by a receiver.  For example, the
755	   payload types 1-23, with the associated packet type "NAL unit", are
756	   allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode", but
757	   disallowed in "Interleaved Mode".  However, NAL units of NAL unit
758	   types 1-23 can be used in "Interleaved Mode" as aggregation units in
759	   STAP-B, MTAP16 and MTAP14 packets as well as fragmented units in FU-A
760	   and FU-B packets.  Similarly, NAL units of NAL unit types 1-23 can
761	   also be used in the "Non-Interleaved Mode" as aggregation units in
762	   STAP-A packets or fragmented units in FU-A packets, in addition to
763	   being directly used as packet payloads.

765	5.5. Decoding Order Number (DON)

767	   In the interleaved packetization mode, the transmission order of NAL
768	   units is allowed to differ from the decoding order of the NAL units.
769	   Decoding order number (DON) is a field in the payload structure or a
770	   derived variable that indicates the NAL unit decoding order.

772	   Rationale and examples of use cases for transmission out of decoding
773	   order and for the use of DON are given in section 13.

775	   The coupling of transmission and decoding order is controlled by the
776	   OPTIONAL sprop-interleaving-depth media type parameter as follows.
777	   When the value of the OPTIONAL sprop-interleaving-depth media type
778	   parameter is equal to 0 (explicitly or per default), the transmission
779	   order of NAL units MUST conform to the NAL unit decoding order.  When
780	   the value of the OPTIONAL sprop-interleaving-depth media type
781	   parameter is greater than 0,

783	   o  the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED
784	      to be the NAL unit decoding order, and

786	   o  the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
787	      and FUs in two consecutive packets is NOT REQUIRED to be the NAL
788	      unit decoding order.

790	   The RTP payload structures for a single NAL unit packet, an STAP-A,
791	   and an FU-A do not include DON.  STAP-B and FU-B structures include
792	   DON, and the structure of MTAPs enables derivation of DON as
793	   specified in section 5.7.2.

795	      Informative note: When an FU-A occurs in interleaved mode, it
796	      always follows an FU-B, which sets its DON.

798	      Informative note: If a transmitter wants to encapsulate a single
799	      NAL unit per packet and transmit packets out of their decoding
800	      order, STAP-B packet type can be used.

802	   In the single NAL unit packetization mode, the transmission order of
803	   NAL units, determined by the RTP sequence number, MUST be the same as
804	   their NAL unit decoding order.  In the non-interleaved packetization
805	   mode, the transmission order of NAL units in single NAL unit packets,
806	   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
807	   The NAL units within an STAP MUST appear in the NAL unit decoding
808	   order.  Thus, the decoding order is first provided through the
809	   implicit order within a STAP, and second provided through the RTP
810	   sequence number for the order between STAPs, FUs, and single NAL unit
811	   packets.

813	   Signaling of the value of DON for NAL units carried in STAP-B, MTAP,
814	   and a series of fragmentation units starting with an FU-B is
815	   specified in sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
816	   value of the first NAL unit in transmission order MAY be set to any
817	   value.  Values of DON are in the range of 0 to 65535, inclusive.
818	   After reaching the maximum value, the value of DON wraps around to 0.

820	   The decoding order of two NAL units contained in any STAP-B, MTAP, or
821	   a series of fragmentation units starting with an FU-B is determined
822	   as follows.  Let DON(i) be the decoding order number of the NAL unit
823	   having index i in the transmission order.  Function don_diff(m,n) is
824	   specified as follows:

826	         If DON(m) == DON(n), don_diff(m,n) = 0

828	         If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
829	         don_diff(m,n) = DON(n) - DON(m)

831	         If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
832	         don_diff(m,n) = 65536 - DON(m) + DON(n)

834	         If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
835	         don_diff(m,n) = - (DON(m) + 65536 - DON(n))

837	         If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
838	         don_diff(m,n) = - (DON(m) - DON(n))

840	   A positive value of don_diff(m,n) indicates that the NAL unit having
841	   transmission order index n follows, in decoding order, the NAL unit
842	   having transmission order index m.  When don_diff(m,n) is equal to 0,
843	   then the NAL unit decoding order of the two NAL units can be in
844	   either order.  A negative value of don_diff(m,n) indicates that the
845	   NAL unit having transmission order index n precedes, in decoding
846	   order, the NAL unit having transmission order index m.

848	   Values of DON related fields (DON, DONB, and DOND; see section 5.7)
849	   MUST be such that the decoding order determined by the values of DON,
850	   as specified above, conforms to the NAL unit decoding order.  If the
851	   order of two NAL units in NAL unit decoding order is switched and the
852	   new order does not conform to the NAL unit decoding order, the NAL
853	   units MUST NOT have the same value of DON.  If the order of two
854	   consecutive NAL units in the NAL unit stream is switched and the new
855	   order still conforms to the NAL unit decoding order, the NAL units
856	   MAY have the same value of DON.  For example, when arbitrary slice
857	   order is allowed by the video coding profile in use, all the coded
858	   slice NAL units of a coded picture are allowed to have the same value
859	   of DON.  Consequently, NAL units having the same value of DON can be
860	   decoded in any order, and two NAL units having a different value of
861	   DON should be passed to the decoder in the order specified above.
862	   When two consecutive NAL units in the NAL unit decoding order have a
863	   different value of DON, the value of DON for the second NAL unit in
864	   decoding order SHOULD be the value of DON for the first, incremented
865	   by one.

867	   An example of the de-packetization process to recover the NAL unit
868	   decoding order is given in section 7.

870	      Informative note: Receivers should not expect that the absolute
871	      difference of values of DON for two consecutive NAL units in the
872	      NAL unit decoding order will be equal to one, even in error-free
873	      transmission.  An increment by one is not required, as at the
874	      time of associating values of DON to NAL units, it may not be
875	      known whether all NAL units are delivered to the receiver.  For
876	      example, a gateway may not forward coded slice NAL units of non-
877	      reference pictures or SEI NAL units when there is a shortage of
878	      bit rate in the network to which the packets are forwarded.  In
879	      another example, a live broadcast is interrupted by pre-encoded
880	      content, such as commercials, from time to time.  The first intra
881	      picture of a pre-encoded clip is transmitted in advance to ensure
882	      that it is readily available in the receiver.  When transmitting
883	      the first intra picture, the originator does not exactly know how
884	      many NAL units will be encoded before the first intra picture of
885	      the pre-encoded clip follows in decoding order.  Thus, the values
886	      of DON for the NAL units of the first intra picture of the pre-
887	      encoded clip have to be estimated when they are transmitted, and
888	      gaps in values of DON may occur.

890	5.6. Single NAL Unit Packet

892	   The single NAL unit packet defined here MUST contain only one NAL
893	   unit, of the types defined in [1].  This means that neither an
894	   aggregation packet nor a fragmentation unit can be used within a
895	   single NAL unit packet.  A NAL unit stream composed by de-packetizing
896	   single NAL unit packets in RTP sequence number order MUST conform to
897	   the NAL unit decoding order.  The structure of the single NAL unit
898	   packet is shown in Figure 2.

900	      Informative note: The first byte of a NAL unit co-serves as the
901	      RTP payload header.

903	    0                   1                   2                   3
904	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
905	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
906	   |F|NRI|  Type   |                                               |
907	   +-+-+-+-+-+-+-+-+                                               |
908	   |                                                               |
909	   |               Bytes 2..n of a Single NAL unit                 |
910	   |                                                               |
911	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
912	   |                               :...OPTIONAL RTP padding        |
913	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

915	          Figure 2 RTP payload format for single NAL unit packet

917	5.7. Aggregation Packets

919	   Aggregation packets are the NAL unit aggregation scheme of this
920	   payload specification.  The scheme is introduced to reflect the
921	   dramatically different MTU sizes of two key target networks: wireline
922	   IP networks (with an MTU size that is often limited by the Ethernet
923	   MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T H.324/M)
924	   based wireless communication systems with preferred transmission unit
925	   sizes of 254 bytes or less.  To prevent media transcoding between the
926	   two worlds, and to avoid undesirable packetization overhead, a NAL
927	   unit aggregation scheme is introduced.

929	   Two types of aggregation packets are defined by this specification:

931	   o  Single-time aggregation packet (STAP): aggregates NAL units with
932	      identical NALU-time.  Two types of STAPs are defined, one without
933	      DON (STAP-A) and another including DON (STAP-B).

935	   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
936	      potentially differing NALU-time.  Two different MTAPs are defined,
937	      differing in the length of the NAL unit timestamp offset.

939	   Each NAL unit to be carried in an aggregation packet is encapsulated
940	   in an aggregation unit.  Please see below for the four different
941	   aggregation units and their characteristics.

943	   The structure of the RTP payload format for aggregation packets is
944	   presented in Figure 3.

946	    0                   1                   2                   3
947	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
948	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
949	   |F|NRI|  Type   |                                               |
950	   +-+-+-+-+-+-+-+-+                                               |
951	   |                                                               |
952	   |             one or more aggregation units                     |
953	   |                                                               |
954	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
955	   |                               :...OPTIONAL RTP padding        |
956	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

958	            Figure 3 RTP payload format for aggregation packets

960	   MTAPs and STAPs share the following packetization rules:  The RTP
961	   timestamp MUST be set to the earliest of the NALU-times of all the
962	   NAL units to be aggregated.  The type field of the NAL unit type
963	   octet MUST be set to the appropriate value, as indicated in Table 4.
964	   The F bit MUST be cleared if all F bits of the aggregated NAL units
965	   are zero; otherwise, it MUST be set.  The value of NRI MUST be the
966	   maximum of all the NAL units carried in the aggregation packet.

968	                 Table 4.  Type field for STAPs and MTAPs

970	      Type   Packet    Timestamp offset   DON related fields
971	                       field length       (DON, DONB, DOND)
972	                       (in bits)          present
973	      --------------------------------------------------------
974	      24     STAP-A       0                 no
975	      25     STAP-B       0                 yes
976	      26     MTAP16      16                 yes
977	      27     MTAP24      24                 yes

979	   The marker bit in the RTP header is set to the value that the marker
980	   bit of the last NAL unit of the aggregated packet would have if it
981	   were transported in its own RTP packet.

983	   The payload of an aggregation packet consists of one or more
984	   aggregation units.  See sections 5.7.1 and 5.7.2 for the four
985	   different types of aggregation units.  An aggregation packet can
986	   carry as many aggregation units as necessary; however, the total
987	   amount of data in an aggregation packet obviously MUST fit into an IP
988	   packet, and the size SHOULD be chosen so that the resulting IP packet
989	   is smaller than the MTU size.  An aggregation packet MUST NOT contain
990	   fragmentation units specified in section 5.8.  Aggregation packets
991	   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
992	   another aggregation packet.

994	5.7.1. Single-Time Aggregation Packet

996	   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
997	   units are aggregated that all share the same NALU-time.  The payload
998	   of an STAP-A does not include DON and consists of at least one
999	   single-time aggregation unit, as presented in Figure 4.  The payload
1000	   of an STAP-B consists of a 16-bit unsigned decoding order number (DON)
1001	   (in network byte order) followed by at least one single-time
1002	   aggregation unit, as presented in Figure 5.

1004	    0                   1                   2                   3
1005	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1006	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1007	                   :                                               |
1008	   +-+-+-+-+-+-+-+-+                                               |
1009	   |                                                               |
1010	   |                single-time aggregation units                  |
1011	   |                                                               |
1012	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1013	   |                               :
1014	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1016	                    Figure 4 Payload format for STAP-A

1018	    0                   1                   2                   3
1019	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1020	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1021	                   :  decoding order number (DON)  |               |
1022	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1023	   |                                                               |
1024	   |                single-time aggregation units                  |
1025	   |                                                               |
1026	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1027	   |                               :
1028	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1030	                    Figure 5 Payload format for STAP-B

1032	   The DON field specifies the value of DON for the first NAL unit in an
1033	   STAP-B in transmission order.  For each successive NAL unit in
1034	   appearance order in an STAP-B, the value of DON is equal to (the
1035	   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
1036	   which '%' stands for the modulo operation.

1038	   A single-time aggregation unit consists of 16-bit unsigned size
1039	   information (in network byte order) that indicates the size of the
1040	   following NAL unit in bytes (excluding these two octets, but
1041	   including the NAL unit type octet of the NAL unit), followed by the
1042	   NAL unit itself, including its NAL unit type byte.  A single-time
1043	   aggregation unit is byte aligned within the RTP payload, but it may
1044	   not be aligned on a 32-bit word boundary.  Figure 6 presents the
1045	   structure of the single-time aggregation unit.

1047	    0                   1                   2                   3
1048	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1049	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1050	                   :        NAL unit size          |               |
1051	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1052	   |                                                               |
1053	   |                           NAL unit                            |
1054	   |                                                               |
1055	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1056	   |                               :
1057	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1059	            Figure 6 Structure for single-time aggregation unit

1061	   Figure 7 presents an example of an RTP packet that contains an STAP-A.
1062	   The STAP contains two single-time aggregation units, labeled as 1 and
1063	   2 in the figure.

1065	    0                   1                   2                   3
1066	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1067	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1068	   |                          RTP Header                           |
1069	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1070	   |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
1071	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1072	   |                         NALU 1 Data                           |
1073	   :                                                               :
1074	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1075	   |               | NALU 2 Size                   | NALU 2 HDR    |
1076	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1077	   |                         NALU 2 Data                           |
1078	   :                                                               :
1079	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1080	   |                               :...OPTIONAL RTP padding        |
1081	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1083	    Figure 7 An example of an RTP packet including an STAP-A containing
1084	                     two single-time aggregation units

1086	   Figure 8 presents an example of an RTP packet that contains an STAP-B.
1087	   The STAP contains two single-time aggregation units, labeled as 1 and
1088	   2 in the figure.

1090	    0                   1                   2                   3
1091	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1092	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1093	   |                          RTP Header                           |
1094	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1095	   |STAP-B NAL HDR | DON                           | NALU 1 Size   |
1096	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1097	   | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
1098	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1099	   :                                                               :
1100	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1101	   |               | NALU 2 Size                   | NALU 2 HDR    |
1102	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1103	   |                       NALU 2 Data                             |
1104	   :                                                               :
1105	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1106	   |                               :...OPTIONAL RTP padding        |
1107	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1109	    Figure 8 An example of an RTP packet including an STAP-B containing
1110	                     two single-time aggregation units

1112	5.7.2. Multi-Time Aggregation Packets (MTAPs)

1114	   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
1115	   order number base (DONB) (in network byte order) and one or more
1116	   multi-time aggregation units, as presented in Figure 9.  DONB MUST
1117	   contain the value of DON for the first NAL unit in the NAL unit
1118	   decoding order among the NAL units of the MTAP.

1120	      Informative note: The first NAL unit in the NAL unit decoding
1121	      order is not necessarily the first NAL unit in the order in which
1122	      the NAL units are encapsulated in an MTAP.

1124	    0                   1                   2                   3
1125	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1126	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1127	                   :  decoding order number base   |               |
1128	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1129	   |                                                               |
1130	   |                 multi-time aggregation units                  |
1131	   |                                                               |
1132	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1133	   |                               :
1134	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1136	                Figure 9 NAL unit payload format for MTAPs

1138	   Two different multi-time aggregation units are defined in this
1139	   specification.  Both of them consist of 16 bits unsigned size
1140	   information of the following NAL unit (in network byte order), an 8-
1141	   bit unsigned decoding order number difference (DOND), and n bits (in
1142	   network byte order) of timestamp offset (TS offset) for this NAL unit,
1143	   whereby n can be 16 or 24.  The choice between the different MTAP
1144	   types (MTAP16 and MTAP24) is application dependent: the larger the
1145	   timestamp offset is, the higher the flexibility of the MTAP, but the
1146	   overhead is also higher.

1148	   The structure of the multi-time aggregation units for MTAP16 and
1149	   MTAP24 are presented in Figures 10 and 11, respectively.  The
1150	   starting or ending position of an aggregation unit within a packet is
1151	   NOT REQUIRED to be on a 32-bit word boundary.  The DON of the NAL
1152	   unit contained in a multi-time aggregation unit is equal to (DONB +
1153	   DOND) % 65536, in which % denotes the modulo operation.  This memo
1154	   does not specify how the NAL units within an MTAP are ordered, but,
1155	   in most cases, NAL unit decoding order SHOULD be used.

1157	   The timestamp offset field MUST be set to a value equal to the value
1158	   of the following formula: If the NALU-time is larger than or equal to
1159	   the RTP timestamp of the packet, then the timestamp offset equals
1160	   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
1161	   If the NALU-time is smaller than the RTP timestamp of the packet,
1162	   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
1163	   timestamp of the packet).

1165	    0                   1                   2                   3
1166	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1167	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1168	   :        NAL unit size          |      DOND     |  TS offset    |
1169	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1170	   |  TS offset    |                                               |
1171	   +-+-+-+-+-+-+-+-+              NAL unit                         |
1172	   |                                                               |
1173	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1174	   |                               :
1175	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1177	             Figure 10  Multi-time aggregation unit for MTAP16

1179	    0                   1                   2                   3
1180	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1181	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1182	   :        NAL unit size         |      DOND     |  TS offset    |
1183	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1184	   |         TS offset             |                               |
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1186	   |                              NAL unit                         |
1187	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1188	   |                               :
1189	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1191	             Figure 11  Multi-time aggregation unit for MTAP24

1193	   For the "earliest" multi-time aggregation unit in an MTAP the
1194	   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
1195	   itself is identical to the earliest NALU-time.

1197	      Informative note: The "earliest" multi-time aggregation unit is
1198	      the one that would have the smallest extended RTP timestamp among
1199	      all the aggregation units of an MTAP if the NAL units contained
1200	      in the aggregation units were encapsulated in single NAL unit
1201	      packets.  An extended timestamp is a timestamp that has more than
1202	      32 bits and is capable of counting the wraparound of the
1203	      timestamp field, thus enabling one to determine the smallest
1204	      value if the timestamp wraps.  Such an "earliest" aggregation
1205	      unit may not be the first one in the order in which the
1206	      aggregation units are encapsulated in an MTAP.  The "earliest"
1207	      NAL unit need not be the same as the first NAL unit in the NAL
1208	      unit decoding order either.

1210	   Figure 12 presents an example of an RTP packet that contains a multi-
1211	   time aggregation packet of type MTAP16 that contains two multi-time
1212	   aggregation units, labeled as 1 and 2 in the figure.

1214	    0                   1                   2                   3
1215	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1216	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1217	   |                          RTP Header                           |
1218	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1219	   |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
1220	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1221	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
1222	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1223	   |  NALU 1 HDR   |  NALU 1 DATA                                  |
1224	   +-+-+-+-+-+-+-+-+                                               +
1225	   :                                                               :
1226	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1227	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1228	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1229	   |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
1230	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1231	   :                                                               :
1232	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1233	   |                               :...OPTIONAL RTP padding        |
1234	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1236	   Figure 12  An RTP packet including a multi-time aggregation packet of
1237	          type MTAP16 containing two multi-time aggregation units

1239	   Figure 13 presents an example of an RTP packet that contains a multi-
1240	   time aggregation packet of type MTAP24 that contains two multi-time
1241	   aggregation units, labeled as 1 and 2 in the figure.

1243	    0                   1                   2                   3
1244	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1245	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1246	   |                          RTP Header                           |
1247	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1248	   |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
1249	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1250	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
1251	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1252	   |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
1253	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1254	   :                                                               :
1255	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1256	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1257	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1258	   |       NALU 2 TS offset                        |  NALU 2 HDR   |
1259	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1260	   |  NALU 2 DATA                                                  |
1261	   :                                                               :
1262	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1263	   |                               :...OPTIONAL RTP padding        |
1264	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1266	   Figure 13  An RTP packet including a multi-time aggregation packet of
1267	          type MTAP24 containing two multi-time aggregation units

1269	5.7.3. Fragmentation Units (FUs)

1271	   This payload type allows fragmenting a NAL unit into several RTP
1272	   packets.  Doing so on the application layer instead of relying on
1273	   lower layer fragmentation (e.g., by IP) has the following advantages:

1275	   o  The payload format is capable of transporting NAL units bigger
1276	      than 64 kbytes over an IPv4 network that may be present in pre-
1277	      recorded video, particularly in High Definition formats (there is
1278	      a limit of the number of slices per picture, which results in a
1279	      limit of NAL units per picture, which may result in big NAL units).

1281	   o  The fragmentation mechanism allows fragmenting a single NAL unit
1282	      and applying generic forward error correction as described in
1283	      section 12.5.

1285	   Fragmentation is defined only for a single NAL unit and not for any
1286	   aggregation packets.  A fragment of a NAL unit consists of an integer
1287	   number of consecutive octets of that NAL unit.  Each octet of the NAL
1288	   unit MUST be part of exactly one fragment of that NAL unit.
1289	   Fragments of the same NAL unit MUST be sent in consecutive order with
1290	   ascending RTP sequence numbers (with no other RTP packets within the
1291	   same RTP packet stream being sent between the first and last
1292	   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
1293	   number order.

1295	   When a NAL unit is fragmented and conveyed within fragmentation units
1296	   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
1297	   MUST NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
1298	   contain another FU.

1300	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1301	   time of the fragmented NAL unit.

1303	   Figure 14 presents the RTP payload format for FU-As.  An FU-A
1304	   consists of a fragmentation unit indicator of one octet, a
1305	   fragmentation unit header of one octet, and a fragmentation unit
1306	   payload.

1308	    0                   1                   2                   3
1309	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1310	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1311	   | FU indicator  |   FU header   |                               |
1312	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1313	   |                                                               |
1314	   |                         FU payload                            |
1315	   |                                                               |
1316	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1317	   |                               :...OPTIONAL RTP padding        |
1318	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1320	                  Figure 14  RTP payload format for FU-A

1322	   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
1323	   consists of a fragmentation unit indicator of one octet, a
1324	   fragmentation unit header of one octet, a decoding order number (DON)
1325	   (in network byte order), and a fragmentation unit payload.  In other
1326	   words, the structure of FU-B is the same as the structure of FU-A,
1327	   except for the additional DON field.

1329	    0                   1                   2                   3
1330	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1331	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1332	   | FU indicator  |   FU header   |               DON             |
1333	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1334	   |                                                               |
1335	   |                         FU payload                            |
1336	   |                                                               |
1337	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1338	   |                               :...OPTIONAL RTP padding        |
1339	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1341	                  Figure 15  RTP payload format for FU-B

1343	   NAL unit type FU-B MUST be used in the interleaved packetization mode
1344	   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
1345	   type FU-B MUST NOT be used in any other case.  In other words, in the
1346	   interleaved packetization mode, each NALU that is fragmented has an
1347	   FU-B as the first fragment, followed by one or more FU-A fragments.

1349	   The FU indicator octet has the following format:

1351	      +---------------+
1352	      |0|1|2|3|4|5|6|7|
1353	      +-+-+-+-+-+-+-+-+
1354	      |F|NRI|  Type   |
1355	      +---------------+

1357	   Values equal to 28 and 29 in the Type field of the FU indicator octet
1358	   identify an FU-A and an FU-B, respectively.  The use of the F bit is
1359	   described in section 5.3.  The value of the NRI field MUST be set
1360	   according to the value of the NRI field in the fragmented NAL unit.

1362	   The FU header has the following format:

1364	      +---------------+
1365	      |0|1|2|3|4|5|6|7|
1366	      +-+-+-+-+-+-+-+-+
1367	      |S|E|R|  Type   |
1368	      +---------------+

1370	   S: 1 bit
1371	      When set to one, the Start bit indicates the start of a
1372	      fragmented NAL unit.  When the following FU payload is not the
1373	      start of a fragmented NAL unit payload, the Start bit is set to
1374	      zero.

1376	   E: 1 bit
1377	      When set to one, the End bit indicates the end of a fragmented
1378	      NAL unit, i.e., the last byte of the payload is also the last
1379	      byte of the fragmented NAL unit.  When the following FU payload
1380	      is not the last fragment of a fragmented NAL unit, the End bit is
1381	      set to zero.

1383	   R: 1 bit
1384	      The Reserved bit MUST be equal to 0 and MUST be ignored by the
1385	      receiver.

1387	   Type: 5 bits
1388	      The NAL unit payload type as defined in Table 7-1 of [1].

1390	   The value of DON in FU-Bs is selected as described in section 5.5.

1392	      Informative note: The DON field in FU-Bs allows gateways to
1393	      fragment NAL units to FU-Bs without organizing the incoming NAL
1394	      units to the NAL unit decoding order.

1396	   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
1397	   Start bit and End bit MUST NOT both be set to one in the same FU
1398	   header.

1400	   The FU payload consists of fragments of the payload of the fragmented
1401	   NAL unit so that if the fragmentation unit payloads of consecutive
1402	   FUs are sequentially concatenated, the payload of the fragmented NAL
1403	   unit can be reconstructed.  The NAL unit type octet of the fragmented
1404	   NAL unit is not included as such in the fragmentation unit payload,
1405	   but rather the information of the NAL unit type octet of the
1406	   fragmented NAL unit is conveyed in F and NRI fields of the FU
1407	   indicator octet of the fragmentation unit and in the type field of
1408	   the FU header.  An FU payload MAY have any number of octets and MAY
1409	   be empty.

1411	      Informative note: Empty FUs are allowed to reduce the latency of
1412	      a certain class of senders in nearly lossless environments.
1413	      These senders can be characterized in that they packetize NALU
1414	      fragments before the NALU is completely generated and, hence,
1415	      before the NALU size is known.  If zero-length NALU fragments
1416	      were not allowed, the sender would have to generate at least one
1417	      bit of data of the following fragment before the current fragment
1418	      could be sent.  Due to the characteristics of H.264, where
1419	      sometimes several macroblocks occupy zero bits, this is
1420	      undesirable and can add delay.  However, the (potential) use of
1421	      zero-length NALU fragments should be carefully weighed against
1422	      the increased risk of the loss of at least a part of the NALU
1423	      because of the additional packets employed for its transmission.

1425	   If a fragmentation unit is lost, the receiver SHOULD discard all
1426	   following fragmentation units in transmission order corresponding to
1427	   the same fragmented NAL unit.

1429	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1430	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1431	   n of that NAL unit is not received.  In this case, the
1432	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1433	   syntax violation.

1435	6. Packetization Rules

1437	   The packetization modes are introduced in section 5.2.  The
1438	   packetization rules common to more than one of the packetization
1439	   modes are specified in section 6.1.  The packetization rules for the
1440	   single NAL unit mode, the non-interleaved mode, and the interleaved
1441	   mode are specified in sections 6.2, 6.3, and 6.4, respectively.

1443	6.1. Common Packetization Rules

1445	   All senders MUST enforce the following packetization rules regardless
1446	   of the packetization mode in use:

1448	   o  Coded slice NAL units or coded slice data partition NAL units
1449	      belonging to the same coded picture (and thus sharing the same RTP
1450	      timestamp value) MAY be sent in any order; however, for delay-
1451	      critical systems, they SHOULD be sent in their original decoding
1452	      order to minimize the delay.  Note that the decoding order is the
1453	      order of the NAL units in the bitstream.

1455	   o  Parameter sets are handled in accordance with the rules and
1456	      recommendations given in section 8.4.

1458	   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
1459	      picture parameter set NAL units, as neither this memo nor the
1460	      H.264 specification provides means to identify duplicated NAL
1461	      units.  Sequence and picture parameter set NAL units MAY be
1462	      duplicated to make their correct reception more probable, but any
1463	      such duplication MUST NOT affect the contents of any active
1464	      sequence or picture parameter set.  Duplication SHOULD be
1465	      performed on the application layer and not by duplicating RTP
1466	      packets (with identical sequence numbers).

1468	   Senders using the non-interleaved mode and the interleaved mode MUST
1469	   enforce the following packetization rule:

1471	   o  MANEs MAY convert single NAL unit packets into one aggregation
1472	      packet, convert an aggregation packet into several single NAL unit
1473	      packets, or mix both concepts, in an RTP translator.  The RTP
1474	      translator SHOULD take into account at least the following
1475	      parameters: path MTU size, unequal protection mechanisms (e.g.,
1476	      through packet-based FEC according to RFC 2733 [18], especially
1477	      for sequence and picture parameter set NAL units and coded slice
1478	      data partition A NAL units), bearable latency of the system, and
1479	      buffering capabilities of the receiver.

1481	         Informative note: An RTP translator is required to handle RTCP
1482	         as per RFC 3550.

1484	6.2. Single NAL Unit Mode

1486	   This mode is in use when the value of the OPTIONAL packetization-mode
1487	   media type parameter is equal to 0 or the packetization-mode is not
1488	   present.  All receivers MUST support this mode.  It is primarily
1489	   intended for low-delay applications that are compatible with systems
1490	   using ITU-T Recommendation H.241 [3] (see section 12.1).  Only single
1491	   NAL unit packets MAY be used in this mode.  STAPs, MTAPs, and FUs
1492	   MUST NOT be used.  The transmission order of single NAL unit packets
1493	   MUST comply with the NAL unit decoding order.

1495	6.3. Non-Interleaved Mode

1497	   This mode is in use when the value of the OPTIONAL packetization-mode
1498	   media type parameter is equal to 1.  This mode SHOULD be supported.
1499	   It is primarily intended for low-delay applications.  Only single NAL
1500	   unit packets, STAP-As, and FU-As MAY be used in this mode.  STAP-Bs,
1501	   MTAPs, and FU-Bs MUST NOT be used.  The transmission order of NAL
1502	   units MUST comply with the NAL unit decoding order.

1504	6.4. Interleaved Mode

1506	   This mode is in use when the value of the OPTIONAL packetization-mode
1507	   media type parameter is equal to 2.  Some receivers MAY support this
1508	   mode.  STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used.  STAP-As and
1509	   single NAL unit packets MUST NOT be used.  The transmission order of
1510	   packets and NAL units is constrained as specified in section 5.5.

1512	7. De-Packetization Process

1514	   The de-packetization process is implementation dependent.  Therefore,
1515	   the following description should be seen as an example of a suitable
1516	   implementation.  Other schemes may be used as well as long as the
1517	   output for the same input is the same as the process described below.
1518	   The same output means that the resulting NAL units, and their order,
1519	   are identical.  Optimizations relative to the described algorithms
1520	   are likely possible.  Section 7.1 presents the de-packetization
1521	   process for the single NAL unit and non-interleaved packetization
1522	   modes, whereas section 7.2 describes the process for the interleaved
1523	   mode.  Section 7.3 includes additional de-packetization guidelines
1524	   for intelligent receivers.

1526	   All normal RTP mechanisms related to buffer management apply.  In
1527	   particular, duplicated or outdated RTP packets (as indicated by the
1528	   RTP sequences number and the RTP timestamp) are removed.  To
1529	   determine the exact time for decoding, factors such as a possible
1530	   intentional delay to allow for proper inter-stream synchronization
1531	   must be factored in.

1533	7.1. Single NAL Unit and Non-Interleaved Mode

1535	   The receiver includes a receiver buffer to compensate for
1536	   transmission delay jitter.  The receiver stores incoming packets in
1537	   reception order into the receiver buffer.  Packets are de-packetized
1538	   in RTP sequence number order.  If a de-packetized packet is a single
1539	   NAL unit packet, the NAL unit contained in the packet is passed
1540	   directly to the decoder.  If a de-packetized packet is an STAP-A, the
1541	   NAL units contained in the packet are passed to the decoder in the
1542	   order in which they are encapsulated in the packet.  For all the FU-A
1543	   packets containing fragments of a single NAL unit, the de-packetized
1544	   fragments are concatenated in their sending order to recover the NAL
1545	   unit, which is then passed to the decoder.

1547	      Informative note: If the decoder supports Arbitrary Slice Order,
1548	      coded slices of a picture can be passed to the decoder in any
1549	      order regardless of their reception and transmission order.

1551	7.2. Interleaved Mode

1553	   The general concept behind these de-packetization rules is to reorder
1554	   NAL units from transmission order to the NAL unit decoding order.

1556	   The receiver includes a receiver buffer, which is used to compensate
1557	   for transmission delay jitter and to reorder NAL units from
1558	   transmission order to the NAL unit decoding order.  In this section,
1559	   the receiver operation is described under the assumption that there
1560	   is no transmission delay jitter.  To make a difference from a
1561	   practical receiver buffer that is also used for compensation of
1562	   transmission delay jitter, the receiver buffer is here after called
1563	   the de-interleaving buffer in this section.  Receivers SHOULD also
1564	   prepare for transmission delay jitter; i.e., either reserve separate
1565	   buffers for transmission delay jitter buffering and de-interleaving
1566	   buffering or use a receiver buffer for both transmission delay jitter
1567	   and de-interleaving.  Moreover, receivers SHOULD take transmission
1568	   delay jitter into account in the buffering operation; e.g., by
1569	   additional initial buffering before starting of decoding and playback.

1571	   This section is organized as follows: subsection 7.2.1 presents how o
1572	   calculate the size of the de-interleaving buffer.  Subsection 7.2.2
1573	   specifies the receiver process how to organize received NAL units to
1574	   the NAL unit decoding order.

1576	7.2.1. Size of the De-interleaving Buffer

1578	   When the SDP Offer/Answer model or any other capability exchange
1579	   procedure is used in session setup, the properties of the received
1580	   stream SHOULD be such that the receiver capabilities are not exceeded.
1581	   In the SDP Offer/Answer model, the receiver can indicate its
1582	   capabilities to allocate a de-interleaving buffer with the deint-buf-
1583	   cap media type parameter.  The sender indicates the requirement for
1584	   the de-interleaving buffer size with the sprop-deint-buf-req media
1585	   type parameter.  It is therefore RECOMMENDED to set the de-
1586	   interleaving buffer size, in terms of number of bytes, equal to or
1587	   greater than the value of sprop-deint-buf-req media type parameter.
1588	   See section 8.1 for further information on deint-buf-cap and sprop-
1589	   deint-buf-req media type parameters and section 8.2.2 for further
1590	   information on their use in the SDP Offer/Answer model.

1592	   When a declarative session description is used in session setup, the
1593	   sprop-deint-buf-req media type parameter signals the requirement for
1594	   the de-interleaving buffer size.  It is therefore RECOMMENDED to set
1595	   the de-interleaving buffer size, in terms of number of bytes, equal
1596	   to or greater than the value of sprop-deint-buf-req media type
1597	   parameter.

1599	7.2.2. De-interleaving Process

1601	   There are two buffering states in the receiver: initial buffering and
1602	   buffering while playing.  Initial buffering occurs when the RTP
1603	   session is initialized.  After initial buffering, decoding and
1604	   playback are started, and the buffering-while-playing mode is used.

1606	   Regardless of the buffering state, the receiver stores incoming NAL
1607	   units, in reception order, in the de-interleaving buffer as follows.
1608	   NAL units of aggregation packets are stored in the de-interleaving
1609	   buffer individually.  The value of DON is calculated and stored for
1610	   each NAL unit.

1612	   The receiver operation is described below with the help of the
1613	   following functions and constants:

1615	   o  Function AbsDON is specified in section 8.1.

1617	   o  Function don_diff is specified in section 5.5.

1619	   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
1620	      media type parameter (see section 8.1) incremented by 1.

1622	   Initial buffering lasts until one of the following conditions is
1623	   fulfilled:

1625	   o  There are N or more VCL NAL units in the de-interleaving buffer.

1627	   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
1628	      the value of sprop-max-don-diff, in which n corresponds to the NAL
1629	      unit having the greatest value of AbsDON among the received NAL
1630	      units and m corresponds to the NAL unit having the smallest value
1631	      of AbsDON among the received NAL units.

1633	   o  Initial buffering has lasted for the duration equal to or greater
1634	      than the value of the OPTIONAL sprop-init-buf-time media type
1635	      parameter.

1637	   The NAL units to be removed from the de-interleaving buffer are
1638	   determined as follows:

1640	   o  If the de-interleaving buffer contains at least N VCL NAL units,
1641	      NAL units are removed from the de-interleaving buffer and passed
1642	      to the decoder in the order specified below until the buffer
1643	      contains N-1 VCL NAL units.

1645	   o  If sprop-max-don-diff is present, all NAL units m for which
1646	      don_diff(m,n) is greater than sprop-max-don-diff are removed from
1647	      the de-interleaving buffer and passed to the decoder in the order
1648	      specified below.  Herein, n corresponds to the NAL unit having the
1649	      greatest value of AbsDON among the NAL units in the de-
1650	      interleaving buffer.

1652	   The order in which NAL units are passed to the decoder is specified
1653	   as follows:

1655	   o  Let PDON be a variable that is initialized to 0 at the beginning
1656	      of the RTP session.

1658	   o  For each NAL unit associated with a value of DON, a DON distance
1659	      is calculated as follows.  If the value of DON of the NAL unit is
1660	      larger than the value of PDON, the DON distance is equal to DON -
1661	      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
1662	      + 1.

1664	   o  NAL units are delivered to the decoder in ascending order of DON
1665	      distance.  If several NAL units share the same value of DON
1666	      distance, they can be passed to the decoder in any order.

1668	   o  When a desired number of NAL units have been passed to the decoder,
1669	      the value of PDON is set to the value of DON for the last NAL unit
1670	      passed to the decoder.

1672	7.3. Additional De-Packetization Guidelines

1674	   The following additional de-packetization rules may be used to
1675	   implement an operational H.264 de-packetizer:

1677	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1678	      coded slice data partitions A (DPAs).  If a lost DPA is found,
1679	      after taking into account possible retransmission and FEC, a
1680	      gateway may decide not to send the corresponding coded slice data
1681	      partitions B and C, as their information is meaningless for H.264
1682	      decoders.  In this way a MANE can reduce network load by
1683	      discarding useless packets without parsing a complex bitstream.

1685	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1686	      FUs.  If a lost FU is found, a gateway may decide not to send the
1687	      following FUs of the same fragmented NAL unit, as their
1688	      information is meaningless for H.264 decoders.  In this way a MANE
1689	      can reduce network load by discarding useless packets without
1690	      parsing a complex bitstream.

1692	   o  Intelligent receivers having to discard packets or NALUs should
1693	      first discard all packets/NALUs in which the value of the NRI
1694	      field of the NAL unit type octet is equal to 0.  This will
1695	      minimize the impact on user experience and keep the reference
1696	      pictures intact.  If more packets have to be discarded, then
1697	      packets with a numerically lower NRI value should be discarded
1698	      before packets with a numerically higher NRI value.  However,
1699	      discarding any packets with an NRI bigger than 0 very likely leads
1700	      to decoder drift and SHOULD be avoided.

1702	8. Payload Format Parameters

1704	   This section specifies the parameters that MAY be used to select
1705	   optional features of the payload format and certain features of the
1706	   bitstream.  The parameters are specified here as part of the media
1707	   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
1708	   mapping of the parameters into the Session Description Protocol (SDP)
1709	   [6] is also provided for applications that use SDP.  Equivalent
1710	   parameters could be defined elsewhere for use with control protocols
1711	   that do not use SDP.

1713	   Some parameters provide a receiver with the properties of the stream
1714	   that will be sent.  The names of all these parameters start with
1715	   "sprop" for stream properties.  Some of these "sprop" parameters are
1716	   limited by other payload or codec configuration parameters.  For
1717	   example, the sprop-parameter-sets parameter is constrained by the
1718	   profile-level-id parameter.  The media sender selects all "sprop"
1719	   parameters rather than the receiver.  This uncommon characteristic of
1720	   the "sprop" parameters may not be compatible with some signaling
1721	   protocol concepts, in which case the use of these parameters SHOULD
1722	   be avoided.

1724	8.1. Media Type Registration

1726	   The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
1727	   allocated from the IETF tree.

1729	   The receiver MUST ignore any unspecified parameter.

1731	   Media Type name:     video

1733	   Media subtype name:  H264

1735	   Required parameters: none

1737	   OPTIONAL parameters:

1739	      profile-level-id:
1740	         A base16 [7] (hexadecimal) representation of the following
1741	         three bytes in the sequence parameter set NAL unit specified
1742	         in [1]: 1) profile_idc, 2) a byte herein referred to as
1743	         profile-iop, composed of the values of constraint_set0_flag,
1744	         constraint_set1_flag,constraint_set2_flag,
1745	         constraint_set3_flag, and reserved_zero_4bits in bit-
1746	         significance order, starting from the most significant bit,
1747	         and 3) level_idc.  Note that reserved_zero_4bits is required
1748	         to be equal to 0 in [1], but other values for it may be
1749	         specified in the future by ITU-T or ISO/IEC.

1751	         The profile-level-id parameter indicates the default sub-
1752	         profile, i.e. the subset of coding tools that may have been
1753	         used to generate the stream or the receiver supports, and the
1754	         default level of the stream or the receiver supports.

1756	         The default sub-profile is indicated collectively by the
1757	         profile_idc byte and some fields in the profile-iop byte.
1758	         Depending on the values of the fields in the profile-iop byte,
1759	         the default sub-profile may be the same set of coding tools
1760	         supported by one profile, or a common subset of coding tools
1761	         of multiple profiles, as specified in subsection 7.4.2.1.1 of
1762	         [1].  The default level is indicated by the level_idc byte,
1763	         and, when profile_idc is equal to 66, 77 or 88 (the Baseline,
1764	         Main, or Extended profile) and level_idc is equal to 11,
1765	         additionally by bit 4 (constraint_set3_flag) of the profile-
1766	         iop byte.  When profile_idc is equal to 66, 77 or 88 (the
1767	         Baseline, Main, or Extended profile) and level_idc is equal to
1768	         11, and bit 4 (constraint_set3_flag) of the profile-iop byte
1769	         is equal to 1, the default level is level 1b.

1771	         Table 5 lists all profiles defined in Annex A of [1] and, for
1772	         each of the profiles, the possible combinations of profile_idc
1773	         and profile-iop that represent the same sub-profile.

1775	            Table 5.  Combinations of profile_idc and profile-iop
1776	            representing the same sub-profile corresponding to the full
1777	            set of coding tools supported by one profile.  In the
1778	            following, x may be either 0 or 1, while the profile names
1779	            are indicated as follows. CB: Constrained Baseline profile,
1780	            B: Baseline profile, M: Main profile, E: Extended profile,
1781	            H: High profile, H10: High 10 profile, H42: High 4:2:2
1782	            profile, H44: High 4:4:4 Predictive profile, H10I: High 10
1783	            Intra profile, H42I: High 4:2:2 Intra profile, H44I: High
1784	            4:4:4 Intra profile, and C44I: CAVLC 4:4:4 Intra profile.

1786	              Profile     profile_idc             profile-iop
1787	                          (hexadecimal)           (binary)

1789	              CB          42 (B)                  x1xx0000
1790	                 same as: 4D (M)                  1xxx0000
1791	                 same as: 58 (E)                  11xx0000
1792	                 same as: 64 (H), 6E (H10),       1xx00000
1793	                          7A (H42), or F4 (H44)
1794	              B           42 (B)                  x0xx0000
1795	                 same as: 58 (E)                  10xx0000
1796	              M           4D (M)                  0x0x0000
1797	                 same as: 64 (H), 6E (H10),       01000000
1798	                          7A (H42), or F4 (H44)
1799	              E           58                      00xx0000
1800	              H           64                      00000000
1801	              H10         6E                      00000000
1802	              H42         7A                      00000000
1803	              H44         F4                      00000000
1804	              H10I        64                      00010000
1805	              H42I        7A                      00010000
1806	              H44I        F4                      00010000
1807	              C44I        2C                      00010000

1809	         For example, in the table above, profile_idc equal to 58
1810	         (Extended) with profile-iop equal to 11xx0000 indicates the
1811	         same sub-profile corresponding to profile_idc equal to 42
1812	         (Baseline) with profile-iop equal to x1xx0000.  Note that
1813	         other combinations of profile_idc and profile-iop (note listed
1814	         in Table 5) may represent a sub-profile equivalent to the
1815	         common subset of coding tools for more than one profile.  Note
1816	         also that a decoder conforming to a certain profile may be
1817	         able to decode bitstreams conforming to other profiles.  For
1818	         example, a decoder conforming to the High 4:4:4 profile at
1819	         certain level must be able to decode bitstreams confirming to
1820	         the Constrained Baseline, Main, High, High 10 or High 4:2:2
1821	         profile at the same or a lower level.

1823	         If the profile-level-id parameter is used to indicate
1824	         properties of a NAL unit stream, it indicates that, to decode
1825	         the stream, the minimum subset of coding tools a decoder has
1826	         to support is the default sub-profile, and the lowest level
1827	         the decoder has to support is the default level.

1829	         If the profile-level-id parameter is used for capability
1830	         exchange or session setup procedure, it indicates the subset
1831	         of coding tools, which is equal to the default sub-profile,
1832	         and the highest level, which is equal to the default level,
1833	         that the codec supports.  All levels lower than the default
1834	         level are also supported by the codec.

1836	            Informative note: Capability exchange and session setup
1837	            procedures should provide means to list the capabilities
1838	            for each supported sub-profile separately.  For example,
1839	            the one-of-N codec selection procedure of the SDP
1840	            Offer/Answer model can be used (section 10.2 of [8]).  The
1841	            one-of-N codec selection procedure may also be used to
1842	            provide different combinations of profile_idc and profile-
1843	            iop that represent the same sub-profile.  When there are
1844	            many different combinations of profile_idc and profile-iop
1845	            that represent the same sub-profile, using the one-of-N
1846	            codec selection procedure may result into a fairly large
1847	            SDP message.  Therefore, a receiver should understand the
1848	            different equivalent combinations of profile_idc and
1849	            profile-iop that represent the same sub-profile, and be
1850	            ready to accept an offer using any of the equivalent
1851	            combinations.

1853	         If no profile-level-id is present, the Baseline Profile
1854	         without additional constraints at Level 1 MUST be implied.

1856	      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
1857	         These parameters MAY be used to signal the capabilities of a
1858	         receiver implementation. These parameters MUST NOT be used for
1859	         any other purpose.  The profile-level-id parameter MUST be
1860	         present in the same receiver capability description that
1861	         contains any of these parameters.  The level conveyed in the
1862	         value of the profile-level-id parameter MUST be such that the
1863	         receiver is fully capable of supporting.  max-mbps, max-smbps,
1864	         max-fs, max-cpb, max-dpb, and max-br MAY be used to indicate
1865	         capabilities of the receiver that extend the required
1866	         capabilities of the signaled level, as specified below.

1868	         When more than one parameter from the set (max-mbps, max-
1869	         smbps , max-fs, max-cpb, max-dpb, max-br) is present, the
1870	         receiver MUST support all signaled capabilities simultaneously.
1871	         For example, if both max-mbps and max-br are present, the
1872	         signaled level with the extension of both the frame rate and
1873	         bit rate is supported.  That is, the receiver is able to
1874	         decode NAL unit streams in which the macroblock processing
1875	         rate is up to max-mbps (inclusive), the bit rate is up to max-
1876	         br (inclusive), the coded picture buffer size is derived as
1877	         specified in the semantics of the max-br parameter below, and
1878	         other properties comply with the level specified in the value
1879	         of the profile-level-id parameter.

1881	         If a receiver can support all the properties of level A, the
1882	         level specified in the value of the profile-level-id MUST be
1883	         level A (i.e. MUST NOT be lower than level A).  In other words,
1884	         a sender or receiver MUST NOT signal values of max-mbps, max-
1885	         fs, max-cpb, max-dpb, and max-br that meet the requirements of
1886	         a higher level compared to the level specified in the value of
1887	         the profile-level-id parameter.

1889	            Informative note: When the OPTIONAL media type parameters
1890	            are used to signal the properties of a NAL unit stream,
1891	            max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
1892	            are not present, and the value of profile-level-id must
1893	            always be such that the NAL unit stream complies fully with
1894	            the specified profile and level.

1896	      max-mbps: The value of max-mbps is an integer indicating the
1897	         maximum macroblock processing rate in units of macroblocks per
1898	         second.  The max-mbps parameter signals that the receiver is
1899	         capable of decoding video at a higher rate than is required by
1900	         the signaled level conveyed in the value of the profile-level-
1901	         id parameter.  When max-mbps is signaled, the receiver MUST be
1902	         able to decode NAL unit streams that conform to the signaled
1903	         level, with the exception that the MaxMBPS value in Table A-1
1904	         of [1] for the signaled level is replaced with the value of
1905	         max-mbps.  The value of max-mbps MUST be greater than or equal
1906	         to the value of MaxMBPS for the level given in Table A-1 of
1907	         [1].  Senders MAY use this knowledge to send pictures of a
1908	         given size at a higher picture rate than is indicated in the
1909	         signaled level.

1911	      max-smbps: The value of max-smbps is an integer indicating the
1912	         maximum static macroblock processing rate in units of static
1913	         macroblocks per second, under the hypothetical assumption that
1914	         all macroblocks are static macroblocks.  When max-smbps is
1915	         signalled the MaxMBPS value in Table A-1 of [1] should be
1916	         replaced with the result of the following computation:

1918	         o If the parameter max-mbps is signalled, set a variable
1919	            MaxMacroblocksPerSecond to the value of max-mbps.
1920	            Otherwise, set MaxMacroblocksPerSecond equal to the value
1921	            of MaxMBPS for the level in Table A-1 [1].

1923	         o Set a variable P_non-static to the proportion of non-static
1924	            macroblocks in picture n.

1926	         o Set a variable P_static to the proportion of static
1927	            macroblocks in picture n.

1929	         o The value of MaxMBPS in Table A-1 of [1] should be
1930	            considered by the encoder to be equal to:

1932	            MaxMacroblocksPerSecond * max-smbps / ( P_non-static * max-
1933	            smbps + P_static * MaxMacroblocksPerSecond)

1935	         The encoder should recompute this value for each picture. The
1936	         value of max-smbps MUST be greater than the value of MaxMBPS
1937	         for the level given in Table A-1 of [1].  Senders MAY use this
1938	         knowledge to send pictures of a given size at a higher picture
1939	         rate than is indicated in the signalled level.

1941	      max-fs: The value of max-fs is an integer indicating the maximum
1942	         frame size in units of macroblocks.  The max-fs parameter
1943	         signals that the receiver is capable of decoding larger
1944	         picture sizes than are required by the signaled level conveyed
1945	         in the value of the profile-level-id parameter.  When max-fs
1946	         is signaled, the receiver MUST be able to decode NAL unit
1947	         streams that conform to the signaled level, with the exception
1948	         that the MaxFS value in Table A-1 of [1] for the signaled
1949	         level is replaced with the value of max-fs.  The value of max-
1950	         fs MUST be greater than or equal to the value of MaxFS for the
1951	         level given in Table A-1 of [1].  Senders MAY use this
1952	         knowledge to send larger pictures at a proportionally lower
1953	         frame rate than is indicated in the signaled level.

1955	      max-cpb: The value of max-cpb is an integer indicating the
1956	         maximum coded picture buffer size in units of 1000 bits for
1957	         the VCL HRD parameters (see A.3.1 item i of [1]) and in units
1958	         of 1200 bits for the NAL HRD parameters (see A.3.1 item j of
1959	         [1]).  The max-cpb parameter signals that the receiver has
1960	         more memory than the minimum amount of coded picture buffer
1961	         memory required by the signaled level conveyed in the value of
1962	         the profile-level-id parameter.  When max-cpb is signaled, the
1963	         receiver MUST be able to decode NAL unit streams that conform
1964	         to the signaled level, with the exception that the MaxCPB
1965	         value in Table A-1 of [1] for the signaled level is replaced
1966	         with the value of max-cpb.  The value of max-cpb MUST be
1967	         greater than or equal to the value of MaxCPB for the level
1968	         given in Table A-1 of [1].  Senders MAY use this knowledge to
1969	         construct coded video streams with greater variation of bit
1970	         rate than can be achieved with the MaxCPB value in Table A-1
1971	         of [1].

1973	            Informative note: The coded picture buffer is used in the
1974	            hypothetical reference decoder (Annex C) of H.264.  The use
1975	            of the hypothetical reference decoder is recommended in
1976	            H.264 encoders to verify that the produced bitstream
1977	            conforms to the standard and to control the output bitrate.
1978	            Thus, the coded picture buffer is conceptually independent
1979	            of any other potential buffers in the receiver, including
1980	            de-interleaving and de-jitter buffers.  The coded picture
1981	            buffer need not be implemented in decoders as specified in
1982	            Annex C of H.264, but rather standard-compliant decoders
1983	            can have any buffering arrangements provided that they can
1984	            decode standard-compliant bitstreams.  Thus, in practice,
1985	            the input buffer for video decoder can be integrated with
1986	            de-interleaving and de-jitter buffers of the receiver.

1988	      max-dpb: The value of max-dpb is an integer indicating the
1989	         maximum decoded picture buffer size in units of 1024 bytes.
1990	         The max-dpb parameter signals that the receiver has more
1991	         memory than the minimum amount of decoded picture buffer
1992	         memory required by the signaled level conveyed in the value of
1993	         the profile-level-id parameter.  When max-dpb is signaled, the
1994	         receiver MUST be able to decode NAL unit streams that conform
1995	         to the signaled level, with the exception that the MaxDPB
1996	         value in Table A-1 of [1] for the signaled level is replaced
1997	         with the value of max-dpb.  Consequently, a receiver that
1998	         signals max-dpb MUST be capable of storing the following
1999	         number of decoded frames, complementary field pairs, and non-
2000	         paired fields in its decoded picture buffer:

2002	            Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs *
2003	            256 * ChromaFormatFactor ), 16)

2005	         PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
2006	         defined in [1].

2008	         The value of max-dpb MUST be greater than or equal to the
2009	         value of MaxDPB for the level given in Table A-1 of [1].
2010	         Senders MAY use this knowledge to construct coded video
2011	         streams with improved compression.

2013	            Informative note: This parameter was added primarily to
2014	            complement a similar codepoint in the ITU-T Recommendation
2015	            H.245, so as to facilitate signaling gateway designs.  The
2016	            decoded picture buffer stores reconstructed samples.  There
2017	            is no relationship between the size of the decoded picture
2018	            buffer and the buffers used in RTP, especially de-
2019	            interleaving and de-jitter buffers.

2021	      max-br: The value of max-br is an integer indicating the maximum
2022	         video bit rate in units of 1000 bits per second for the VCL
2023	         HRD parameters (see A.3.1 item i of [1]) and in units of 1200
2024	         bits per second for the NAL HRD parameters (see A.3.1 item j
2025	         of [1]).

2027	         The max-br parameter signals that the video decoder of the
2028	         receiver is capable of decoding video at a higher bit rate
2029	         than is required by the signaled level conveyed in the value
2030	         of the profile-level-id parameter.

2032	         When max-br is signaled, the video codec of the receiver MUST
2033	         be able to decode NAL unit streams that conform to the
2034	         signaled level, conveyed in the profile-level-id parameter,
2035	         with the following exceptions in the limits specified by the
2036	         level:

2038	         o The value of max-br replaces the MaxBR value of the signaled
2039	            level (in Table A-1 of [1]).

2041	         o When the max-cpb parameter is not present, the result of the
2042	            following formula replaces the value of MaxCPB in Table A-1
2043	            of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
2044	            the signaled level).

2046	         For example, if a receiver signals capability for Level 1.2
2047	         with max-br equal to 1550, this indicates a maximum video
2048	         bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum
2049	         video bitrate of 1860 kbits/sec for NAL HRD parameters, and a
2050	         CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000).

2052	         The value of max-br MUST be greater than or equal to the value
2053	         MaxBR for the signaled level given in Table A-1 of [1].

2055	         Senders MAY use this knowledge to send higher bitrate video as
2056	         allowed in the level definition of Annex A of H.264, to
2057	         achieve improved video quality.

2059	            Informative note: This parameter was added primarily to
2060	            complement a similar codepoint in the ITU-T Recommendation
2061	            H.245, so as to facilitate signaling gateway designs.  No
2062	            assumption can be made from the value of this parameter
2063	            that the network is capable of handling such bit rates at
2064	            any given time.  In particular, no conclusion can be drawn
2065	            that the signaled bit rate is possible under congestion
2066	            control constraints.

2068	      redundant-pic-cap:
2069	         This parameter signals the capabilities of a receiver
2070	         implementation.  When equal to 0, the parameter indicates that
2071	         the receiver makes no attempt to use redundant coded pictures
2072	         to correct incorrectly decoded primary coded pictures.  When
2073	         equal to 0, the receiver is not capable of using redundant
2074	         slices; therefore, a sender SHOULD avoid sending redundant
2075	         slices to save bandwidth.  When equal to 1, the receiver is
2076	         capable of decoding any such redundant slice that covers a
2077	         corrupted area in a primary decoded picture (at least partly),
2078	         and therefore a sender MAY send redundant slices.  When the
2079	         parameter is not present, then a value of 0 MUST be used for
2080	         redundant-pic-cap.  When present, the value of redundant-pic-
2081	         cap MUST be either 0 or 1.

2083	         When the profile-level-id parameter is present in the same
2084	         signaling as the redundant-pic-cap parameter, and the profile
2085	         indicated in profile-level-id is such that it disallows the
2086	         use of redundant coded pictures (e.g., Main Profile), the
2087	         value of redundant-pic-cap MUST be equal to 0.  When a
2088	         receiver indicates redundant-pic-cap equal to 0, the received
2089	         stream SHOULD NOT contain redundant coded pictures.

2091	            Informative note: Even if redundant-pic-cap is equal to 0,
2092	            the decoder is able to ignore redundant codec pictures
2093	            provided that the decoder supports such a profile (Baseline,
2094	            Extended) in which redundant coded pictures are allowed.

2096	            Informative note: Even if redundant-pic-cap is equal to 1,
2097	            the receiver may also choose other error concealment
2098	            strategies to replace or complement decoding of redundant
2099	            slices.

2101	      sprop-parameter-sets:
2102	         This parameter MAY be used to convey any sequence and picture
2103	         parameter set NAL units (herein referred to as the initial
2104	         parameter set NAL units) that can be placed in the NAL unit
2105	         stream to precede any other NAL units in decoding order.  The
2106	         parameter MUST NOT be used to indicate codec capability in any
2107	         capability exchange procedure.  The value of the parameter is
2108	         a comma (',') separated list of base64 [7] representations of
2109	         parameter set NAL units as specified in sections 7.3.2.1 and
2110	         7.3.2.2 of [1].  Note that the number of bytes in a parameter
2111	         set NAL unit is typically less than 10, but a picture
2112	         parameter set NAL unit can contain several hundreds of bytes.

2114	            Informative note: When several payload types are offered in
2115	            the SDP Offer/Answer model, each with its own sprop-
2116	            parameter-sets parameter, then the receiver cannot assume
2117	            that those parameter sets do not use conflicting storage
2118	            locations (i.e., identical values of parameter set
2119	            identifiers).  Therefore, a receiver should buffer all
2120	            sprop-parameter-sets and make them available to the decoder
2121	            instance that decodes a certain payload type.

2123	         The "sprop-parameter-sets" parameter MUST only contain
2124	         parameter sets that are conforming to the profile-level-id,
2125	         i.e., the subset of coding tools indicated by any of the
2126	         parameter sets MUST be equal to the default sub-profile, and
2127	         the level indicated by any of the parameter sets MUST be equal
2128	         to the default level.

2130	      sprop-level-parameter-sets:
2131	         This parameter MAY be used to convey any sequence and picture
2132	         parameter set NAL units (herein referred to as the initial
2133	         parameter set NAL units) that can be placed in the NAL unit
2134	         stream to precede any other NAL units in decoding order and
2135	         that are associated with one or more levels lower than the
2136	         default level.  The parameter MUST NOT be used to indicate
2137	         codec capability in any capability exchange procedure.

2139	         The sprop-level-parameter-sets parameter contains parameter
2140	         sets for one or more levels which are lower than the default
2141	         level.  All parameter sets associated with one level are
2142	         clustered and prefixed with a three-byte field which has the
2143	         same syntax as profile-level-id.  This enables the receiver to
2144	         install the parameter sets for one level and discard the rest.
2145	         The three-byte field is named PLId, and all parameter sets
2146	         associated with one level are named PSL, which has the same
2147	         syntax as sprop-parameter-sets.  Parameter sets for each level
2148	         are represented in the form of PLId:PSL, i.e., PLId followed
2149	         by a colon (':') and the base64 [7] representation of the
2150	         initial parameter set NAL units for the level.  Each pair of
2151	         PLId:PSL is also separated by a colon.  Note that a PSL can
2152	         contain multiple parameter sets for that level, separated with
2153	         commas (',').

2155	         The subset of coding tools indicated by each PLId field MUST
2156	         be equal to the default sub-profile, and the level indicated
2157	         by each PLId field MUST be lower than the default level.  All
2158	         sequence parameter sets contained in each PSL MUST have the
2159	         three bytes from profile_idc to level_idc, inclusive, equal to
2160	         the preceding PLId.

2162	            Informative note: This parameter allows for efficient level
2163	            downgrade in SDP Offer/Answer and out-of-band transport of
2164	            parameter sets, simultaneously.

2166	      use-level-src-parameter-sets:
2167	         This parameter MAY be used to indicate a receiver capability.
2168	         The value MAY be equal to either 0 or 1.  When the parameter
2169	         is not present, the value MUST be inferred to be equal to 0.
2170	         The value 0 indicates that the receiver does not understand
2171	         the sprop-level-parameter-sets parameter, and does not
2172	         understand the "fmtp" source attribute as specified in section
2173	         6.3 of [9], and will ignore sprop-level-parameter-sets when
2174	         present, and will ignore sprop-parameter-sets when conveyed
2175	         using the "fmtp" source attribute.  The value 1 indicates that
2176	         the receiver understands the sprop-level-parameter-sets
2177	         parameter, and understands the "fmtp" source attribute as
2178	         specified in section 6.3 of [9], and is capable of using
2179	         parameter sets contained in the sprop-level-parameter-sets or
2180	         contained in the sprop-parameter-sets that is conveyed using
2181	         the "fmtp" source attribute.

2183	            Informative note: An RFC 3984 receiver does not understand
2184	            sprop-level-parameter-sets, use-level-src-parameter-sets,
2185	            or the "fmtp" source attribute as specified in section 6.3
2186	            of [9].  Therefore, during SDP Offer/Answer, an RFC 3984
2187	            receiver as the answerer will simply ignore sprop-level-
2188	            parameter-sets, when present in an offer, and sprop-
2189	            parameter-sets, when conveyed using the "fmtp" source
2190	            attribute as specified in section 6.3 of [9].  Assume that
2191	            the offered payload type was accepted at a level lower than
2192	            the default level.  If the offered payload type included
2193	            sprop-level-parameter-sets or included sprop-parameter-sets
2194	            conveyed using the "fmtp" source attribute, and the offerer
2195	            sees that the answerer has not included use-level-src-
2196	            parameter-sets equal to 1 in the answer, the offerer gets
2197	            to know that in-band transport of parameter sets is needed.

2199	      in-band-parameter-sets:
2200	         This parameter MAY be used to indicate a receiver capability.
2201	         The value MAY be equal to either 0 or 1.  The value 1
2202	         indicates that receiver discards out-of-band parameter sets in
2203	         sprop-parameter-sets and sprop-level-parameter-sets, therefore
2204	         the sender MUST transmit all parameter sets in-band.  The
2205	         value 0 indicates that the receiver utilizes out-of-band
2206	         parameter sets included in sprop-parameter-sets and sprop-
2207	         level-parameter-sets.  However, in this case, the sender MAY
2208	         still choose to send parameter sets in-band.  When in-band-
2209	         parameter-sets is equal to 1, use-level-src-parameter-sets
2210	         MUST NOT be present or MUST be equal to 0.  When the parameter
2211	         is not present, this receiver capability is not specified, and
2212	         therefore the sender MAY send out-of-band parameter sets only,
2213	         or it MAY send in-band-parameter-sets only, or it MAY send
2214	         both.

2216	      packetization-mode:
2217	         This parameter signals the properties of an RTP payload type
2218	         or the capabilities of a receiver implementation.  Only a
2219	         single configuration point can be indicated; thus, when
2220	         capabilities to support more than one packetization-mode are
2221	         declared, multiple configuration points (RTP payload types)
2222	         must be used.

2224	         When the value of packetization-mode is equal to 0 or
2225	         packetization-mode is not present, the single NAL mode, as
2226	         defined in section 6.2 of RFC 3984, MUST be used.  This mode
2227	         is in use in standards using ITU-T Recommendation H.241 [3]
2228	         (see section 12.1).  When the value of packetization-mode is
2229	         equal to 1, the non-interleaved mode, as defined in section
2230	         6.3 of RFC 3984, MUST be used.  When the value of
2231	         packetization-mode is equal to 2, the interleaved mode, as
2232	         defined in section 6.4 of RFC 3984, MUST be used.  The value
2233	         of packetization-mode MUST be an integer in the range of 0 to
2234	         2, inclusive.

2236	      sprop-interleaving-depth:
2237	         This parameter MUST NOT be present when packetization-mode is
2238	         not present or the value of packetization-mode is equal to 0
2239	         or 1.  This parameter MUST be present when the value of
2240	         packetization-mode is equal to 2.

2242	         This parameter signals the properties of an RTP packet stream.
2243	         It specifies the maximum number of VCL NAL units that precede
2244	         any VCL NAL unit in the RTP packet stream in transmission
2245	         order and follow the VCL NAL unit in decoding order.
2246	         Consequently, it is guaranteed that receivers can reconstruct
2247	         NAL unit decoding order when the buffer size for NAL unit
2248	         decoding order recovery is at least the value of sprop-
2249	         interleaving-depth + 1 in terms of VCL NAL units.

2251	         The value of sprop-interleaving-depth MUST be an integer in
2252	         the range of 0 to 32767, inclusive.

2254	      sprop-deint-buf-req:
2255	         This parameter MUST NOT be present when packetization-mode is
2256	         not present or the value of packetization-mode is equal to 0
2257	         or 1.  It MUST be present when the value of packetization-mode
2258	         is equal to 2.

2260	         sprop-deint-buf-req signals the required size of the de-
2261	         interleaving buffer for the RTP packet stream.  The value of
2262	         the parameter MUST be greater than or equal to the maximum
2263	         buffer occupancy (in units of bytes) required in such a de-
2264	         interleaving buffer that is specified in section 7.2 of RFC
2265	         3984.  It is guaranteed that receivers can perform the de-
2266	         interleaving of interleaved NAL units into NAL unit decoding
2267	         order, when the de-interleaving buffer size is at least the
2268	         value of sprop-deint-buf-req in terms of bytes.

2270	         The value of sprop-deint-buf-req MUST be an integer in the
2271	         range of 0 to 4294967295, inclusive.

2273	            Informative note: sprop-deint-buf-req indicates the
2274	            required size of the de-interleaving buffer only.  When
2275	            network jitter can occur, an appropriately sized jitter
2276	            buffer has to be provisioned for as well.

2278	      deint-buf-cap:
2279	         This parameter signals the capabilities of a receiver
2280	         implementation and indicates the amount of de-interleaving
2281	         buffer space in units of bytes that the receiver has available
2282	         for reconstructing the NAL unit decoding order.  A receiver is
2283	         able to handle any stream for which the value of the sprop-
2284	         deint-buf-req parameter is smaller than or equal to this
2285	         parameter.

2287	         If the parameter is not present, then a value of 0 MUST be
2288	         used for deint-buf-cap.  The value of deint-buf-cap MUST be an
2289	         integer in the range of 0 to 4294967295, inclusive.

2291	            Informative note: deint-buf-cap indicates the maximum
2292	            possible size of the de-interleaving buffer of the receiver
2293	            only.  When network jitter can occur, an appropriately
2294	            sized jitter buffer has to be provisioned for as well.

2296	      sprop-init-buf-time:
2297	         This parameter MAY be used to signal the properties of an RTP
2298	         packet stream.  The parameter MUST NOT be present, if the
2299	         value of packetization-mode is equal to 0 or 1.

2301	         The parameter signals the initial buffering time that a
2302	         receiver MUST wait before starting decoding to recover the NAL
2303	         unit decoding order from the transmission order.  The
2304	         parameter is the maximum value of (decoding time of the NAL
2305	         unit - transmission time of a NAL unit), assuming reliable and
2306	         instantaneous transmission, the same timeline for transmission
2307	         and decoding, and that decoding starts when the first packet
2308	         arrives.

2310	         An example of specifying the value of sprop-init-buf-time
2311	         follows.  A NAL unit stream is sent in the following
2312	         interleaved order, in which the value corresponds to the
2313	         decoding time and the transmission order is from left to right:

2315	            0  2  1  3  5  4  6  8  7 ...

2317	         Assuming a steady transmission rate of NAL units, the
2318	         transmission times are:

2320	            0  1  2  3  4  5  6  7  8 ...

2322	         Subtracting the decoding time from the transmission time
2323	         column-wise results in the following series:

2325	            0 -1  1  0 -1  1  0 -1  1 ...

2327	         Thus, in terms of intervals of NAL unit transmission times,
2328	         the value of sprop-init-buf-time in this example is 1.  The
2329	         parameter is coded as a non-negative base10 integer
2330	         representation in clock ticks of a 90-kHz clock.  If the
2331	         parameter is not present, then no initial buffering time value
2332	         is defined.  Otherwise the value of sprop-init-buf-time MUST
2333	         be an integer in the range of 0 to 4294967295, inclusive.

2335	         In addition to the signaled sprop-init-buf-time, receivers
2336	         SHOULD take into account the transmission delay jitter
2337	         buffering, including buffering for the delay jitter caused by
2338	         mixers, translators, gateways, proxies, traffic-shapers, and
2339	         other network elements.

2341	      sprop-max-don-diff:
2342	         This parameter MAY be used to signal the properties of an RTP
2343	         packet stream.  It MUST NOT be used to signal transmitter or
2344	         receiver or codec capabilities.  The parameter MUST NOT be
2345	         present if the value of packetization-mode is equal to 0 or 1.
2346	         sprop-max-don-diff is an integer in the range of 0 to 32767,
2347	         inclusive.  If sprop-max-don-diff is not present, the value of
2348	         the parameter is unspecified.  sprop-max-don-diff is
2349	         calculated as follows:

2351	            sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
2352	            for any i and any j>i,

2354	         where i and j indicate the index of the NAL unit in the
2355	         transmission order and AbsDON denotes a decoding order number
2356	         of the NAL unit that does not wrap around to 0 after 65535.
2357	         In other words, AbsDON is calculated as follows: Let m and n
2358	         be consecutive NAL units in transmission order.  For the very
2359	         first NAL unit in transmission order (whose index is 0),
2360	         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
2361	         as follows:

2363	            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

2365	            If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
2366	              AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

2368	            If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
2369	              AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

2371	            If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
2372	              AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

2374	            If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
2375	              AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

2377	         where DON(i) is the decoding order number of the NAL unit
2378	         having index i in the transmission order.  The decoding order
2379	         number is specified in section 5.5 of RFC 3984.

2381	            Informative note: Receivers may use sprop-max-don-diff to
2382	            trigger which NAL units in the receiver buffer can be
2383	            passed to the decoder.

2385	      max-rcmd-nalu-size:
2386	         This parameter MAY be used to signal the capabilities of a
2387	         receiver.  The parameter MUST NOT be used for any other
2388	         purposes.  The value of the parameter indicates the largest
2389	         NALU size in bytes that the receiver can handle efficiently.
2390	         The parameter value is a recommendation, not a strict upper
2391	         boundary.  The sender MAY create larger NALUs but must be
2392	         aware that the handling of these may come at a higher cost
2393	         than NALUs conforming to the limitation.

2395	         The value of max-rcmd-nalu-size MUST be an integer in the
2396	         range of 0 to 4294967295, inclusive.  If this parameter is not
2397	         specified, no known limitation to the NALU size exists.
2398	         Senders still have to consider the MTU size available between
2399	         the sender and the receiver and SHOULD run MTU discovery for
2400	         this purpose.

2402	         This parameter is motivated by, for example, an IP to H.223
2403	         video telephony gateway, where NALUs smaller than the H.223
2404	         transport data unit will be more efficient.  A gateway may
2405	         terminate IP; thus, MTU discovery will normally not work
2406	         beyond the gateway.

2408	            Informative note: Setting this parameter to a lower than
2409	            necessary value may have a negative impact.

2411	      sar-understood:
2412	         This parameter MAY be used to indicate a receiver capability
2413	         and not anything else.  The parameter indicates the maximum
2414	         value of aspect_ratio_idc (specified in [1]) smaller than 255
2415	         that the receiver understands.  Table E-1 of [1] specifies
2416	         aspect_ratio_idc equal to 0 as "unspecified", 1 to 16,
2417	         inclusive, as specific Sample Aspect Ratios (SARs), 17 to 254,
2418	         inclusive, as "reserved", and 255 as the Extended SAR, for
2419	         which SAR width and SAR height are explicitly signaled.
2420	         Therefore, a receiver with a decoder according to [1]
2421	         understands aspect_ratio_idc in the range of 1 to 16,
2422	         inclusive and aspect_ratio_idc equal to 255, in the sense that
2423	         the receiver knows what exactly the SAR is.  For such a
2424	         receiver, the value of sar-understood is 16.  If in the future
2425	         Table E-1 of [1] is extended, e.g., such that the SAR for
2426	         aspect_ratio_idc equal to 17 is specified, then for a receiver
2427	         with a decoder that understands the extension, the value of
2428	         sar-understood is 17.  For a receiver with a decoder according
2429	         to the 2003 version of [1], the value of sar-understood is 13,
2430	         as the minimum reserved aspect_ratio_idc therein is 14.

2432	         When sar-understood is not present, the value MUST be inferred
2433	         to be equal to 13.

2435	      sar-supported:
2436	         This parameter MAY be used to indicate a receiver capability
2437	         and not anything else.  The value of this parameter is an
2438	         integer in the range of 1 to sar-understood, inclusive, equal
2439	         to 255.  The value of sar-supported equal to N smaller than
2440	         255 indicates that the reciever supports all the SARs
2441	         corresponding to H.264 aspect_ratio_idc values (see Table E-1
2442	         of [1]) in the range from 1 to N, inclusive, without geometric
2443	         distortion.  The value of sar-supported equal to 255 indicates
2444	         that the receiver supports all sample aspect ratios which are
2445	         expressible using two 16-bit integer values as the numerator
2446	         and denominator, i.e., those that are expressible using the
2447	         H.264 aspect_ratio_idc value of 255 (Extended_SAR, see Table
2448	         E-1 of [1]), without geometric distortion.

2450	         H.264 compliant encoders SHOULD NOT send an aspect_ratio_idc
2451	         equal to 0, or an aspect_ratio_idc larger than sar-understood
2452	         and smaller than 255.  H.264 compliant encoders SHOULD send an
2453	         aspect_ratio_idc that the receiver is able to display without
2454	         geometrical distortion.  However, H.264 compliant encoders MAY
2455	         choose to send pictures using any SAR.

2457	         Note that the actual sample aspect ratio or extended sample
2458	         aspect ratio, when present, of the stream is conveyed in the
2459	         Video Usability Information (VUI) part of the sequence
2460	         parameter set.

2462	      Encoding considerations:
2463	         This type is only defined for transfer via RTP (RFC 3550).

2465	      Security considerations:
2466	         See section 9 of RFC xxxx.

2468	      Public specification:
2469	         Please refer to RFC xxxx and its section 15.

2471	      Additional information:
2472	         None

2474	      File extensions:     none

2476	      Macintosh file type code: none

2478	      Object identifier or OID: none
2479	      Person & email address to contact for further information:
2480	         Ye-Kui Wang, yekuiwang@huawei.com

2482	      Intended usage:      COMMON

2484	      Author:
2485	         Ye-Kui Wang, yekuiwang@huawei.com

2487	      Change controller:
2488	         IETF Audio/Video Transport working group delegated from the
2489	         IESG.

2491	8.2. SDP Parameters

2493	8.2.1. Mapping of Payload Type Parameters to SDP

2495	   The media type video/H264 string is mapped to fields in the Session
2496	   Description Protocol (SDP) [6] as follows:

2498	   o  The media name in the "m=" line of SDP MUST be video.

2500	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
2501	      media subtype).

2503	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2505	   o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-
2506	      smbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic-
2507	      cap", "use-level-src-parameter-sets", "in-band-parameter-sets",
2508	      "packetization-mode", "sprop-interleaving-depth", "sprop-deint-
2509	      buf-req", "deint-buf-cap", "sprop-init-buf-time", "sprop-max-don-
2510	      diff", "max-rcmd-nalu-size", "sar-understood", and "sar-supported",
2511	      when present, MUST be included in the "a=fmtp" line of SDP.  These
2512	      parameters are expressed as a media type string, in the form of a
2513	      semicolon separated list of parameter=value pairs.

2515	   o  The OPTIONAL parameters "sprop-parameter-sets" and "sprop-level-
2516	      parameter-sets", when present, MUST be included in the "a=fmtp"
2517	      line of SDP or conveyed using the "fmtp" source attribute as
2518	      specified in section 6.3 of [9].  For a particular media format
2519	      (i.e., RTP payload type), a "sprop-parameter-sets" or "sprop-
2520	      level-parameter-sets" MUST NOT be both included in the "a=fmtp"
2521	      line of SDP and conveyed using the "fmtp" source attribute.  When
2522	      included in the "a=fmtp" line of SDP, these parameters are
2523	      expressed as a media type string, in the form of a semicolon
2524	      separated list of parameter=value pairs.  When conveyed using the
2525	      "fmtp" source attribute, these parameters are only associated with
2526	      the given source and payload type as parts of the "fmtp" source
2527	      attribute.

2529	         Informative note: Conveyance of "sprop-parameter-sets" and
2530	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2531	         allows for out-of-band transport of parameter sets in
2532	         topologies like Topo-Video-switch-MCU [29].

2534	   An example of media representation in SDP is as follows (Baseline
2535	   Profile, Level 3.0, some of the constraints of the Main profile may
2536	   not be obeyed):

2538	      m=video 49170 RTP/AVP 98
2539	      a=rtpmap:98 H264/90000
2540	      a=fmtp:98 profile-level-id=42A01E;
2541	                packetization-mode=1;
2542	                sprop-parameter-sets=<parameter sets data>

2544	8.2.2. Usage with the SDP Offer/Answer Model

2546	   When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
2547	   for negotiation for unicast usage, the following limitations and
2548	   rules apply:

2550	   o  The parameters identifying a media format configuration for H.264
2551	      are "profile-level-id" and "packetization-mode", when present.
2552	      These media format configuration parameters (except for the level
2553	      part of "profile-level-id") MUST be used symmetrically; i.e., the
2554	      answerer MUST either maintain all configuration parameters or
2555	      remove the media format (payload type) completely, if one or more
2556	      of the parameter values are not supported.  Note that the level
2557	      part of "profile-level-id" includes level_idc, and, for indication
2558	      of level 1b when profile_idc is equal to 66, 77 or 88, bit 4
2559	      (constraint_set3_flag) of profile-iop.  The level part of
2560	      "profile-level-id" is downgradable, i.e. the answerer MUST
2561	      maintain the same or a lower level or remove the media format
2562	      (payload type) completely.

2564	         Informative note: The requirement for symmetric use applies
2565	         only for the above media format configuration parameters
2566	         excluding the level part of "profile-level-id", and not for
2567	         the other stream properties and capability parameters.

2569	         Informative note: In H.264 [1], all the levels except for
2570	         level 1b are equal to the value of level_idc divided by 10.
2571	         Level 1b is a level higher than level 1.0 but lower than level
2572	         1.1, and is signaled in an ad-hoc manner, due to that the
2573	         level was specified after level 1.0 and level 1.1.  For the
2574	         Baseline, Main and Extended profiles (with profile_idc equal
2575	         to 66, 77 and 88, respectively), level 1b is indicated by
2576	         level_idc equal to 11 (i.e. same as level 1.1) and
2577	         constraint_set3_flag equal to 1.  For other profiles, level 1b
2578	         is indicated by level_idc equal to 9 (but note that level 1b
2579	         for these profiles are still higher than level 1, which has
2580	         level_idc equal to 10, and lower than level 1.1).  In SDP
2581	         Offer/Answer, an answer to an offer may indicate a level equal
2582	         to or lower than the level indicated in the offer.  Due to the
2583	         ad-hoc indication of level 1b, offerers and answerers must
2584	         check the value of bit 4 (constraint_set3_flag) of the middle
2585	         octet of the parameter "profile-level-id", when profile_idc is
2586	         equal to 66, 77 or 88 and level_idc is equal to 11.

2588	      To simplify handling and matching of these configurations, the
2589	      same RTP payload type number used in the offer SHOULD also be
2590	      used in the answer, as specified in [8].  An answer MUST NOT
2591	      contain a payload type number used in the offer unless the
2592	      configuration is exactly the same as in the offer or the
2593	      configuration in the answer only differs from that in the offer
2594	      with a level lower than the default level offered.

2596	         Informative note: When an offerer receives an answer, it has
2597	         to compare payload types not declared in the offer based on
2598	         the media type (i.e., video/H264) and the above media
2599	         configuration parameters with any payload types it has already
2600	         declared.  This will enable it to determine whether the
2601	         configuration in question is new or if it is equivalent to
2602	         configuration already offered, since a different payload type
2603	         number may be used in the answer.

2605	   o  The parameters "sprop-deint-buf-req", "sprop-interleaving-depth",
2606	      "sprop-max-don-diff", and "sprop-init-buf-time" describe the
2607	      properties of the RTP packet stream that the offerer or answerer
2608	      is sending for the media format configuration.  This differs from
2609	      the normal usage of the Offer/Answer parameters: normally such
2610	      parameters declare the properties of the stream that the offerer
2611	      or the answerer is able to receive.  When dealing with H.264, the
2612	      offerer assumes that the answerer will be able to receive media
2613	      encoded using the configuration being offered.

2615	         Informative note: The above parameters apply for any stream
2616	         sent by the declaring entity with the same configuration; i.e.,
2617	         they are dependent on their source.  Rather than being bound
2618	         to the payload type, the values may have to be applied to
2619	         another payload type when being sent, as they apply for the
2620	         configuration.

2622	   o  The capability parameters ("max-mbps", "max-smbps", "max-fs",
2623	      "max-cpb", "max-dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-
2624	      nalu-size", "sar-understood", "sar-supported") MAY be used to
2625	      declare further capabilities of the offerer or answerer for
2626	      receiving.  These parameters can only be present when the
2627	      direction attribute is sendrecv or recvonly, and the parameters
2628	      describe the limitations of what the offerer or answerer accepts
2629	      for receiving streams.

2631	   o  An offerer has to include the size of the de-interleaving buffer,
2632	      "sprop-deint-buf-req", in the offer for an interleaved H.264
2633	      stream.  To enable the offerer and answerer to inform each other
2634	      about their capabilities for de-interleaving buffering in
2635	      receiving streams, both parties are RECOMMENDED to include "deint-
2636	      buf-cap".  For interleaved streams, it is also RECOMMENDED to
2637	      consider offering multiple payload types with different buffering
2638	      requirements when the capabilities of the receiver are unknown.

2640	   o  The "sprop-parameter-sets" or "sprop-level-parameter-sets"
2641	      parameter, when present (included in the "a=fmtp" line of SDP or
2642	      conveyed using the "fmtp" source attribute as specified in section
2643	      6.3 of [9]), is used for out-of-band transport of parameter sets.
2644	      However, when out-of-band transport of parameter sets is used,
2645	      parameter sets MAY still be additionally transported in-band.  If
2646	      neither "sprop-parameter-sets" nor "sprop-level-parameter-sets" is
2647	      present, then only in-band transport of parameter sets is used.

2649	      An offer MAY include either or both of "sprop-parameter-sets" and
2650	      "sprop-level-parameter-sets".  An answer MAY include "sprop-
2651	      parameter-sets", and MUST NOT include "sprop-level-parameter-
2652	      sets".

2654	      If the answer includes "in-band-parameter-sets" equal to 1, then
2655	      the sender MUST transmit parameter sets in-band.

2657	      Otherwise, the following applies.

2659	        o When an offered payload type is accepted without level
2660	           downgrade, i.e. the default level is accepted, the following
2661	           applies.

2663	             o When there is a "sprop-parameter-sets" included in the
2664	                "a=fmtp" line of SDP, the answerer MUST be prepared to
2665	                use the parameter sets included in "sprop-parameter-
2666	                sets" for decoding the incoming NAL unit stream.

2668	             o When there is a "sprop-parameter-sets" conveyed using
2669	                the "fmtp" source attribute as specified in section 6.3
2670	                of [9], and the answerer understands the "fmtp" source
2671	                attribute, it MUST be prepared to use the parameter
2672	                sets included in "sprop-parameter-sets" for decoding
2673	                the incoming NAL unit stream, and it MUST include
2674	                either "use-level-src-parameter-sets" equal to 1 or the
2675	                "fmtp" source attribute in the answer.

2677	             o When there is a "sprop-parameter-sets" conveyed using
2678	                the "fmtp" source attribute as specified in section 6.3
2679	                of [9], and the answerer does not understand the "fmtp"
2680	                source attribute, the sender MUST transmit parameter
2681	                sets in-band, and the answerer MUST NOT include "use-
2682	                level-src-parameter-sets" equal to 1 or the "fmtp"
2683	                source attribute in the answer.

2685	             o When "sprop-parameter-sets" is not present, the sender
2686	                MUST transmit parameter sets in-band.

2688	             o The answerer MUST ignore "sprop-level-parameter-sets",
2689	                when present (either included in the "a=fmtp" line of
2690	                SDP or conveyed using the "fmtp" source attribute).

2692	        o When level downgrade is in use, i.e., a level lower than the
2693	           default level offered is accepted, the following applies.

2695	             o The answerer MUST ignore "sprop-parameter-sets", when
2696	                present (either included in the "a=fmtp" line of SDP or
2697	                conveyed using the "fmtp" source attribute).

2699	             o When "use-level-src-parameter-sets" equal to 1 and the
2700	                "fmtp" source attribute are not present in the answer
2701	                for the accepted payload type, the answerer MUST ignore
2702	                "sprop-level-parameter-sets", when present, and the
2703	                sender MUST transmit parameter sets in-band.

2705	             o When "use-level-src-parameter-sets" equal to 1 or the
2706	                "fmtp" source attribute is present in the answer for
2707	                the accepted payload type, the answerer MUST be
2708	                prepared to use the parameter sets that are included in
2709	                "sprop-level-parameter-sets" for the accepted level,
2710	                when present, for decoding the incoming NAL unit stream,
2711	                and ignore all other parameter sets included in "sprop-
2712	                level-parameter-sets".

2714	             o When no parameter sets for the accepted level are
2715	                present in the "sprop-level-parameter-sets", the sender
2716	                MUST transmit parameter sets in-band.

2718	      The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
2719	      the answerer MAY use either out-of-band or in-band transport of
2720	      parameter sets for the stream it is sending, regardless of
2721	      whether out-of-band parameter sets transport has been used in the
2722	      offerer-to-answerer direction.  When the offer includes "in-band-
2723	      parameter-sets" equal to 1, the answerer MUST not include "sprop-
2724	      parameter-sets" and MUST transmit parameter sets in-band.  All
2725	      parameter sets included in the "sprop-parameter-sets", when
2726	      present, for the accepted payload type in an answer MUST be
2727	      associated with the accepted level, as indicated by the profile-
2728	      level-id in the answer for the accepted payload type.

2730	      Parameter sets included in "sprop-parameter-sets" in an answer
2731	      are independent of those parameter sets included in the offer, as
2732	      they are used for decoding two different video streams, one from
2733	      the answerer to the offerer, and the other in the opposite
2734	      direction.  The offerer MUST be prepared to use the parameter
2735	      sets included in the answer's "sprop-parameter-sets", when
2736	      present, for decoding the incoming NAL unit stream.

2738	      When "sprop-parameter-sets" or "sprop-level-parameter-sets" is
2739	      conveyed using the "fmtp" source attribute in as specified in
2740	      section 6.3 of [9], the receiver of the parameters MUST store the
2741	      parameter sets included in the "sprop-parameter-sets" or "sprop-
2742	      level-parameter-sets" for the accepted level and associate them
2743	      to the source given as a part of the "fmtp" source attribute.
2744	      Parameter sets associated with one source MUST only be used to
2745	      decode NAL units conveyed in RTP packets from the same source.
2746	      When this mechanism is in use, SSRC collision detection and
2747	      resolution MUST be performed as specified in [9].

2749	         Informative note: Conveyance of "sprop-parameter-sets" and
2750	         "sprop-level-parameter-sets" using the "fmtp" source attribute
2751	         may be used in topologies like Topo-Video-switch-MCU [29] to
2752	         enable out-of-band transport of parameter sets.

2754	   For streams being delivered over multicast, the following rules apply:

2756	   o  The media format configuration is identified by the same
2757	      parameters as above for unicast (i.e. "profile-level-id" and
2758	      "packetization-mode", when present).  These media format
2759	      configuration parameters (including the level part of "profile-
2760	      level-id") MUST be used symmetrically; i.e., the answerer MUST
2761	      either maintain all configuration parameters or remove the media
2762	      format (payload type) completely.  Note that this implies that the
2763	      level part of "profile-level-id" for Offer/Answer in multicast is
2764	      not downgradable.

2766	      To simplify handling and matching of these configurations, the
2767	      same RTP payload type number used in the offer SHOULD also be
2768	      used in the answer, as specified in [8].  An answer MUST NOT
2769	      contain a payload type number used in the offer unless the
2770	      configuration is the same as in the offer.

2772	   o  Parameter sets received MUST be associated with the originating
2773	      source, and MUST be only used in decoding the incoming NAL unit
2774	      stream from the same source.

2776	   o  The rules for other parameters are the same as above for unicast.

2778	   Table 6 lists the interpretation of all the 20 media type parameters
2779	   that MUST be used for the different direction attributes.

2781	       Table 6. Interpretation of parameters for different direction
2782	                                attributes.

2784	                                              sendonly --+
2785	                                           recvonly --+  |
2786	                                        sendrecv --+  |  |
2787	                                                   |  |  |
2788	                profile-level-id                   C  C  P
2789	                packetization-mode                 C  C  P
2790	                sprop-deint-buf-req                P  -  P
2791	                sprop-interleaving-depth           P  -  P
2792	                sprop-max-don-diff                 P  -  P
2793	                sprop-init-buf-time                P  -  P
2794	                max-mbps                           R  R  -
2795	                max-smbps                          R  R  -
2796	                max-fs                             R  R  -
2797	                max-cpb                            R  R  -
2798	                max-dpb                            R  R  -
2799	                max-br                             R  R  -
2800	                redundant-pic-cap                  R  R  -
2801	                deint-buf-cap                      R  R  -
2802	                max-rcmd-nalu-size                 R  R  -
2803	                sar-understood                     R  R  -
2804	                sar-supported                      R  R  -
2805	                in-band-parameter-sets             R  R  -
2806	                use-level-src-parameter-sets       R  R  -
2807	                sprop-parameter-sets               S  -  S
2808	                sprop-level-parameter-sets         S  -  S

2810	             Legend:

2812	             C: configuration for sending and receiving streams
2813	             P: properties of the stream to be sent
2814	             R: receiver capabilities
2815	             S: out-of-band parameter sets
2816	             -: not usable, when present SHOULD be ignored

2818	   Parameters used for declaring receiver capabilities are in general
2819	   downgradable; i.e., they express the upper limit for a sender's
2820	   possible behavior.  Thus a sender MAY select to set its encoder using
2821	   only lower/less or equal values of these parameters.

2823	   Parameters declaring a configuration point are not downgradable, with
2824	   the exception of the level part of the "profile-level-id" parameter
2825	   for unicast usage.  This expresses values a receiver expects to be
2826	   used and must be used verbatim on the sender side.

2828	   When a sender's capabilities are declared, and non-downgradable
2829	   parameters are used in this declaration, then these parameters
2830	   express a configuration that is acceptable for the sender to receive
2831	   streams.  In order to achieve high interoperability levels, it is
2832	   often advisable to offer multiple alternative configurations; e.g.,
2833	   for the packetization mode.  It is impossible to offer multiple
2834	   configurations in a single payload type.  Thus, when multiple
2835	   configuration offers are made, each offer requires its own RTP
2836	   payload type associated with the offer.

2838	   A receiver SHOULD understand all media type parameters, even if it
2839	   only supports a subset of the payload format's functionality.  This
2840	   ensures that a receiver is capable of understanding when an offer to
2841	   receive media can be downgraded to what is supported by the receiver
2842	   of the offer.

2844	   An answerer MAY extend the offer with additional media format
2845	   configurations.  However, to enable their usage, in most cases a
2846	   second offer is required from the offerer to provide the stream
2847	   property parameters that the media sender will use.  This also has
2848	   the effect that the offerer has to be able to receive this media
2849	   format configuration, not only to send it.

2851	   If an offerer wishes to have non-symmetric capabilities between
2852	   sending and receiving, the offerer should offer different RTP
2853	   sessions; i.e., different media lines declared as "recvonly" and
2854	   "sendonly", respectively.  This may have further implications on the
2855	   system.

2857	8.2.3. Usage in Declarative Session Descriptions

2859	   When H.264 over RTP is offered with SDP in a declarative style, as in
2860	   RTSP [27] or SAP [28], the following considerations are necessary.

2862	   o  All parameters capable of indicating both stream properties and
2863	      receiver capabilities are used to indicate only stream properties.
2864	      For example, in this case, the parameter "profile-level-id"
2865	      declares only the values used by the stream, not the capabilities
2866	      for receiving streams.  This results in that the following
2867	      interpretation of the parameters MUST be used:

2869	      Declaring actual configuration or stream properties:

2871	         - profile-level-id
2872	         - packetization-mode
2873	         - sprop-interleaving-depth
2874	         - sprop-deint-buf-req
2875	         - sprop-max-don-diff
2876	         - sprop-init-buf-time

2878	      Out-of-band transporting of parameter sets:

2880	         - sprop-parameter-sets
2881	         - sprop-level-parameter-sets

2883	      Not usable(when present, they SHOULD be ignored):

2885	         - max-mbps
2886	         - max-smbps
2887	         - max-fs
2888	         - max-cpb
2889	         - max-dpb
2890	         - max-br
2891	         - redundant-pic-cap
2892	         - max-rcmd-nalu-size
2893	         - deint-buf-cap
2894	         - sar-understood
2895	         - sar-supported
2896	         - in-band-parameter-sets
2897	         - use-level-src-parameter-sets

2899	   o  A receiver of the SDP is required to support all parameters and
2900	      values of the parameters provided; otherwise, the receiver MUST
2901	      reject (RTSP) or not participate in (SAP) the session.  It falls
2902	      on the creator of the session to use values that are expected to
2903	      be supported by the receiving application.

2905	8.3. Examples

2907	   An SDP Offer/Answer exchange wherein both parties are expected to
2908	   both send and receive could look like the following.  Only the media
2909	   codec specific parts of the SDP are shown.  Some lines are wrapped
2910	   due to text constraints.

2912	      Offerer -> Answerer SDP message:

2914	      m=video 49170 RTP/AVP 100 99 98
2915	      a=rtpmap:98 H264/90000
2916	      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
2917	        sprop-parameter-sets=<parameter sets data#0>
2918	      a=rtpmap:99 H264/90000
2919	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2920	        sprop-parameter-sets=<parameter sets data#1>
2921	      a=rtpmap:100 H264/90000
2922	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2923	        sprop-parameter-sets=<parameter sets data#2>;
2924	        sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
2925	        sprop-init-buf-time=102478; deint-buf-cap=128000

2927	   The above offer presents the same codec configuration in three
2928	   different packetization formats.  PT 98 represents single NALU mode,
2929	   PT 99 represents non-interleaved mode, and PT 100 indicates the
2930	   interleaved mode.  In the interleaved mode case, the interleaving
2931	   parameters that the offerer would use if the answer indicates support
2932	   for PT 100 are also included.  In all three cases the parameter
2933	   "sprop-parameter-sets" conveys the initial parameter sets that are
2934	   required by the answerer when receiving a stream from the offerer
2935	   when this configuration is accepted.  Note that the value for "sprop-
2936	   parameter-sets" could be different for each payload type.

2938	      Answerer -> Offerer SDP message:

2940	      m=video 49170 RTP/AVP 100 99 97
2941	      a=rtpmap:97 H264/90000
2942	      a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
2943	        sprop-parameter-sets=<parameter sets data#3>
2944	      a=rtpmap:99 H264/90000
2945	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2946	        sprop-parameter-sets=<parameter sets data#4>;
2947	        max-rcmd-nalu-size=3980
2948	      a=rtpmap:100 H264/90000
2949	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2950	        sprop-parameter-sets=<parameter sets data#5>;
2951	        sprop-interleaving-depth=60;
2952	        sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
2953	        deint-buf-cap=128000; max-rcmd-nalu-size=3980

2955	   As the Offer/Answer negotiation covers both sending and receiving
2956	   streams, an offer indicates the exact parameters for what the offerer
2957	   is willing to receive, whereas the answer indicates the same for what
2958	   the answerer accepts to receive.  In this case the offerer declared
2959	   that it is willing to receive payload type 98.  The answerer accepts
2960	   this by declaring an equivalent payload type 97; i.e., it has
2961	   identical values for the two parameters "profile-level-id" and
2962	   "packetization-mode" (since "packetization-mode" is equal to 0,
2963	   "sprop-deint-buf-req" is not present).  As the offered payload type
2964	   98 is accepted, the answerer needs to store parameter sets included
2965	   in sprop-parameter-sets=<parameter sets data#0> in case the offer
2966	   finally decides to use this configuration. In the answer, the
2967	   answerer includes the parameter sets in sprop-parameter-
2968	   sets=<parameter sets data#3> that the answerer would use in the
2969	   stream sent from the answerer if this configuration is finally used.

2971	   The answerer also accepts the reception of the two configurations
2972	   that payload types 99 and 100 represent.  Again, the answerer needs
2973	   to store parameter sets included in sprop-parameter-sets=<parameter
2974	   sets data#1> and sprop-parameter-sets=<parameter sets data#2> in case
2975	   the offer finally decides to use either of these two configurations.
2976	   The answerer provides the initial parameter sets for the answerer-to-
2977	   offerer direction, i.e. the parameter sets in sprop-parameter-
2978	   sets=<parameter sets data#4> and sprop-parameter-sets=<parameter sets
2979	   data#5>, for payload types 99 and 100, respectively, that it will use
2980	   to send the payload types.  The answerer also provides the offerer
2981	   with its memory limit for de-interleaving operations by providing a
2982	   "deint-buf-cap" parameter.  This is only useful if the offerer
2983	   decides on making a second offer, where it can take the new value
2984	   into account.  The "max-rcmd-nalu-size" indicates that the answerer
2985	   can efficiently process NALUs up to the size of 3980 bytes.  However,
2986	   there is no guarantee that the network supports this size.

2988	   In the following example, the offer is accepted without level
2989	   downgrading (i.e. the default level, 3.0, is accepted), and both
2990	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
2991	   in the offer.  The answerer must ignore sprop-level-parameter-
2992	   sets=<parameter sets data#1> and store parameter sets in sprop-
2993	   parameter-sets=<parameter sets data#0> for decoding the incoming NAL
2994	   unit stream.  The offerer must store the parameter sets in sprop-
2995	   parameter-sets=<parameter sets data#2> in the answer for decoding the
2996	   incoming NAL unit stream.  Note that in this example, parameter sets
2997	   in sprop-parameter-sets=<parameter sets data#2> must be associated
2998	   with level 3.0.

3000	      Offer SDP:

3002	      m=video 49170 RTP/AVP 98
3003	      a=rtpmap:98 H264/90000
3004	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3005	        packetization-mode=1;
3006	        sprop-parameter-sets=<parameter sets data#0>;
3007	        sprop-level-parameter-sets=<parameter sets data#1>

3009	      Answer SDP:

3011	      m=video 49170 RTP/AVP 98
3012	      a=rtpmap:98 H264/90000
3013	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3014	        packetization-mode=1;
3015	        sprop-parameter-sets=<parameter sets data#2>

3017	   In the following example, the offer (Baseline profile, level 1.1) is
3018	   accepted with level downgrading (the accepted level is 1b), and both
3019	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3020	   in the offer.  The answerer must ignore sprop-parameter-
3021	   sets=<parameter sets data#0> and all parameter sets not for the
3022	   accepted level (level 1b) in sprop-level-parameter-sets=<parameter
3023	   sets data#1>, and must store parameter sets for the accepted level
3024	   (level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for
3025	   decoding the incoming NAL unit stream.  The offerer must store the
3026	   parameter sets in sprop-parameter-sets=<parameter sets data#2> in the
3027	   answer for decoding the incoming NAL unit stream.  Note that in this
3028	   example, parameter sets in sprop-parameter-sets=<parameter sets
3029	   data#2> must be associated with level 1b.

3031	      Offer SDP:

3033	      m=video 49170 RTP/AVP 98
3034	      a=rtpmap:98 H264/90000
3035	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3036	        packetization-mode=1;
3037	        sprop-parameter-sets=<parameter sets data#0>;
3038	        sprop-level-parameter-sets=<parameter sets data#1>

3040	      Answer SDP:

3042	      m=video 49170 RTP/AVP 98
3043	      a=rtpmap:98 H264/90000
3044	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3045	        packetization-mode=1;
3046	        sprop-parameter-sets=<parameter sets data#2>;
3047	        use-level-src-parameter-sets=1

3049	   In the following example, the offer (Baseline profile, level 1.1) is
3050	   accepted with level downgrading (the accepted level is 1b), and both
3051	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3052	   in the offer.  However, the answerer is a legacy RFC 3984
3053	   implementation and does not understand "sprop-level-parameter-sets",
3054	   hence it does not include "use-level-src-parameter-sets" (which the
3055	   answerer does not understand, either) in the answer.  Therefore, the
3056	   answerer must ignore both sprop-parameter-sets=<parameter sets
3057	   data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and
3058	   the offerer must transport parameter sets in-band.

3060	      Offer SDP:

3062	      m=video 49170 RTP/AVP 98
3063	      a=rtpmap:98 H264/90000
3064	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3065	        packetization-mode=1;
3066	        sprop-parameter-sets=<parameter sets data#0>;
3067	        sprop-level-parameter-sets=<parameter sets data#1>

3069	      Answer SDP:

3071	      m=video 49170 RTP/AVP 98
3072	      a=rtpmap:98 H264/90000
3073	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3074	        packetization-mode=1

3076	   In the following example, the offer is accepted without level
3077	   downgrading, and "sprop-parameter-sets" is present in the offer.
3078	   Parameter sets in sprop-parameter-sets=<parameter sets data#0> must
3079	   be stored and used used by the encoder of the offerer and the decoder
3080	   of the answerer, and parameter sets in sprop-parameter-
3081	   sets=<parameter sets data#1>must be used by the encoder of the
3082	   answerer and the decoder of the offerer.  Note that sprop-parameter-
3083	   sets=<parameter sets data#0> is basically independent of sprop-
3084	   parameter-sets=<parameter sets data#1>.

3086	      Offer SDP:

3088	      m=video 49170 RTP/AVP 98
3089	      a=rtpmap:98 H264/90000
3090	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3091	        packetization-mode=1;
3092	        sprop-parameter-sets=<parameter sets data#0>

3094	      Answer SDP:

3096	      m=video 49170 RTP/AVP 98
3097	      a=rtpmap:98 H264/90000
3098	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3099	        packetization-mode=1;
3100	        sprop-parameter-sets=<parameter sets data#1>

3102	   In the following example, the offer is accepted without level
3103	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3104	   parameter-sets" is present in the offer, meaning that there is no
3105	   out-of-band transmission of parameter sets, which then have to be
3106	   transported in-band.

3108	      Offer SDP:

3110	      m=video 49170 RTP/AVP 98
3111	      a=rtpmap:98 H264/90000
3112	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3113	        packetization-mode=1

3115	      Answer SDP:

3117	      m=video 49170 RTP/AVP 98
3118	      a=rtpmap:98 H264/90000
3119	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3120	        packetization-mode=1

3122	   In the following example, the offer is accepted with level
3123	   downgrading and "sprop-parameter-sets" is present in the offer.  As
3124	   sprop-parameter-sets=<parameter sets data#0> contains level_idc
3125	   indicating Level 3.0, therefore cannot be used as the answerer wants
3126	   Level 2.0 and must be ignored by the answerer, and in-band parameter
3127	   sets must be used.

3129	      Offer SDP:

3131	      m=video 49170 RTP/AVP 98
3132	      a=rtpmap:98 H264/90000
3133	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3134	        packetization-mode=1;
3135	        sprop-parameter-sets=<parameter sets data#0>

3137	      Answer SDP:

3139	      m=video 49170 RTP/AVP 98
3140	      a=rtpmap:98 H264/90000
3141	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3142	        packetization-mode=1

3144	   In the following example, the offer is also accepted with level
3145	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3146	   parameter-sets" is present in the offer, meaning that there is no
3147	   out-of-band transmission of parameter sets, which then have to be
3148	   transported in-band.

3150	      Offer SDP:

3152	      m=video 49170 RTP/AVP 98
3153	      a=rtpmap:98 H264/90000
3154	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3155	        packetization-mode=1

3157	      Answer SDP:

3159	      m=video 49170 RTP/AVP 98
3160	      a=rtpmap:98 H264/90000
3161	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3162	        packetization-mode=1

3164	   In the following example, the offerer is a Multipoint Control Unit
3165	   (MCU) in a Topo-Video-switch-MCU like topology [29], offering
3166	   parameter sets received (using out-of-band transport) from three
3167	   other participants B, C, and D, and receiving parameter sets from the
3168	   participant A, which is the answerer.  The participants are
3169	   identified by their values of CNAME, which are mapped to different
3170	   SSRC values.  The same codec configuration is used by all the four
3171	   participants.  The participant A stores and associates the parameter
3172	   sets included in <parameter sets data#B>, <parameter sets data#C>,
3173	   and <parameter sets data#D> to participants B, C, and D, respectively,
3174	   and uses <parameter sets data#B> for decoding NAL units carried in
3175	   RTP packets originated from participant B only, uses <parameter sets
3176	   data#C> for decoding NAL units carried in RTP packets originated from
3177	   participant C only, and uses <parameter sets data#D> for decoding NAL
3178	   units carried in RTP packets originated from participant D only.

3180	      Offer SDP:

3182	      m=video 49170 RTP/AVP 98
3183	      a=ssrc:SSRC-B cname:CNAME-B
3184	      a=ssrc:SSRC-C cname:CNAME-C
3185	      a=ssrc:SSRC-D cname:CNAME-D
3186	      a=ssrc:SSRC-B fmtp:98
3187	        sprop-parameter-sets=<parameter sets data#B>
3188	      a=ssrc:SSRC-C fmtp:98
3189	        sprop-parameter-sets=<parameter sets data#C>
3190	      a=ssrc:SSRC-D fmtp:98
3191	        sprop-parameter-sets=<parameter sets data#D>
3192	      a=rtpmap:98 H264/90000
3193	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3194	        packetization-mode=1

3196	      Answer SDP:

3198	      m=video 49170 RTP/AVP 98
3199	      a=ssrc:SSRC-A cname:CNAME-A
3200	      a=ssrc:SSRC-A fmtp:98
3201	        sprop-parameter-sets=<parameter sets data#A>
3202	      a=rtpmap:98 H264/90000
3203	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3204	        packetization-mode=1

3206	8.4. Parameter Set Considerations

3208	   The H.264 parameter sets are a fundamental part of the video codec
3209	   and vital to its operation; see section 1.2.  Due to their
3210	   characteristics and their importance for the decoding process, lost
3211	   or erroneously transmitted parameter sets can hardly be concealed
3212	   locally at the receiver.  A reference to a corrupt parameter set has
3213	   normally fatal results to the decoding process.  Corruption could
3214	   occur, for example, due to the erroneous transmission or loss of a
3215	   parameter set NAL unit, but also due to the untimely transmission of
3216	   a parameter set update.  A parameter set update refers to a change of
3217	   at least one parameter in a picture parameter set or sequence
3218	   parameter set for which the picture parameter set or sequence
3219	   parameter set identifier remains unchanged.  Therefore, the following
3220	   recommendations are provided as a guideline for the implementer of
3221	   the RTP sender.

3223	   Parameter set NALUs can be transported using three different
3224	   principles:

3226	   A. Using a session control protocol (out-of-band) prior to the actual
3227	     RTP session.

3229	   B. Using a session control protocol (out-of-band) during an ongoing
3230	     RTP session.

3232	   C. Within the RTP packet stream in the payload (in-band) during an
3233	     ongoing RTP session.

3235	   It is recommended to implement principles A and B within a session
3236	   control protocol.  SIP and SDP can be used as described in the SDP
3237	   Offer/Answer model and in the previous sections of this memo.
3238	   Section 8.2.2 includes a detailed discussion on transport of
3239	   parameter sets in-band or out-of-band in SDP Offer/Answer using media
3240	   type parameters "sprop-parameter-sets", "sprop-level-parameter-sets",
3241	   "use-level-src-parameter-sets" and "in-band-parameter-sets".  This
3242	   section contains guidelines on how principles A and B should be
3243	   implemented within session control protocols.  It is independent of
3244	   the particular protocol used.  Principle C is supported by the RTP
3245	   payload format defined in this specification.  There are topologies
3246	   like Topo-Video-switch-MCU [29] for which the use of principle C may
3247	   be desirable.

3249	   If in-band signaling of parameter sets is used, the picture and
3250	   sequence parameter set NALUs SHOULD be transmitted in the RTP payload
3251	   using a reliable method of delivering of RTP (see below), as a loss
3252	   of a parameter set of either type will likely prevent decoding of a
3253	   considerable portion of the corresponding RTP packet stream.

3255	   If in-band signaling of parameter sets is used, the sender SHOULD
3256	   take the error characteristics into account and use mechanisms to
3257	   provide a high probability for delivering the parameter sets
3258	   correctly.  Mechanisms that increase the probability for a correct
3259	   reception include packet repetition, FEC, and retransmission.  The
3260	   use of an unreliable, out-of-band control protocol has similar
3261	   disadvantages as the in-band signaling (possible loss) and, in
3262	   addition, may also lead to difficulties in the synchronization (see
3263	   below).  Therefore, it is NOT RECOMMENDED.

3265	   Parameter sets MAY be added or updated during the lifetime of a
3266	   session using principles B and C.  It is required that parameter sets
3267	   are present at the decoder prior to the NAL units that refer to them.
3268	   Updating or adding of parameter sets can result in further problems,
3269	   and therefore the following recommendations should be considered.

3271	   - When parameter sets are added or updated, care SHOULD be taken to
3272	     ensure that any parameter set is delivered prior to its usage.
3273	     When new parameter sets are added, previously unused parameter set
3274	     identifiers are used.  It is common that no synchronization is
3275	     present between out-of-band signaling and in-band traffic.  If
3276	     out-of-band signaling is used, it is RECOMMENDED that a sender
3277	     does not start sending NALUs requiring the added or updated
3278	     parameter sets prior to acknowledgement of delivery from the
3279	     signaling protocol.

3281	   - When parameter sets are updated, the following synchronization
3282	     issue should be taken into account.  When overwriting a parameter
3283	     set at the receiver, the sender has to ensure that the parameter
3284	     set in question is not needed by any NALU present in the network
3285	     or receiver buffers.  Otherwise, decoding with a wrong parameter
3286	     set may occur.  To lessen this problem, it is RECOMMENDED either
3287	     to overwrite only those parameter sets that have not been used for
3288	     a sufficiently long time (to ensure that all related NALUs have
3289	     been consumed), or to add a new parameter set instead (which may
3290	     have negative consequences for the efficiency of the video coding).

3292	         Informative note: In some topologies like Topo-Video-switch-
3293	         MCU [29] the origin of the whole set of parameter sets may
3294	         come from multiple sources that may use non-unique parameter
3295	         sets identifiers.  In this case an offer may overwrite an
3296	         existing parameter set if no other mechanism that enables
3297	         uniqueness of the parameter sets in the out-of-band channel
3298	         exists.

3300	   - In a multiparty session, one participant MUST associate parameter
3301	     sets coming from different sources with the source identification
3302	     whenever possible, e.g. by conveying out-of-band transported
3303	     parameter sets, as different sources typically use independent
3304	     parameter set identifier value spaces.

3306	   - Adding or modifying parameter sets by using both principles B and
3307	     C in the same RTP session may lead to inconsistencies of the
3308	     parameter sets because of the lack of synchronization between the
3309	     control and the RTP channel.  Therefore, principles B and C MUST
3310	     NOT both be used in the same session unless sufficient
3311	     synchronization can be provided.

3313	   In some scenarios (e.g., when only the subset of this payload format
3314	   specification corresponding to H.241 is used) or topologies, it is
3315	   not possible to employ out-of-band parameter set transmission.  In
3316	   this case, parameter sets have to be transmitted in-band.  Here, the
3317	   synchronization with the non-parameter-set-data in the bitstream is
3318	   implicit, but the possibility of a loss has to be taken into account.
3319	   The loss probability should be reduced using the mechanisms discussed
3320	   above.  In case a loss of a parameter set is detected, recovery may
3321	   be achieved by using a Decoder Refresh Point procedure, for example,
3322	   using RTCP feedback Full Intra Request (FIR) [30].  Two example
3323	   Decoder Refresh Point procedures are provided in the informative
3324	   Section 8.5.

3326	   - When parameter sets are initially provided using principle A and
3327	     then later added or updated in-band (principle C), there is a risk
3328	     associated with updating the parameter sets delivered out-of-band.
3329	     If receivers miss some in-band updates (for example, because of a
3330	     loss or a late tune-in), those receivers attempt to decode the
3331	     bitstream using out-dated parameters.  It is therefore RECOMMENDED
3332	     that parameter set IDs be partitioned between the out-of-band and
3333	     in-band parameter sets.

3335	8.5. Decoder Refresh Point Procedure using In-Band Transport of
3336	   Parameter Sets (Informative)

3338	   When a sender with a video encoder according to [1] receives a
3339	   request for a decoder refresh point, the encoder shall enter the fast
3340	   update mode by using one of the procedures specified in Section 8.5.1
3341	   or 8.5.2 below.  The procedure in 8.5.1 is the preferred response in
3342	   a lossless transmission environment.  Both procedures satisfy the
3343	   requirement to enter the fast update mode for H.264 video encoding.

3345	8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point

3347	   This section gives one possible way to respond to a request for a
3348	   decoder refresh point.

3350	   The encoder shall, in the order presented here:

3352	   1) Immediately prepare to send an IDR picture.

3354	   2) Send a sequence parameter set to be used by the IDR picture to be
3355	     sent. The encoder may optionally also send other sequence
3356	     parameter sets.

3358	   3) Send a picture parameter set to be used by the IDR picture to be
3359	     sent. The encoder may optionally also send other picture parameter
3360	     sets.

3362	   4) Send the IDR picture.

3364	   5) From this point forward in time, send any other sequence or
3365	     picture parameter sets that have not yet been sent in this
3366	     procedure, prior to their reference by any NAL unit, regardless of
3367	     whether such parameter sets were previously sent prior to
3368	     receiving the request for a decoder refresh point.  As needed,
3369	     such parameter sets may be sent in a batch, one at a time, or in
3370	     any combination of these two methods.  Parameter sets may be re-
3371	     sent at any time for redundancy.  Caution should be taken when
3372	     parameter set updates are present, as described above in Section
3373	     8.4.

3375	8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder
3376	   Refresh Point

3378	   This section gives another possible way to respond to a request for a
3379	   decoder refresh point.

3381	   The encoder shall, in the order presented here:

3383	   1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
3384	     [1]).

3386	   2) Repeat any sequence and picture parameter sets that were sent
3387	     before the recovery point SEI message, prior to their reference by
3388	     a NAL unit.

3390	   The encoder shall ensure that the decoder has access to all reference
3391	   pictures for inter prediction of pictures at or after the recovery
3392	   point, which is indicated by the recovery point SEI message, in
3393	   output order, assuming that the transmission from now on is error-
3394	   free.

3396	   The value of the recovery_frame_cnt syntax element in the recovery
3397	   point SEI message should be small enough to ensure a fast recovery.

3399	   As needed, such parameter sets may be re-sent in a batch, one at a
3400	   time, or in any combination of these two methods.  Parameter sets may
3401	   be re-sent at any time for redundancy.  Caution should be taken when
3402	   parameter set updates are present, as described above in Section 8.4.

3404	9. Security Considerations

3406	   RTP packets using the payload format defined in this specification
3407	   are subject to the security considerations discussed in the RTP
3408	   specification [5], and in any appropriate RTP profile (for example,
3409	   [16]).  This implies that confidentiality of the media streams is
3410	   achieved by encryption; for example, through the application of SRTP
3411	   [26].  Because the data compression used with this payload format is
3412	   applied end-to-end, any encryption needs to be performed after
3413	   compression.  A potential denial-of-service threat exists for data
3414	   encodings using compression techniques that have non-uniform
3415	   receiver-end computational load.  The attacker can inject
3416	   pathological datagrams into the stream that are complex to decode and
3417	   that cause the receiver to be overloaded.  H.264 is particularly
3418	   vulnerable to such attacks, as it is extremely simple to generate
3419	   datagrams containing NAL units that affect the decoding process of
3420	   many future NAL units.  Therefore, the usage of data origin
3421	   authentication and data integrity protection of at least the RTP
3422	   packet is RECOMMENDED; for example, with SRTP [26].

3424	   Note that the appropriate mechanism to ensure confidentiality and
3425	   integrity of RTP packets and their payloads is very dependent on the
3426	   application and on the transport and signaling protocols employed.
3427	   Thus, although SRTP is given as an example above, other possible
3428	   choices exist.

3430	   Decoders MUST exercise caution with respect to the handling of user
3431	   data SEI messages, particularly if they contain active elements, and
3432	   MUST restrict their domain of applicability to the presentation
3433	   containing the stream.

3435	   End-to-End security with either authentication, integrity or
3436	   confidentiality protection will prevent a MANE from performing media-
3437	   aware operations other than discarding complete packets.  And in the
3438	   case of confidentiality protection it will even be prevented from
3439	   performing discarding of packets in a media aware way.  To allow any
3440	   MANE to perform its operations, it will be required to be a trusted
3441	   entity which is included in the security context establishment.

3443	10. Congestion Control

3445	   Congestion control for RTP SHALL be used in accordance with RFC 3550
3446	   [5], and with any applicable RTP profile; e.g., RFC 3551 [16].  An
3447	   additional requirement if best-effort service is being used is: users
3448	   of this payload format MUST monitor packet loss to ensure that the
3449	   packet loss rate is within acceptable parameters.  Packet loss is
3450	   considered acceptable if a TCP flow across the same network path, and
3451	   experiencing the same network conditions, would achieve an average
3452	   throughput, measured on a reasonable timescale that is not less than
3453	   the RTP flow is achieving.  This condition can be satisfied by
3454	   implementing congestion control mechanisms to adapt the transmission
3455	   rate (or the number of layers subscribed for a layered multicast
3456	   session), or by arranging for a receiver to leave the session if the
3457	   loss rate is unacceptably high.

3459	   The bit rate adaptation necessary for obeying the congestion control
3460	   principle is easily achievable when real-time encoding is used.
3461	   However, when pre-encoded content is being transmitted, bandwidth
3462	   adaptation requires the availability of more than one coded
3463	   representation of the same content, at different bit rates, or the
3464	   existence of non-reference pictures or sub-sequences [22] in the
3465	   bitstream.  The switching between the different representations can
3466	   normally be performed in the same RTP session; e.g., by employing a
3467	   concept known as SI/SP slices of the Extended Profile, or by
3468	   switching streams at IDR picture boundaries.  Only when non-
3469	   downgradable parameters (such as the profile part of the
3470	   profile/level ID) are required to be changed does it become necessary
3471	   to terminate and re-start the media stream.  This may be accomplished
3472	   by using a different RTP payload type.

3474	   MANEs MAY follow the suggestions outlined in section 7.3 and remove
3475	   certain unusable packets from the packet stream when that stream was
3476	   damaged due to previous packet losses.  This can help reduce the
3477	   network load in certain special cases.

3479	11. IANA Consideration

3481	   The H264 media subtype name specified by RFC 3984 should be updated
3482	   as defined in section 8.1 of this memo.

3484	12. Informative Appendix: Application Examples

3486	   This payload specification is very flexible in its use, in order to
3487	   cover the extremely wide application space anticipated for H.264.
3488	   However, this great flexibility also makes it difficult for an
3489	   implementer to decide on a reasonable packetization scheme.  Some
3490	   information on how to apply this specification to real-world
3491	   scenarios is likely to appear in the form of academic publications
3492	   and a test model software and description in the near future.
3493	   However, some preliminary usage scenarios are described here as well.

3495	12.1. Video Telephony according to ITU-T Recommendation H.241 Annex A

3497	   H.323-based video telephony systems that use H.264 as an optional
3498	   video compression scheme are required to support H.241 Annex A [3] as
3499	   a packetization scheme.  The packetization mechanism defined in this
3500	   Annex is technically identical with a small subset of this
3501	   specification.

3503	   When a system operates according to H.241 Annex A, parameter set NAL
3504	   units are sent in-band.  Only Single NAL unit packets are used.  Many
3505	   such systems are not sending IDR pictures regularly, but only when
3506	   required by user interaction or by control protocol means; e.g., when
3507	   switching between video channels in a Multipoint Control Unit or for
3508	   error recovery requested by feedback.

3510	12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
3511	   Aggregation

3513	   The RTP part of this scheme is implemented and tested (though not the
3514	   control-protocol part; see below).

3516	   In most real-world video telephony applications, picture parameters
3517	   such as picture size or optional modes never change during the
3518	   lifetime of a connection.  Therefore, all necessary parameter sets
3519	   (usually only one) are sent as a side effect of the capability
3520	   exchange/announcement process, e.g., according to the SDP syntax
3521	   specified in section 8.2 of this document.  As all necessary
3522	   parameter set information is established before the RTP session
3523	   starts, there is no need for sending any parameter set NAL units.
3524	   Slice data partitioning is not used, either.  Thus, the RTP packet
3525	   stream basically consists of NAL units that carry single coded slices.

3527	   The encoder chooses the size of coded slice NAL units so that they
3528	   offer the best performance.  Often, this is done by adapting the
3529	   coded slice size to the MTU size of the IP network.  For small
3530	   picture sizes, this may result in a one-picture-per-one-packet
3531	   strategy.  Intra refresh algorithms clean up the loss of packets and
3532	   the resulting drift-related artifacts.

3534	12.3. Video Telephony, Interleaved Packetization Using NAL Unit
3535	   Aggregation

3537	   This scheme allows better error concealment and is used in H.263
3538	   based designs using RFC 2429 packetization [11].  It has been
3539	   implemented, and good results were reported [13].

3541	   The VCL encoder codes the source picture so that all macroblocks (MBs)
3542	   of one MB line are assigned to one slice.  All slices with even MB
3543	   row addresses are combined into one STAP, and all slices with odd MB
3544	   row addresses into another.  Those STAPs are transmitted as RTP
3545	   packets.  The establishment of the parameter sets is performed as
3546	   discussed above.

3548	   Note that the use of STAPs is essential here, as the high number of
3549	   individual slices (18 for a CIF picture) would lead to unacceptably
3550	   high IP/UDP/RTP header overhead (unless the source coding tool FMO is
3551	   used, which is not assumed in this scenario).  Furthermore, some
3552	   wireless video transmission systems, such as H.324M and the IP-based
3553	   video telephony specified in 3GPP, are likely to use relatively small
3554	   transport packet size.  For example, a typical MTU size of H.223 AL3
3555	   SDU is around 100 bytes [17].  Coding individual slices according to
3556	   this packetization scheme provides further advantage in communication
3557	   between wired and wireless networks, as individual slices are likely
3558	   to be smaller than the preferred maximum packet size of wireless
3559	   systems.  Consequently, a gateway can convert the STAPs used in a
3560	   wired network into several RTP packets with only one NAL unit, which
3561	   are preferred in a wireless network, and vice versa.

3563	12.4. Video Telephony with Data Partitioning

3565	   This scheme has been implemented and has been shown to offer good
3566	   performance, especially at higher packet loss rates [13].

3568	   Data Partitioning is known to be useful only when some form of
3569	   unequal error protection is available.  Normally, in single-session
3570	   RTP environments, even error characteristics are assumed; i.e., the
3571	   packet loss probability of all packets of the session is the same
3572	   statistically.  However, there are means to reduce the packet loss
3573	   probability of individual packets in an RTP session.  A FEC packet
3574	   according to RFC 2733 [18], for example, specifies which media
3575	   packets are associated with the FEC packet.

3577	   In all cases, the incurred overhead is substantial but is in the same
3578	   order of magnitude as the number of bits that have otherwise been
3579	   spent for intra information.  However, this mechanism does not add
3580	   any delay to the system.

3582	   Again, the complete parameter set establishment is performed through
3583	   control protocol means.

3585	12.5. Video Telephony or Streaming with FUs and Forward Error Correction

3587	   This scheme has been implemented and has been shown to provide good
3588	   performance, especially at higher packet loss rates [19].

3590	   The most efficient means to combat packet losses for scenarios where
3591	   retransmissions are not applicable is forward error correction (FEC).
3592	   Although application layer, end-to-end use of FEC is often less
3593	   efficient than an FEC-based protection of individual links
3594	   (especially when links of different characteristics are in the
3595	   transmission path), application layer, end-to-end FEC is unavoidable
3596	   in some scenarios.  RFC 5109 [18] provides means to use generic,
3597	   application layer, end-to-end FEC in packet-loss environments.  A
3598	   binary forward error correcting code is generated by applying the XOR
3599	   operation to the bits at the same bit position in different packets.
3600	   The binary code can be specified by the parameters (n,k) in which k
3601	   is the number of information packets used in the connection and n is
3602	   the total number of packets generated for k information packets; i.e.,
3603	   n-k parity packets are generated for k information packets.

3605	   When a code is used with parameters (n,k) within the RFC 5109
3606	   framework, the following properties are well known:

3608	   a) If applied over one RTP packet, RFC 5109 provides only packet
3609	     repetition.

3611	   b) RFC 5109 is most bit rate efficient if XOR-connected packets have
3612	     equal length.

3614	   c) At the same packet loss probability p and for a fixed k, the
3615	     greater the value of n is, the smaller the residual error
3616	     probability becomes.  For example, for a packet loss probability
3617	     of 10%, k=1, and n=2, the residual error probability is about 1%,
3618	     whereas for n=3, the residual error probability is about 0.1%.

3620	   d) At the same packet loss probability p and for a fixed code rate
3621	     k/n, the greater the value of n is, the smaller the residual error
3622	     probability becomes.  For example, at a packet loss probability of
3623	     p=10%, k=1 and n=2, the residual error rate is about 1%, whereas
3624	     for an extended Golay code with k=12 and n=24, the residual error
3625	     rate is about 0.01%.

3627	   For applying RFC 5109 in combination with H.264 baseline coded video
3628	   without using FUs, several options might be considered:

3630	   1) The video encoder produces NAL units for which each video frame is
3631	     coded in a single slice.  Applying FEC, one could use a simple
3632	     code; e.g., (n=2, k=1).  That is, each NAL unit would basically
3633	     just be repeated.  The disadvantage is obviously the bad code
3634	     performance according to d), above, and the low flexibility, as
3635	     only (n, k=1) codes can be used.

3637	   2) The video encoder produces NAL units for which each video frame is
3638	     encoded in one or more consecutive slices.  Applying FEC, one
3639	     could use a better code, e.g., (n=24, k=12), over a sequence of
3640	     NAL units.  Depending on the number of RTP packets per frame, a
3641	     loss may introduce a significant delay, which is reduced when more
3642	     RTP packets are used per frame.  Packets of completely different
3643	     length might also be connected, which decreases bit rate
3644	     efficiency according to b), above.  However, with some care and
3645	     for slices of 1kb or larger, similar length (100-200 bytes
3646	     difference) may be produced, which will not lower the bit
3647	     efficiency catastrophically.

3649	   3) The video encoder produces NAL units, for which a certain frame
3650	     contains k slices of possibly almost equal length.  Then, applying
3651	     FEC, a better code, e.g., (n=24, k=12), can be used over the
3652	     sequence of NAL units for each frame.  The delay compared to that
3653	     of 2), above, may be reduced, but several disadvantages are
3654	     obvious.  First, the coding efficiency of the encoded video is
3655	     lowered significantly, as slice-structured coding reduces intra-
3656	     frame prediction and additional slice overhead is necessary.
3657	     Second, pre-encoded content or, when operating over a gateway, the
3658	     video is usually not appropriately coded with k slices such that
3659	     FEC can be applied.  Finally, the encoding of video producing k
3660	     slices of equal length is not straightforward and might require
3661	     more than one encoding pass.

3663	   Many of the mentioned disadvantages can be avoided by applying FUs in
3664	   combination with FEC.  Each NAL unit can be split into any number of
3665	   FUs of basically equal length; therefore, FEC with a reasonable k and
3666	   n can be applied, even if the encoder made no effort to produce
3667	   slices of equal length.  For example, a coded slice NAL unit
3668	   containing an entire frame can be split to k FUs, and a parity check
3669	   code (n=k+1, k) can be applied.  However, this has the disadvantage
3670	   that unless all created fragments can be recovered, the whole slice
3671	   will be lost.  Thus a larger section is lost than would be if the
3672	   frame had been split into several slices.

3674	   The presented technique makes it possible to achieve good
3675	   transmission error tolerance, even if no additional source coding
3676	   layer redundancy (such as periodic intra frames) is present.
3677	   Consequently, the same coded video sequence can be used to achieve
3678	   the maximum compression efficiency and quality over error-free
3679	   transmission and for transmission over error-prone networks.
3680	   Furthermore, the technique allows the application of FEC to pre-
3681	   encoded sequences without adding delay.  In this case, pre-encoded
3682	   sequences that are not encoded for error-prone networks can still be
3683	   transmitted almost reliably without adding extensive delays.  In
3684	   addition, FUs of equal length result in a bit rate efficient use of
3685	   RFC 5109.

3687	   If the error probability depends on the length of the transmitted
3688	   packet (e.g., in case of mobile transmission [15]), the benefits of
3689	   applying FUs with FEC are even more obvious.  Basically, the
3690	   flexibility of the size of FUs allows appropriate FEC to be applied
3691	   for each NAL unit and unequal error protection of NAL units.

3693	   When FUs and FEC are used, the incurred overhead is substantial but
3694	   is in the same order of magnitude as the number of bits that have to
3695	   be spent for intra-coded macroblocks if no FEC is applied.  In [19],
3696	   it was shown that the overall performance of the FEC-based approach
3697	   enhanced quality when using the same error rate and same overall bit
3698	   rate, including the overhead.

3700	12.6. Low Bit-Rate Streaming

3702	   This scheme has been implemented with H.263 and non-standard RTP
3703	   packetization and has given good results [20].  There is no technical
3704	   reason why similarly good results could not be achievable with H.264.

3706	   In today's Internet streaming, some of the offered bit rates are
3707	   relatively low in order to allow terminals with dial-up modems to
3708	   access the content.  In wired IP networks, relatively large packets,
3709	   say 500 - 1500 bytes, are preferred to smaller and more frequently
3710	   occurring packets in order to reduce network congestion.  Moreover,
3711	   use of large packets decreases the amount of RTP/UDP/IP header
3712	   overhead.  For low bit-rate video, the use of large packets means
3713	   that sometimes up to few pictures should be encapsulated in one
3714	   packet.

3716	   However, loss of a packet including many coded pictures would have
3717	   drastic consequences for visual quality, as there is practically no
3718	   other way to conceal a loss of an entire picture than to repeat the
3719	   previous one.  One way to construct relatively large packets and
3720	   maintain possibilities for successful loss concealment is to
3721	   construct MTAPs that contain interleaved slices from several pictures.
3722	   An MTAP should not contain spatially adjacent slices from the same
3723	   picture or spatially overlapping slices from any picture.  If a
3724	   packet is lost, it is likely that a lost slice is surrounded by
3725	   spatially adjacent slices of the same picture and spatially
3726	   corresponding slices of the temporally previous and succeeding
3727	   pictures.  Consequently, concealment of the lost slice is likely to
3728	   be relatively successful.

3730	12.7. Robust Packet Scheduling in Video Streaming

3732	   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
3733	   simulated in a wireless streaming environment [21].  There is no
3734	   technical reason why similar or better results could not be
3735	   achievable with H.264.

3737	   Streaming clients typically have a receiver buffer that is capable of
3738	   storing a relatively large amount of data.  Initially, when a
3739	   streaming session is established, a client does not start playing the
3740	   stream back immediately.  Rather, it typically buffers the incoming
3741	   data for a few seconds.  This buffering helps maintain continuous
3742	   playback, as, in case of occasional increased transmission delays or
3743	   network throughput drops, the client can decode and play buffered
3744	   data.  Otherwise, without initial buffering, the client has to freeze
3745	   the display, stop decoding, and wait for incoming data.  The
3746	   buffering is also necessary for either automatic or selective
3747	   retransmission in any protocol level.  If any part of a picture is
3748	   lost, a retransmission mechanism may be used to resend the lost data.
3749	   If the retransmitted data is received before its scheduled decoding
3750	   or playback time, the loss is recovered perfectly.  Coded pictures
3751	   can be ranked according to their importance in the subjective quality
3752	   of the decoded sequence.  For example, non-reference pictures, such
3753	   as conventional B pictures, are subjectively least important, as
3754	   their absence does not affect decoding of any other pictures.  In
3755	   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-
3756	   10 standard includes a temporal scalability method called sub-
3757	   sequences [22].  Subjective ranking can also be made on coded slice
3758	   data partition or slice group basis.  Coded slices and coded slice
3759	   data partitions that are subjectively the most important can be sent
3760	   earlier than their decoding order indicates, whereas coded slices and
3761	   coded slice data partitions that are subjectively the least important
3762	   can be sent later than their natural coding order indicates.
3763	   Consequently, any retransmitted parts of the most important slices
3764	   and coded slice data partitions are more likely to be received before
3765	   their scheduled decoding or playback time compared to the least
3766	   important slices and slice data partitions.

3768	13. Informative Appendix: Rationale for Decoding Order Number

3770	13.1. Introduction

3772	   The Decoding Order Number (DON) concept was introduced mainly to
3773	   enable efficient multi-picture slice interleaving (see section 12.6)
3774	   and robust packet scheduling (see section 12.7).  In both of these
3775	   applications, NAL units are transmitted out of decoding order.  DON
3776	   indicates the decoding order of NAL units and should be used in the
3777	   receiver to recover the decoding order.  Example use cases for
3778	   efficient multi-picture slice interleaving and for robust packet
3779	   scheduling are given in sections 13.2 and 13.3, respectively.
3780	   Section 13.4 describes the benefits of the DON concept in error
3781	   resiliency achieved by redundant coded pictures.  Section 13.5
3782	   summarizes considered alternatives to DON and justifies why DON was
3783	   chosen to this RTP payload specification.

3785	13.2. Example of Multi-Picture Slice Interleaving

3787	   An example of multi-picture slice interleaving follows.  A subset of
3788	   a coded video sequence is depicted below in output order.  R denotes
3789	   a reference picture, N denotes a non-reference picture, and the
3790	   number indicates a relative output time.

3792	      ... R1 N2 R3 N4 R5 ...

3794	   The decoding order of these pictures from left to right is as follows:

3796	      ... R1 R3 N2 R5 N4 ...

3798	   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
3799	   DON equal to 1, 2, 3, 4, and 5, respectively.

3801	   Each reference picture consists of three slice groups that are
3802	   scattered as follows (a number denotes the slice group number for
3803	   each macroblock in a QCIF frame):

3805	      0 1 2 0 1 2 0 1 2 0 1
3806	      2 0 1 2 0 1 2 0 1 2 0
3807	      1 2 0 1 2 0 1 2 0 1 2
3808	      0 1 2 0 1 2 0 1 2 0 1
3809	      2 0 1 2 0 1 2 0 1 2 0
3810	      1 2 0 1 2 0 1 2 0 1 2
3811	      0 1 2 0 1 2 0 1 2 0 1
3812	      2 0 1 2 0 1 2 0 1 2 0
3813	      1 2 0 1 2 0 1 2 0 1 2

3815	   For the sake of simplicity, we assume that all the macroblocks of a
3816	   slice group are included in one slice.  Three MTAPs are constructed
3817	   from three consecutive reference pictures so that each MTAP contains
3818	   three aggregation units, each of which contains all the macroblocks
3819	   from one slice group.  The first MTAP contains slice group 0 of
3820	   picture R1, slice group 1 of picture R3, and slice group 2 of picture
3821	   R5.  The second MTAP contains slice group 1 of picture R1, slice
3822	   group 2 of picture R3, and slice group 0 of picture R5.  The third
3823	   MTAP contains slice group 2 of picture R1, slice group 0 of picture
3824	   R3, and slice group 1 of picture R5.  Each non-reference picture is
3825	   encapsulated into an STAP-B.

3827	   Consequently, the transmission order of NAL units is the following:

3829	      R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
3830	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
3831	      R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
3832	      R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
3833	      R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
3834	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
3835	      R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
3836	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
3837	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
3838	      N2, DON 3, carried in STAP-B, RTP SN: N+3
3839	      N4, DON 5, carried in STAP-B, RTP SN: N+4

3841	   The receiver is able to organize the NAL units back in decoding order
3842	   based on the value of DON associated with each NAL unit.

3844	   If one of the MTAPs is lost, the spatially adjacent and temporally
3845	   co-located macroblocks are received and can be used to conceal the
3846	   loss efficiently.  If one of the STAPs is lost, the effect of the
3847	   loss does not propagate temporally.

3849	13.3. Example of Robust Packet Scheduling

3851	   An example of robust packet scheduling follows.  The communication
3852	   system used in the example consists of the following components in
3853	   the order that the video is processed from source to sink:

3855	      o camera and capturing
3856	      o pre-encoding buffer
3857	      o encoder
3858	      o encoded picture buffer
3859	      o transmitter
3860	      o transmission channel
3861	      o receiver
3862	      o receiver buffer
3863	      o decoder
3864	      o decoded picture buffer
3865	      o display

3867	   The video communication system used in the example operates as
3868	   follows.  Note that processing of the video stream happens gradually
3869	   and at the same time in all components of the system.  The source
3870	   video sequence is shot and captured to a pre-encoding buffer.  The
3871	   pre-encoding buffer can be used to order pictures from sampling order
3872	   to encoding order or to analyze multiple uncompressed frames for bit
3873	   rate control purposes, for example.  In some cases, the pre-encoding
3874	   buffer may not exist; instead, the sampled pictures are encoded right
3875	   away.  The encoder encodes pictures from the pre-encoding buffer and
3876	   stores the output; i.e., coded pictures, to the encoded picture
3877	   buffer.  The transmitter encapsulates the coded pictures from the
3878	   encoded picture buffer to transmission packets and sends them to a
3879	   receiver through a transmission channel.  The receiver stores the
3880	   received packets to the receiver buffer.  The receiver buffering
3881	   process typically includes buffering for transmission delay jitter.
3882	   The receiver buffer can also be used to recover correct decoding
3883	   order of coded data.  The decoder reads coded data from the receiver
3884	   buffer and produces decoded pictures as output into the decoded
3885	   picture buffer.  The decoded picture buffer is used to recover the
3886	   output (or display) order of pictures.  Finally, pictures are
3887	   displayed.

3889	   In the following example figures, I denotes an IDR picture, R denotes
3890	   a reference picture, N denotes a non-reference picture, and the
3891	   number after I, R, or N indicates the sampling time relative to the
3892	   previous IDR picture in decoding order.  Values below the sequence of
3893	   pictures indicate scaled system clock timestamps.  The system clock
3894	   is initialized arbitrarily in this example, and time runs from left
3895	   to right.  Each I, R, and N picture is mapped into the same timeline
3896	   compared to the previous processing step, if any, assuming that
3897	   encoding, transmission, and decoding take no time.  Thus, events
3898	   happening at the same time are located in the same column throughout
3899	   all example figures.

3901	   A subset of a sequence of coded pictures is depicted below in
3902	   sampling order.

3904	       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
3905	       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
3906	       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...

3908	             Figure 16  Sequence of pictures in sampling order

3910	   The sampled pictures are buffered in the pre-encoding buffer to
3911	   arrange them in encoding order.  In this example, we assume that the
3912	   non-reference pictures are predicted from both the previous and the
3913	   next reference picture in output order, except for the non-reference
3914	   pictures immediately preceding an IDR picture, which are predicted
3915	   only from the previous reference picture in output order.  Thus, the
3916	   pre-encoding buffer has to contain at least two pictures, and the
3917	   buffering causes a delay of two picture intervals.  The output of the
3918	   pre-encoding buffering process and the encoding (and decoding) order
3919	   of the pictures are as follows:

3921	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3922	       ... -|---|---|---|---|---|---|---|---|- ...
3923	       ... 60  61  62  63  64  65  66  67  68  ...

3925	         Figure 17  Re-ordered pictures in the pre-encoding buffer

3927	   The encoder or the transmitter can set the value of DON for each
3928	   picture to a value of DON for the previous picture in decoding order
3929	   plus one.

3931	   For the sake of simplicity, let us assume that:

3933	   o  the frame rate of the sequence is constant,
3934	   o  each picture consists of only one slice,
3935	   o  each slice is encapsulated in a single NAL unit packet,
3936	   o  there is no transmission delay, and
3937	   o  pictures are transmitted at constant intervals (that is, 1 /
3938	   (frame rate)).

3940	   When pictures are transmitted in decoding order, they are received as
3941	   follows:

3943	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3944	       ... -|---|---|---|---|---|---|---|---|- ...
3945	       ... 60  61  62  63  64  65  66  67  68  ...

3947	              Figure 18  Received pictures in decoding order

3949	   The OPTIONAL sprop-interleaving-depth media type parameter is set to
3950	   0, as the transmission (or reception) order is identical to the
3951	   decoding order.

3953	   The decoder has to buffer for one picture interval initially in its
3954	   decoded picture buffer to organize pictures from decoding order to
3955	   output order as depicted below:

3957	        ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
3958	        ... -|---|---|---|---|---|---|---|---|- ...
3959	        ... 61  62  63  64  65  66  67  68  69  ...

3961	                          Figure 19  Output order

3963	   The amount of required initial buffering in the decoded picture
3964	   buffer can be signaled in the buffering period SEI message or with
3965	   the num_reorder_frames syntax element of H.264 video usability
3966	   information.  num_reorder_frames indicates the maximum number of
3967	   frames, complementary field pairs, or non-paired fields that precede
3968	   any frame, complementary field pair, or non-paired field in the
3969	   sequence in decoding order and that follow it in output order.  For
3970	   the sake of simplicity, we assume that num_reorder_frames is used to
3971	   indicate the initial buffer in the decoded picture buffer.  In this
3972	   example, num_reorder_frames is equal to 1.

3974	   It can be observed that if the IDR picture I00 is lost during
3975	   transmission and a retransmission request is issued when the value of
3976	   the system clock is 62, there is one picture interval of time (until
3977	   the system clock reaches timestamp 63) to receive the retransmitted
3978	   IDR picture I00.

3980	   Let us then assume that IDR pictures are transmitted two frame
3981	   intervals earlier than their decoding position; i.e., the pictures
3982	   are transmitted as follows:

3984	        ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
3985	        ... --|---|---|---|---|---|---|---|---|- ...
3986	        ...  62  63  64  65  66  67  68  69  70  ...

3988	       Figure 20  Interleaving: Early IDR pictures in sending order

3990	   The OPTIONAL sprop-interleaving-depth media type parameter is set
3991	   equal to 1 according to its definition.  (The value of sprop-
3992	   interleaving-depth in this example can be derived as follows: Picture
3993	   I00 is the only picture preceding picture N58 or N59 in transmission
3994	   order and following it in decoding order.  Except for pictures I00,
3995	   N58, and N59, the transmission order is the same as the decoding
3996	   order of pictures.  As a coded picture is encapsulated into exactly
3997	   one NAL unit, the value of sprop-interleaving-depth is equal to the
3998	   maximum number of pictures preceding any picture in transmission
3999	   order and following the picture in decoding order.)

4001	   The receiver buffering process contains two pictures at a time
4002	   according to the value of the sprop-interleaving-depth parameter and
4003	   orders pictures from the reception order to the correct decoding
4004	   order based on the value of DON associated with each picture.  The
4005	   output of the receiver buffering process is as follows:

4007	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
4008	       ... -|---|---|---|---|---|---|---|---|- ...
4009	       ... 63  64  65  66  67  68  69  70  71  ...

4011	                 Figure 21  Interleaving: Receiver buffer

4013	   Again, an initial buffering delay of one picture interval is needed
4014	   to organize pictures from decoding order to output order, as depicted
4015	   below:

4017	        ... N58 N59 I00 N01 N02 R03 N04 N05 ...
4018	        ... -|---|---|---|---|---|---|---|- ...
4019	        ... 64  65  66  67  68  69  70  71  ...

4021	         Figure 22  Interleaving: Receiver buffer after reordering

4023	   Note that the maximum delay that IDR pictures can undergo during
4024	   transmission, including possible application, transport, or link
4025	   layer retransmission, is equal to three picture intervals.  Thus, the
4026	   loss resiliency of IDR pictures is improved in systems supporting
4027	   retransmission compared to the case in which pictures were
4028	   transmitted in their decoding order.

4030	13.4. Robust Transmission Scheduling of Redundant Coded Slices

4032	   A redundant coded picture is a coded representation of a picture or a
4033	   part of a picture that is not used in the decoding process if the
4034	   corresponding primary coded picture is correctly decoded.  There
4035	   should be no noticeable difference between any area of the decoded
4036	   primary picture and a corresponding area that would result from
4037	   application of the H.264 decoding process for any redundant picture
4038	   in the same access unit.  A redundant coded slice is a coded slice
4039	   that is a part of a redundant coded picture.

4041	   Redundant coded pictures can be used to provide unequal error
4042	   protection in error-prone video transmission.  If a primary coded
4043	   representation of a picture is decoded incorrectly, a corresponding
4044	   redundant coded picture can be decoded.  Examples of applications and
4045	   coding techniques using the redundant codec picture feature include
4046	   the video redundancy coding [23] and the protection of "key pictures"
4047	   in multicast streaming [24].

4049	   One property of many error-prone video communications systems is that
4050	   transmission errors are often bursty.  Therefore, they may affect
4051	   more than one consecutive transmission packets in transmission order.
4052	   In low bit-rate video communication, it is relatively common that an
4053	   entire coded picture can be encapsulated into one transmission packet.
4054	   Consequently, a primary coded picture and the corresponding redundant
4055	   coded pictures may be transmitted in consecutive packets in
4056	   transmission order.  To make the transmission scheme more tolerant of
4057	   bursty transmission errors, it is beneficial to transmit the primary
4058	   coded picture and redundant coded picture separated by more than a
4059	   single packet.  The DON concept enables this.

4061	13.5. Remarks on Other Design Possibilities

4063	   The slice header syntax structure of the H.264 coding standard
4064	   contains the frame_num syntax element that can indicate the decoding
4065	   order of coded frames.  However, the usage of the frame_num syntax
4066	   element is not feasible or desirable to recover the decoding order,
4067	   due to the following reasons:

4069	   o  The receiver is required to parse at least one slice header per
4070	      coded picture (before passing the coded data to the decoder).

4072	   o  Coded slices from multiple coded video sequences cannot be
4073	      interleaved, as the frame number syntax element is reset to 0 in
4074	      each IDR picture.

4076	   o  The coded fields of a complementary field pair share the same
4077	      value of the frame_num syntax element.  Thus, the decoding order
4078	      of the coded fields of a complementary field pair cannot be
4079	      recovered based on the frame_num syntax element or any other
4080	      syntax element of the H.264 coding syntax.

4082	   The RTP payload format for transport of MPEG-4 elementary streams [25]
4083	   enables interleaving of access units and transmission of multiple
4084	   access units in the same RTP packet.  An access unit is specified in
4085	   the H.264 coding standard to comprise all NAL units associated with a
4086	   primary coded picture according to subclause 7.4.1.2 of [1].
4087	   Consequently, slices of different pictures cannot be interleaved, and
4088	   the multi-picture slice interleaving technique (see section 12.6) for
4089	   improved error resilience cannot be used.

4091	14. Acknowledgements

4093	   Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
4094	   Westerlund, and David Singer are thanked as the authors of RFC 3984.
4095	   Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
4096	   Joerg Ott, and Colin Perkins are thanked for careful review during
4097	   the development of RFC 3984. Randell Jesup, Stephen Botzko, Magnus
4098	   Westerlund, Alex Eleftheriadis, and Thomas Schierl are thanked for
4099	   their valuable comments and inputs during the development of this
4100	   memo.

4102	   This document was prepared using 2-Word-v2.0.template.dot.

4104	15. References

4106	15.1. Normative References

4108	   [1]   ITU-T Recommendation H.264, "Advanced video coding for generic
4109	         audiovisual services", November 2007.

4111	   [2]   ISO/IEC International Standard 14496-10:2008.

4113	   [3]   ITU-T Recommendation H.241, "Extended video procedures and
4114	         control signals for H.300 series terminals", May 2006.

4116	   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
4117	         Levels", BCP 14, RFC 2119, March 1997.

4119	   [5]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
4120	         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
4121	         RFC 3550, July 2003.

4123	   [6]   Handley, M. and V. Jacobson, "SDP: Session Description
4124	         Protocol", RFC 2327, April 1998.

4126	   [7]   Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
4127	         RFC 3548, July 2003.

4129	   [8]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
4130	         Session Description Protocol (SDP)", RFC 3264, June 2002.

4132	   [9]   Lennox, J., Ott, J., and Schierl, T., "Source-Specific Media
4133	         Attributes in the Session Description Protocol", draft-ietf-
4134	         mmusic-sdp-source-attributes-02 (work in progress), October
4135	         2008.

4137	15.2. Informative References

4139	   [10]  Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special
4140	         Issue on H.264/AVC. IEEE Transactions on Circuits and Systems
4141	         on Video Technology, July 2003.

4143	   [11]  Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
4144	         Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
4145	         Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
4146	         (H.263+)", RFC 2429, October 1998.

4148	   [12]  ISO/IEC IS 14496-2.

4150	   [13]  Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
4151	         Systems for Video technology, Vol. 13, No. 7, July 2003.

4153	   [14]  Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
4154	         Proceedings Packet Video Workshop 02, April 2002.

4156	   [15]  Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
4157	         Coding Network Abstraction Layer and IP-based Transport" in
4158	         Proc. ICIP 2002, Rochester, NY, September 2002.

4160	   [16]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
4161	         Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

4163	   [17]  ITU-T Recommendation H.223, "Multiplexing protocol for low bit
4164	         rate multimedia communication", July 2001.

4166	   [18]  Li, A., "RTP Payload Format for Generic Forward Error
4167	         Correction", RFC 5109, December 2007.

4169	   [19]  Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
4170	         "Video Coding and Transport Layer Techniques for H.264/AVC-
4171	         Based Transmission over Packet-Lossy Networks", IEEE
4172	         International Conference on Image Processing (ICIP 2003),
4173	         Barcelona, Spain, September 2003.

4175	   [20]  Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
4176	         video packetization", Packet Video Workshop 2000.

4178	   [21]  Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
4179	         wireless video streaming," International Packet Video Workshop
4180	         2002.

4182	   [22]  Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042,
4183	         available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
4184	         B042.doc, anuary 2002.

4186	   [23]  Wenger, S., "Video Redundancy Coding in H.263+", 1997
4187	         International Workshop on Audio-Visual Services over Packet
4188	         Networks, September 1997.

4190	   [24]  Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
4191	         Video Coding Using Unequally Protected Key Pictures", in Proc.
4192	         International Workshop VLBV03, September 2003.

4194	   [25]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
4195	         P. Gentric, "RTP Payload Format for Transport of MPEG-4
4196	         Elementary Streams", RFC 3640, November 2003.

4198	   [26]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
4199	         Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
4200	         3711, March 2004.

4202	   [27]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
4203	         Protocol (RTSP)", RFC 2326, April 1998.

4205	   [28]  Handley, M., Perkins, C., and E. Whelan, "Session Announcement
4206	         Protocol", RFC 2974, October 2000.

4208	   [29]  Westerlund, M. and Wenger, S., "RTP Topologies", RFC 5117,
4209	         January 2008.

4211	   [30]  Wenger, S., Chandra, U., and Westerlund, M., "Codec Control
4212	         Messages in the RTP Audio-Visual Profile with Feedback (AVPF)",
4213	         RFC 5104, February 2008.

4215	16. Authors' Addresses

4217	   Ye-Kui Wang
4218	   Huawei Technologies
4219	   400 Somerset Corporate Blvd
4220	   Bridgewater, NJ 08807
4221	   USA

4223	   Phone: +1-908-393-4758
4224	   EMail: yekuiwang@huawei.com

4226	   Roni Even
4227	   14 David Hamelech
4228	   Tel Aviv 64953
4229	   Israel

4231	   Phone: +972-545481099
4232	   Email:ron.even.tlv@gmail.com

4234	   Tom Kristensen
4235	   TANDBERG
4236	   Philip Pedersens vei 22
4237	   N-1366 Lysaker
4238	   Norway

4240	   Phone: +47 67125125
4241	   Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no

4243	17. Backward Compatibility to RFC 3984

4245	   The current document is a revision of RFC 3984 and intends to
4246	   obsolete it.  This section addresses the backward compatibility
4247	   issues.

4249	   The technical changes are listed in section 18.

4251	   Items 1), 2), 3), 7), 9), 10), 12), 13) are bug-fix type of changes,
4252	   and do not incur any backward compatibility issues.

4254	   Item 4), addition of six new media type parameters, does not incur
4255	   any backward compatibility issues for SDP Offer/Answer based
4256	   applications, as legacy RFC 3984 receivers ignore these parameters,
4257	   and it is fine for legacy RFC 3984 senders not to use these
4258	   parameters as they are optional.  However, there is a backward
4259	   compatibility issue for SDP declarative usage based applications, e.g.
4260	   those using RTSP and SAP, because the SDP receiver per RFC 3984
4261	   cannot accept a session for which the SDP includes an unrecognized
4262	   parameter.  Therefore, the RTSP or SAP server may have to prepare two
4263	   sets of streams, one for legacy RFC 3984 receivers and one for
4264	   receivers according to this memo.

4266	   Items 5), 6) and 11) are related to out-of-band transport of
4267	   parameter sets.  There are following backward compatibility issues.

4269	   1) When a legacy sender per RFC 3984 includes parameter sets for a
4270	     level different than the default level indicated by profile-level-
4271	     id to sprop-parameter-sets, the parameter value of sprop-
4272	     parameter-sets is invalid to the receiver per this memo and
4273	     therefore the session may be rejected.

4275	   2) In SDP Offer/Answer between a legacy offerer per RFC 3984 and an
4276	     answerer per this memo, when the answerer includes in the answer
4277	     parameter sets that are not a superset of the parameter sets
4278	     included in the offer, the parameter value of sprop-parameter-sets
4279	     is invalid to offerer and the session may not be initiated
4280	     properly (related to change item 11)).

4282	   3) When one endpoint A per this memo includes in-band-parameter-sets
4283	     equal to 1, the other side B per RFC 3984 does not understand that
4284	     it must transmit parameter sets in-band and B may still exclude
4285	     parameter sets in the in-band stream it is sending. Consequently
4286	     endpoint A cannot decode the stream it receives.

4288	   Item 7), allowance of conveying sprop-parameter-sets and sprop-level-
4289	   parameter-sets using the "fmtp" source attribute as specified in
4290	   section 6.3 of [9], is similar as item 4).  It does not incur any
4291	   backward compatibility issues for SDP Offer/Answer based applications,
4292	   as legacy RFC 3984 receivers ignore the "fmtp" source attribute, and
4293	   it is fine for legacy RFC 3984 senders not to use the "fmtp" source
4294	   attribute as it is optional.  However, there is a backward
4295	   compatibility issue for SDP declarative usage based applications, e.g.
4296	   those using RTSP and SAP, because the SDP receiver per RFC 3984
4297	   cannot accept a session for which the SDP includes an unrecognized
4298	   parameter (i.e., the "fmtp" source attribute).  Therefore, the RTSP
4299	   or SAP server may have to prepare two sets of streams, one for legacy
4300	   RFC 3984 receivers and one for receivers according to this memo.

4302	   Item 14) removed that use of out-of-band transport of parameter sets
4303	   is recommended.  As out-of-band transport of parameter sets is still
4304	   allowed, this change does not incur any backward compatibility issues.

4306	   Item 15) does not incur any backward compatibility issues as the
4307	   added subsection 8.5 is informative.

4309	18. Changes from RFC 3984

4311	   Following is the list of technical changes (including bug fixes) from
4312	   RFC 3984.  Besides this list of technical changes, numerous editorial
4313	   changes have been made, but not documented in this memo.

4315	   1) In subsections 5.4, 5.5, 6.2, 6,3 and 6.4, removed that the
4316	     packetization mode in use may be signaled by external means.

4318	   2) In subsection 7.2.2, changed the sentence

4320	      There are N VCL NAL units in the deinterleaving buffer.

4322	      to

4324	      There are N or more VCL NAL units in the de-interleaving buffer.

4326	   3) In subsection 8.1, the semantics of sprop-init-buf-time, paragraph
4327	     2, changed the sentence

4329	      The parameter is the maximum value of (transmission time of a NAL
4330	      unit - decoding time of the NAL unit), assuming reliable and
4331	      instantaneous transmission, the same timeline for transmission
4332	      and decoding, and that decoding starts when the first packet
4333	      arrives.

4335	      to

4337	      The parameter is the maximum value of (decoding time of the NAL
4338	      unit - transmission time of a NAL unit), assuming reliable and
4339	      instantaneous transmission, the same timeline for transmission
4340	      and decoding, and that decoding starts when the first packet
4341	      arrives.

4343	   4) Added six new media type parameters, namely max-smbps, sprop-
4344	     level-parameter-sets, use-level-src-parameter-sets, in-band-
4345	     parameter-sets, sar-understood and sar-supported.

4347	   5) In subsection 8.1, removed the specification of parameter-add.
4348	     Other descriptions of parameter-add (in subsections 8.2 and 8.4)
4349	     are also removed.

4351	   6) In subsection 8.1, added a constraint to sprop-parameter-sets such
4352	     that it can only contain parameter sets for the same profile and
4353	     level as indicated by profile-level-id.

4355	   7) In subsection 8.2.1, added that sprop-parameter-sets and sprop-
4356	     level-parameter-sets may be either included in the "a=fmtp" line
4357	     of SDP or conveyed using the "fmtp" source attribute as specified
4358	     in section 6.3 of [9].

4360	   8) In subsection 8.2.2, removed sprop-deint-buf-req from being part
4361	     of the media format configuration in usage with the SDP
4362	     Offer/Answer model.

4364	   9) In subsection 8.2.2, made it clear that level is downgradable in
4365	     the SDP Offer/Answer model, i.e. the use of the level part of
4366	     "profile-level-id" does not need to be symmetric (the level
4367	     included in the answer can be lower than or equal to the level
4368	     included in the offer).

4370	   10)In subsection 8.2.2, removed that the capability parameters may be
4371	     used to declare encoding capabilities.

4373	   11)In subsection 8.2.2, added rules on how to use sprop-parameter-
4374	     sets and sprop-level-parameter-sets for out-of-band transport of
4375	     parameter sets, with or without level downgrading.

4377	   12)In subsection 8.2.2, clarified the rules of using the media type
4378	     parameters with SDP Offer/Answer for multicast.

4380	   13)In subsection 8.2.2, completed and corrected the list of how
4381	     different media type parameters shall be interpreted in the
4382	     different combinations of offer or answer and direction attribute.

4384	   14)In subsection 8.4, changed the text such that both out-of-band and
4385	     in-band transport of parameter sets are allowed and neither is
4386	     recommended or required.

4388	   15)Added subsection 8.5 (informative) providing example methods for
4389	     decoder refresh to handle parameter set losses.