idnits 2.17.1 

draft-ietf-avt-rtp-rfc3984bis-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 17.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 4235.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4212.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4219.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4225.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but
     does not include the phrase in its RFC 2119 key words list.

  -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not
     defined in RFC 2119.  If it is intended as a requirements expression, it
     should be rewritten using one of the combinations defined in RFC 2119;
     otherwise it should not be all-uppercase.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 3, 2008) is 5653 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '4' is defined on line 4081, but no explicit reference
     was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648)

  -- Obsolete informational reference (is this intentional?): RFC 2429 (ref.
     '10') (Obsoleted by RFC 4629)

  -- Obsolete informational reference (is this intentional?): RFC 2733 (ref.
     '17') (Obsoleted by RFC 5109)

  -- Obsolete informational reference (is this intentional?): RFC 2326 (ref.
     '26') (Obsoleted by RFC 7826)

  -- Obsolete informational reference (is this intentional?): RFC 5117 (ref.
     '28') (Obsoleted by RFC 7667)


     Summary: 3 errors (**), 0 flaws (~~), 4 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Audio/Video Transport WG                                     Y.-K. Wang
2	Internet Draft                                                    Nokia
3	Intended status: Standards track                                R. Even
4	Expires: May 2009                                         Self-employed
5	                                                          T. Kristensen
6	                                                               Tandberg
7	                                                       November 3, 2008

9	                    RTP Payload Format for H.264 Video
10	                   draft-ietf-avt-rtp-rfc3984bis-01.txt

12	Status of this Memo

14	   By submitting this Internet-Draft, each author represents that any
15	   applicable patent or other IPR claims of which he or she is aware
16	   have been or will be disclosed, and any of which he or she becomes
17	   aware will be disclosed, in accordance with Section 6 of BCP 79.

19	   Internet-Drafts are working documents of the Internet Engineering
20	   Task Force (IETF), its areas, and its working groups.  Note that
21	   other groups may also distribute working documents as Internet-
22	   Drafts.

24	   Internet-Drafts are draft documents valid for a maximum of six months
25	   and may be updated, replaced, or obsoleted by other documents at any
26	   time.  It is inappropriate to use Internet-Drafts as reference
27	   material or to cite them other than as "work in progress."

29	   The list of current Internet-Drafts can be accessed at
30	   http://www.ietf.org/ietf/1id-abstracts.txt

32	   The list of Internet-Draft Shadow Directories can be accessed at
33	   http://www.ietf.org/shadow.html

35	   This Internet-Draft will expire on May 3, 2009.

37	Copyright Notice

39	   Copyright (C) The IETF Trust (2008).

41	Abstract

43	   This memo describes an RTP Payload format for the ITU-T
44	   Recommendation H.264 video codec and the technically identical
45	   ISO/IEC International Standard 14496-10 video codec, excluding the
46	   Scalable Video Coding (SVC) extension and the Multivew Video Coding
47	   extension, for which the RTP payload formats are defined elsewhere.

49	   The RTP payload format allows for packetization of one or more
50	   Network Abstraction Layer Units (NALUs), produced by an H.264 video
51	   encoder, in each RTP payload.  The payload format has wide
52	   applicability, as it supports applications from simple low bit-rate
53	   conversational usage, to Internet video streaming with interleaved
54	   transmission, to high bit-rate video-on-demand.

56	   This memo intends to obsolete RFC 3984.  Changes from RFC 3984 are
57	   summarized in section 17.   Issues on backward compatibility to RFC
58	   3984 are discussed in section 16.

60	Table of Contents

62	   1. Introduction...................................................4
63	      1.1. The H.264 Codec...........................................4
64	      1.2. Parameter Set Concept.....................................5
65	      1.3. Network Abstraction Layer Unit Types......................6
66	   2. Conventions....................................................7
67	   3. Scope..........................................................7
68	   4. Definitions and Abbreviations..................................7
69	      4.1. Definitions...............................................7
70	      4.2. Abbreviations.............................................9
71	   5. RTP Payload Format............................................10
72	      5.1. RTP Header Usage.........................................10
73	      5.2. Payload Structures.......................................13
74	      5.3. NAL Unit Header Usage....................................14
75	      5.4. Packetization Modes......................................16
76	      5.5. Decoding Order Number (DON)..............................17
77	      5.6. Single NAL Unit Packet...................................20
78	      5.7. Aggregation Packets......................................21
79	         5.7.1. Single-Time Aggregation Packet......................23
80	         5.7.2. Multi-Time Aggregation Packets (MTAPs)..............25
81	         5.7.3. Fragmentation Units (FUs)...........................29
82	   6. Packetization Rules...........................................33
83	      6.1. Common Packetization Rules...............................33
84	      6.2. Single NAL Unit Mode.....................................34
85	      6.3. Non-Interleaved Mode.....................................34
86	      6.4. Interleaved Mode.........................................34
87	   7. De-Packetization Process......................................35
88	      7.1. Single NAL Unit and Non-Interleaved Mode.................35
89	      7.2. Interleaved Mode.........................................35
90	         7.2.1. Size of the De-interleaving Buffer..................36
91	         7.2.2. De-interleaving Process.............................36
92	      7.3. Additional De-Packetization Guidelines...................38
93	   8. Payload Format Parameters.....................................39
94	      8.1. Media Type Registration..................................39
95	      8.2. SDP Parameters...........................................55
96	         8.2.1. Mapping of Payload Type Parameters to SDP...........55
97	         8.2.2. Usage with the SDP Offer/Answer Model...............56
98	         8.2.3. Usage in Declarative Session Descriptions...........64
99	      8.3. Examples.................................................65
100	      8.4. Parameter Set Considerations.............................70
101	      8.5. Decoder Refresh Point Procedure using In-Band Transport of
102	      Parameter Sets (Informative)..................................73
103	         8.5.1. IDR Procedure to Respond to a Request for a Decoder
104	         Refresh Point..............................................73
105	         8.5.2. Gradual Recovery Procedure to Respond to a Request for a
106	         Decoder Refresh Point......................................74
107	   9. Security Considerations.......................................74
108	   10. Congestion Control...........................................75
109	   11. IANA Consideration...........................................76
110	   12. Informative Appendix: Application Examples...................76
111	      12.1. Video Telephony according to ITU-T Recommendation H.241
112	      Annex A.......................................................76
113	      12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
114	      Aggregation...................................................77
115	      12.3. Video Telephony, Interleaved Packetization Using NAL Unit
116	      Aggregation...................................................77
117	      12.4. Video Telephony with Data Partitioning..................78
118	      12.5. Video Telephony or Streaming with FUs and Forward Error
119	      Correction....................................................78
120	      12.6. Low Bit-Rate Streaming..................................81
121	      12.7. Robust Packet Scheduling in Video Streaming.............81
122	   13. Informative Appendix: Rationale for Decoding Order Number....82
123	      13.1. Introduction............................................82
124	      13.2. Example of Multi-Picture Slice Interleaving.............83
125	      13.3. Example of Robust Packet Scheduling.....................84
126	      13.4. Robust Transmission Scheduling of Redundant Coded Slices88
127	      13.5. Remarks on Other Design Possibilities...................89
128	   14. Acknowledgements.............................................89
129	   15. References...................................................90
130	      15.1. Normative References....................................90
131	      15.2. Informative References..................................90
132	   Authors' Addresses...............................................92
133	   Intellectual Property Statement..................................93
134	   Disclaimer of Validity...........................................93
135	   Acknowledgement..................................................93
136	   16. Backward Compatibility to RFC 3984...........................94
137	   17. Changes from RFC 3984........................................95
138	   18. Open issues..................................................96

140	1. Introduction

142	   This memo intends to obsolete RFC 3984.  Changes from RFC 3984 are
143	   summarized in section 17.   Issues on backward compatibility to RFC
144	   3984 are discussed in section 16.

146	1.1. The H.264 Codec

148	   This memo specifies an RTP payload specification for the video coding
149	   standard known as ITU-T Recommendation H.264 [1] and ISO/IEC
150	   International Standard 14496 Part 10 [2] (both also known as Advanced
151	   Video Coding, or AVC).  In this memo the H.264 acronym is used for
152	   the codec and the standard, but the memo is equally applicable to the
153	   ISO/IEC counterpart of the coding standard.

155	   The H.264 video codec has a very broad application range that covers
156	   all forms of digital compressed video from, low bit-rate Internet
157	   streaming applications to HDTV broadcast and Digital Cinema
158	   applications with nearly lossless coding.  Compared to the current
159	   state of technology, the overall performance of H.264 is such that
160	   bit rate savings of 50% or more are reported.  Digital Satellite TV
161	   quality, for example, was reported to be achievable at 1.5 Mbit/s,
162	   compared to the current operation point of MPEG 2 video at around 3.5
163	   Mbit/s [9].

165	   The codec specification [1] itself distinguishes conceptually between
166	   a video coding layer (VCL) and a network abstraction layer (NAL).
167	   The VCL contains the signal processing functionality of the codec;
168	   mechanisms such as transform, quantization, and motion compensated
169	   prediction; and a loop filter.  It follows the general concept of
170	   most of today's video codecs, a macroblock-based coder that uses
171	   inter picture prediction with motion compensation and transform
172	   coding of the residual signal.  The VCL encoder outputs slices: a bit
173	   string that contains the macroblock data of an integer number of
174	   macroblocks, and the information of the slice header (containing the
175	   spatial address of the first macroblock in the slice, the initial
176	   quantization parameter, and similar information).  Macroblocks in
177	   slices are arranged in scan order unless a different macroblock
178	   allocation is specified, by using the so-called Flexible Macroblock
179	   Ordering syntax.  In-picture prediction is used only within a slice.
180	   More information is provided in [9].

182	   The Network Abstraction Layer (NAL) encoder encapsulates the slice
183	   output of the VCL encoder into Network Abstraction Layer Units (NAL
184	   units), which are suitable for transmission over packet networks or
185	   use in packet oriented multiplex environments.  Annex B of H.264
186	   defines an encapsulation process to transmit such NAL units over
187	   byte-stream oriented networks.  In the scope of this memo, Annex B is
188	   not relevant.

190	   Internally, the NAL uses NAL units.  A NAL unit consists of a one-
191	   byte header and the payload byte string.  The header indicates the
192	   type of the NAL unit, the (potential) presence of bit errors or
193	   syntax violations in the NAL unit payload, and information regarding
194	   the relative importance of the NAL unit for the decoding process.
195	   This RTP payload specification is designed to be unaware of the bit
196	   string in the NAL unit payload.

198	   One of the main properties of H.264 is the complete decoupling of the
199	   transmission time, the decoding time, and the sampling or
200	   presentation time of slices and pictures.  The decoding process
201	   specified in H.264 is unaware of time, and the H.264 syntax does not
202	   carry information such as the number of skipped frames (as is common
203	   in the form of the Temporal Reference in earlier video compression
204	   standards).  Also, there are NAL units that affect many pictures and
205	   that are, therefore, inherently timeless.  For this reason, the
206	   handling of the RTP timestamp requires some special considerations
207	   for NAL units for which the sampling or presentation time is not
208	   defined or, at transmission time, unknown.

210	1.2. Parameter Set Concept

212	   One very fundamental design concept of H.264 is to generate self-
213	   contained packets, to make mechanisms such as the header duplication
214	   of RFC 2429 [10] or MPEG-4's Header Extension Code (HEC) [11]
215	   unnecessary.  This was achieved by decoupling information relevant to
216	   more than one slice from the media stream.  This higher layer meta
217	   information should be sent reliably, asynchronously, and in advance
218	   from the RTP packet stream that contains the slice packets.
219	   (Provisions for sending this information in-band are also available
220	   for applications that do not have an out-of-band transport channel
221	   appropriate for the purpose.)  The combination of the higher-level
222	   parameters is called a parameter set.  The H.264 specification
223	   includes two types of parameter sets: sequence parameter set and
224	   picture parameter set.  An active sequence parameter set remains
225	   unchanged throughout a coded video sequence, and an active picture
226	   parameter set remains unchanged within a coded picture.  The sequence
227	   and picture parameter set structures contain information such as
228	   picture size, optional coding modes employed, and macroblock to slice
229	   group map.

231	   To be able to change picture parameters (such as the picture size)
232	   without having to transmit parameter set updates synchronously to the
233	   slice packet stream, the encoder and decoder can maintain a list of
234	   more than one sequence and picture parameter set.  Each slice header
235	   contains a codeword that indicates the sequence and picture parameter
236	   set to be used.

238	   This mechanism allows the decoupling of the transmission of parameter
239	   sets from the packet stream, and the transmission of them by external
240	   means (e.g., as a side effect of the capability exchange), or through
241	   a (reliable or unreliable) control protocol.  It may even be possible
242	   that they are never transmitted but are fixed by an application
243	   design specification.

245	1.3. Network Abstraction Layer Unit Types

247	   Tutorial information on the NAL design can be found in [12], [13],
248	   and [14].

250	   All NAL units consist of a single NAL unit type octet, which also co-
251	   serves as the payload header of this RTP payload format.  The payload
252	   of a NAL unit follows immediately.

254	   The syntax and semantics of the NAL unit type octet are specified in
255	   [1], but the essential properties of the NAL unit type octet are
256	   summarized below.  The NAL unit type octet has the following format:

258	      +---------------+
259	      |0|1|2|3|4|5|6|7|
260	      +-+-+-+-+-+-+-+-+
261	      |F|NRI|  Type   |
262	      +---------------+

264	   The semantics of the components of the NAL unit type octet, as
265	   specified in the H.264 specification, are described briefly below.

267	   F: 1 bit
268	      forbidden_zero_bit.  The H.264 specification declares a value of
269	      1 as a syntax violation.

271	   NRI: 2 bits
272	      nal_ref_idc.  A value of 00 indicates that the content of the NAL
273	      unit is not used to reconstruct reference pictures for inter
274	      picture prediction.  Such NAL units can be discarded without
275	      risking the integrity of the reference pictures.  Values greater
276	      than 00 indicate that the decoding of the NAL unit is required to
277	      maintain the integrity of the reference pictures.

279	   Type: 5 bits
280	      nal_unit_type.  This component specifies the NAL unit payload
281	      type as defined in Table 7-1 of [1], and later within this memo.
282	      For a reference of all currently defined NAL unit types and their
283	      semantics, please refer to section 7.4.1 in [1].

285	   This memo introduces new NAL unit types, which are presented in
286	   section 5.2.  The NAL unit types defined in this memo are marked as
287	   unspecified in [1].  Moreover, this specification extends the
288	   semantics of F and NRI as described in section 5.3.

290	2. Conventions

292	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
293	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
294	   document are to be interpreted as described in RFC-2119 [3].

296	   This specification uses the notion of setting and clearing a bit when
297	   bit fields are handled.  Setting a bit is the same as assigning that
298	   bit the value of 1 (On).  Clearing a bit is the same as assigning
299	   that bit the value of 0 (Off).

301	3. Scope

303	   This payload specification can only be used to carry the "naked"
304	   H.264 NAL unit stream over RTP, and not the bitstream format
305	   discussed in Annex B of H.264.  Likely, the first applications of
306	   this specification will be in the conversational multimedia field,
307	   video telephony or video conferencing, but the payload format also
308	   covers other applications, such as Internet streaming and TV over IP.

310	4. Definitions and Abbreviations

312	4.1. Definitions

314	   This document uses the definitions of [1].  The following terms,
315	   defined in [1], are summed up for convenience:

317	      access unit: A set of NAL units always containing a primary coded
318	      picture.  In addition to the primary coded picture, an access
319	      unit may also contain one or more redundant coded pictures or
320	      other NAL units not containing slices or slice data partitions of
321	      a coded picture.  The decoding of an access unit always results
322	      in a decoded picture.

324	      coded video sequence: A sequence of access units that consists,
325	      in decoding order, of an instantaneous decoding refresh (IDR)
326	      access unit followed by zero or more non-IDR access units
327	      including all subsequent access units up to but not including any
328	      subsequent IDR access unit.

330	      IDR access unit: An access unit in which the primary coded
331	      picture is an IDR picture.

333	      IDR picture: A coded picture containing only slices with I or SI
334	      slice types that causes a "reset" in the decoding process.  After
335	      the decoding of an IDR picture, all following coded pictures in
336	      decoding order can be decoded without inter prediction from any
337	      picture decoded prior to the IDR picture.

339	      primary coded picture: The coded representation of a picture to
340	      be used by the decoding process for a bitstream conforming to
341	      H.264.  The primary coded picture contains all macroblocks of the
342	      picture.

344	      redundant coded picture: A coded representation of a picture or a
345	      part of a picture.  The content of a redundant coded picture
346	      shall not be used by the decoding process for a bitstream
347	      conforming to H.264.  The content of a redundant coded picture
348	      may be used by the decoding process for a bitstream that contains
349	      errors or losses.

351	      VCL NAL unit: A collective term used to refer to coded slice and
352	      coded data partition NAL units.

354	   In addition, the following definitions apply:

356	      decoding order number (DON): A field in the payload structure, or
357	      a derived variable indicating NAL unit decoding order.  Values of
358	      DON are in the range of 0 to 65535, inclusive.  After reaching
359	      the maximum value, the value of DON wraps around to 0.

361	      NAL unit decoding order: A NAL unit order that conforms to the
362	      constraints on NAL unit order given in section 7.4.1.2 in [1].

364	      NALU-time: The value that the RTP timestamp would have if the NAL
365	      unit would be transported in its own RTP packet.

367	      transmission order: The order of packets in ascending RTP
368	      sequence number order (in modulo arithmetic).  Within an
369	      aggregation packet, the NAL unit transmission order is the same
370	      as the order of appearance of NAL units in the packet.

372	      media aware network element (MANE): A network element, such as a
373	      middlebox or application layer gateway that is capable of parsing
374	      certain aspects of the RTP payload headers or the RTP payload and
375	      reacting to the contents.

377	         Informative note: The concept of a MANE goes beyond normal
378	         routers or gateways in that a MANE has to be aware of the
379	         signaling (e.g., to learn about the payload type mappings of
380	         the media streams), and in that it has to be trusted when
381	         working with SRTP.  The advantage of using MANEs is that they
382	         allow packets to be dropped according to the needs of the
383	         media coding.  For example, if a MANE has to drop packets due
384	         to congestion on a certain link, it can identify those packets
385	         whose dropping has the smallest negative impact on the user
386	         experience and remove them in order to remove the congestion
387	         and/or keep the delay low.

389	      static macroblock: A certain amount of macroblocks in the video
390	      stream can be defined as static, as defined in section 8.3.2.8 in
391	      [3].  Static macroblocks free up additional processing cycles for
392	      the handling of non-static macroblocks.  Based on a given amount
393	      of video processing resources and a given resolution, a higher
394	      number of static macroblocks enables a correspondingly higher
395	      frame rate.

397	      default sub-profile: The subset of coding tools, which may be all
398	      coding tools of one profile or the common subset of coding tools
399	      of more than one profile, indicated by the profile-level-id
400	      parameter.  In SDP Offer/Answer, the default sub-profile must be
401	      used in a symmetric manner, i.e. the answer must either use the
402	      same sub-profile as the offer or reject the offer.

404	      default level: The level indicated by the profile-level-id
405	      parameter.  In SDP Offer/Answer, level is downgradable, i.e., the
406	      answer may either use the default level or a lower level.

408	4.2. Abbreviations

410	      DON:        Decoding Order Number
411	      DONB:       Decoding Order Number Base
412	      DOND:       Decoding Order Number Difference
413	      FEC:        Forward Error Correction
414	      FU:         Fragmentation Unit
415	      IDR:        Instantaneous Decoding Refresh
416	      IEC:        International Electrotechnical Commission
417	      ISO:        International Organization for Standardization
418	      ITU-T:      International Telecommunication Union,
419	                  Telecommunication Standardization Sector
420	      MANE:       Media Aware Network Element
421	      MTAP:       Multi-Time Aggregation Packet
422	      MTAP16:     MTAP with 16-bit timestamp offset
423	      MTAP24:     MTAP with 24-bit timestamp offset
424	      NAL:        Network Abstraction Layer
425	      NALU:       NAL Unit
426	      SAR:        Sample Aspect Ratio
427	      SEI:        Supplemental Enhancement Information
428	      STAP:       Single-Time Aggregation Packet
429	      STAP-A:     STAP type A
430	      STAP-B:     STAP type B
431	      TS:         Timestamp
432	      VCL:        Video Coding Layer
433	      VUI:        Video Usability Information

435	5. RTP Payload Format

437	5.1. RTP Header Usage

439	   The format of the RTP header is specified in RFC 3550 [5] and
440	   reprinted in Figure 1 for convenience.  This payload format uses the
441	   fields of the header in a manner consistent with that specification.

443	   When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP
444	   payload format is specified in section 5.6.  The RTP payload (and the
445	   settings for some RTP header bits) for aggregation packets and
446	   fragmentation units are specified in sections 5.7 and 5.8,
447	   respectively.

449	    0                   1                   2                   3
450	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
451	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
452	   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
453	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
454	   |                           timestamp                           |
455	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
456	   |           synchronization source (SSRC) identifier            |
457	   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
458	   |            contributing source (CSRC) identifiers             |
459	   |                             ....                              |
460	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

462	                 Figure 1 RTP header according to RFC 3550

464	   The RTP header information to be set according to this RTP payload
465	   format is set as follows:

467	   Marker bit (M): 1 bit
468	      Set for the very last packet of the access unit indicated by the
469	      RTP timestamp, in line with the normal use of the M bit in video
470	      formats, to allow an efficient playout buffer handling.  For
471	      aggregation packets (STAP and MTAP), the marker bit in the RTP
472	      header MUST be set to the value that the marker bit of the last
473	      NAL unit of the aggregation packet would have been if it were
474	      transported in its own RTP packet.  Decoders MAY use this bit as
475	      an early indication of the last packet of an access unit, but
476	      MUST NOT rely on this property.

478	         Informative note: Only one M bit is associated with an
479	         aggregation packet carrying multiple NAL units.  Thus, if a
480	         gateway has re-packetized an aggregation packet into several
481	         packets, it cannot reliably set the M bit of those packets.

483	   Payload type (PT): 7 bits
484	      The assignment of an RTP payload type for this new packet format
485	      is outside the scope of this document and will not be specified
486	      here.  The assignment of a payload type has to be performed
487	      either through the profile used or in a dynamic way.

489	   Sequence number (SN): 16 bits
490	      Set and used in accordance with RFC 3550.  For the single NALU
491	      and non-interleaved packetization mode, the sequence number is
492	      used to determine decoding order for the NALU.

494	   Timestamp: 32 bits
495	      The RTP timestamp is set to the sampling timestamp of the
496	      content.  A 90 kHz clock rate MUST be used.

498	      If the NAL unit has no timing properties of its own (e.g.,
499	      parameter set and SEI NAL units), the RTP timestamp is set to the
500	      RTP timestamp of the primary coded picture of the access unit in
501	      which the NAL unit is included, according to section 7.4.1.2 of
502	      [1].

504	      The setting of the RTP Timestamp for MTAPs is defined in section
505	      5.7.2.

507	      Receivers SHOULD ignore any picture timing SEI messages included
508	      in access units that have only one display timestamp.  Instead,
509	      receivers SHOULD use the RTP timestamp for synchronizing the
510	      display process.

512	      RTP senders SHOULD NOT transmit picture timing SEI messages for
513	      pictures that are not supposed to be displayed as multiple
514	      fields.

516	      If one access unit has more than one display timestamp carried in
517	      a picture timing SEI message, then the information in the SEI
518	      message SHOULD be treated as relative to the RTP timestamp, with
519	      the earliest event occurring at the time given by the RTP
520	      timestamp, and subsequent events later, as given by the
521	      difference in SEI message picture timing values.  Let tSEI1,
522	      tSEI2, ..., tSEIn be the display timestamps carried in the SEI
523	      message of an access unit, where tSEI1 is the earliest of all
524	      such timestamps.  Let tmadjst() be a function that adjusts the
525	      SEI messages time scale to a 90-kHz time scale.  Let TS be the
526	      RTP timestamp.  Then, the display time for the event associated
527	      with tSEI1 is TS.  The display time for the event with tSEIx,
528	      where x is [2..n] is TS + tmadjst (tSEIx - tSEI1).

530	         Informative note: Displaying coded frames as fields is needed
531	         commonly in an operation known as 3:2 pulldown, in which film
532	         content that consists of coded frames is displayed on a
533	         display using interlaced scanning.  The picture timing SEI
534	         message enables carriage of multiple timestamps for the same
535	         coded picture, and therefore the 3:2 pulldown process is
536	         perfectly controlled.  The picture timing SEI message
537	         mechanism is necessary because only one timestamp per coded
538	         frame can be conveyed in the RTP timestamp.

540	         Informative note: Because H.264 allows the decoding order to
541	         be different from the display order, values of RTP timestamps
542	         may not be monotonically non-decreasing as a function of RTP
543	         sequence numbers.  Furthermore, the value for inter-arrival
544	         jitter reported in the RTCP reports may not be a trustworthy
545	         indication of the network performance, as the calculation
546	         rules for inter-arrival jitter (section 6.4.1 of RFC 3550)
547	         assume that the RTP timestamp of a packet is directly
548	         proportional to its transmission time.

550	5.2. Payload Structures

552	   The payload format defines three different basic payload structures.
553	   A receiver can identify the payload structure by the first byte of
554	   the RTP packet payload, which co-serves as the RTP payload header
555	   and, in some cases, as the first byte of the payload.  This byte is
556	   always structured as a NAL unit header.  The NAL unit type field
557	   indicates which structure is present.  The possible structures are as
558	   follows:

560	   Single NAL Unit Packet: Contains only a single NAL unit in the
561	   payload.  The NAL header type field will be equal to the original NAL
562	   unit type; i.e., in the range of 1 to 23, inclusive.  Specified in
563	   section 5.6.

565	   Aggregation Packet: Packet type used to aggregate multiple NAL units
566	   into a single RTP payload.  This packet exists in four versions, the
567	   Single-Time Aggregation Packet type A (STAP-A), the Single-Time
568	   Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet
569	   (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet
570	   (MTAP) with 24-bit offset (MTAP24).  The NAL unit type numbers
571	   assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and
572	   27, respectively.  Specified in section 5.7.

574	   Fragmentation Unit: Used to fragment a single NAL unit over multiple
575	   RTP packets.  Exists with two versions, FU-A and FU-B, identified
576	   with the NAL unit type numbers 28 and 29, respectively.  Specified in
577	   section 5.8.

579	      Informative note: This specification does not limit the size of
580	      NAL units encapsulated in single NAL unit packets and
581	      fragmentation units.  The maximum size of a NAL unit encapsulated
582	      in any aggregation packet is 65535 bytes.

584	   Table 1 summarizes NAL unit types and the corresponding RTP packet
585	   types when each of these NAL units is directly used a packet payload,
586	   and where the types are described in this memo.

588	     Table 1.  Summary of NAL unit types and the corresponding packet
589	                                   types

591	      NAL Unit  Packet    Packet Type Name               Section
592	      Type      Type
593	      ---------------------------------------------------------
594	      0        reserved                                     -
595	      1-23     NAL unit  Single NAL unit packet             5.6
596	      24       STAP-A    Single-time aggregation packet     5.7.1
597	      25       STAP-B    Single-time aggregation packet     5.7.1
598	      26       MTAP16    Multi-time aggregation packet      5.7.2
599	      27       MTAP24    Multi-time aggregation packet      5.7.2
600	      28       FU-A      Fragmentation unit                 5.8
601	      29       FU-B      Fragmentation unit                 5.8
602	      30-31    reserved                                     -

604	5.3. NAL Unit Header Usage

606	   The structure and semantics of the NAL unit header were introduced in
607	   section 1.3.  For convenience, the format of the NAL unit header is
608	   reprinted below:

610	      +---------------+
611	      |0|1|2|3|4|5|6|7|
612	      +-+-+-+-+-+-+-+-+
613	      |F|NRI|  Type   |
614	      +---------------+

616	   This section specifies the semantics of F and NRI according to this
617	   specification.

619	   F: 1 bit
620	      forbidden_zero_bit.  A value of 0 indicates that the NAL unit
621	      type octet and payload should not contain bit errors or other
622	      syntax violations.  A value of 1 indicates that the NAL unit type
623	      octet and payload may contain bit errors or other syntax
624	      violations.

626	      MANEs SHOULD set the F bit to indicate detected bit errors in the
627	      NAL unit.  The H.264 specification requires that the F bit is
628	      equal to 0.  When the F bit is set, the decoder is advised that
629	      bit errors or any other syntax violations may be present in the
630	      payload or in the NAL unit type octet.  The simplest decoder
631	      reaction to a NAL unit in which the F bit is equal to 1 is to
632	      discard such a NAL unit and to conceal the lost data in the
633	      discarded NAL unit.

635	   NRI: 2 bits
636	      nal_ref_idc.  The semantics of value 00 and a non-zero value
637	      remain unchanged from the H.264 specification.  In other words, a
638	      value of 00 indicates that the content of the NAL unit is not
639	      used to reconstruct reference pictures for inter picture
640	      prediction. Such NAL units can be discarded without risking the
641	      integrity of the reference pictures.  Values greater than 00
642	      indicate that the decoding of the NAL unit is required to
643	      maintain the integrity of the reference pictures.

645	      In addition to the specification above, according to this RTP
646	      payload specification, values of NRI indicate the relative
647	      transport priority, as determined by the encoder.  MANEs can use
648	      this information to protect more important NAL units better than
649	      they do less important NAL units.  The highest transport priority
650	      is 11, followed by 10, and then by 01; finally, 00 is the lowest.

652	         Informative note: Any non-zero value of NRI is handled
653	         identically in H.264 decoders.  Therefore, receivers need not
654	         manipulate the value of NRI when passing NAL units to the
655	         decoder.

657	      An H.264 encoder MUST set the value of NRI according to the H.264
658	      specification (subclause 7.4.1) when the value of nal_unit_type
659	      is in the range of 1 to 12, inclusive.  In particular, the H.264
660	      specification requires that the value of NRI SHALL be equal to 0
661	      for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or
662	      12.

664	      For NAL units having nal_unit_type equal to 7 or 8 (indicating a
665	      sequence parameter set or a picture parameter set, respectively),
666	      an H.264 encoder SHOULD set the value of NRI to 11 (in binary
667	      format).  For coded slice NAL units of a primary coded picture
668	      having nal_unit_type equal to 5 (indicating a coded slice
669	      belonging to an IDR picture), an H.264 encoder SHOULD set the
670	      value of NRI to 11 (in binary format).

672	      For a mapping of the remaining nal_unit_types to NRI values, the
673	      following example MAY be used and has been shown to be efficient
674	      in a certain environment [13].  Other mappings MAY also be
675	      desirable, depending on the application and the H.264/AVC Annex A
676	      profile in use.

678	         Informative note: Data Partitioning is not available in
679	         certain profiles; e.g., in the Main or Baseline profiles.
680	         Consequently, the NAL unit types 2, 3, and 4 can occur only if
681	         the video bitstream conforms to a profile in which data
682	         partitioning is allowed and not in streams that conform to the
683	         Main or Baseline profiles.

685	   Table 2.  Example of NRI values for coded slices and coded slice data
686	              partitions of primary coded reference pictures

688	      NAL Unit Type     Content of NAL unit              NRI (binary)
689	      ----------------------------------------------------------------
690	       1              non-IDR coded slice                         10
691	       2              Coded slice data partition A                10
692	       3              Coded slice data partition B                01
693	       4              Coded slice data partition C                01

695	         Informative note: As mentioned before, the NRI value of non-
696	         reference pictures is 00 as mandated by H.264/AVC.

698	      An H.264 encoder SHOULD set the value of NRI for coded slice and
699	      coded slice data partition NAL units of redundant coded reference
700	      pictures equal to 01 (in binary format).

702	      Definitions of the values for NRI for NAL unit types 24 to 29,
703	      inclusive, are given in sections 5.7 and 5.8 of this memo.

705	      No recommendation for the value of NRI is given for NAL units
706	      having nal_unit_type in the range of 13 to 23, inclusive, because
707	      these values are reserved for ITU-T and ISO/IEC.  No
708	      recommendation for the value of NRI is given for NAL units having
709	      nal_unit_type equal to 0 or in the range of 30 to 31, inclusive,
710	      as the semantics of these values are not specified in this memo.

712	5.4. Packetization Modes

714	   This memo specifies three cases of packetization modes:

716	   o  Single NAL unit mode

718	   o  Non-interleaved mode

720	   o  Interleaved mode

722	   The single NAL unit mode is targeted for conversational systems that
723	   comply with ITU-T Recommendation H.241 [3]  (see section 12.1).  The
724	   non-interleaved mode is targeted for conversational systems that may
725	   not comply with ITU-T Recommendation H.241.  In the non-interleaved
726	   mode, NAL units are transmitted in NAL unit decoding order.  The
727	   interleaved mode is targeted for systems that do not require very low
728	   end-to-end latency.  The interleaved mode allows transmission of NAL
729	   units out of NAL unit decoding order.

731	   The packetization mode in use MAY be signaled by the value of the
732	   OPTIONAL packetization-mode media type parameter.  The used
733	   packetization mode governs which NAL unit types are allowed in RTP
734	   payloads.  Table 3 summarizes the allowed packet payload types for
735	   each packetization mode.  Packetization modes are explained in more
736	   detail in section 6.

738	    Table 3.  Summary of allowed NAL unit types for each packetization
739	            mode (yes = allowed, no = disallowed, ig = ignore)

741	      Payload Packet    Single NAL    Non-Interleaved    Interleaved
742	      Type    Type      Unit Mode           Mode             Mode
743	      -------------------------------------------------------------
744	      0      reserved      ig               ig               ig
745	      1-23   NAL unit     yes              yes               no
746	      24     STAP-A        no              yes               no
747	      25     STAP-B        no               no              yes
748	      26     MTAP16        no               no              yes
749	      27     MTAP24        no               no              yes
750	      28     FU-A          no              yes              yes
751	      29     FU-B          no               no              yes
752	      30-31  reserved      ig               ig               ig

754	   Some NAL unit or payload type values (indicated as reserved in
755	   Table 3) are reserved for future extensions.  NAL units of those
756	   types SHOULD NOT be sent by a sender (direct as packet payloads, or
757	   as aggregation units in aggregation packets, or as fragmented units
758	   in FU packets) and MUST be ignored by a receiver.  For example, the
759	   payload types 1-23, with the associated packet type "NAL unit", are
760	   allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode", but
761	   disallowed in "Interleaved Mode".  However, NAL units of NAL unit
762	   types 1-23 can be used in "Interleaved Mode" as aggregation units in
763	   STAP-B, MTAP16 and MTAP14 packets as well as fragmented units in FU-A
764	   and FU-B packets.  Similarly, NAL units of NAL unit types 1-23 can
765	   also be used in the "Non-Interleaved Mode" as aggregation units in
766	   STAP-A packets or fragmented units in FU-A packets, in addition to
767	   being directly used as packet payloads.

769	5.5. Decoding Order Number (DON)

771	   In the interleaved packetization mode, the transmission order of NAL
772	   units is allowed to differ from the decoding order of the NAL units.
773	   Decoding order number (DON) is a field in the payload structure or a
774	   derived variable that indicates the NAL unit decoding order.

776	   Rationale and examples of use cases for transmission out of decoding
777	   order and for the use of DON are given in section 13.

779	   The coupling of transmission and decoding order is controlled by the
780	   OPTIONAL sprop-interleaving-depth media type parameter as follows.
781	   When the value of the OPTIONAL sprop-interleaving-depth media type
782	   parameter is equal to 0 (explicitly or per default), the transmission
783	   order of NAL units MUST conform to the NAL unit decoding order.  When
784	   the value of the OPTIONAL sprop-interleaving-depth media type
785	   parameter is greater than 0,

787	   o  the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED
788	      to be the NAL unit decoding order, and

790	   o  the order of NAL units generated by de-packetizing STAP-Bs, MTAPs,
791	      and FUs in two consecutive packets is NOT REQUIRED to be the NAL
792	      unit decoding order.

794	   The RTP payload structures for a single NAL unit packet, an STAP-A,
795	   and an FU-A do not include DON.  STAP-B and FU-B structures include
796	   DON, and the structure of MTAPs enables derivation of DON as
797	   specified in section 5.7.2.

799	      Informative note: When an FU-A occurs in interleaved mode, it
800	      always follows an FU-B, which sets its DON.

802	      Informative note: If a transmitter wants to encapsulate a single
803	      NAL unit per packet and transmit packets out of their decoding
804	      order, STAP-B packet type can be used.

806	   In the single NAL unit packetization mode, the transmission order of
807	   NAL units, determined by the RTP sequence number, MUST be the same as
808	   their NAL unit decoding order.  In the non-interleaved packetization
809	   mode, the transmission order of NAL units in single NAL unit packets,
810	   STAP-As, and FU-As MUST be the same as their NAL unit decoding order.
811	   The NAL units within an STAP MUST appear in the NAL unit decoding
812	   order.  Thus, the decoding order is first provided through the
813	   implicit order within a STAP, and second provided through the RTP
814	   sequence number for the order between STAPs, FUs, and single NAL unit
815	   packets.

817	   Signaling of the value of DON for NAL units carried in STAP-B, MTAP,
818	   and a series of fragmentation units starting with an FU-B is
819	   specified in sections 5.7.1, 5.7.2, and 5.8, respectively.  The DON
820	   value of the first NAL unit in transmission order MAY be set to any
821	   value.  Values of DON are in the range of 0 to 65535, inclusive.
822	   After reaching the maximum value, the value of DON wraps around to 0.

824	   The decoding order of two NAL units contained in any STAP-B, MTAP, or
825	   a series of fragmentation units starting with an FU-B is determined
826	   as follows.  Let DON(i) be the decoding order number of the NAL unit
827	   having index i in the transmission order.  Function don_diff(m,n) is
828	   specified as follows:

830	         If DON(m) == DON(n), don_diff(m,n) = 0

832	         If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
833	         don_diff(m,n) = DON(n) - DON(m)

835	         If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
836	         don_diff(m,n) = 65536 - DON(m) + DON(n)

838	         If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
839	         don_diff(m,n) = - (DON(m) + 65536 - DON(n))

841	         If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
842	         don_diff(m,n) = - (DON(m) - DON(n))

844	   A positive value of don_diff(m,n) indicates that the NAL unit having
845	   transmission order index n follows, in decoding order, the NAL unit
846	   having transmission order index m.  When don_diff(m,n) is equal to 0,
847	   then the NAL unit decoding order of the two NAL units can be in
848	   either order.  A negative value of don_diff(m,n) indicates that the
849	   NAL unit having transmission order index n precedes, in decoding
850	   order, the NAL unit having transmission order index m.

852	   Values of DON related fields (DON, DONB, and DOND; see section 5.7)
853	   MUST be such that the decoding order determined by the values of DON,
854	   as specified above, conforms to the NAL unit decoding order.  If the
855	   order of two NAL units in NAL unit decoding order is switched and the
856	   new order does not conform to the NAL unit decoding order, the NAL
857	   units MUST NOT have the same value of DON.  If the order of two
858	   consecutive NAL units in the NAL unit stream is switched and the new
859	   order still conforms to the NAL unit decoding order, the NAL units
860	   MAY have the same value of DON.  For example, when arbitrary slice
861	   order is allowed by the video coding profile in use, all the coded
862	   slice NAL units of a coded picture are allowed to have the same value
863	   of DON.  Consequently, NAL units having the same value of DON can be
864	   decoded in any order, and two NAL units having a different value of
865	   DON should be passed to the decoder in the order specified above.
866	   When two consecutive NAL units in the NAL unit decoding order have a
867	   different value of DON, the value of DON for the second NAL unit in
868	   decoding order SHOULD be the value of DON for the first, incremented
869	   by one.

871	   An example of the de-packetization process to recover the NAL unit
872	   decoding order is given in section 7.

874	      Informative note: Receivers should not expect that the absolute
875	      difference of values of DON for two consecutive NAL units in the
876	      NAL unit decoding order will be equal to one, even in error-free
877	      transmission.  An increment by one is not required, as at the
878	      time of associating values of DON to NAL units, it may not be
879	      known whether all NAL units are delivered to the receiver.  For
880	      example, a gateway may not forward coded slice NAL units of non-
881	      reference pictures or SEI NAL units when there is a shortage of
882	      bit rate in the network to which the packets are forwarded.  In
883	      another example, a live broadcast is interrupted by pre-encoded
884	      content, such as commercials, from time to time.  The first intra
885	      picture of a pre-encoded clip is transmitted in advance to ensure
886	      that it is readily available in the receiver.  When transmitting
887	      the first intra picture, the originator does not exactly know how
888	      many NAL units will be encoded before the first intra picture of
889	      the pre-encoded clip follows in decoding order.  Thus, the values
890	      of DON for the NAL units of the first intra picture of the pre-
891	      encoded clip have to be estimated when they are transmitted, and
892	      gaps in values of DON may occur.

894	5.6. Single NAL Unit Packet

896	   The single NAL unit packet defined here MUST contain only one NAL
897	   unit, of the types defined in [1].  This means that neither an
898	   aggregation packet nor a fragmentation unit can be used within a
899	   single NAL unit packet.  A NAL unit stream composed by de-packetizing
900	   single NAL unit packets in RTP sequence number order MUST conform to
901	   the NAL unit decoding order.  The structure of the single NAL unit
902	   packet is shown in Figure 2.

904	      Informative note: The first byte of a NAL unit co-serves as the
905	      RTP payload header.

907	    0                   1                   2                   3
908	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
909	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
910	   |F|NRI|  Type   |                                               |
911	   +-+-+-+-+-+-+-+-+                                               |
912	   |                                                               |
913	   |               Bytes 2..n of a Single NAL unit                 |
914	   |                                                               |
915	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
916	   |                               :...OPTIONAL RTP padding        |
917	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

919	          Figure 2 RTP payload format for single NAL unit packet

921	5.7. Aggregation Packets

923	   Aggregation packets are the NAL unit aggregation scheme of this
924	   payload specification.  The scheme is introduced to reflect the
925	   dramatically different MTU sizes of two key target networks: wireline
926	   IP networks (with an MTU size that is often limited by the Ethernet
927	   MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T H.324/M)
928	   based wireless communication systems with preferred transmission unit
929	   sizes of 254 bytes or less.  To prevent media transcoding between the
930	   two worlds, and to avoid undesirable packetization overhead, a NAL
931	   unit aggregation scheme is introduced.

933	   Two types of aggregation packets are defined by this specification:

935	   o  Single-time aggregation packet (STAP): aggregates NAL units with
936	      identical NALU-time.  Two types of STAPs are defined, one without
937	      DON (STAP-A) and another including DON (STAP-B).

939	   o  Multi-time aggregation packet (MTAP): aggregates NAL units with
940	      potentially differing NALU-time.  Two different MTAPs are defined,
941	      differing in the length of the NAL unit timestamp offset.

943	   Each NAL unit to be carried in an aggregation packet is encapsulated
944	   in an aggregation unit.  Please see below for the four different
945	   aggregation units and their characteristics.

947	   The structure of the RTP payload format for aggregation packets is
948	   presented in Figure 3.

950	    0                   1                   2                   3
951	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
952	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
953	   |F|NRI|  Type   |                                               |
954	   +-+-+-+-+-+-+-+-+                                               |
955	   |                                                               |
956	   |             one or more aggregation units                     |
957	   |                                                               |
958	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
959	   |                               :...OPTIONAL RTP padding        |
960	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

962	            Figure 3 RTP payload format for aggregation packets

964	   MTAPs and STAPs share the following packetization rules:  The RTP
965	   timestamp MUST be set to the earliest of the NALU-times of all the
966	   NAL units to be aggregated.  The type field of the NAL unit type
967	   octet MUST be set to the appropriate value, as indicated in Table 4.
968	   The F bit MUST be cleared if all F bits of the aggregated NAL units
969	   are zero; otherwise, it MUST be set.  The value of NRI MUST be the
970	   maximum of all the NAL units carried in the aggregation packet.

972	                 Table 4.  Type field for STAPs and MTAPs

974	      Type   Packet    Timestamp offset   DON related fields
975	                       field length       (DON, DONB, DOND)
976	                       (in bits)          present
977	      --------------------------------------------------------
978	      24     STAP-A       0                 no
979	      25     STAP-B       0                 yes
980	      26     MTAP16      16                 yes
981	      27     MTAP24      24                 yes

983	   The marker bit in the RTP header is set to the value that the marker
984	   bit of the last NAL unit of the aggregated packet would have if it
985	   were transported in its own RTP packet.

987	   The payload of an aggregation packet consists of one or more
988	   aggregation units.  See sections 5.7.1 and 5.7.2 for the four
989	   different types of aggregation units.  An aggregation packet can
990	   carry as many aggregation units as necessary; however, the total
991	   amount of data in an aggregation packet obviously MUST fit into an IP
992	   packet, and the size SHOULD be chosen so that the resulting IP packet
993	   is smaller than the MTU size.  An aggregation packet MUST NOT contain
994	   fragmentation units specified in section 5.8.  Aggregation packets
995	   MUST NOT be nested; i.e., an aggregation packet MUST NOT contain
996	   another aggregation packet.

998	5.7.1. Single-Time Aggregation Packet

1000	   Single-time aggregation packet (STAP) SHOULD be used whenever NAL
1001	   units are aggregated that all share the same NALU-time.  The payload
1002	   of an STAP-A does not include DON and consists of at least one
1003	   single-time aggregation unit, as presented in Figure 4.  The payload
1004	   of an STAP-B consists of a 16-bit unsigned decoding order number
1005	   (DON) (in network byte order) followed by at least one single-time
1006	   aggregation unit, as presented in Figure 5.

1008	    0                   1                   2                   3
1009	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1010	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1011	                   :                                               |
1012	   +-+-+-+-+-+-+-+-+                                               |
1013	   |                                                               |
1014	   |                single-time aggregation units                  |
1015	   |                                                               |
1016	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1017	   |                               :
1018	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1020	                    Figure 4 Payload format for STAP-A

1022	    0                   1                   2                   3
1023	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1024	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1025	                   :  decoding order number (DON)  |               |
1026	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1027	   |                                                               |
1028	   |                single-time aggregation units                  |
1029	   |                                                               |
1030	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1031	   |                               :
1032	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1034	                    Figure 5 Payload format for STAP-B

1036	   The DON field specifies the value of DON for the first NAL unit in an
1037	   STAP-B in transmission order.  For each successive NAL unit in
1038	   appearance order in an STAP-B, the value of DON is equal to (the
1039	   value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in
1040	   which '%' stands for the modulo operation.

1042	   A single-time aggregation unit consists of 16-bit unsigned size
1043	   information (in network byte order) that indicates the size of the
1044	   following NAL unit in bytes (excluding these two octets, but
1045	   including the NAL unit type octet of the NAL unit), followed by the
1046	   NAL unit itself, including its NAL unit type byte.  A single-time
1047	   aggregation unit is byte aligned within the RTP payload, but it may
1048	   not be aligned on a 32-bit word boundary.  Figure 6 presents the
1049	   structure of the single-time aggregation unit.

1051	    0                   1                   2                   3
1052	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1053	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1054	                   :        NAL unit size          |               |
1055	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1056	   |                                                               |
1057	   |                           NAL unit                            |
1058	   |                                                               |
1059	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1060	   |                               :
1061	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1063	            Figure 6 Structure for single-time aggregation unit

1065	   Figure 7 presents an example of an RTP packet that contains an STAP-
1066	   A.  The STAP contains two single-time aggregation units, labeled as 1
1067	   and 2 in the figure.

1069	    0                   1                   2                   3
1070	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1071	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1072	   |                          RTP Header                           |
1073	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1074	   |STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
1075	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1076	   |                         NALU 1 Data                           |
1077	   :                                                               :
1078	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1079	   |               | NALU 2 Size                   | NALU 2 HDR    |
1080	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1081	   |                         NALU 2 Data                           |
1082	   :                                                               :
1083	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1084	   |                               :...OPTIONAL RTP padding        |
1085	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1087	    Figure 7 An example of an RTP packet including an STAP-A containing
1088	                     two single-time aggregation units

1090	   Figure 8 presents an example of an RTP packet that contains an STAP-
1091	   B.  The STAP contains two single-time aggregation units, labeled as 1
1092	   and 2 in the figure.

1094	    0                   1                   2                   3
1095	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1096	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1097	   |                          RTP Header                           |
1098	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1099	   |STAP-B NAL HDR | DON                           | NALU 1 Size   |
1100	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1101	   | NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
1102	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1103	   :                                                               :
1104	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1105	   |               | NALU 2 Size                   | NALU 2 HDR    |
1106	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1107	   |                       NALU 2 Data                             |
1108	   :                                                               :
1109	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1110	   |                               :...OPTIONAL RTP padding        |
1111	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1113	    Figure 8 An example of an RTP packet including an STAP-B containing
1114	                     two single-time aggregation units

1116	5.7.2. Multi-Time Aggregation Packets (MTAPs)

1118	   The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding
1119	   order number base (DONB) (in network byte order) and one or more
1120	   multi-time aggregation units, as presented in Figure 9.  DONB MUST
1121	   contain the value of DON for the first NAL unit in the NAL unit
1122	   decoding order among the NAL units of the MTAP.

1124	      Informative note: The first NAL unit in the NAL unit decoding
1125	      order is not necessarily the first NAL unit in the order in which
1126	      the NAL units are encapsulated in an MTAP.

1128	    0                   1                   2                   3
1129	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1130	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1131	                   :  decoding order number base   |               |
1132	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1133	   |                                                               |
1134	   |                 multi-time aggregation units                  |
1135	   |                                                               |
1136	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1137	   |                               :
1138	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1140	                Figure 9 NAL unit payload format for MTAPs

1142	   Two different multi-time aggregation units are defined in this
1143	   specification.  Both of them consist of 16 bits unsigned size
1144	   information of the following NAL unit (in network byte order), an 8-
1145	   bit unsigned decoding order number difference (DOND), and n bits (in
1146	   network byte order) of timestamp offset (TS offset) for this NAL
1147	   unit, whereby n can be 16 or 24.  The choice between the different
1148	   MTAP types (MTAP16 and MTAP24) is application dependent: the larger
1149	   the timestamp offset is, the higher the flexibility of the MTAP, but
1150	   the overhead is also higher.

1152	   The structure of the multi-time aggregation units for MTAP16 and
1153	   MTAP24 are presented in Figures 10 and 11, respectively.  The
1154	   starting or ending position of an aggregation unit within a packet is
1155	   NOT REQUIRED to be on a 32-bit word boundary.  The DON of the NAL
1156	   unit contained in a multi-time aggregation unit is equal to (DONB +
1157	   DOND) % 65536, in which % denotes the modulo operation.  This memo
1158	   does not specify how the NAL units within an MTAP are ordered, but,
1159	   in most cases, NAL unit decoding order SHOULD be used.

1161	   The timestamp offset field MUST be set to a value equal to the value
1162	   of the following formula: If the NALU-time is larger than or equal to
1163	   the RTP timestamp of the packet, then the timestamp offset equals
1164	   (the NALU-time of the NAL unit - the RTP timestamp of the packet).
1165	   If the NALU-time is smaller than the RTP timestamp of the packet,
1166	   then the timestamp offset is equal to the NALU-time + (2^32 - the RTP
1167	   timestamp of the packet).

1169	    0                   1                   2                   3
1170	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1171	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1172	   :        NAL unit size          |      DOND     |  TS offset    |
1173	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1174	   |  TS offset    |                                               |
1175	   +-+-+-+-+-+-+-+-+              NAL unit                         |
1176	   |                                                               |
1177	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1178	   |                               :
1179	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1181	             Figure 10  Multi-time aggregation unit for MTAP16

1183	    0                   1                   2                   3
1184	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1185	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1186	   :        NAL unit size         |      DOND     |  TS offset    |
1187	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1188	   |         TS offset             |                               |
1189	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1190	   |                              NAL unit                         |
1191	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1192	   |                               :
1193	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1195	             Figure 11  Multi-time aggregation unit for MTAP24

1197	   For the "earliest" multi-time aggregation unit in an MTAP the
1198	   timestamp offset MUST be zero.  Hence, the RTP timestamp of the MTAP
1199	   itself is identical to the earliest NALU-time.

1201	      Informative note: The "earliest" multi-time aggregation unit is
1202	      the one that would have the smallest extended RTP timestamp among
1203	      all the aggregation units of an MTAP if the NAL units contained
1204	      in the aggregation units were encapsulated in single NAL unit
1205	      packets.  An extended timestamp is a timestamp that has more than
1206	      32 bits and is capable of counting the wraparound of the
1207	      timestamp field, thus enabling one to determine the smallest
1208	      value if the timestamp wraps.  Such an "earliest" aggregation
1209	      unit may not be the first one in the order in which the
1210	      aggregation units are encapsulated in an MTAP.  The "earliest"
1211	      NAL unit need not be the same as the first NAL unit in the NAL
1212	      unit decoding order either.

1214	   Figure 12 presents an example of an RTP packet that contains a multi-
1215	   time aggregation packet of type MTAP16 that contains two multi-time
1216	   aggregation units, labeled as 1 and 2 in the figure.

1218	    0                   1                   2                   3
1219	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1220	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1221	   |                          RTP Header                           |
1222	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1223	   |MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
1224	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1225	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
1226	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1227	   |  NALU 1 HDR   |  NALU 1 DATA                                  |
1228	   +-+-+-+-+-+-+-+-+                                               +
1229	   :                                                               :
1230	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1231	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1232	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1233	   |       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
1234	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
1235	   :                                                               :
1236	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1237	   |                               :...OPTIONAL RTP padding        |
1238	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1240	   Figure 12  An RTP packet including a multi-time aggregation packet of
1241	          type MTAP16 containing two multi-time aggregation units

1243	   Figure 13 presents an example of an RTP packet that contains a multi-
1244	   time aggregation packet of type MTAP24 that contains two multi-time
1245	   aggregation units, labeled as 1 and 2 in the figure.

1247	    0                   1                   2                   3
1248	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1249	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1250	   |                          RTP Header                           |
1251	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1252	   |MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
1253	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1254	   |  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
1255	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1256	   |NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
1257	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
1258	   :                                                               :
1259	   +               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1260	   |               | NALU 2 SIZE                   |  NALU 2 DOND  |
1261	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1262	   |       NALU 2 TS offset                        |  NALU 2 HDR   |
1263	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1264	   |  NALU 2 DATA                                                  |
1265	   :                                                               :
1266	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1267	   |                               :...OPTIONAL RTP padding        |
1268	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1270	   Figure 13  An RTP packet including a multi-time aggregation packet of
1271	          type MTAP24 containing two multi-time aggregation units

1273	5.7.3. Fragmentation Units (FUs)

1275	   This payload type allows fragmenting a NAL unit into several RTP
1276	   packets.  Doing so on the application layer instead of relying on
1277	   lower layer fragmentation (e.g., by IP) has the following advantages:

1279	   o  The payload format is capable of transporting NAL units bigger
1280	      than 64 kbytes over an IPv4 network that may be present in pre-
1281	      recorded video, particularly in High Definition formats (there is
1282	      a limit of the number of slices per picture, which results in a
1283	      limit of NAL units per picture, which may result in big NAL
1284	      units).

1286	   o  The fragmentation mechanism allows fragmenting a single NAL unit
1287	      and applying generic forward error correction as described in
1288	      section 12.5.

1290	   Fragmentation is defined only for a single NAL unit and not for any
1291	   aggregation packets.  A fragment of a NAL unit consists of an integer
1292	   number of consecutive octets of that NAL unit.  Each octet of the NAL
1293	   unit MUST be part of exactly one fragment of that NAL unit.

1295	   Fragments of the same NAL unit MUST be sent in consecutive order with
1296	   ascending RTP sequence numbers (with no other RTP packets within the
1297	   same RTP packet stream being sent between the first and last
1298	   fragment).  Similarly, a NAL unit MUST be reassembled in RTP sequence
1299	   number order.

1301	   When a NAL unit is fragmented and conveyed within fragmentation units
1302	   (FUs), it is referred to as a fragmented NAL unit.  STAPs and MTAPs
1303	   MUST NOT be fragmented.  FUs MUST NOT be nested; i.e., an FU MUST NOT
1304	   contain another FU.

1306	   The RTP timestamp of an RTP packet carrying an FU is set to the NALU-
1307	   time of the fragmented NAL unit.

1309	   Figure 14 presents the RTP payload format for FU-As.  An FU-A
1310	   consists of a fragmentation unit indicator of one octet, a
1311	   fragmentation unit header of one octet, and a fragmentation unit
1312	   payload.

1314	    0                   1                   2                   3
1315	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1316	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1317	   | FU indicator  |   FU header   |                               |
1318	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
1319	   |                                                               |
1320	   |                         FU payload                            |
1321	   |                                                               |
1322	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1323	   |                               :...OPTIONAL RTP padding        |
1324	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1326	                  Figure 14  RTP payload format for FU-A

1328	   Figure 15 presents the RTP payload format for FU-Bs.  An FU-B
1329	   consists of a fragmentation unit indicator of one octet, a
1330	   fragmentation unit header of one octet, a decoding order number (DON)
1331	   (in network byte order), and a fragmentation unit payload.  In other
1332	   words, the structure of FU-B is the same as the structure of FU-A,
1333	   except for the additional DON field.

1335	    0                   1                   2                   3
1336	    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
1337	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1338	   | FU indicator  |   FU header   |               DON             |
1339	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
1340	   |                                                               |
1341	   |                         FU payload                            |
1342	   |                                                               |
1343	   |                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
1344	   |                               :...OPTIONAL RTP padding        |
1345	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

1347	                  Figure 15  RTP payload format for FU-B

1349	   NAL unit type FU-B MUST be used in the interleaved packetization mode
1350	   for the first fragmentation unit of a fragmented NAL unit.  NAL unit
1351	   type FU-B MUST NOT be used in any other case.  In other words, in the
1352	   interleaved packetization mode, each NALU that is fragmented has an
1353	   FU-B as the first fragment, followed by one or more FU-A fragments.

1355	   The FU indicator octet has the following format:

1357	      +---------------+
1358	      |0|1|2|3|4|5|6|7|
1359	      +-+-+-+-+-+-+-+-+
1360	      |F|NRI|  Type   |
1361	      +---------------+

1363	   Values equal to 28 and 29 in the Type field of the FU indicator octet
1364	   identify an FU-A and an FU-B, respectively.  The use of the F bit is
1365	   described in section 5.3.  The value of the NRI field MUST be set
1366	   according to the value of the NRI field in the fragmented NAL unit.

1368	   The FU header has the following format:

1370	      +---------------+
1371	      |0|1|2|3|4|5|6|7|
1372	      +-+-+-+-+-+-+-+-+
1373	      |S|E|R|  Type   |
1374	      +---------------+

1376	   S: 1 bit
1377	      When set to one, the Start bit indicates the start of a
1378	      fragmented NAL unit.  When the following FU payload is not the
1379	      start of a fragmented NAL unit payload, the Start bit is set to
1380	      zero.

1382	   E: 1 bit
1383	      When set to one, the End bit indicates the end of a fragmented
1384	      NAL unit, i.e., the last byte of the payload is also the last
1385	      byte of the fragmented NAL unit.  When the following FU payload
1386	      is not the last fragment of a fragmented NAL unit, the End bit is
1387	      set to zero.

1389	   R: 1 bit
1390	      The Reserved bit MUST be equal to 0 and MUST be ignored by the
1391	      receiver.

1393	   Type: 5 bits
1394	      The NAL unit payload type as defined in Table 7-1 of [1].

1396	   The value of DON in FU-Bs is selected as described in section 5.5.

1398	      Informative note: The DON field in FU-Bs allows gateways to
1399	      fragment NAL units to FU-Bs without organizing the incoming NAL
1400	      units to the NAL unit decoding order.

1402	   A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the
1403	   Start bit and End bit MUST NOT both be set to one in the same FU
1404	   header.

1406	   The FU payload consists of fragments of the payload of the fragmented
1407	   NAL unit so that if the fragmentation unit payloads of consecutive
1408	   FUs are sequentially concatenated, the payload of the fragmented NAL
1409	   unit can be reconstructed.  The NAL unit type octet of the fragmented
1410	   NAL unit is not included as such in the fragmentation unit payload,
1411	   but rather the information of the NAL unit type octet of the
1412	   fragmented NAL unit is conveyed in F and NRI fields of the FU
1413	   indicator octet of the fragmentation unit and in the type field of
1414	   the FU header.  An FU payload MAY have any number of octets and MAY
1415	   be empty.

1417	      Informative note: Empty FUs are allowed to reduce the latency of
1418	      a certain class of senders in nearly lossless environments.
1419	      These senders can be characterized in that they packetize NALU
1420	      fragments before the NALU is completely generated and, hence,
1421	      before the NALU size is known.  If zero-length NALU fragments
1422	      were not allowed, the sender would have to generate at least one
1423	      bit of data of the following fragment before the current fragment
1424	      could be sent.  Due to the characteristics of H.264, where
1425	      sometimes several macroblocks occupy zero bits, this is
1426	      undesirable and can add delay.  However, the (potential) use of
1427	      zero-length NALU fragments should be carefully weighed against
1428	      the increased risk of the loss of at least a part of the NALU
1429	      because of the additional packets employed for its transmission.

1431	   If a fragmentation unit is lost, the receiver SHOULD discard all
1432	   following fragmentation units in transmission order corresponding to
1433	   the same fragmented NAL unit.

1435	   A receiver in an endpoint or in a MANE MAY aggregate the first n-1
1436	   fragments of a NAL unit to an (incomplete) NAL unit, even if fragment
1437	   n of that NAL unit is not received.  In this case, the
1438	   forbidden_zero_bit of the NAL unit MUST be set to one to indicate a
1439	   syntax violation.

1441	6. Packetization Rules

1443	   The packetization modes are introduced in section 5.2.  The
1444	   packetization rules common to more than one of the packetization
1445	   modes are specified in section 6.1.  The packetization rules for the
1446	   single NAL unit mode, the non-interleaved mode, and the interleaved
1447	   mode are specified in sections 6.2, 6.3, and 6.4, respectively.

1449	6.1. Common Packetization Rules

1451	   All senders MUST enforce the following packetization rules regardless
1452	   of the packetization mode in use:

1454	   o  Coded slice NAL units or coded slice data partition NAL units
1455	      belonging to the same coded picture (and thus sharing the same RTP
1456	      timestamp value) MAY be sent in any order; however, for delay-
1457	      critical systems, they SHOULD be sent in their original decoding
1458	      order to minimize the delay.  Note that the decoding order is the
1459	      order of the NAL units in the bitstream.

1461	   o  Parameter sets are handled in accordance with the rules and
1462	      recommendations given in section 8.4.

1464	   o  MANEs MUST NOT duplicate any NAL unit except for sequence or
1465	      picture parameter set NAL units, as neither this memo nor the
1466	      H.264 specification provides means to identify duplicated NAL
1467	      units.  Sequence and picture parameter set NAL units MAY be
1468	      duplicated to make their correct reception more probable, but any
1469	      such duplication MUST NOT affect the contents of any active
1470	      sequence or picture parameter set.  Duplication SHOULD be
1471	      performed on the application layer and not by duplicating RTP
1472	      packets (with identical sequence numbers).

1474	   Senders using the non-interleaved mode and the interleaved mode MUST
1475	   enforce the following packetization rule:

1477	   o  MANEs MAY convert single NAL unit packets into one aggregation
1478	      packet, convert an aggregation packet into several single NAL unit
1479	      packets, or mix both concepts, in an RTP translator.  The RTP
1480	      translator SHOULD take into account at least the following
1481	      parameters: path MTU size, unequal protection mechanisms (e.g.,
1482	      through packet-based FEC according to RFC 2733 [17], especially
1483	      for sequence and picture parameter set NAL units and coded slice
1484	      data partition A NAL units), bearable latency of the system, and
1485	      buffering capabilities of the receiver.

1487	         Informative note: An RTP translator is required to handle RTCP
1488	         as per RFC 3550.

1490	6.2. Single NAL Unit Mode

1492	   This mode is in use when the value of the OPTIONAL packetization-mode
1493	   media type parameter is equal to 0 or the packetization-mode is not
1494	   present.  All receivers MUST support this mode.  It is primarily
1495	   intended for low-delay applications that are compatible with systems
1496	   using ITU-T Recommendation H.241 [3] (see section 12.1).  Only single
1497	   NAL unit packets MAY be used in this mode.  STAPs, MTAPs, and FUs
1498	   MUST NOT be used.  The transmission order of single NAL unit packets
1499	   MUST comply with the NAL unit decoding order.

1501	6.3. Non-Interleaved Mode

1503	   This mode is in use when the value of the OPTIONAL packetization-mode
1504	   media type parameter is equal to 1.  This mode SHOULD be supported.
1505	   It is primarily intended for low-delay applications.  Only single NAL
1506	   unit packets, STAP-As, and FU-As MAY be used in this mode.  STAP-Bs,
1507	   MTAPs, and FU-Bs MUST NOT be used.  The transmission order of NAL
1508	   units MUST comply with the NAL unit decoding order.

1510	6.4. Interleaved Mode

1512	   This mode is in use when the value of the OPTIONAL packetization-mode
1513	   media type parameter is equal to 2.  Some receivers MAY support this
1514	   mode.  STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used.  STAP-As and
1515	   single NAL unit packets MUST NOT be used.  The transmission order of
1516	   packets and NAL units is constrained as specified in section 5.5.

1518	7. De-Packetization Process

1520	   The de-packetization process is implementation dependent.  Therefore,
1521	   the following description should be seen as an example of a suitable
1522	   implementation.  Other schemes may be used as well as long as the
1523	   output for the same input is the same as the process described below.
1524	   The output is the same meaning that the number of NAL units and their
1525	   order are both the identical.  Optimizations relative to the
1526	   described algorithms are likely possible.  Section 7.1 presents the
1527	   de-packetization process for the single NAL unit and non-interleaved
1528	   packetization modes, whereas section 7.2 describes the process for
1529	   the interleaved mode.  Section 7.3 includes additional de-
1530	   packetization guidelines for intelligent receivers.

1532	   All normal RTP mechanisms related to buffer management apply.  In
1533	   particular, duplicated or outdated RTP packets (as indicated by the
1534	   RTP sequences number and the RTP timestamp) are removed.  To
1535	   determine the exact time for decoding, factors such as a possible
1536	   intentional delay to allow for proper inter-stream synchronization
1537	   must be factored in.

1539	7.1. Single NAL Unit and Non-Interleaved Mode

1541	   The receiver includes a receiver buffer to compensate for
1542	   transmission delay jitter.  The receiver stores incoming packets in
1543	   reception order into the receiver buffer.  Packets are de-packetized
1544	   in RTP sequence number order.  If a de-packetized packet is a single
1545	   NAL unit packet, the NAL unit contained in the packet is passed
1546	   directly to the decoder.  If a de-packetized packet is an STAP-A, the
1547	   NAL units contained in the packet are passed to the decoder in the
1548	   order in which they are encapsulated in the packet.  For all the FU-A
1549	   packets containing fragments of a single NAL unit, the de-packetized
1550	   fragments are concatenated in their sending order to recover the NAL
1551	   unit, which is then passed to the decoder.

1553	      Informative note: If the decoder supports Arbitrary Slice Order,
1554	      coded slices of a picture can be passed to the decoder in any
1555	      order regardless of their reception and transmission order.

1557	7.2. Interleaved Mode

1559	   The general concept behind these de-packetization rules is to reorder
1560	   NAL units from transmission order to the NAL unit decoding order.

1562	   The receiver includes a receiver buffer, which is used to compensate
1563	   for transmission delay jitter and to reorder NAL units from
1564	   transmission order to the NAL unit decoding order.  In this section,
1565	   the receiver operation is described under the assumption that there
1566	   is no transmission delay jitter.  To make a difference from a
1567	   practical receiver buffer that is also used for compensation of
1568	   transmission delay jitter, the receiver buffer is here after called
1569	   the de-interleaving buffer in this section.  Receivers SHOULD also
1570	   prepare for transmission delay jitter; i.e., either reserve separate
1571	   buffers for transmission delay jitter buffering and de-interleaving
1572	   buffering or use a receiver buffer for both transmission delay jitter
1573	   and de-interleaving.  Moreover, receivers SHOULD take transmission
1574	   delay jitter into account in the buffering operation; e.g., by
1575	   additional initial buffering before starting of decoding and
1576	   playback.

1578	   This section is organized as follows: subsection 7.2.1 presents how o
1579	   calculate the size of the de-interleaving buffer.  Subsection 7.2.2
1580	   specifies the receiver process how to organize received NAL units to
1581	   the NAL unit decoding order.

1583	7.2.1. Size of the De-interleaving Buffer

1585	   When the SDP Offer/Answer model or any other capability exchange
1586	   procedure is used in session setup, the properties of the received
1587	   stream SHOULD be such that the receiver capabilities are not
1588	   exceeded.  In the SDP Offer/Answer model, the receiver can indicate
1589	   its capabilities to allocate a de-interleaving buffer with the deint-
1590	   buf-cap media type parameter.  The sender indicates the requirement
1591	   for the de-interleaving buffer size with the sprop-deint-buf-req
1592	   media type parameter.  It is therefore RECOMMENDED to set the de-
1593	   interleaving buffer size, in terms of number of bytes, equal to or
1594	   greater than the value of sprop-deint-buf-req media type parameter.
1595	   See section 8.1 for further information on deint-buf-cap and sprop-
1596	   deint-buf-req media type parameters and section 8.2.2 for further
1597	   information on their use in the SDP Offer/Answer model.

1599	   When a declarative session description is used in session setup, the
1600	   sprop-deint-buf-req media type parameter signals the requirement for
1601	   the de-interleaving buffer size.  It is therefore RECOMMENDED to set
1602	   the de-interleaving buffer size, in terms of number of bytes, equal
1603	   to or greater than the value of sprop-deint-buf-req media type
1604	   parameter.

1606	7.2.2. De-interleaving Process

1608	   There are two buffering states in the receiver: initial buffering and
1609	   buffering while playing.  Initial buffering occurs when the RTP
1610	   session is initialized.  After initial buffering, decoding and
1611	   playback are started, and the buffering-while-playing mode is used.

1613	   Regardless of the buffering state, the receiver stores incoming NAL
1614	   units, in reception order, in the de-interleaving buffer as follows.
1615	   NAL units of aggregation packets are stored in the de-interleaving
1616	   buffer individually.  The value of DON is calculated and stored for
1617	   each NAL unit.

1619	   The receiver operation is described below with the help of the
1620	   following functions and constants:

1622	   o  Function AbsDON is specified in section 8.1.

1624	   o  Function don_diff is specified in section 5.5.

1626	   o  Constant N is the value of the OPTIONAL sprop-interleaving-depth
1627	      media type type parameter (see section 8.1) incremented by 1.

1629	   Initial buffering lasts until one of the following conditions is
1630	   fulfilled:

1632	   o  There are N or more VCL NAL units in the de-interleaving buffer.

1634	   o  If sprop-max-don-diff is present, don_diff(m,n) is greater than
1635	      the value of sprop-max-don-diff, in which n corresponds to the NAL
1636	      unit having the greatest value of AbsDON among the received NAL
1637	      units and m corresponds to the NAL unit having the smallest value
1638	      of AbsDON among the received NAL units.

1640	   o  Initial buffering has lasted for the duration equal to or greater
1641	      than the value of the OPTIONAL sprop-init-buf-time media type
1642	      parameter.

1644	   The NAL units to be removed from the de-interleaving buffer are
1645	   determined as follows:

1647	   o  If the de-interleaving buffer contains at least N VCL NAL units,
1648	      NAL units are removed from the de-interleaving buffer and passed
1649	      to the decoder in the order specified below until the buffer
1650	      contains N-1 VCL NAL units.

1652	   o  If sprop-max-don-diff is present, all NAL units m for which
1653	      don_diff(m,n) is greater than sprop-max-don-diff are removed from
1654	      the de-interleaving buffer and passed to the decoder in the order
1655	      specified below.  Herein, n corresponds to the NAL unit having the
1656	      greatest value of AbsDON among the NAL units in the de-
1657	      interleaving buffer.

1659	   The order in which NAL units are passed to the decoder is specified
1660	   as follows:

1662	   o  Let PDON be a variable that is initialized to 0 at the beginning
1663	      of the RTP session.

1665	   o  For each NAL unit associated with a value of DON, a DON distance
1666	      is calculated as follows.  If the value of DON of the NAL unit is
1667	      larger than the value of PDON, the DON distance is equal to DON -
1668	      PDON.  Otherwise, the DON distance is equal to 65535 - PDON + DON
1669	      + 1.

1671	   o  NAL units are delivered to the decoder in ascending order of DON
1672	      distance.  If several NAL units share the same value of DON
1673	      distance, they can be passed to the decoder in any order.

1675	   o  When a desired number of NAL units have been passed to the
1676	      decoder, the value of PDON is set to the value of DON for the last
1677	      NAL unit passed to the decoder.

1679	7.3. Additional De-Packetization Guidelines

1681	   The following additional de-packetization rules may be used to
1682	   implement an operational H.264 de-packetizer:

1684	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1685	      coded slice data partitions A (DPAs).  If a lost DPA is found, a
1686	      gateway may decide not to send the corresponding coded slice data
1687	      partitions B and C, as their information is meaningless for H.264
1688	      decoders.  In this way a MANE can reduce network load by
1689	      discarding useless packets without parsing a complex bitstream.

1691	   o  Intelligent RTP receivers (e.g., in gateways) may identify lost
1692	      FUs.  If a lost FU is found, a gateway may decide not to send the
1693	      following FUs of the same fragmented NAL unit, as their
1694	      information is meaningless for H.264 decoders.  In this way a MANE
1695	      can reduce network load by discarding useless packets without
1696	      parsing a complex bitstream.

1698	   o  Intelligent receivers having to discard packets or NALUs should
1699	      first discard all packets/NALUs in which the value of the NRI
1700	      field of the NAL unit type octet is equal to 0.  This will
1701	      minimize the impact on user experience and keep the reference
1702	      pictures intact.  If more packets have to be discarded, then
1703	      packets with a numerically lower NRI value should be discarded
1704	      before packets with a numerically higher NRI value.  However,
1705	      discarding any packets with an NRI bigger than 0 very likely leads
1706	      to decoder drift and SHOULD be avoided.

1708	8. Payload Format Parameters

1710	   This section specifies the parameters that MAY be used to select
1711	   optional features of the payload format and certain features of the
1712	   bitstream.  The parameters are specified here as part of the media
1713	   subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.  A
1714	   mapping of the parameters into the Session Description Protocol (SDP)
1715	   [6] is also provided for applications that use SDP.  Equivalent
1716	   parameters could be defined elsewhere for use with control protocols
1717	   that do not use SDP.

1719	   Some parameters provide a receiver with the properties of the stream
1720	   that will be sent.  The names of all these parameters start with
1721	   "sprop" for stream properties.  Some of these "sprop" parameters are
1722	   limited by other payload or codec configuration parameters.  For
1723	   example, the sprop-parameter-sets parameter is constrained by the
1724	   profile-level-id parameter.  The media sender selects all "sprop"
1725	   parameters rather than the receiver.  This uncommon characteristic of
1726	   the "sprop" parameters may not be compatible with some signaling
1727	   protocol concepts, in which case the use of these parameters SHOULD
1728	   be avoided.

1730	8.1. Media Type Registration

1732	   The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is
1733	   allocated from the IETF tree.

1735	   The receiver MUST ignore any unspecified parameter.

1737	   Media Type name:     video

1739	   Media subtype name:  H264

1741	   Required parameters: none

1743	   OPTIONAL parameters:

1745	      profile-level-id:
1746	         A base16 [7] (hexadecimal) representation of the following
1747	         three bytes in the sequence parameter set NAL unit specified
1748	         in [1]: 1) profile_idc, 2) a byte herein referred to as
1749	         profile-iop, composed of the values of constraint_set0_flag,
1750	         constraint_set1_flag,constraint_set2_flag,
1751	         constraint_set3_flag, and reserved_zero_4bits in bit-
1752	         significance order, starting from the most significant bit,
1753	         and 3) level_idc.  Note that reserved_zero_4bits is required
1754	         to be equal to 0 in [1], but other values for it may be
1755	         specified in the future by ITU-T or ISO/IEC.

1757	         The profile-level-id parameter indicates the default sub-
1758	         profile, i.e. the subset of coding tools that may have been
1759	         used to generate the stream or the receiver supports, and the
1760	         default level of the stream or the receiver supports.

1762	         The default sub-profile is indicated collectively by the
1763	         profile_idc byte and some fields in the profile-iop byte.
1764	         Depending on the values of the fields in the profile-iop byte,
1765	         the default sub-profile may be the same set of coding tools
1766	         supported by one profile, or a common subset of coding tools
1767	         of multiple profiles, as specified in subsection 7.4.2.1.1 of
1768	         [1].  The default level is indicated by the level_idc byte,
1769	         and, when profile_idc is equal to 66, 77 or 88 (the Baseline,
1770	         Main, or Extended profile) and level_idc is equal to 11,
1771	         additionally by bit 4 (constraint_set3_flag) of the profile-
1772	         iop byte.  When profile_idc is equal to 66, 77 or 88 (the
1773	         Baseline, Main, or Extended profile) and level_idc is equal to
1774	         11, and bit 4 (constraint_set3_flag) of the profile-iop byte
1775	         is equal to 1, the default level is level 1b.

1777	         Table 5 lists all profiles defined in Annex A of [1] and, for
1778	         each of the profiles, the possible combinations of profile_idc
1779	         and profile-iop that represent the same sub-profile.

1781	            Table 5.  Combinations of profile_idc and profile-iop
1782	            representing the same sub-profile corresponding to the full
1783	            set of coding tools supported by one profile.  In the
1784	            following, x may be either 0 or 1, and other notions as
1785	            follows. CB: Constrained Baseline profile, B: Baseline
1786	            profile, M: Main profile, E: Extended profile, H: High
1787	            profile, H10: High 10 profile, H42: High 4:2:2 profile,
1788	            H44: High 4:4:4 Predictive profile, H10I: High 10 Intra
1789	            profile, H42I: High 4:2:2 Intra profile, H44I: High 4:4:4
1790	            Intra profile, and C44I: CAVLC 4:4:4 Intra profile.

1792	              Profile     profile_idc             profile-iop
1793	                          (hexadecimal)           (binary)

1795	              CB          42                      x1xx0000
1796	                          4D                      1xxx0000
1797	                          58                      11xx0000
1798	                          64, 6E, 7A or F4        1xx00000
1799	              B           42                      x0xx0000
1800	                          58                      10xx0000
1801	              M           4D                      0x0x0000
1802	                          64,6E,7A or F4          01000000
1803	              E           58                      00xx0000
1804	              H           64                      00000000
1805	              H10         6E                      00000000
1806	              H42         7A                      00000000
1807	              H44         F4                      00000000
1808	              H10I        64                      00010000
1809	              H42I        7A                      00010000
1810	              H44I        F4                      00010000
1811	              C44I        2C                      00010000

1813	         Note that other combinations of profile_idc and profile-iop
1814	         (note listed in Table 13) may represent a sub-profile
1815	         equivalent to the common subset of coding tools for more than
1816	         one profile.  Note also that a decoder conforming to a certain
1817	         profile may be able to decode bitstreams conforming to other
1818	         profiles.  For example, a decoder conforming to the High 4:4:4
1819	         profile at certain level must be able to decode bitstreams
1820	         confirming to the Constrained Baseline, Main, High, High 10 or
1821	         High 4:2:2 profile at the same or a lower level.

1823	         If the profile-level-id parameter is used to indicate
1824	         properties of a NAL unit stream, it indicates that, to decode
1825	         the stream, the minimum subset of coding tools a decoder has
1826	         to support is the default sub-profile, and the lowest level
1827	         the decoder has to support is the default level.

1829	         If the profile-level-id parameter is used for capability
1830	         exchange or session setup procedure, it indicates the subset
1831	         of coding tools, which is equal to the default sub-profile,
1832	         and the highest level, which is equal to the default level,
1833	         that the codec supports.  All levels lower than the default
1834	         level are also supported by the codec.

1836	            Informative note: Capability exchange and session setup
1837	            procedures should provide means to list the capabilities
1838	            for each supported sub-profile separately.  For example,
1839	            the one-of-N codec selection procedure of the SDP
1840	            Offer/Answer model can be used (section 10.2 of [8]).  The
1841	            one-of-N codec selection procedure may also be used to
1842	            provide different combinations of profile_idc and profile-
1843	            iop that represent the same sub-profile.  When there are a
1844	            lot of different combinations of profile_idc and profile-
1845	            iop that represent the same sub-profile, using the one-of-N
1846	            codec selection procedure may result into large-sized SDP
1847	            message.  Therefore, a receiver should understand the
1848	            different equivalent combinations of profile_idc and
1849	            profile-iop that represent the same sub-profile, and be
1850	            ready to accept an offer using any of the equivalent
1851	            combinations.

1853	         If no profile-level-id is present, the Baseline Profile
1854	         without additional constraints at Level 1 MUST be implied.

1856	      max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
1857	         These parameters MAY be used to signal the capabilities of a
1858	         receiver implementation. These parameters MUST NOT be used for
1859	         any other purpose.  The profile-level-id parameter MUST be
1860	         present in the same receiver capability description that
1861	         contains any of these parameters.  The level conveyed in the
1862	         value of the profile-level-id parameter MUST be such that the
1863	         receiver is fully capable of supporting.  max-mbps, max-smbps,
1864	         max-fs, max-cpb, max-dpb, and max-br MAY be used to indicate
1865	         capabilities of the receiver that extend the required
1866	         capabilities of the signaled level, as specified below.

1868	         When more than one parameter from the set (max-mbps, max-smbps
1869	         , max-fs, max-cpb, max-dpb, max-br) is present, the receiver
1870	         MUST support all signaled capabilities simultaneously.  For
1871	         example, if both max-mbps and max-br are present, the signaled
1872	         level with the extension of both the frame rate and bit rate
1873	         is supported.  That is, the receiver is able to decode NAL
1874	         unit streams in which the macroblock processing rate is up to
1875	         max-mbps (inclusive), the bit rate is up to max-br
1876	         (inclusive), the coded picture buffer size is derived as
1877	         specified in the semantics of the max-br parameter below, and
1878	         other properties comply with the level specified in the value
1879	         of the profile-level-id parameter.

1881	         If a receiver can support all the properties of level A, the
1882	         level specified in the value of the profile-level-id MUST be
1883	         level A (i.e. MUST NOT be lower than level A).  In other
1884	         words, a sender or receiver MUST NOT signal values of max-
1885	         mbps, max-fs, max-cpb, max-dpb, and max-br that meet the
1886	         requirements of a higher level compared to the level specified
1887	         in the value of the profile-level-id parameter.

1889	            Informative note: When the OPTIONAL media type parameters
1890	            are used to signal the properties of a NAL unit stream,
1891	            max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br
1892	            are not present, and the value of profile-level-id must
1893	            always be such that the NAL unit stream complies fully with
1894	            the specified profile and level.

1896	      max-mbps: The value of max-mbps is an integer indicating the
1897	         maximum macroblock processing rate in units of macroblocks per
1898	         second.  The max-mbps parameter signals that the receiver is
1899	         capable of decoding video at a higher rate than is required by
1900	         the signaled level conveyed in the value of the profile-level-
1901	         id parameter.  When max-mbps is signaled, the receiver MUST be
1902	         able to decode NAL unit streams that conform to the signaled
1903	         level, with the exception that the MaxMBPS value in Table A-1
1904	         of [1] for the signaled level is replaced with the value of
1905	         max-mbps.  The value of max-mbps MUST be greater than or equal
1906	         to the value of MaxMBPS for the level given in Table A-1 of
1907	         [1].  Senders MAY use this knowledge to send pictures of a
1908	         given size at a higher picture rate than is indicated in the
1909	         signaled level.

1911	      max-smbps: The value of max-smbps is an integer indicating the
1912	         maximum static macroblock processing rate in units of static
1913	         macroblocks per second, under the hypothetical assumption that
1914	         all macroblocks are static macroblocks.  When max-smbps is
1915	         signalled the MaxMBPS value in Table A-1 of [1] should be
1916	         replaced with the result of the following computation:

1918	         o If the parameter max-mbps is signalled, set a variable
1919	            MaxMacroblocksPerSecond to the value of max-mbps.
1920	            Otherwise, set MaxMacroblocksPerSecond equal to the value
1921	            of MaxMBPS for the level in Table A-1 [1].

1923	         o Set a variable P_non-static to the proportion of non-static
1924	            macroblocks in picture n.

1926	         o Set a variable P_static to the proportion of static
1927	            macroblocks in picture n.

1929	         o The value of MaxMBPS in Table A-1 of [1] should be
1930	            considered by the encoder to be equal to:

1932	            MaxMacroblocksPerSecond * max-smbps / ( P_non-static * max-
1933	            smbps + P_static * MaxMacroblocksPerSecond)

1935	         The encoder should recompute this value for each picture. The
1936	         value of max-smbps MUST be greater than the value of MaxMBPS
1937	         for the level given in Table A-1 of [1].  Senders MAY use this
1938	         knowledge to send pictures of a given size at a higher picture
1939	         rate than is indicated in the signalled level.

1941	      max-fs: The value of max-fs is an integer indicating the maximum
1942	         frame size in units of macroblocks.  The max-fs parameter
1943	         signals that the receiver is capable of decoding larger
1944	         picture sizes than are required by the signaled level conveyed
1945	         in the value of the profile-level-id parameter.  When max-fs
1946	         is signaled, the receiver MUST be able to decode NAL unit
1947	         streams that conform to the signaled level, with the exception
1948	         that the MaxFS value in Table A-1 of [1] for the signaled
1949	         level is replaced with the value of max-fs.  The value of max-
1950	         fs MUST be greater than or equal to the value of MaxFS for the
1951	         level given in Table A-1 of [1].  Senders MAY use this
1952	         knowledge to send larger pictures at a proportionally lower
1953	         frame rate than is indicated in the signaled level.

1955	      max-cpb: The value of max-cpb is an integer indicating the
1956	         maximum coded picture buffer size in units of 1000 bits for
1957	         the VCL HRD parameters (see A.3.1 item i of [1]) and in units
1958	         of 1200 bits for the NAL HRD parameters (see A.3.1 item j of
1959	         [1]).  The max-cpb parameter signals that the receiver has
1960	         more memory than the minimum amount of coded picture buffer
1961	         memory required by the signaled level conveyed in the value of
1962	         the profile-level-id parameter.  When max-cpb is signaled, the
1963	         receiver MUST be able to decode NAL unit streams that conform
1964	         to the signaled level, with the exception that the MaxCPB
1965	         value in Table A-1 of [1] for the signaled level is replaced
1966	         with the value of max-cpb.  The value of max-cpb MUST be
1967	         greater than or equal to the value of MaxCPB for the level
1968	         given in Table A-1 of [1].  Senders MAY use this knowledge to
1969	         construct coded video streams with greater variation of bit
1970	         rate than can be achieved with the MaxCPB value in Table A-1
1971	         of [1].

1973	            Informative note: The coded picture buffer is used in the
1974	            hypothetical reference decoder (Annex C) of H.264.  The use
1975	            of the hypothetical reference decoder is recommended in
1976	            H.264 encoders to verify that the produced bitstream
1977	            conforms to the standard and to control the output bitrate.
1978	            Thus, the coded picture buffer is conceptually independent
1979	            of any other potential buffers in the receiver, including
1980	            de-interleaving and de-jitter buffers.  The coded picture
1981	            buffer need not be implemented in decoders as specified in
1982	            Annex C of H.264, but rather standard-compliant decoders
1983	            can have any buffering arrangements provided that they can
1984	            decode standard-compliant bitstreams.  Thus, in practice,
1985	            the input buffer for video decoder can be integrated with
1986	            de-interleaving and de-jitter buffers of the receiver.

1988	      max-dpb: The value of max-dpb is an integer indicating the
1989	         maximum decoded picture buffer size in units of 1024 bytes.
1990	         The max-dpb parameter signals that the receiver has more
1991	         memory than the minimum amount of decoded picture buffer
1992	         memory required by the signaled level conveyed in the value of
1993	         the profile-level-id parameter.  When max-dpb is signaled, the
1994	         receiver MUST be able to decode NAL unit streams that conform
1995	         to the signaled level, with the exception that the MaxDPB
1996	         value in Table A-1 of [1] for the signaled level is replaced
1997	         with the value of max-dpb.  Consequently, a receiver that
1998	         signals max-dpb MUST be capable of storing the following
1999	         number of decoded frames, complementary field pairs, and non-
2000	         paired fields in its decoded picture buffer:

2002	            Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs *
2003	            256 * ChromaFormatFactor ), 16)

2005	         PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are
2006	         defined in [1].

2008	         The value of max-dpb MUST be greater than or equal to the
2009	         value of MaxDPB for the level given in Table A-1 of [1].
2010	         Senders MAY use this knowledge to construct coded video
2011	         streams with improved compression.

2013	            Informative note: This parameter was added primarily to
2014	            complement a similar codepoint in the ITU-T Recommendation
2015	            H.245, so as to facilitate signaling gateway designs.  The
2016	            decoded picture buffer stores reconstructed samples.  There
2017	            is no relationship between the size of the decoded picture
2018	            buffer and the buffers used in RTP, especially de-
2019	            interleaving and de-jitter buffers.

2021	      max-br: The value of max-br is an integer indicating the maximum
2022	         video bit rate in units of 1000 bits per second for the VCL
2023	         HRD parameters (see A.3.1 item i of [1]) and in units of 1200
2024	         bits per second for the NAL HRD parameters (see A.3.1 item j
2025	         of [1]).

2027	         The max-br parameter signals that the video decoder of the
2028	         receiver is capable of decoding video at a higher bit rate
2029	         than is required by the signaled level conveyed in the value
2030	         of the profile-level-id parameter.

2032	         When max-br is signaled, the video codec of the receiver MUST
2033	         be able to decode NAL unit streams that conform to the
2034	         signaled level, conveyed in the profile-level-id parameter,
2035	         with the following exceptions in the limits specified by the
2036	         level:

2038	         o The value of max-br replaces the MaxBR value of the signaled
2039	            level (in Table A-1 of [1]).

2041	         o When the max-cpb parameter is not present, the result of the
2042	            following formula replaces the value of MaxCPB in Table A-1
2043	            of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of
2044	            the signaled level).

2046	         For example, if a receiver signals capability for Level 1.2
2047	         with max-br equal to 1550, this indicates a maximum video
2048	         bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum
2049	         video bitrate of 1860 kbits/sec for NAL HRD parameters, and a
2050	         CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000).

2052	         The value of max-br MUST be greater than or equal to the value
2053	         MaxBR for the signaled level given in Table A-1 of [1].

2055	         Senders MAY use this knowledge to send higher bitrate video as
2056	         allowed in the level definition of Annex A of H.264, to
2057	         achieve improved video quality.

2059	            Informative note: This parameter was added primarily to
2060	            complement a similar codepoint in the ITU-T Recommendation
2061	            H.245, so as to facilitate signaling gateway designs.  No
2062	            assumption can be made from the value of this parameter
2063	            that the network is capable of handling such bit rates at
2064	            any given time.  In particular, no conclusion can be drawn
2065	            that the signaled bit rate is possible under congestion
2066	            control constraints.

2068	      redundant-pic-cap:
2069	         This parameter signals the capabilities of a receiver
2070	         implementation.  When equal to 0, the parameter indicates that
2071	         the receiver makes no attempt to use redundant coded pictures
2072	         to correct incorrectly decoded primary coded pictures.  When
2073	         equal to 0, the receiver is not capable of using redundant
2074	         slices; therefore, a sender SHOULD avoid sending redundant
2075	         slices to save bandwidth.  When equal to 1, the receiver is
2076	         capable of decoding any such redundant slice that covers a
2077	         corrupted area in a primary decoded picture (at least partly),
2078	         and therefore a sender MAY send redundant slices.  When the
2079	         parameter is not present, then a value of 0 MUST be used for
2080	         redundant-pic-cap.  When present, the value of redundant-pic-
2081	         cap MUST be either 0 or 1.

2083	         When the profile-level-id parameter is present in the same
2084	         signaling as the redundant-pic-cap parameter, and the profile
2085	         indicated in profile-level-id is such that it disallows the
2086	         use of redundant coded pictures (e.g., Main Profile), the
2087	         value of redundant-pic-cap MUST be equal to 0.  When a
2088	         receiver indicates redundant-pic-cap equal to 0, the received
2089	         stream SHOULD NOT contain redundant coded pictures.

2091	            Informative note: Even if redundant-pic-cap is equal to 0,
2092	            the decoder is able to ignore redundant codec pictures
2093	            provided that the decoder supports such a profile
2094	            (Baseline, Extended) in which redundant coded pictures are
2095	            allowed.

2097	            Informative note: Even if redundant-pic-cap is equal to 1,
2098	            the receiver may also choose other error concealment
2099	            strategies to replace or complement decoding of redundant
2100	            slices.

2102	      sprop-parameter-sets:
2103	         This parameter MAY be used to convey any sequence and picture
2104	         parameter set NAL units (herein referred to as the initial
2105	         parameter set NAL units) that can be placed in the NAL unit
2106	         stream to precede any other NAL units in decoding order.  The
2107	         parameter MUST NOT be used to indicate codec capability in any
2108	         capability exchange procedure.  The value of the parameter is
2109	         the base64 [7] representation of the initial parameter set NAL
2110	         units as specified in sections 7.3.2.1 and 7.3.2.2 of [1].

2112	         The parameter sets are conveyed in decoding order, and no
2113	         framing of the parameter set NAL units takes place.  A comma
2114	         (',') is used to separate any pair of parameter sets in the
2115	         list.  Note that the number of bytes in a parameter set NAL
2116	         unit is typically less than 10, but a picture parameter set
2117	         NAL unit can contain several hundreds of bytes.

2119	            Informative note: When several payload types are offered in
2120	            the SDP Offer/Answer model, each with its own sprop-
2121	            parameter-sets parameter, then the receiver cannot assume
2122	            that those parameter sets do not use conflicting storage
2123	            locations (i.e., identical values of parameter set
2124	            identifiers).  Therefore, a receiver should double-buffer
2125	            all sprop-parameter-sets and make them available to the
2126	            decoder instance that decodes a certain payload type.

2128	         The "sprop-parameter-sets" parameter MUST only contain
2129	         parameter sets that are conforming to the profile-level-id,
2130	         i.e., the subset of coding tools indicated by any of the
2131	         parameter sets MUST be equal to the default sub-profile, and
2132	         the level indicated by any of the parameter sets MUST be equal
2133	         to the default level.

2135	      sprop-level-parameter-sets:
2136	         This parameter MAY be used to convey any sequence and picture
2137	         parameter set NAL units (herein referred to as the initial
2138	         parameter set NAL units) that can be placed in the NAL unit
2139	         stream to precede any other NAL units in decoding order and
2140	         that are associated with one or more levels lower than the
2141	         default level.  The parameter MUST NOT be used to indicate
2142	         codec capability in any capability exchange procedure.

2144	         The sprop-level-parameter-sets parameter contains parameter
2145	         sets for one or more levels which are lower than the default
2146	         level.  All parameter sets associated with one level are
2147	         clustered and prefixed with a three-byte field which has the
2148	         same syntax as profile-level-id.  This enables the receiver to
2149	         install the parameter sets for one level and discard the rest.
2150	         The three-byte field is named PLId, and all parameter sets
2151	         associated with one level are named PSL, which has the same
2152	         syntax as sprop-parameter-sets.  Parameter sets for each level
2153	         are represented in the form of PLId:PSL, i.e., PLId followed
2154	         by a colon (':') and the base64 [7] representation of the
2155	         initial parameter set NAL units for the level.  Each pair of
2156	         PLId:PSL is also separated by a colon.  Note that a PSL can
2157	         contain multiple parameter sets for that level, separated with
2158	         commas (',').

2160	         The subset of coding tools indicated by each PLId field MUST
2161	         be equal to the default sub-profile, and the level indicated
2162	         by each PLId field MUST be lower than the default level.  All
2163	         sequence parameter sets contained in each PSL MUST have the
2164	         three bytes from profile_idc to level_idc, inclusive, equal to
2165	         the preceding PLId.

2167	            Informative note: This parameter allows for efficient level
2168	            downgrade in SDP Offer/Answer and out-of-band transport of
2169	            parameter sets, simultaneously.

2171	      use-level-parameter-sets:
2172	         This parameter MAY be used to indicate a receiver capability.
2173	         The value MAY be equal to either 0 or 1.  When the parameter
2174	         is not present, the value MUST be inferred to be equal to 0.
2175	         The value 0 indicates that the receiver does not understand
2176	         the sprop-level-parameter-sets parameter and will ignore
2177	         sprop-level-parameter-sets when present.  The value 1
2178	         indicates that the receiver understands the sprop-level-
2179	         parameter-sets parameter and is capable of using parameter
2180	         sets contained therein.

2182	            Informative note: An RFC 3984 receiver does not understand
2183	            both sprop-level-parameter-sets and use-level-parameter-
2184	            sets.  Therefore, during SDP Offer/Answer, an RFC 3984
2185	            receiver as the answerer will simply ignore sprop-level-
2186	            parameter-sets, when present in an offer.  Assume that the
2187	            offered payload type was accepted at a level lower than the
2188	            default level.  If the offered payload type included sprop-
2189	            level-parameter-sets, and the offerer sees that the
2190	            answerer has not included use-level-parameter-sets equal to
2191	            1 in the answer, the offerer gets to know that in-band
2192	            transport of parameter sets is needed.

2194	      sprop-ssrc:
2195	         This parameter MAY be used to signal the properties of an RTP
2196	         packet stream.  It specifies the SSRC values in the RTP header
2197	         of all RTP packets in the RTP packet stream.  The syntax of
2198	         this parameter is the same as the syntax of the SSRC field in
2199	         the RTP header.

2201	            Informative note: This parameter allows for out-of-band
2202	            transport of parameter sets in topologies like Topo-Video-
2203	            switch-MCU [28].

2205	      packetization-mode:
2206	         This parameter signals the properties of an RTP payload type
2207	         or the capabilities of a receiver implementation.  Only a
2208	         single configuration point can be indicated; thus, when
2209	         capabilities to support more than one packetization-mode are
2210	         declared, multiple configuration points (RTP payload types)
2211	         must be used.

2213	         When the value of packetization-mode is equal to 0 or
2214	         packetization-mode is not present, the single NAL mode, as
2215	         defined in section 6.2 of RFC 3984, MUST be used.  This mode
2216	         is in use in standards using ITU-T Recommendation H.241 [3]
2217	         (see section 12.1).  When the value of packetization-mode is
2218	         equal to 1, the non-interleaved mode, as defined in section
2219	         6.3 of RFC 3984, MUST be used.  When the value of
2220	         packetization-mode is equal to 2, the interleaved mode, as
2221	         defined in section 6.4 of RFC 3984, MUST be used.  The value
2222	         of packetization-mode MUST be an integer in the range of 0 to
2223	         2, inclusive.

2225	      sprop-interleaving-depth:
2226	         This parameter MUST NOT be present when packetization-mode is
2227	         not present or the value of packetization-mode is equal to 0
2228	         or 1.  This parameter MUST be present when the value of
2229	         packetization-mode is equal to 2.

2231	         This parameter signals the properties of an RTP packet stream.
2232	         It specifies the maximum number of VCL NAL units that precede
2233	         any VCL NAL unit in the RTP packet stream in transmission
2234	         order and follow the VCL NAL unit in decoding order.
2235	         Consequently, it is guaranteed that receivers can reconstruct
2236	         NAL unit decoding order when the buffer size for NAL unit
2237	         decoding order recovery is at least the value of sprop-
2238	         interleaving-depth + 1 in terms of VCL NAL units.

2240	         The value of sprop-interleaving-depth MUST be an integer in
2241	         the range of 0 to 32767, inclusive.

2243	      sprop-deint-buf-req:
2244	         This parameter MUST NOT be present when packetization-mode is
2245	         not present or the value of packetization-mode is equal to 0
2246	         or 1.  It MUST be present when the value of packetization-mode
2247	         is equal to 2.

2249	         sprop-deint-buf-req signals the required size of the de-
2250	         interleaving buffer for the RTP packet stream.  The value of
2251	         the parameter MUST be greater than or equal to the maximum
2252	         buffer occupancy (in units of bytes) required in such a de-
2253	         interleaving buffer that is specified in section 7.2 of RFC
2254	         3984.  It is guaranteed that receivers can perform the de-
2255	         interleaving of interleaved NAL units into NAL unit decoding
2256	         order, when the de-interleaving buffer size is at least the
2257	         value of sprop-deint-buf-req in terms of bytes.

2259	         The value of sprop-deint-buf-req MUST be an integer in the
2260	         range of 0 to 4294967295, inclusive.

2262	            Informative note: sprop-deint-buf-req indicates the
2263	            required size of the de-interleaving buffer only.  When
2264	            network jitter can occur, an appropriately sized jitter
2265	            buffer has to be provisioned for as well.

2267	      deint-buf-cap:
2268	         This parameter signals the capabilities of a receiver
2269	         implementation and indicates the amount of de-interleaving
2270	         buffer space in units of bytes that the receiver has available
2271	         for reconstructing the NAL unit decoding order.  A receiver is
2272	         able to handle any stream for which the value of the sprop-
2273	         deint-buf-req parameter is smaller than or equal to this
2274	         parameter.

2276	         If the parameter is not present, then a value of 0 MUST be
2277	         used for deint-buf-cap.  The value of deint-buf-cap MUST be an
2278	         integer in the range of 0 to 4294967295, inclusive.

2280	            Informative note: deint-buf-cap indicates the maximum
2281	            possible size of the de-interleaving buffer of the receiver
2282	            only.  When network jitter can occur, an appropriately
2283	            sized jitter buffer has to be provisioned for as well.

2285	      sprop-init-buf-time:
2286	         This parameter MAY be used to signal the properties of an RTP
2287	         packet stream.  The parameter MUST NOT be present, if the
2288	         value of packetization-mode is equal to 0 or 1.

2290	         The parameter signals the initial buffering time that a
2291	         receiver MUST wait before starting decoding to recover the NAL
2292	         unit decoding order from the transmission order.  The
2293	         parameter is the maximum value of (decoding time of the NAL
2294	         unit - transmission time of a NAL unit), assuming reliable and
2295	         instantaneous transmission, the same timeline for transmission
2296	         and decoding, and that decoding starts when the first packet
2297	         arrives.

2299	         An example of specifying the value of sprop-init-buf-time
2300	         follows.  A NAL unit stream is sent in the following
2301	         interleaved order, in which the value corresponds to the
2302	         decoding time and the transmission order is from left to
2303	         right:

2305	            0  2  1  3  5  4  6  8  7 ...

2307	         Assuming a steady transmission rate of NAL units, the
2308	         transmission times are:

2310	            0  1  2  3  4  5  6  7  8 ...

2312	         Subtracting the decoding time from the transmission time
2313	         column-wise results in the following series:

2315	            0 -1  1  0 -1  1  0 -1  1 ...

2317	         Thus, in terms of intervals of NAL unit transmission times,
2318	         the value of sprop-init-buf-time in this example is 1.  The
2319	         parameter is coded as a non-negative base10 integer
2320	         representation in clock ticks of a 90-kHz clock.  If the
2321	         parameter is not present, then no initial buffering time value
2322	         is defined.  Otherwise the value of sprop-init-buf-time MUST
2323	         be an integer in the range of 0 to 4294967295, inclusive.

2325	         In addition to the signaled sprop-init-buf-time, receivers
2326	         SHOULD take into account the transmission delay jitter
2327	         buffering, including buffering for the delay jitter caused by
2328	         mixers, translators, gateways, proxies, traffic-shapers, and
2329	         other network elements.

2331	      sprop-max-don-diff:
2332	         This parameter MAY be used to signal the properties of an RTP
2333	         packet stream.  It MUST NOT be used to signal transmitter or
2334	         receiver or codec capabilities.  The parameter MUST NOT be
2335	         present if the value of packetization-mode is equal to 0 or 1.
2336	         sprop-max-don-diff is an integer in the range of 0 to 32767,
2337	         inclusive.  If sprop-max-don-diff is not present, the value of
2338	         the parameter is unspecified.  sprop-max-don-diff is
2339	         calculated as follows:

2341	            sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)},
2342	            for any i and any j>i,

2344	         where i and j indicate the index of the NAL unit in the
2345	         transmission order and AbsDON denotes a decoding order number
2346	         of the NAL unit that does not wrap around to 0 after 65535.
2347	         In other words, AbsDON is calculated as follows: Let m and n
2348	         be consecutive NAL units in transmission order.  For the very
2349	         first NAL unit in transmission order (whose index is 0),
2350	         AbsDON(0) = DON(0).  For other NAL units, AbsDON is calculated
2351	         as follows:

2353	            If DON(m) == DON(n), AbsDON(n) = AbsDON(m)

2355	            If (DON(m) < DON(n) and DON(n) - DON(m) < 32768),
2356	              AbsDON(n) = AbsDON(m) + DON(n) - DON(m)

2358	            If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768),
2359	              AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n)

2361	            If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768),
2362	              AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n))

2364	            If (DON(m) > DON(n) and DON(m) - DON(n) < 32768),
2365	              AbsDON(n) = AbsDON(m) - (DON(m) - DON(n))

2367	         where DON(i) is the decoding order number of the NAL unit
2368	         having index i in the transmission order.  The decoding order
2369	         number is specified in section 5.5 of RFC 3984.

2371	            Informative note: Receivers may use sprop-max-don-diff to
2372	            trigger which NAL units in the receiver buffer can be
2373	            passed to the decoder.

2375	      max-rcmd-nalu-size:
2376	         This parameter MAY be used to signal the capabilities of a
2377	         receiver.  The parameter MUST NOT be used for any other
2378	         purposes.  The value of the parameter indicates the largest
2379	         NALU size in bytes that the receiver can handle efficiently.
2380	         The parameter value is a recommendation, not a strict upper
2381	         boundary.  The sender MAY create larger NALUs but must be
2382	         aware that the handling of these may come at a higher cost
2383	         than NALUs conforming to the limitation.

2385	         The value of max-rcmd-nalu-size MUST be an integer in the
2386	         range of 0 to 4294967295, inclusive.  If this parameter is not
2387	         specified, no known limitation to the NALU size exists.
2388	         Senders still have to consider the MTU size available between
2389	         the sender and the receiver and SHOULD run MTU discovery for
2390	         this purpose.

2392	         This parameter is motivated by, for example, an IP to H.223
2393	         video telephony gateway, where NALUs smaller than the H.223
2394	         transport data unit will be more efficient.  A gateway may
2395	         terminate IP; thus, MTU discovery will normally not work
2396	         beyond the gateway.

2398	            Informative note: Setting this parameter to a lower than
2399	            necessary value may have a negative impact.

2401	      sar-understood:
2402	         This parameter MAY be used to indicate a receiver capability
2403	         and not anything else.  The parameter indicates the maximum
2404	         value of aspect_ratio_idc (specified in [1]) smaller than 255
2405	         that the receiver understands.  Table E-1 of [1] specifies
2406	         aspect_ratio_idc equal to 0 as "unspecified", 1 to 16,
2407	         inclusive, as specific Sample Aspect Ratios (SARs), 17 to 254,
2408	         inclusive, as "reserved", and 255 as the Extended SAR, for
2409	         which SAR width and SAR height are explicitly signaled.
2410	         Therefore, a receiver with a decoder according to [1]
2411	         understands aspect_ratio_idc in the range of 1 to 16,
2412	         inclusive and aspect_ratio_idc equal to 255, in the sense that
2413	         the receiver knows what exactly the SAR is.  For such a
2414	         receiver, the value of sar-understood is 16.  If in the future
2415	         Table E-1 of [1] is extended, e.g., such that the SAR for
2416	         aspect_ratio_idc equal to 17 is specified, then for a receiver
2417	         with a decoder that understands the extension, the value of
2418	         sar-understood is 17.  For a receiver with a decoder according
2419	         to the 2003 version of [1], the value of sar-understood is 13,
2420	         as the minimum reserved aspect_ratio_idc therein is 14.

2422	         When sar-understood is not present, the value MUST be inferred
2423	         to be equal to 13.

2425	      sar-supported:
2426	         This parameter MAY be used to indicate a receiver capability
2427	         and not anything else.  The value of this parameter is an
2428	         integer in the range of 1 to sar-understood, inclusive, equal
2429	         to 255.  The value of sar-supported equal to N smaller than
2430	         255 indicates that the reciever supports all the SARs
2431	         corresponding to H.264 aspect_ratio_idc values (see Table E-1
2432	         of [1]) in the range from 1 to N, inclusive, without geometric
2433	         distortion.  The value of sar-supported equal to 255 indicates
2434	         that the receiver supports all sample aspect ratios which are
2435	         expressible using two 16-bit integer values as the numerator
2436	         and denominator, i.e., those that are expressible using the
2437	         H.264 aspect_ratio_idc value of 255 (Extended_SAR, see Table
2438	         E-1 of [1]), without geometric distortion.

2440	         H.264 compliant encoders SHOULD NOT send an aspect_ratio_idc
2441	         equal to 0, or an aspect_ratio_idc larger than sar-understood
2442	         and smaller than 255.  H.264 compliant encoders SHOULD send an
2443	         aspect_ratio_idc that the receiver is able to display without
2444	         geometrical distortion.  However, H.264 compliant encoders MAY
2445	         choose to send pictures using any SAR.

2447	         Note that the actual sample aspect ratio or extended sample
2448	         aspect ratio, when present, of the stream is conveyed in the
2449	         Video Usability Information (VUI) part of the sequence
2450	         parameter set.

2452	      Encoding considerations:
2453	         This type is only defined for transfer via RTP (RFC 3550).

2455	      Security considerations:
2456	         See section 9 of RFC xxxx.

2458	      Public specification:
2459	         Please refer to RFC xxxx and its section 15.

2461	      Additional information:
2462	         None

2464	      File extensions:     none

2466	      Macintosh file type code: none

2468	      Object identifier or OID: none

2470	      Person & email address to contact for further information:
2471	         Ye-Kui Wang, ye-kui.wang@nokia.com

2473	      Intended usage:      COMMON

2475	      Author:
2476	         Ye-Kui Wang, ye-kui.wang@nokia.com

2478	      Change controller:
2479	         IETF Audio/Video Transport working group delegated from the
2480	         IESG.

2482	8.2. SDP Parameters

2484	8.2.1. Mapping of Payload Type Parameters to SDP

2486	   The media type video/H264 string is mapped to fields in the Session
2487	   Description Protocol (SDP) [6] as follows:

2489	   o  The media name in the "m=" line of SDP MUST be video.

2491	   o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the
2492	      media subtype).

2494	   o  The clock rate in the "a=rtpmap" line MUST be 90000.

2496	   o  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-
2497	      smbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic-
2498	      cap", "sprop-parameter-sets", "sprop-level-parameter-sets", "use-
2499	      level-parameter-sets", "sprop-ssrc", "packetization-mode", "sprop-
2500	      interleaving-depth", "sprop-deint-buf-req", "deint-buf-cap",
2501	      "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-size",
2502	      "sar-understood", and "sar-supported", when present, MUST be
2503	      included in the "a=fmtp" line of SDP.  These parameters are
2504	      expressed as a media type string, in the form of a semicolon
2505	      separated list of parameter=value pairs.

2507	   An example of media representation in SDP is as follows (Baseline
2508	   Profile, Level 3.0, some of the constraints of the Main profile may
2509	   not be obeyed):

2511	      m=video 49170 RTP/AVP 98
2512	      a=rtpmap:98 H264/90000
2513	      a=fmtp:98 profile-level-id=42A01E;
2514	                packetization-mode=1;
2515	                sprop-parameter-sets=<base64 data>

2517	8.2.2. Usage with the SDP Offer/Answer Model

2519	   When H.264 is offered over RTP using SDP in an Offer/Answer model [8]
2520	   for negotiation for unicast usage, the following limitations and
2521	   rules apply:

2523	   o  The parameters identifying a media format configuration for H.264
2524	      are "profile-level-id" and "packetization-mode", when present.
2525	      These media format configuration parameters (except for the level
2526	      part of "profile-level-id") MUST be used symmetrically; i.e., the
2527	      answerer MUST either maintain all configuration parameters or
2528	      remove the media format (payload type) completely, if one or more
2529	      of the parameter values are not supported.  Note that the level
2530	      part of "profile-level-id" includes level_idc, and, for indication
2531	      of level 1b when profile_idc is equal to 66, 77 or 88, bit 4
2532	      (constraint_set3_flag) of profile-iop.  The level part of
2533	      "profile-level-id" is downgradable, i.e. the answerer MUST
2534	      maintain the same or a lower level or remove the media format
2535	      (payload type) completely.

2537	         Informative note: The requirement for symmetric use applies
2538	         only for the above media format configuration parameters
2539	         excluding the level part of "profile-level-id", and not for
2540	         the other stream properties and capability parameters.

2542	         Informative note: In H.264 [1], all the levels except for
2543	         level 1b are equal to the value of level_idc divided by 10.
2544	         Level 1b is a level higher than level 1.0 but lower than level
2545	         1.1, and is signaled in an ad-hoc manner, due to that the
2546	         level was specified after level 1.0 and level 1.1.  For the
2547	         Baseline, Main and Extended profiles (with profile_idc equal
2548	         to 66, 77 and 88, respectively), level 1b is indicated by
2549	         level_idc equal to 11 (i.e. same as level 1.1) and
2550	         constraint_set3_flag equal to 1.  For other profiles, level 1b
2551	         is indicated by level_idc equal to 9 (but note that level 1b
2552	         for these profiles are still higher than level 1, which has
2553	         level_idc equal to 10, and lower than level 1.1).  In SDP
2554	         Offer/Answer, an answer to an offer may indicate a level equal
2555	         to or lower than the level indicated in the offer.  Due to the
2556	         ad-hoc indication of level 1b, offerers and answerers must
2557	         check the value of bit 4 (constraint_set3_flag) of the middle
2558	         octet of the parameter "profile-level-id", when profile_idc is
2559	         equal to 66, 77 or 88 and level_idc is equal to 11.

2561	      To simplify handling and matching of these configurations, the
2562	      same RTP payload type number used in the offer SHOULD also be
2563	      used in the answer, as specified in [8].  An answer MUST NOT
2564	      contain a payload type number used in the offer unless the
2565	      configuration is exactly the same as in the offer or the
2566	      configuration in the answer only differs from that in the offer
2567	      with a level lower than the default level offered.

2569	         Informative note: An offerer, when receiving the answer, has
2570	         to compare payload types not declared in the offer based on
2571	         media type (i.e., video/H264) and the above media format
2572	         configuration parameters with any payload types it has already
2573	         declared, in order to determine whether the configuration in
2574	         question is new or equivalent to a configuration already
2575	         offered.

2577	   o  The parameters "sprop-deint-buf-req", "sprop-interleaving-depth",
2578	      "sprop-max-don-diff", "sprop-init-buf-time", and "sprop-ssrc"
2579	      describe the properties of the RTP packet stream that the offerer
2580	      or answerer is sending for the media format configuration.  This
2581	      differs from the normal usage of the Offer/Answer parameters:
2582	      normally such parameters declare the properties of the stream that
2583	      the offerer or the answerer is able to receive.  When dealing with
2584	      H.264, the offerer assumes that the answerer will be able to
2585	      receive media encoded using the configuration being offered.

2587	         Informative note: The above parameters apply for any stream
2588	         sent by the declaring entity with the same configuration;
2589	         i.e., they are dependent on their source.  Rather than being
2590	         bound to the payload type, the values may have to be applied
2591	         to another payload type when being sent, as they apply for the
2592	         configuration.

2594	   o  The capability parameters ("max-mbps", "max-smbps", "max-fs",
2595	      "max-cpb", "max-dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-
2596	      nalu-size", "sar-understood", "sar-supported") MAY be used to
2597	      declare further capabilities of the offerer or answerer for
2598	      receiving.  These parameters can only be present when the
2599	      direction attribute is sendrecv or recvonly, and the parameters
2600	      describe the limitations of what the offerer or answerer accepts
2601	      for receiving streams.

2603	   o  An offerer has to include the size of the de-interleaving buffer,
2604	      "sprop-deint-buf-req", in the offer for an interleaved H.264
2605	      stream.  To enable the offerer and answerer to inform each other
2606	      about their capabilities for de-interleaving buffering in
2607	      receiving streams, both parties are RECOMMENDED to include "deint-
2608	      buf-cap".  For interleaved streams, it is also RECOMMENDED to
2609	      consider offering multiple payload types with different buffering
2610	      requirements when the capabilities of the receiver are unknown.

2612	   o  The "sprop-parameter-sets" or "sprop-level-parameter-sets"
2613	      parameter, when present, is used for out-of-band transport of
2614	      parameter sets.  However, when out-of-band transport of parameter
2615	      sets is used, parameter sets MAY still be additionally transported
2616	      in-band.  If neither "sprop-parameter-sets" nor "sprop-level-
2617	      parameter-sets" is present, then only in-band transport of
2618	      parameter sets is used.

2620	      An offer MAY include either or both of "sprop-parameter-sets" and
2621	      "sprop-level-parameter-sets".  An answer MAY include "sprop-
2622	      parameter-sets", and MUST NOT include "sprop-level-parameter-
2623	      sets".

2625	      When an offered payload type is accepted without level downgrade,
2626	      i.e. the default level is accepted, the following applies.

2628	        o The answerer MUST be prepared to use the parameter sets
2629	           included in "sprop-parameter-sets", when present, for
2630	           decoding the incoming NAL unit stream, and ignore "sprop-
2631	           level-parameter-sets", when present.

2633	        o When "sprop-parameter-sets" is not present, in-band
2634	           transport of parameter sets MUST be used.

2636	      When level downgrade is in use, i.e., a level lower than the
2637	      default level offered is accepted, the following applies.

2639	        o If "use-level-parameter-sets" is not present in the answer
2640	           for the accepted payload type or the value is equal to 0 in
2641	           the answer for the accepted payload type, the answerer MUST
2642	           ignore "sprop-parameter-sets" and "sprop-level-parameter-
2643	           sets", when present in the offer for the accepted payload
2644	           type.

2646	        o Otherwise (the "use-level-parameter-sets" is present in the
2647	           answer for the accepted payload type and the value is equal
2648	           to 1), the answerer MUST be prepared to use the parameter
2649	           sets that are included in "sprop-level-parameter-sets" for
2650	           the accepted level, when present, for decoding the incoming
2651	           NAL unit stream, and ignore all other parameter sets
2652	           included in "sprop-level-parameter-sets" and "sprop-
2653	           parameter-sets", when present.

2655	        o When no parameter sets for the accepted level are present in
2656	           the "sprop-level-parameter-sets", in-band transport of
2657	           parameter sets MUST be used.

2659	      The answerer MAY or MAY not include "sprop-parameter-sets", i.e.,
2660	      the answerer MAY use either out-of-band or in-band transport of
2661	      parameter sets for the stream it is sending, regardless of
2662	      whether out-of-band parameter sets transport has been used in the
2663	      offerer-to-answerer direction.  All parameter sets included in
2664	      the "sprop-parameter-sets", when present, for the accepted
2665	      payload type in an answer MUST be associated with the accepted
2666	      level, as indicated by the profile-level-id in the answer for the
2667	      accepted payload type.

2669	      Parameter sets included in "sprop-parameter-sets" in an answer
2670	      are independent of those parameter sets included in the offer, as
2671	      they are used for decoding two different video streams, one from
2672	      the answerer to the offerer, and the other in the opposite
2673	      direction.  The offerer MUST be prepared to use the parameter
2674	      sets included in the answer's "sprop-parameter-sets", when
2675	      present, for decoding the incoming NAL unit stream.

2677	      When "sprop-parameter-sets" or "sprop-level-parameter-sets" is
2678	      present and "sprop-ssrc" is present, the receiver of the
2679	      parameters MUST store the parameter sets included in the "sprop-
2680	      parameter-sets" or "sprop-level-parameter-sets" for the accepted
2681	      level and associate them to "sprop-ssrc".  Parameter sets
2682	      associated with one "sprop-ssrc" MUST only be used to decode NAL
2683	      units conveyed in packets with SSRC equal to the associated
2684	      "sprop-ssrc".  The "sprop-ssrc" MAY be used in topologies like
2685	      Topo-Video-switch-MCU [28] to enable out-of-band transport of
2686	      parameter sets.  When "sprop-ssrc" is used, and SSRC collision is
2687	      detected, the connection needs to be renegotiated using a new
2688	      random SSRC.

2690	   For streams being delivered over multicast, the following rules
2691	   apply:

2693	   o  The media format configuration is identified by the same
2694	      parameters as above for unicast (i.e. "profile-level-id" and
2695	      "packetization-mode", when present).  These media format
2696	      configuration parameters (including the level part of "profile-
2697	      level-id", i.e. the level part of "profile-level-id" is not
2698	      downgradable for Offer/Answer in multicast) MUST be used
2699	      symmetrically; i.e., the answerer MUST either maintain all
2700	      configuration parameters or remove the media format (payload type)
2701	      completely.

2703	      To simplify handling and matching of these configurations, the
2704	      same RTP payload type number used in the offer SHOULD also be
2705	      used in the answer, as specified in [8].  An answer MUST NOT
2706	      contain a payload type number used in the offer unless the
2707	      configuration is the same as in the offer.

2709	   o  Parameter sets received MUST be associated with the originating
2710	      source, and MUST be only used in decoding the incoming NAL unit
2711	      stream from the same source.

2713	   o  The rules for other parameters are the same as above for unicast.

2715	   Below are the complete lists of how the different parameters shall be
2716	   interpreted in the different combinations of offer or answer and
2717	   direction attribute.

2719	   o  In offers and answers for which "a=sendrecv" or no direction
2720	      attribute is used, the following interpretation of the parameters
2721	      MUST be used.

2723	      Declaring actual configuration for sending and receiving streams:

2725	         - profile-level-id
2726	         - packetization-mode

2728	      Declaring actual properties of the stream to be sent:

2730	         - sprop-deint-buf-req
2731	         - sprop-interleaving-depth
2732	         - sprop-max-don-diff
2733	         - sprop-init-buf-time
2734	         - sprop-ssrc

2736	      Declaring receiver capabilities:

2738	         - max-mbps
2739	         - max-smbps
2740	         - max-fs
2741	         - max-cpb
2742	         - max-dpb
2743	         - max-br
2744	         - redundant-pic-cap
2745	         - deint-buf-cap
2746	         - max-rcmd-nalu-size
2747	         - sar-understood
2748	         - sar-supported
2749	         - use-level-parameter-sets

2751	      Out-of-band transporting of parameter sets:

2753	         - sprop-parameter-sets
2754	         - sprop-level-parameter-sets

2756	   o  In offers and answers for which "a=recvonly" is used, the
2757	      following interpretation of the parameters MUST be used.

2759	      Declaring actual configuration for receiving streams:

2761	         - profile-level-id
2762	         - packetization-mode

2764	      Declaring receiver capabilities:

2766	         - max-mbps
2767	         - max-smbps
2768	         - max-fs
2769	         - max-cpb
2770	         - max-dpb
2771	         - max-br
2772	         - redundant-pic-cap
2773	         - deint-buf-cap
2774	         - max-rcmd-nalu-size
2775	         - sar-understood
2776	         - sar-supported
2777	         - use-level-parameter-sets

2779	      Not usable (when present, they SHOULD be ignored):

2781	         - sprop-deint-buf-req
2782	         - sprop-interleaving-depth
2783	         - sprop-parameter-sets
2784	         - sprop-level-parameter-sets
2785	         - sprop-max-don-diff
2786	         - sprop-init-buf-time
2787	         - sprop-ssrc

2789	   o  In offers or answers for which "a=sendonly" is used, the following
2790	      interpretation of the parameters MUST be used.

2792	      Declaring actual configuration or properties for sending streams:

2794	         - profile-level-id
2795	         - packetization-mode
2796	         - sprop-deint-buf-req
2797	         - sprop-max-don-diff
2798	         - sprop-init-buf-time
2799	         - sprop-interleaving-depth
2800	         - sprop-ssrc

2802	      Out-of-band transporting of parameter sets:

2804	         - sprop-parameter-sets
2805	         - sprop-level-parameter-sets

2807	      Not usable(when present, they SHOULD be ignored):

2809	         - max-mbps
2810	         - max-smbps
2811	         - max-fs
2812	         - max-cpb
2813	         - max-dpb
2814	         - max-br
2815	         - redundant-pic-cap
2816	         - deint-buf-cap
2817	         - max-rcmd-nalu-size
2818	         - sar-understood
2819	         - sar-supported
2820	         - use-level-parameter-sets

2822	   Furthermore, the following considerations are necessary:

2824	   o  Parameters used for declaring receiver capabilities are in general
2825	      downgradable; i.e., they express the upper limit for a sender's
2826	      possible behavior.  Thus a sender MAY select to set its encoder
2827	      using only lower/less or equal values of these parameters.

2829	   o  Parameters declaring a configuration point are not downgradable,
2830	      with the exception of the level part of the "profile-level-id"
2831	      parameter for unicast usage.  This expresses values a receiver
2832	      expects to be used and must be used verbatim on the sender side.

2834	   o  When a sender's capabilities are declared, and non-downgradable
2835	      parameters are used in this declaration, then these parameters
2836	      express a configuration that is acceptable for the sender to
2837	      receive streams.  In order to achieve high interoperability
2838	      levels, it is often advisable to offer multiple alternative
2839	      configurations; e.g., for the packetization mode.  It is
2840	      impossible to offer multiple configurations in a single payload
2841	      type.  Thus, when multiple configuration offers are made, each
2842	      offer requires its own RTP payload type associated with the offer.

2844	   o  A receiver SHOULD understand all media type parameters, even if it
2845	      only supports a subset of the payload format's functionality.
2846	      This ensures that a receiver is capable of understanding when an
2847	      offer to receive media can be downgraded to what is supported by
2848	      the receiver of the offer.

2850	   o  An answerer MAY extend the offer with additional media format
2851	      configurations.  However, to enable their usage, in most cases a
2852	      second offer is required from the offerer to provide the stream
2853	      properties parameters that the media sender will use.  This also
2854	      has the effect that the offerer has to be able to receive this
2855	      media format configuration, not only to send it.

2857	   o  If an offerer wishes to have non-symmetric capabilities between
2858	      sending and receiving, the offerer should offer different RTP
2859	      sessions; i.e., different media lines declared as "recvonly" and
2860	      "sendonly", respectively.  This may have further implications on
2861	      the system.

2863	8.2.3. Usage in Declarative Session Descriptions

2865	   When H.264 over RTP is offered with SDP in a declarative style, as in
2866	   RTSP [26] or SAP [27], the following considerations are necessary.

2868	   o  All parameters capable of indicating both stream properties and
2869	      receiver capabilities are used to indicate only stream properties.
2870	      For example, in this case, the parameter "profile-level-id"
2871	      declares only the values used by the stream, not the capabilities
2872	      for receiving streams.  This results in that the following
2873	      interpretation of the parameters MUST be used:

2875	      Declaring actual configuration or stream properties:

2877	         - profile-level-id
2878	         - packetization-mode
2879	         - sprop-interleaving-depth
2880	         - sprop-deint-buf-req
2881	         - sprop-max-don-diff
2882	         - sprop-init-buf-time
2883	         - sprop-ssrc

2885	      Out-of-band transporting of parameter sets:

2887	         - sprop-parameter-sets
2888	         - sprop-level-parameter-sets

2890	      Not usable(when present, they SHOULD be ignored):

2892	         - max-mbps
2893	         - max-smbps
2894	         - max-fs
2895	         - max-cpb
2896	         - max-dpb
2897	         - max-br
2898	         - redundant-pic-cap
2899	         - max-rcmd-nalu-size
2900	         - deint-buf-cap
2901	         - sar-understood
2902	         - sar-supported
2903	         - use-level-parameter-sets

2905	   o  A receiver of the SDP is required to support all parameters and
2906	      values of the parameters provided; otherwise, the receiver MUST
2907	      reject (RTSP) or not participate in (SAP) the session.  It falls
2908	      on the creator of the session to use values that are expected to
2909	      be supported by the receiving application.

2911	8.3. Examples

2913	   An SDP Offer/Answer exchange wherein both parties are expected to
2914	   both send and receive could look like the following.  Only the media
2915	   codec specific parts of the SDP are shown.  Some lines are wrapped
2916	   due to text constraints.

2918	      Offerer -> Answerer SDP message:

2920	      m=video 49170 RTP/AVP 100 99 98
2921	      a=rtpmap:98 H264/90000
2922	      a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
2923	        sprop-parameter-sets=<base64 data#0>
2924	      a=rtpmap:99 H264/90000
2925	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2926	        sprop-parameter-sets=<base64 data#1>
2927	      a=rtpmap:100 H264/90000
2928	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2929	        sprop-parameter-sets=<base64 data#2>;
2930	        sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
2931	        sprop-init-buf-time=102478; deint-buf-cap=128000

2933	   The above offer presents the same codec configuration in three
2934	   different packetization formats.  PT 98 represents single NALU mode,
2935	   PT 99 represents non-interleaved mode, and PT 100 indicates the
2936	   interleaved mode.  In the interleaved mode case, the interleaving
2937	   parameters that the offerer would use if the answer indicates support
2938	   for PT 100 are also included.  In all three cases the parameter
2939	   "sprop-parameter-sets" conveys the initial parameter sets that are
2940	   required by the answerer when receiving a stream from the offerer
2941	   when this configuration is accepted.  Note that the value for "sprop-
2942	   parameter-sets" could be different for each payload type.

2944	      Answerer -> Offerer SDP message:

2946	      m=video 49170 RTP/AVP 100 99 97
2947	      a=rtpmap:97 H264/90000
2948	      a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
2949	        sprop-parameter-sets=<base64 data#3>
2950	      a=rtpmap:99 H264/90000
2951	      a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
2952	        sprop-parameter-sets=<base64 data#4>;
2953	        max-rcmd-nalu-size=3980
2954	      a=rtpmap:100 H264/90000
2955	      a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
2956	        sprop-parameter-sets=<base64 data#5>;
2957	        sprop-interleaving-depth=60;
2958	        sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
2959	        deint-buf-cap=128000; max-rcmd-nalu-size=3980

2961	   As the Offer/Answer negotiation covers both sending and receiving
2962	   streams, an offer indicates the exact parameters for what the offerer
2963	   is willing to receive, whereas the answer indicates the same for what
2964	   the answerer accepts to receive.  In this case the offerer declared
2965	   that it is willing to receive payload type 98.  The answerer accepts
2966	   this by declaring an equivalent payload type 97; i.e., it has
2967	   identical values for the two parameters "profile-level-id" and
2968	   "packetization-mode" (since "packetization-mode" is equal to 0,
2969	   "sprop-deint-buf-req" is not present).  As the offered payload type
2970	   98 is accepted, the answerer needs to store parameter sets included
2971	   in sprop-parameter-sets=<base64 data#0> in case the offer finally
2972	   decides to use this configuration. In the answer, the answerer
2973	   includes the parameter sets in sprop-parameter-sets=<base64 data#3>
2974	   that the answerer would use in the stream sent from the answerer if
2975	   this configuration is finally used.

2977	   The answerer also accepts the reception of the two configurations
2978	   that payload types 99 and 100 represent.  Again, the answerer needs
2979	   to store parameter sets included in sprop-parameter-sets=<base64
2980	   data#1> and sprop-parameter-sets=<base64 data#2> in case the offer
2981	   finally decides to use either of these two configurations.  The
2982	   answerer provides the initial parameter sets for the answerer-to-
2983	   offerer direction, i.e. the parameter sets in sprop-parameter-
2984	   sets=<base64 data#4> and sprop-parameter-sets=<base64 data#5>, for
2985	   payload types 99 and 100, respectively, that it will use to send the
2986	   payload types.  The answerer also provides the offerer with its
2987	   memory limit for de-interleaving operations by providing a "deint-
2988	   buf-cap" parameter.  This is only useful if the offerer decides on
2989	   making a second offer, where it can take the new value into account.
2990	   The "max-rcmd-nalu-size" indicates that the answerer can efficiently
2991	   process NALUs up to the size of 3980 bytes.  However, there is no
2992	   guarantee that the network supports this size.

2994	   In the following example, the offer is accepted without level
2995	   downgrading (i.e. the default level, 3.0, is accepted), and both
2996	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
2997	   in the offer.  The answerer must ignore sprop-level-parameter-
2998	   sets=<base-64 data#1> and store parameter sets in sprop-parameter-
2999	   sets=<base-64 data#0> for decoding the incoming NAL unit stream.  The
3000	   offerer must store the parameter sets in sprop-parameter-sets=<base-
3001	   64 data#2> in the answer for decoding the incoming NAL unit stream.
3002	   Note that in this example, parameter sets in sprop-parameter-
3003	   sets=<base-64 data#2> must be associated with level 3.0.

3005	      Offer SDP:

3007	      m=video 49170 RTP/AVP 98
3008	      a=rtpmap:98 H264/90000
3009	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3010	        packetization-mode=1;
3011	        sprop-parameter-sets=<base-64 data#0>;
3012	        sprop-level-parameter-sets=<base-64 data#1>

3014	      Answer SDP:

3016	      m=video 49170 RTP/AVP 98
3017	      a=rtpmap:98 H264/90000
3018	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3019	        packetization-mode=1;
3020	        sprop-parameter-sets=<base-64 data#2>

3022	   In the following example, the offer (Baseline profile, level 1.1) is
3023	   accepted with level downgrading (the accepted level is 1b), and both
3024	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3025	   in the offer.  The answerer must ignore sprop-parameter-sets=<base-64
3026	   data#0> and all parameter sets not for the accepted level (level 1b)
3027	   in sprop-level-parameter-sets=<base-64 data#1>, and must store
3028	   parameter sets for the accepted level (level 1b) in sprop-level-
3029	   parameter-sets=<base-64 data#1> for decoding the incoming NAL unit
3030	   stream.  The offerer must store the parameter sets in sprop-
3031	   parameter-sets=<base-64 data#2> in the answer for decoding the
3032	   incoming NAL unit stream.  Note that in this example, parameter sets
3033	   in sprop-parameter-sets=<base-64 data#2> must be associated with
3034	   level 1b.

3036	      Offer SDP:

3038	      m=video 49170 RTP/AVP 98
3039	      a=rtpmap:98 H264/90000
3040	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3041	        packetization-mode=1;
3042	        sprop-parameter-sets=<base-64 data#0>;
3043	        sprop-level-parameter-sets=<base-64 data#1>

3045	      Answer SDP:

3047	      m=video 49170 RTP/AVP 98
3048	      a=rtpmap:98 H264/90000
3049	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3050	        packetization-mode=1;
3051	        sprop-parameter-sets=<base-64 data#2>;
3052	        use-level-parameter-sets=1

3054	   In the following example, the offer (Baseline profile, level 1.1) is
3055	   accepted with level downgrading (the accepted level is 1b), and both
3056	   "sprop-parameter-sets" and "sprop-level-parameter-sets" are present
3057	   in the offer.  However, the answerer is a legacy RFC 3984
3058	   implementation and does not understand "sprop-level-parameter-sets",
3059	   hence it does not include "use-level-parameter-sets" (which the
3060	   answerer does not understand, either) in the answer.  Therefore, the
3061	   answerer must ignore both sprop-parameter-sets=<base-64 data#0> and
3062	   sprop-level-parameter-sets=<base-64 data#1>, and the offerer must
3063	   transport parameter sets in-band.

3065	      Offer SDP:

3067	      m=video 49170 RTP/AVP 98
3068	      a=rtpmap:98 H264/90000
3069	      a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
3070	        packetization-mode=1;
3071	        sprop-parameter-sets=<base-64 data#0>;
3072	        sprop-level-parameter-sets=<base-64 data#1>

3074	      Answer SDP:

3076	      m=video 49170 RTP/AVP 98
3077	      a=rtpmap:98 H264/90000
3078	      a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
3079	        packetization-mode=1

3081	   In the following example, the offer is accepted without level
3082	   downgrading, and "sprop-parameter-sets" is present in the offer.
3083	   Parameter sets in sprop-parameter-sets=<base-64 data#0> must be
3084	   stored and used used by the encoder of the offerer and the decoder of
3085	   the answerer, and parameter sets in sprop-parameter-sets=<base-64
3086	   data#1>must be used by the encoder of the answerer and the decoder of
3087	   the offerer.  Note that sprop-parameter-sets=<base-64 data#0> is
3088	   basically independent of sprop-parameter-sets=<base-64 data#1>.

3090	      Offer SDP:

3092	      m=video 49170 RTP/AVP 98
3093	      a=rtpmap:98 H264/90000
3094	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3095	        packetization-mode=1;
3096	        sprop-parameter-sets=<base-64 data#0>

3098	      Answer SDP:

3100	      m=video 49170 RTP/AVP 98
3101	      a=rtpmap:98 H264/90000
3102	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3103	        packetization-mode=1;
3104	        sprop-parameter-sets=<base-64 data#1>

3106	   In the following example, the offer is accepted without level
3107	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3108	   parameter-sets" is present in the offer, meaning that there is no
3109	   out-of-band transmission of parameter sets, which then have to be
3110	   transported in-band.

3112	      Offer SDP:

3114	      m=video 49170 RTP/AVP 98
3115	      a=rtpmap:98 H264/90000
3116	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3117	        packetization-mode=1

3119	      Answer SDP:

3121	      m=video 49170 RTP/AVP 98
3122	      a=rtpmap:98 H264/90000
3123	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3124	        packetization-mode=1

3126	   In the following example, the offer is accepted with level
3127	   downgrading and "sprop-parameter-sets" is present in the offer.  As
3128	   sprop-parameter-sets=<base-64 data#0> contains level_idc indicating
3129	   Level 3.0, therefore cannot be used as the answerer wants Level 2.0
3130	   and must be ignored by the answerer, and in-band parameter sets must
3131	   be used.

3133	      Offer SDP:

3135	      m=video 49170 RTP/AVP 98
3136	      a=rtpmap:98 H264/90000
3137	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3138	        packetization-mode=1;
3139	        sprop-parameter-sets=<base-64 data#0>

3141	      Answer SDP:

3143	      m=video 49170 RTP/AVP 98
3144	      a=rtpmap:98 H264/90000
3145	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3146	        packetization-mode=1

3148	   In the following example, the offer is also accepted with level
3149	   downgrading, and neither "sprop-parameter-sets" nor "sprop-level-
3150	   parameter-sets" is present in the offer, meaning that there is no
3151	   out-of-band transmission of parameter sets, which then have to be
3152	   transported in-band.

3154	      Offer SDP:

3156	      m=video 49170 RTP/AVP 98
3157	      a=rtpmap:98 H264/90000
3158	      a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
3159	        packetization-mode=1

3161	      Answer SDP:

3163	      m=video 49170 RTP/AVP 98
3164	      a=rtpmap:98 H264/90000
3165	      a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
3166	        packetization-mode=1

3168	8.4. Parameter Set Considerations

3170	   The H.264 parameter sets are a fundamental part of the video codec
3171	   and vital to its operation; see section 1.2.  Due to their
3172	   characteristics and their importance for the decoding process, lost
3173	   or erroneously transmitted parameter sets can hardly be concealed
3174	   locally at the receiver.  A reference to a corrupt parameter set has
3175	   normally fatal results to the decoding process.  Corruption could
3176	   occur, for example, due to the erroneous transmission or loss of a
3177	   parameter set NAL unit, but also due to the untimely transmission of
3178	   a parameter set update.  A parameter set update refers to a change of
3179	   at least one parameter in a picture parameter set or sequence
3180	   parameter set for which the picture parameter set or sequence
3181	   parameter set identifier remains unchanged.  Therefore, the following
3182	   recommendations are provided as a guideline for the implementer of
3183	   the RTP sender.

3185	   Parameter set NALUs can be transported using three different
3186	   principles:

3188	   A. Using a session control protocol (out-of-band) prior to the actual
3189	     RTP session.

3191	   B. Using a session control protocol (out-of-band) during an ongoing
3192	     RTP session.

3194	   C. Within the RTP packet stream in the payload (in-band) during an
3195	     ongoing RTP session.

3197	   It is recommended to implement principles A and B within a session
3198	   control protocol.  SIP and SDP can be used as described in the SDP
3199	   Offer/Answer model and in the previous sections of this memo.  This
3200	   section contains guidelines on how principles A and B should be
3201	   implemented within session control protocols.  It is independent of
3202	   the particular protocol used.  Principle C is supported by the RTP
3203	   payload format defined in this specification.  There are topologies
3204	   like Topo-Video-switch-MCU [28] for which the use of principle C may
3205	   be desirable.

3207	   If in-band signaling of parameter sets is used, the picture and
3208	   sequence parameter set NALUs SHOULD be transmitted in the RTP payload
3209	   using a reliable method of delivering of RTP (see below), as a loss
3210	   of a parameter set of either type will likely prevent decoding of a
3211	   considerable portion of the corresponding RTP packet stream.

3213	   If in-band signaling of parameter sets is used, the sender SHOULD
3214	   take the error characteristics into account and use mechanisms to
3215	   provide a high probability for delivering the parameter sets
3216	   correctly.  Mechanisms that increase the probability for a correct
3217	   reception include packet repetition, FEC, and retransmission.  The
3218	   use of an unreliable, out-of-band control protocol has similar
3219	   disadvantages as the in-band signaling (possible loss) and, in
3220	   addition, may also lead to difficulties in the synchronization (see
3221	   below).  Therefore, it is NOT RECOMMENDED.

3223	   Parameter sets MAY be added or updated during the lifetime of a
3224	   session using principles B and C.  It is required that parameter sets
3225	   are present at the decoder prior to the NAL units that refer to them.
3226	   Updating or adding of parameter sets can result in further problems,
3227	   and therefore the following recommendations should be considered.

3229	   - When parameter sets are added or updated, care SHOULD be taken to
3230	     ensure that any parameter set is delivered prior to its usage.
3231	     When new parameter sets are added, previously unused parameter set
3232	     identifiers are used.  It is common that no synchronization is
3233	     present between out-of-band signaling and in-band traffic.  If
3234	     out-of-band signaling is used, it is RECOMMENDED that a sender
3235	     does not start sending NALUs requiring the added or updated
3236	     parameter sets prior to acknowledgement of delivery from the
3237	     signaling protocol.

3239	   - When parameter sets are updated, the following synchronization
3240	     issue should be taken into account.  When overwriting a parameter
3241	     set at the receiver, the sender has to ensure that the parameter
3242	     set in question is not needed by any NALU present in the network
3243	     or receiver buffers.  Otherwise, decoding with a wrong parameter
3244	     set may occur.  To lessen this problem, it is RECOMMENDED either
3245	     to overwrite only those parameter sets that have not been used for
3246	     a sufficiently long time (to ensure that all related NALUs have
3247	     been consumed), or to add a new parameter set instead (which may
3248	     have negative consequences for the efficiency of the video
3249	     coding).

3251	         Informative note: In some topologies like Topo-Video-switch-
3252	         MCU [28] the origin of the whole set of parameter sets may
3253	         come from multiple sources that may use non-unique parameter
3254	         sets identifiers.  In this case an offer may overwrite an
3255	         existing parameter set if no other mechanism that enables
3256	         uniqueness of the parameter sets in the out-of-band channel
3257	         exists.

3259	   - In a multiparty session, one participant MUST associate parameter
3260	     sets coming from different sources with the source identification
3261	     whenever possible, e.g. by using sprop-ssrc for out-of-band
3262	     transported parameter sets, as different sources typically use
3263	     independent parameter set identifier value spaces.

3265	   - Adding or modifying parameter sets by using both principles B and
3266	     C in the same RTP session may lead to inconsistencies of the
3267	     parameter sets because of the lack of synchronization between the
3268	     control and the RTP channel.  Therefore, principles B and C MUST
3269	     NOT both be used in the same session unless sufficient
3270	     synchronization can be provided.

3272	   In some scenarios (e.g., when only the subset of this payload format
3273	   specification corresponding to H.241 is used) or topologies, it is
3274	   not possible to employ out-of-band parameter set transmission.  In
3275	   this case, parameter sets have to be transmitted in-band.  Here, the
3276	   synchronization with the non-parameter-set-data in the bitstream is
3277	   implicit, but the possibility of a loss has to be taken into account.
3278	   The loss probability should be reduced using the mechanisms discussed
3279	   above.  In case a loss of a parameter set is detected, recovery may
3280	   be achieved by using a Decoder Refresh Point procedure, for example,
3281	   using RTCP feedback Full Intra Request (FIR) [29].  Two example
3282	   Decoder Refresh Point procedures are provided in the informative
3283	   Section 8.5.

3285	   - When parameter sets are initially provided using principle A and
3286	     then later added or updated in-band (principle C), there is a risk
3287	     associated with updating the parameter sets delivered out-of-band.
3288	     If receivers miss some in-band updates (for example, because of a
3289	     loss or a late tune-in), those receivers attempt to decode the
3290	     bitstream using out-dated parameters.  It is therefore RECOMMENDED
3291	     that parameter set IDs be partitioned between the out-of-band and
3292	     in-band parameter sets.

3294	8.5. Decoder Refresh Point Procedure using In-Band Transport of
3295	   Parameter Sets (Informative)

3297	   When a sender with a video encoder according to [1] receives a
3298	   request for a decoder refresh point, the encoder shall enter the fast
3299	   update mode by using one of the procedures specified in Section 8.5.1
3300	   or 8.5.2 below.  The procedure in 8.5.1 is the preferred response in
3301	   a lossless transmission environment.  Both procedures satisfy the
3302	   requirement to enter the fast update mode for H.264 video encoding.

3304	8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point

3306	   This section gives one possible way to respond to a request for a
3307	   decoder refresh point.

3309	   The encoder shall, in the order presented here:

3311	   1) Immediately prepare to send an IDR picture.

3313	   2) Send a sequence parameter set to be used by the IDR picture to be
3314	     sent. The encoder may optionally also send other sequence
3315	     parameter sets.

3317	   3) Send a picture parameter set to be used by the IDR picture to be
3318	     sent. The encoder may optionally also send other picture parameter
3319	     sets.

3321	   4) Send the IDR picture.

3323	   5) From this point forward in time, send any other sequence or
3324	     picture parameter sets that have not yet been sent in this
3325	     procedure, prior to their reference by any NAL unit, regardless of
3326	     whether such parameter sets were previously sent prior to
3327	     receiving the request for a decoder refresh point.  As needed,
3328	     such parameter sets may be sent in a batch, one at a time, or in
3329	     any combination of these two methods.  Parameter sets may be re-
3330	     sent at any time for redundancy.  Caution should be taken when
3331	     parameter set updates are present, as described above in Section
3332	     8.4.

3334	8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder
3335	   Refresh Point

3337	   This section gives another possible way to respond to a request for a
3338	   decoder refresh point.

3340	   The encoder shall, in the order presented here:

3342	   1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of
3343	     [1]).

3345	   2) Repeat any sequence and picture parameter sets that were sent
3346	     before the recovery point SEI message, prior to their reference by
3347	     a NAL unit.

3349	   The encoder shall ensure that the decoder has access to all reference
3350	   pictures for inter prediction of pictures at or after the recovery
3351	   point, which is indicated by the recovery point SEI message, in
3352	   output order, assuming that the transmission from now on is error-
3353	   free.

3355	   The value of the recovery_frame_cnt syntax element in the recovery
3356	   point SEI message should be small enough to ensure a fast recovery.

3358	   As needed, such parameter sets may be re-sent in a batch, one at a
3359	   time, or in any combination of these two methods.  Parameter sets may
3360	   be re-sent at any time for redundancy.  Caution should be taken when
3361	   parameter set updates are present, as described above in Section 8.4.

3363	9. Security Considerations

3365	   RTP packets using the payload format defined in this specification
3366	   are subject to the security considerations discussed in the RTP
3367	   specification [5], and in any appropriate RTP profile (for example,
3368	   [15]).  This implies that confidentiality of the media streams is
3369	   achieved by encryption; for example, through the application of SRTP

3371	   [25].  Because the data compression used with this payload format is
3372	   applied end-to-end, any encryption needs to be performed after
3373	   compression.  A potential denial-of-service threat exists for data
3374	   encodings using compression techniques that have non-uniform
3375	   receiver-end computational load.  The attacker can inject
3376	   pathological datagrams into the stream that are complex to decode and
3377	   that cause the receiver to be overloaded.  H.264 is particularly
3378	   vulnerable to such attacks, as it is extremely simple to generate
3379	   datagrams containing NAL units that affect the decoding process of
3380	   many future NAL units.  Therefore, the usage of data origin
3381	   authentication and data integrity protection of at least the RTP
3382	   packet is RECOMMENDED; for example, with SRTP [25].

3384	   Note that the appropriate mechanism to ensure confidentiality and
3385	   integrity of RTP packets and their payloads is very dependent on the
3386	   application and on the transport and signaling protocols employed.
3387	   Thus, although SRTP is given as an example above, other possible
3388	   choices exist.

3390	   Decoders MUST exercise caution with respect to the handling of user
3391	   data SEI messages, particularly if they contain active elements, and
3392	   MUST restrict their domain of applicability to the presentation
3393	   containing the stream.

3395	   End-to-End security with either authentication, integrity or
3396	   confidentiality protection will prevent a MANE from performing media-
3397	   aware operations other than discarding complete packets.  And in the
3398	   case of confidentiality protection it will even be prevented from
3399	   performing discarding of packets in a media aware way.  To allow any
3400	   MANE to perform its operations, it will be required to be a trusted
3401	   entity which is included in the security context establishment.

3403	10. Congestion Control

3405	   Congestion control for RTP SHALL be used in accordance with RFC 3550
3406	   [5], and with any applicable RTP profile; e.g., RFC 3551 [15].  An
3407	   additional requirement if best-effort service is being used is: users
3408	   of this payload format MUST monitor packet loss to ensure that the
3409	   packet loss rate is within acceptable parameters.  Packet loss is
3410	   considered acceptable if a TCP flow across the same network path, and
3411	   experiencing the same network conditions, would achieve an average
3412	   throughput, measured on a reasonable timescale that is not less than
3413	   the RTP flow is achieving.  This condition can be satisfied by
3414	   implementing congestion control mechanisms to adapt the transmission
3415	   rate (or the number of layers subscribed for a layered multicast
3416	   session), or by arranging for a receiver to leave the session if the
3417	   loss rate is unacceptably high.

3419	   The bit rate adaptation necessary for obeying the congestion control
3420	   principle is easily achievable when real-time encoding is used.
3421	   However, when pre-encoded content is being transmitted, bandwidth
3422	   adaptation requires the availability of more than one coded
3423	   representation of the same content, at different bit rates, or the
3424	   existence of non-reference pictures or sub-sequences [21] in the
3425	   bitstream.  The switching between the different representations can
3426	   normally be performed in the same RTP session; e.g., by employing a
3427	   concept known as SI/SP slices of the Extended Profile, or by
3428	   switching streams at IDR picture boundaries.  Only when non-
3429	   downgradable parameters (such as the profile part of the
3430	   profile/level ID) are required to be changed does it become necessary
3431	   to terminate and re-start the media stream.  This may be accomplished
3432	   by using a different RTP payload type.

3434	   MANEs MAY follow the suggestions outlined in section 7.3 and remove
3435	   certain unusable packets from the packet stream when that stream was
3436	   damaged due to previous packet losses.  This can help reduce the
3437	   network load in certain special cases.

3439	11. IANA Consideration

3441	   IANA has registered one new media type; see section 8.1.

3443	12. Informative Appendix: Application Examples

3445	   This payload specification is very flexible in its use, in order to
3446	   cover the extremely wide application space anticipated for H.264.
3447	   However, this great flexibility also makes it difficult for an
3448	   implementer to decide on a reasonable packetization scheme.  Some
3449	   information on how to apply this specification to real-world
3450	   scenarios is likely to appear in the form of academic publications
3451	   and a test model software and description in the near future.
3452	   However, some preliminary usage scenarios are described here as well.

3454	12.1. Video Telephony according to ITU-T Recommendation H.241 Annex A

3456	   H.323-based video telephony systems that use H.264 as an optional
3457	   video compression scheme are required to support H.241 Annex A [3] as
3458	   a packetization scheme.  The packetization mechanism defined in this
3459	   Annex is technically identical with a small subset of this
3460	   specification.

3462	   When a system operates according to H.241 Annex A, parameter set NAL
3463	   units are sent in-band.  Only Single NAL unit packets are used.  Many
3464	   such systems are not sending IDR pictures regularly, but only when
3465	   required by user interaction or by control protocol means; e.g., when
3466	   switching between video channels in a Multipoint Control Unit or for
3467	   error recovery requested by feedback.

3469	12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit
3470	   Aggregation

3472	   The RTP part of this scheme is implemented and tested (though not the
3473	   control-protocol part; see below).

3475	   In most real-world video telephony applications, picture parameters
3476	   such as picture size or optional modes never change during the
3477	   lifetime of a connection.  Therefore, all necessary parameter sets
3478	   (usually only one) are sent as a side effect of the capability
3479	   exchange/announcement process, e.g., according to the SDP syntax
3480	   specified in section 8.2 of this document.  As all necessary
3481	   parameter set information is established before the RTP session
3482	   starts, there is no need for sending any parameter set NAL units.
3483	   Slice data partitioning is not used, either.  Thus, the RTP packet
3484	   stream basically consists of NAL units that carry single coded
3485	   slices.

3487	   The encoder chooses the size of coded slice NAL units so that they
3488	   offer the best performance.  Often, this is done by adapting the
3489	   coded slice size to the MTU size of the IP network.  For small
3490	   picture sizes, this may result in a one-picture-per-one-packet
3491	   strategy.  Intra refresh algorithms clean up the loss of packets and
3492	   the resulting drift-related artifacts.

3494	12.3. Video Telephony, Interleaved Packetization Using NAL Unit
3495	   Aggregation

3497	   This scheme allows better error concealment and is used in H.263
3498	   based designs using RFC 2429 packetization [10].  It has been
3499	   implemented, and good results were reported [12].

3501	   The VCL encoder codes the source picture so that all macroblocks
3502	   (MBs) of one MB line are assigned to one slice.  All slices with even
3503	   MB row addresses are combined into one STAP, and all slices with odd
3504	   MB row addresses into another.  Those STAPs are transmitted as RTP
3505	   packets.  The establishment of the parameter sets is performed as
3506	   discussed above.

3508	   Note that the use of STAPs is essential here, as the high number of
3509	   individual slices (18 for a CIF picture) would lead to unacceptably
3510	   high IP/UDP/RTP header overhead (unless the source coding tool FMO is
3511	   used, which is not assumed in this scenario).  Furthermore, some
3512	   wireless video transmission systems, such as H.324M and the IP-based
3513	   video telephony specified in 3GPP, are likely to use relatively small
3514	   transport packet size.  For example, a typical MTU size of H.223 AL3
3515	   SDU is around 100 bytes [16].  Coding individual slices according to
3516	   this packetization scheme provides further advantage in communication
3517	   between wired and wireless networks, as individual slices are likely
3518	   to be smaller than the preferred maximum packet size of wireless
3519	   systems.  Consequently, a gateway can convert the STAPs used in a
3520	   wired network into several RTP packets with only one NAL unit, which
3521	   are preferred in a wireless network, and vice versa.

3523	12.4. Video Telephony with Data Partitioning

3525	   This scheme has been implemented and has been shown to offer good
3526	   performance, especially at higher packet loss rates [12].

3528	   Data Partitioning is known to be useful only when some form of
3529	   unequal error protection is available.  Normally, in single-session
3530	   RTP environments, even error characteristics are assumed; i.e., the
3531	   packet loss probability of all packets of the session is the same
3532	   statistically.  However, there are means to reduce the packet loss
3533	   probability of individual packets in an RTP session.  A FEC packet
3534	   according to RFC 2733 [17], for example, specifies which media
3535	   packets are associated with the FEC packet.

3537	   In all cases, the incurred overhead is substantial but is in the same
3538	   order of magnitude as the number of bits that have otherwise been
3539	   spent for intra information.  However, this mechanism does not add
3540	   any delay to the system.

3542	   Again, the complete parameter set establishment is performed through
3543	   control protocol means.

3545	12.5. Video Telephony or Streaming with FUs and Forward Error Correction

3547	   This scheme has been implemented and has been shown to provide good
3548	   performance, especially at higher packet loss rates [18].

3550	   The most efficient means to combat packet losses for scenarios where
3551	   retransmissions are not applicable is forward error correction (FEC).
3552	   Although application layer, end-to-end use of FEC is often less
3553	   efficient than an FEC-based protection of individual links
3554	   (especially when links of different characteristics are in the
3555	   transmission path), application layer, end-to-end FEC is unavoidable
3556	   in some scenarios.  RFC 2733 [17] provides means to use generic,
3557	   application layer, end-to-end FEC in packet-loss environments.  A
3558	   binary forward error correcting code is generated by applying the XOR
3559	   operation to the bits at the same bit position in different packets.

3561	   The binary code can be specified by the parameters (n,k) in which k
3562	   is the number of information packets used in the connection and n is
3563	   the total number of packets generated for k information packets;
3564	   i.e., n-k parity packets are generated for k information packets.
3565	   [Ed. (YkW): from Randell: References to RFC 2733 should be updated to
3566	   (and checked against) RFC 5109.  There are a lot of calculations and
3567	   the like that should be checked.  Also update [17] to RFC 5109. ]

3569	   When a code is used with parameters (n,k) within the RFC 2733
3570	   framework, the following properties are well known:

3572	   a) If applied over one RTP packet, RFC 2733 provides only packet
3573	     repetition.

3575	   b) RFC 2733 is most bit rate efficient if XOR-connected packets have
3576	     equal length.

3578	   c) At the same packet loss probability p and for a fixed k, the
3579	     greater the value of n is, the smaller the residual error
3580	     probability becomes.  For example, for a packet loss probability
3581	     of 10%, k=1, and n=2, the residual error probability is about 1%,
3582	     whereas for n=3, the residual error probability is about 0.1%.

3584	   d) At the same packet loss probability p and for a fixed code rate
3585	     k/n, the greater the value of n is, the smaller the residual error
3586	     probability becomes.  For example, at a packet loss probability of
3587	     p=10%, k=1 and n=2, the residual error rate is about 1%, whereas
3588	     for an extended Golay code with k=12 and n=24, the residual error
3589	     rate is about 0.01%.

3591	   For applying RFC 2733 in combination with H.264 baseline coded video
3592	   without using FUs, several options might be considered:

3594	   1) The video encoder produces NAL units for which each video frame is
3595	     coded in a single slice.  Applying FEC, one could use a simple
3596	     code; e.g., (n=2, k=1).  That is, each NAL unit would basically
3597	     just be repeated.  The disadvantage is obviously the bad code
3598	     performance according to d), above, and the low flexibility, as
3599	     only (n, k=1) codes can be used.

3601	   2) The video encoder produces NAL units for which each video frame is
3602	     encoded in one or more consecutive slices.  Applying FEC, one
3603	     could use a better code, e.g., (n=24, k=12), over a sequence of
3604	     NAL units.  Depending on the number of RTP packets per frame, a
3605	     loss may introduce a significant delay, which is reduced when more
3606	     RTP packets are used per frame.  Packets of completely different
3607	     length might also be connected, which decreases bit rate
3608	     efficiency according to b), above.  However, with some care and
3609	     for slices of 1kb or larger, similar length (100-200 bytes
3610	     difference) may be produced, which will not lower the bit
3611	     efficiency catastrophically.

3613	   3) The video encoder produces NAL units, for which a certain frame
3614	     contains k slices of possibly almost equal length.  Then, applying
3615	     FEC, a better code, e.g., (n=24, k=12), can be used over the
3616	     sequence of NAL units for each frame.  The delay compared to that
3617	     of 2), above,  may be reduced, but several disadvantages are
3618	     obvious.  First, the coding efficiency of the encoded video is
3619	     lowered significantly, as slice-structured coding reduces intra-
3620	     frame prediction and additional slice overhead is necessary.
3621	     Second, pre-encoded content or, when operating over a gateway, the
3622	     video is usually not appropriately coded with k slices such that
3623	     FEC can be applied.  Finally, the encoding of video producing k
3624	     slices of equal length is not straightforward and might require
3625	     more than one encoding pass.

3627	   Many of the mentioned disadvantages can be avoided by applying FUs in
3628	   combination with FEC.  Each NAL unit can be split into any number of
3629	   FUs of basically equal length; therefore, FEC with a reasonable k and
3630	   n can be applied, even if the encoder made no effort to produce
3631	   slices of equal length.  For example, a coded slice NAL unit
3632	   containing an entire frame can be split to k FUs, and a parity check
3633	   code (n=k+1, k) can be applied.  However, this has the disadvantage
3634	   that unless all created fragments can be recovered, the whole slice
3635	   will be lost.  Thus a larger section is lost than would be if the
3636	   frame had been split into several slices.

3638	   The presented technique makes it possible to achieve good
3639	   transmission error tolerance, even if no additional source coding
3640	   layer redundancy (such as periodic intra frames) is present.
3641	   Consequently, the same coded video sequence can be used to achieve
3642	   the maximum compression efficiency and quality over error-free
3643	   transmission and for transmission over error-prone networks.
3644	   Furthermore, the technique allows the application of FEC to pre-
3645	   encoded sequences without adding delay.  In this case, pre-encoded
3646	   sequences that are not encoded for error-prone networks can still be
3647	   transmitted almost reliably without adding extensive delays.  In
3648	   addition, FUs of equal length result in a bit rate efficient use of
3649	   RFC 2733.

3651	   If the error probability depends on the length of the transmitted
3652	   packet (e.g., in case of mobile transmission [14]), the benefits of
3653	   applying FUs with FEC are even more obvious.  Basically, the
3654	   flexibility of the size of FUs allows appropriate FEC to be applied
3655	   for each NAL unit and unequal error protection of NAL units.

3657	   When FUs and FEC are used, the incurred overhead is substantial but
3658	   is in the same order of magnitude as the number of bits that have to
3659	   be spent for intra-coded macroblocks if no FEC is applied.  In [18],
3660	   it was shown that the overall performance of the FEC-based approach
3661	   enhanced quality when using the same error rate and same overall bit
3662	   rate, including the overhead.

3664	12.6. Low Bit-Rate Streaming

3666	   This scheme has been implemented with H.263 and non-standard RTP
3667	   packetization and has given good results [19].  There is no technical
3668	   reason why similarly good results could not be achievable with H.264.

3670	   In today's Internet streaming, some of the offered bit rates are
3671	   relatively low in order to allow terminals with dial-up modems to
3672	   access the content.  In wired IP networks, relatively large packets,
3673	   say 500 - 1500 bytes, are preferred to smaller and more frequently
3674	   occurring packets in order to reduce network congestion.  Moreover,
3675	   use of large packets decreases the amount of RTP/UDP/IP header
3676	   overhead.  For low bit-rate video, the use of large packets means
3677	   that sometimes up to few pictures should be encapsulated in one
3678	   packet.

3680	   However, loss of a packet including many coded pictures would have
3681	   drastic consequences for visual quality, as there is practically no
3682	   other way to conceal a loss of an entire picture than to repeat the
3683	   previous one.  One way to construct relatively large packets and
3684	   maintain possibilities for successful loss concealment is to
3685	   construct MTAPs that contain interleaved slices from several
3686	   pictures.  An MTAP should not contain spatially adjacent slices from
3687	   the same picture or spatially overlapping slices from any picture.
3688	   If a packet is lost, it is likely that a lost slice is surrounded by
3689	   spatially adjacent slices of the same picture and spatially
3690	   corresponding slices of the temporally previous and succeeding
3691	   pictures.  Consequently, concealment of the lost slice is likely to
3692	   be relatively successful.

3694	12.7. Robust Packet Scheduling in Video Streaming

3696	   Robust packet scheduling has been implemented with MPEG-4 Part 2 and
3697	   simulated in a wireless streaming environment [20].  There is no
3698	   technical reason why similar or better results could not be
3699	   achievable with H.264.

3701	   Streaming clients typically have a receiver buffer that is capable of
3702	   storing a relatively large amount of data.  Initially, when a
3703	   streaming session is established, a client does not start playing the
3704	   stream back immediately.  Rather, it typically buffers the incoming
3705	   data for a few seconds.  This buffering helps maintain continuous
3706	   playback, as, in case of occasional increased transmission delays or
3707	   network throughput drops, the client can decode and play buffered
3708	   data.  Otherwise, without initial buffering, the client has to freeze
3709	   the display, stop decoding, and wait for incoming data.  The
3710	   buffering is also necessary for either automatic or selective
3711	   retransmission in any protocol level.  If any part of a picture is
3712	   lost, a retransmission mechanism may be used to resend the lost data.
3713	   If the retransmitted data is received before its scheduled decoding
3714	   or playback time, the loss is recovered perfectly.  Coded pictures
3715	   can be ranked according to their importance in the subjective quality
3716	   of the decoded sequence.  For example, non-reference pictures, such
3717	   as conventional B pictures, are subjectively least important, as
3718	   their absence does not affect decoding of any other pictures.  In
3719	   addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496-
3720	   10 standard includes a temporal scalability method called sub-
3721	   sequences [21].  Subjective ranking can also be made on coded slice
3722	   data partition or slice group basis.  Coded slices and coded slice
3723	   data partitions that are subjectively the most important can be sent
3724	   earlier than their decoding order indicates, whereas coded slices and
3725	   coded slice data partitions that are subjectively the least important
3726	   can be sent later than their natural coding order indicates.
3727	   Consequently, any retransmitted parts of the most important slices
3728	   and coded slice data partitions are more likely to be received before
3729	   their scheduled decoding or playback time compared to the least
3730	   important slices and slice data partitions.

3732	13. Informative Appendix: Rationale for Decoding Order Number

3734	13.1. Introduction

3736	   The Decoding Order Number (DON) concept was introduced mainly to
3737	   enable efficient multi-picture slice interleaving (see section 12.6)
3738	   and robust packet scheduling (see section 12.7).  In both of these
3739	   applications, NAL units are transmitted out of decoding order.  DON
3740	   indicates the decoding order of NAL units and should be used in the
3741	   receiver to recover the decoding order.  Example use cases for
3742	   efficient multi-picture slice interleaving and for robust packet
3743	   scheduling are given in sections 13.2 and 13.3, respectively.
3744	   Section 13.4 describes the benefits of the DON concept in error
3745	   resiliency achieved by redundant coded pictures.  Section 13.5
3746	   summarizes considered alternatives to DON and justifies why DON was
3747	   chosen to this RTP payload specification.

3749	13.2. Example of Multi-Picture Slice Interleaving

3751	   An example of multi-picture slice interleaving follows.  A subset of
3752	   a coded video sequence is depicted below in output order.  R denotes
3753	   a reference picture, N denotes a non-reference picture, and the
3754	   number indicates a relative output time.

3756	      ... R1 N2 R3 N4 R5 ...

3758	   The decoding order of these pictures from left to right is as
3759	   follows:

3761	      ... R1 R3 N2 R5 N4 ...

3763	   The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a
3764	   DON equal to 1, 2, 3, 4, and 5, respectively.

3766	   Each reference picture consists of three slice groups that are
3767	   scattered as follows (a number denotes the slice group number for
3768	   each macroblock in a QCIF frame):

3770	      0 1 2 0 1 2 0 1 2 0 1
3771	      2 0 1 2 0 1 2 0 1 2 0
3772	      1 2 0 1 2 0 1 2 0 1 2
3773	      0 1 2 0 1 2 0 1 2 0 1
3774	      2 0 1 2 0 1 2 0 1 2 0
3775	      1 2 0 1 2 0 1 2 0 1 2
3776	      0 1 2 0 1 2 0 1 2 0 1
3777	      2 0 1 2 0 1 2 0 1 2 0
3778	      1 2 0 1 2 0 1 2 0 1 2

3780	   For the sake of simplicity, we assume that all the macroblocks of a
3781	   slice group are included in one slice.  Three MTAPs are constructed
3782	   from three consecutive reference pictures so that each MTAP contains
3783	   three aggregation units, each of which contains all the macroblocks
3784	   from one slice group.  The first MTAP contains slice group 0 of
3785	   picture R1, slice group 1 of picture R3, and slice group 2 of picture
3786	   R5.  The second MTAP contains slice group 1 of picture R1, slice
3787	   group 2 of picture R3, and slice group 0 of picture R5.  The third
3788	   MTAP contains slice group 2 of picture R1, slice group 0 of picture
3789	   R3, and slice group 1 of picture R5.  Each non-reference picture is
3790	   encapsulated into an STAP-B.

3792	   Consequently, the transmission order of NAL units is the following:

3794	      R1, slice group 0, DON 1, carried in MTAP,RTP SN: N
3795	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N
3796	      R5, slice group 2, DON 4, carried in MTAP,RTP SN: N
3797	      R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1
3798	      R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1
3799	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1
3800	      R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2
3801	      R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2
3802	      R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2
3803	      N2, DON 3, carried in STAP-B, RTP SN: N+3
3804	      N4, DON 5, carried in STAP-B, RTP SN: N+4

3806	   The receiver is able to organize the NAL units back in decoding order
3807	   based on the value of DON associated with each NAL unit.

3809	   If one of the MTAPs is lost, the spatially adjacent and temporally
3810	   co-located macroblocks are received and can be used to conceal the
3811	   loss efficiently.  If one of the STAPs is lost, the effect of the
3812	   loss does not propagate temporally.

3814	13.3. Example of Robust Packet Scheduling

3816	   An example of robust packet scheduling follows.  The communication
3817	   system used in the example consists of the following components in
3818	   the order that the video is processed from source to sink:

3820	      o camera and capturing
3821	      o pre-encoding buffer
3822	      o encoder
3823	      o encoded picture buffer
3824	      o transmitter
3825	      o transmission channel
3826	      o receiver
3827	      o receiver buffer
3828	      o decoder
3829	      o decoded picture buffer
3830	      o display

3832	   The video communication system used in the example operates as
3833	   follows.  Note that processing of the video stream happens gradually
3834	   and at the same time in all components of the system.  The source
3835	   video sequence is shot and captured to a pre-encoding buffer.  The
3836	   pre-encoding buffer can be used to order pictures from sampling order
3837	   to encoding order or to analyze multiple uncompressed frames for bit
3838	   rate control purposes, for example.  In some cases, the pre-encoding
3839	   buffer may not exist; instead, the sampled pictures are encoded right
3840	   away.  The encoder encodes pictures from the pre-encoding buffer and
3841	   stores the output; i.e., coded pictures, to the encoded picture
3842	   buffer.  The transmitter encapsulates the coded pictures from the
3843	   encoded picture buffer to transmission packets and sends them to a
3844	   receiver through a transmission channel.  The receiver stores the
3845	   received packets to the receiver buffer.  The receiver buffering
3846	   process typically includes buffering for transmission delay jitter.
3847	   The receiver buffer can also be used to recover correct decoding
3848	   order of coded data.  The decoder reads coded data from the receiver
3849	   buffer and produces decoded pictures as output into the decoded
3850	   picture buffer.  The decoded picture buffer is used to recover the
3851	   output (or display) order of pictures.  Finally, pictures are
3852	   displayed.

3854	   In the following example figures, I denotes an IDR picture, R denotes
3855	   a reference picture, N denotes a non-reference picture, and the
3856	   number after I, R, or N indicates the sampling time relative to the
3857	   previous IDR picture in decoding order.  Values below the sequence of
3858	   pictures indicate scaled system clock timestamps.  The system clock
3859	   is initialized arbitrarily in this example, and time runs from left
3860	   to right.  Each I, R, and N picture is mapped into the same timeline
3861	   compared to the previous processing step, if any, assuming that
3862	   encoding, transmission, and decoding take no time.  Thus, events
3863	   happening at the same time are located in the same column throughout
3864	   all example figures.

3866	   A subset of a sequence of coded pictures is depicted below in
3867	   sampling order.

3869	       ...  N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ...
3870	       ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ...
3871	       ...  58  59  60  61  62  63  64  65  66  ... 128 129 130 131 ...

3873	             Figure 16  Sequence of pictures in sampling order

3875	   The sampled pictures are buffered in the pre-encoding buffer to
3876	   arrange them in encoding order.  In this example, we assume that the
3877	   non-reference pictures are predicted from both the previous and the
3878	   next reference picture in output order, except for the non-reference
3879	   pictures immediately preceding an IDR picture, which are predicted
3880	   only from the previous reference picture in output order.  Thus, the
3881	   pre-encoding buffer has to contain at least two pictures, and the
3882	   buffering causes a delay of two picture intervals.  The output of the
3883	   pre-encoding buffering process and the encoding (and decoding) order
3884	   of the pictures are as follows:

3886	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3887	       ... -|---|---|---|---|---|---|---|---|- ...
3888	       ... 60  61  62  63  64  65  66  67  68  ...

3890	         Figure 17  Re-ordered pictures in the pre-encoding buffer

3892	   The encoder or the transmitter can set the value of DON for each
3893	   picture to a value of DON for the previous picture in decoding order
3894	   plus one.

3896	   For the sake of simplicity, let us assume that:

3898	   o  the frame rate of the sequence is constant,
3899	   o  each picture consists of only one slice,
3900	   o  each slice is encapsulated in a single NAL unit packet,
3901	   o  there is no transmission delay, and
3902	   o  pictures are transmitted at constant intervals (that is, 1 /
3903	   (frame rate)).

3905	   When pictures are transmitted in decoding order, they are received as
3906	   follows:

3908	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3909	       ... -|---|---|---|---|---|---|---|---|- ...
3910	       ... 60  61  62  63  64  65  66  67  68  ...

3912	              Figure 18  Received pictures in decoding order

3914	   The OPTIONAL sprop-interleaving-depth media type parameter is set to
3915	   0, as the transmission (or reception) order is identical to the
3916	   decoding order.

3918	   The decoder has to buffer for one picture interval initially in its
3919	   decoded picture buffer to organize pictures from decoding order to
3920	   output order as depicted below:

3922	        ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ...
3923	        ... -|---|---|---|---|---|---|---|---|- ...
3924	        ... 61  62  63  64  65  66  67  68  69  ...

3926	                          Figure 19  Output order

3928	   The amount of required initial buffering in the decoded picture
3929	   buffer can be signaled in the buffering period SEI message or with
3930	   the num_reorder_frames syntax element of H.264 video usability
3931	   information.  num_reorder_frames indicates the maximum number of
3932	   frames, complementary field pairs, or non-paired fields that precede
3933	   any frame, complementary field pair, or non-paired field in the
3934	   sequence in decoding order and that follow it in output order.  For
3935	   the sake of simplicity, we assume that num_reorder_frames is used to
3936	   indicate the initial buffer in the decoded picture buffer.  In this
3937	   example, num_reorder_frames is equal to 1.

3939	   It can be observed that if the IDR picture I00 is lost during
3940	   transmission and a retransmission request is issued when the value of
3941	   the system clock is 62, there is one picture interval of time (until
3942	   the system clock reaches timestamp 63) to receive the retransmitted
3943	   IDR picture I00.

3945	   Let us then assume that IDR pictures are transmitted two frame
3946	   intervals earlier than their decoding position; i.e., the pictures
3947	   are transmitted as follows:

3949	        ...  I00 N58 N59 R03 N01 N02 R06 N04 N05 ...
3950	        ... --|---|---|---|---|---|---|---|---|- ...
3951	        ...  62  63  64  65  66  67  68  69  70  ...

3953	       Figure 20  Interleaving: Early IDR pictures in sending order

3955	   The OPTIONAL sprop-interleaving-depth media type parameter is set
3956	   equal to 1 according to its definition.  (The value of sprop-
3957	   interleaving-depth in this example can be derived as follows: Picture
3958	   I00 is the only picture preceding picture N58 or N59 in transmission
3959	   order and following it in decoding order.  Except for pictures I00,
3960	   N58, and N59, the transmission order is the same as the decoding
3961	   order of pictures.  As a coded picture is encapsulated into exactly
3962	   one NAL unit, the value of sprop-interleaving-depth is equal to the
3963	   maximum number of pictures preceding any picture in transmission
3964	   order and following the picture in decoding order.)

3966	   The receiver buffering process contains two pictures at a time
3967	   according to the value of the sprop-interleaving-depth parameter and
3968	   orders pictures from the reception order to the correct decoding
3969	   order based on the value of DON associated with each picture.  The
3970	   output of the receiver buffering process is as follows:

3972	       ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ...
3973	       ... -|---|---|---|---|---|---|---|---|- ...
3974	       ... 63  64  65  66  67  68  69  70  71  ...

3976	                 Figure 21  Interleaving: Receiver buffer

3978	   Again, an initial buffering delay of one picture interval is needed
3979	   to organize pictures from decoding order to output order, as depicted
3980	   below:

3982	        ... N58 N59 I00 N01 N02 R03 N04 N05 ...
3983	        ... -|---|---|---|---|---|---|---|- ...
3984	        ... 64  65  66  67  68  69  70  71  ...

3986	         Figure 22  Interleaving: Receiver buffer after reordering

3988	   Note that the maximum delay that IDR pictures can undergo during
3989	   transmission, including possible application, transport, or link
3990	   layer retransmission, is equal to three picture intervals.  Thus, the
3991	   loss resiliency of IDR pictures is improved in systems supporting
3992	   retransmission compared to the case in which pictures were
3993	   transmitted in their decoding order.

3995	13.4. Robust Transmission Scheduling of Redundant Coded Slices

3997	   A redundant coded picture is a coded representation of a picture or a
3998	   part of a picture that is not used in the decoding process if the
3999	   corresponding primary coded picture is correctly decoded.  There
4000	   should be no noticeable difference between any area of the decoded
4001	   primary picture and a corresponding area that would result from
4002	   application of the H.264 decoding process for any redundant picture
4003	   in the same access unit.  A redundant coded slice is a coded slice
4004	   that is a part of a redundant coded picture.

4006	   Redundant coded pictures can be used to provide unequal error
4007	   protection in error-prone video transmission.  If a primary coded
4008	   representation of a picture is decoded incorrectly, a corresponding
4009	   redundant coded picture can be decoded.  Examples of applications and
4010	   coding techniques using the redundant codec picture feature include
4011	   the video redundancy coding [22] and the protection of "key pictures"
4012	   in multicast streaming [23].

4014	   One property of many error-prone video communications systems is that
4015	   transmission errors are often bursty.  Therefore, they may affect
4016	   more than one consecutive transmission packets in transmission order.
4017	   In low bit-rate video communication, it is relatively common that an
4018	   entire coded picture can be encapsulated into one transmission
4019	   packet.  Consequently, a primary coded picture and the corresponding
4020	   redundant coded pictures may be transmitted in consecutive packets in
4021	   transmission order.  To make the transmission scheme more tolerant of
4022	   bursty transmission errors, it is beneficial to transmit the primary
4023	   coded picture and redundant coded picture separated by more than a
4024	   single packet.  The DON concept enables this.

4026	13.5. Remarks on Other Design Possibilities

4028	   The slice header syntax structure of the H.264 coding standard
4029	   contains the frame_num syntax element that can indicate the decoding
4030	   order of coded frames.  However, the usage of the frame_num syntax
4031	   element is not feasible or desirable to recover the decoding order,
4032	   due to the following reasons:

4034	   o  The receiver is required to parse at least one slice header per
4035	      coded picture (before passing the coded data to the decoder).

4037	   o  Coded slices from multiple coded video sequences cannot be
4038	      interleaved, as the frame number syntax element is reset to 0 in
4039	      each IDR picture.

4041	   o  The coded fields of a complementary field pair share the same
4042	      value of the frame_num syntax element.  Thus, the decoding order
4043	      of the coded fields of a complementary field pair cannot be
4044	      recovered based on the frame_num syntax element or any other
4045	      syntax element of the H.264 coding syntax.

4047	   The RTP payload format for transport of MPEG-4 elementary streams
4048	   [24] enables interleaving of access units and transmission of
4049	   multiple access units in the same RTP packet.  An access unit is
4050	   specified in the H.264 coding standard to comprise all NAL units
4051	   associated with a primary coded picture according to subclause
4052	   7.4.1.2 of [1].  Consequently, slices of different pictures cannot be
4053	   interleaved, and the multi-picture slice interleaving technique (see
4054	   section 12.6) for improved error resilience cannot be used.

4056	14. Acknowledgements

4058	   Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus
4059	   Westerlund, and David Singer are thanked as the authors of RFC 3984.
4060	   Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan,
4061	   Joerg Ott, and Colin Perkins are thanked for careful review during
4062	   the development of RFC 3984. Randell Jesup, Stephen Botzko, and
4063	   Magnus Westerlund are thanked for their valuable comments during the
4064	   development of this RFC.

4066	   This document was prepared using 2-Word-v2.0.template.dot.

4068	15. References

4070	15.1. Normative References

4072	   [1]   ITU-T Recommendation H.264, "Advanced video coding for generic
4073	         audiovisual services", November 2007. [Ed. (YkW): This should
4074	         be updated after a later version is approved.]

4076	   [2]   ISO/IEC International Standard 14496-10:2008.

4078	   [3]   ITU-T Recommendation H.241, "Extended video procedures and
4079	         control signals for H.300 series terminals", May 2006.

4081	   [4]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
4082	         Levels", BCP 14, RFC 2119, March 1997.

4084	   [5]   Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
4085	         "RTP: A Transport Protocol for Real-Time Applications", STD 64,
4086	         RFC 3550, July 2003.

4088	   [6]   Handley, M. and V. Jacobson, "SDP: Session Description
4089	         Protocol", RFC 2327, April 1998.

4091	   [7]   Josefsson, S., "The Base16, Base32, and Base64 Data Encodings",
4092	         RFC 3548, July 2003.

4094	   [8]   Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
4095	         Session Description Protocol (SDP)", RFC 3264, June 2002.

4097	15.2. Informative References

4099	   [9]   Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special
4100	         Issue on H.264/AVC. IEEE Transactions on Circuits and Systems
4101	         on Video Technology, July 2003.

4103	   [10]  Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C.,
4104	         Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP
4105	         Payload Format for the 1998 Version of ITU-T Rec. H.263 Video
4106	         (H.263+)", RFC 2429, October 1998.

4108	   [11]  ISO/IEC IS 14496-2.

4110	   [12]  Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and
4111	         Systems for Video technology, Vol. 13, No. 7, July 2003.

4113	   [13]  Wenger, S., "H.26L over IP: The IP Network Adaptation Layer",
4114	         Proceedings Packet Video Workshop 02, April 2002.

4116	   [14]  Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT
4117	         Coding Network Abstraction Layer and IP-based Transport" in
4118	         Proc. ICIP 2002, Rochester, NY, September 2002.

4120	   [15]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video
4121	         Conferences with Minimal Control", STD 65, RFC 3551, July 2003.

4123	   [16]  ITU-T Recommendation H.223, "Multiplexing protocol for low bit
4124	         rate multimedia communication", July 2001.

4126	   [17]  Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for
4127	         Generic Forward Error Correction", RFC 2733, December 1999.

4129	   [18]  Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier,
4130	         "Video Coding and Transport Layer Techniques for H.264/AVC-
4131	         Based Transmission over Packet-Lossy Networks", IEEE
4132	         International Conference on Image Processing (ICIP 2003),
4133	         Barcelona, Spain, September 2003.

4135	   [19]  Varsa, V. and M. Karczewicz, "Slice interleaving in compressed
4136	         video packetization", Packet Video Workshop 2000.

4138	   [20]  Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for
4139	         wireless video streaming," International Packet Video Workshop
4140	         2002.

4142	   [21]  Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042,
4143	         available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT-
4144	         B042.doc, anuary 2002.

4146	   [22]  Wenger, S., "Video Redundancy Coding in H.263+", 1997
4147	         International Workshop on Audio-Visual Services over Packet
4148	         Networks, September 1997.

4150	   [23]  Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient
4151	         Video Coding Using Unequally Protected Key Pictures", in Proc.
4152	         International Workshop VLBV03, September 2003.

4154	   [24]  van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and
4155	         P. Gentric, "RTP Payload Format for Transport of MPEG-4
4156	         Elementary Streams", RFC 3640, November 2003.

4158	   [25]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
4159	         Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
4160	         3711, March 2004.

4162	   [26]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
4163	         Protocol (RTSP)", RFC 2326, April 1998.

4165	   [27]  Handley, M., Perkins, C., and E. Whelan, "Session Announcement
4166	         Protocol", RFC 2974, October 2000.

4168	   [28]  Westerlund, M. and Wenger, S., "RTP Topologies", RFC 5117,
4169	         January 2008.

4171	   [29]  Wenger, S., Chandra, U., and Westerlund, M., "Codec Control
4172	         Messages in the RTP Audio-Visual Profile with Feedback (AVPF)",
4173	         RFC 5104, February 2008.

4175	Authors' Addresses

4177	   Ye-Kui Wang
4178	   Nokia Research Center
4179	   P.O. Box 1000
4180	   33721 Tampere
4181	   Finland

4183	   Phone: +358-50-466-7004
4184	   EMail: ye-kui.wang@nokia.com

4186	   Roni Even
4187	   14 David Hamelech
4188	   Tel Aviv 64953
4189	   Israel

4191	   Phone: +972-545481099
4192	   Email:ron.even.tlv@gmail.com

4194	   Tom Kristensen
4195	   TANDBERG
4196	   Philip Pedersens vei 22
4197	   N-1366 Lysaker
4198	   Norway

4200	   Phone: +47 67125125
4201	   Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no

4203	Intellectual Property Statement

4205	   The IETF takes no position regarding the validity or scope of any
4206	   Intellectual Property Rights or other rights that might be claimed to
4207	   pertain to the implementation or use of the technology described in
4208	   this document or the extent to which any license under such rights
4209	   might or might not be available; nor does it represent that it has
4210	   made any independent effort to identify any such rights.  Information
4211	   on the procedures with respect to rights in RFC documents can be
4212	   found in BCP 78 and BCP 79.

4214	   Copies of IPR disclosures made to the IETF Secretariat and any
4215	   assurances of licenses to be made available, or the result of an
4216	   attempt made to obtain a general license or permission for the use of
4217	   such proprietary rights by implementers or users of this
4218	   specification can be obtained from the IETF on-line IPR repository at
4219	   http://www.ietf.org/ipr.

4221	   The IETF invites any interested party to bring to its attention any
4222	   copyrights, patents or patent applications, or other proprietary
4223	   rights that may cover technology that may be required to implement
4224	   this standard.  Please address the information to the IETF at
4225	   ietf-ipr@ietf.org.

4227	Disclaimer of Validity

4229	   This document and the information contained herein are provided on an
4230	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
4231	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
4232	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
4233	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
4234	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
4235	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

4237	Copyright Statement

4239	   Copyright (C) The IETF Trust (2008).

4241	   This document is subject to the rights, licenses and restrictions
4242	   contained in BCP 78, and except as set forth therein, the authors
4243	   retain all their rights.

4245	Acknowledgement

4247	   Funding for the RFC Editor function is currently provided by the
4248	   Internet Society.

4250	16. Backward Compatibility to RFC 3984

4252	   The current document is a revision of RFC 3984 and intends to
4253	   obsolete it.  This section addresses the backward compatibility
4254	   issues.

4256	   The technical changes are listed in section 17.

4258	   Items 1), 2), 3), 7), 8), 9), 11), 12) are bug-fix type of changes,
4259	   and do not incur any backward compatibility issues.

4261	   Item 4), addition of six new media type parameters, does not incur
4262	   any backward compatibility issues for SDP Offer/Answer based
4263	   applications, as legacy RFC 3984 receivers ignore these parameters,
4264	   and it is fine for legacy RFC 3984 senders not to use these
4265	   parameters as they are optional.  However, there is a backward
4266	   compatibility issue for SDP declarative usage based applications,
4267	   e.g. those using RTSP and SAP, because the SDP receiver per RFC 3984
4268	   cannot accept a session for which the SDP includes an unrecognized
4269	   parameter.  Therefore, the RTSP or SAP server may have to prepare two
4270	   sets of streams, one for legacy RFC 3984 receivers and one for
4271	   receivers according to this memo.

4273	   Items 5), 6) and 10) are related to out-of-band transport of
4274	   parameter sets.  When a sender according to this memo is
4275	   communicating with a legacy receiver according to RFC 3984, there is
4276	   no backward compatibility issue. When the legacy receiver sees an SDP
4277	   message with no parameter-add the value of parameter-add is inferred
4278	   to be equal to 1 by the legacy receiver (related to change item 5)).
4279	   As RFC 3984 allows inclusion of any parameter sets in sprop-
4280	   parameter-sets, it is fine to the legacy receiver to include
4281	   parameter sets only for the default level in sprop-parameter-sets
4282	   (related to change item 6)).  When there are new parameters e.g.
4283	   sprop-level-parameter-sets present, the legacy receiver simply
4284	   ignores them (related to change item 10)).  When a legacy sender
4285	   according to RFC 3984 is communicating with a receiver according to
4286	   this memo, there is one backward compatibility issue.  When the
4287	   legacy sender includes parameter sets for a level different than the
4288	   default level indicated by profile-level-id to sprop-parameter-sets,
4289	   the parameter value of sprop-parameter-sets is invalid to the
4290	   receiver and therefore the session may be rejected.  In SDP
4291	   Offer/Answer between a legacy offerer according to RFC 3984 and an
4292	   answerer according to this memo, when the answerer includes in the
4293	   answer parameter sets that are not a superset of the parameter sets
4294	   included in the offer, the parameter value of sprop-parameter-sets is
4295	   invalid to offerer and the session may not be initiated properly
4296	   (related to change item 10)).

4298	   Item 13) removed that use of out-of-band transport of parameter sets
4299	   is recommended.  As out-of-band transport of parameter sets is still
4300	   allowed, this change does not incur any backward compatibility
4301	   issues.

4303	   Item 14) does not incur any backward compatibility issues as the
4304	   added subsection 8.5 is informative.

4306	17. Changes from RFC 3984

4308	   Following is the list of technical changes (including bug fixes) from
4309	   RFC 3984.  Besides this list of technical changes, numerous editorial
4310	   changes have been made, but not documented in this memo.

4312	   1) In subsections 5.4, 5.5, 6.2, 6,3 and 6.4, removed that the
4313	     packetization mode in use may be signaled by external means.

4315	   2) In subsection 7.2.2, changed the sentence

4317	      There are N VCL NAL units in the deinterleaving buffer.

4319	      to

4321	      There are N or more VCL NAL units in the de-interleaving buffer.

4323	   3) In subsection 8.1, the semantics of sprop-init-buf-time, paragraph
4324	     2, changed the sentence

4326	      The parameter is the maximum value of (transmission time of a NAL
4327	      unit - decoding time of the NAL unit), assuming reliable and
4328	      instantaneous transmission, the same timeline for transmission
4329	      and decoding, and that decoding starts when the first packet
4330	      arrives.

4332	      to

4334	      The parameter is the maximum value of (decoding time of the NAL
4335	      unit - transmission time of a NAL unit), assuming reliable and
4336	      instantaneous transmission, the same timeline for transmission
4337	      and decoding, and that decoding starts when the first packet
4338	      arrives.

4340	   4) Added six new media type parameters, namely max-smbps, sprop-
4341	     level-parameter-sets, use-level-parameter-sets, sprop-ssrc, sar-
4342	     understood and sar-supported.

4344	   5) In subsection 8.1, removed the specification of parameter-add.
4345	     Other descriptions of parameter-add (in subsections 8.2 and 8.4)
4346	     are also removed.

4348	   6) In subsection 8.1, added a constraint to sprop-parameter-sets such
4349	     that it can only contain parameter sets for the same profile and
4350	     level as indicated by profile-level-id.

4352	   7) In subsection 8.2.2, removed sprop-deint-buf-req from being part
4353	     of the media format configuration in usage with the SDP
4354	     Offer/Answer model.

4356	   8) In subsection 8.2.2, made it clear that level is downgradable in
4357	     the SDP Offer/Answer model, i.e. the use of the level part of
4358	     "profile-level-id" does not need to be symmetric (the level
4359	     included in the answer can be lower than or equal to the level
4360	     included in the offer).

4362	   9) In subsection 8.2.2, removed that the capability parameters may be
4363	     used to declare encoding capabilities.

4365	   10)In subsection 8.2.2, added rules on how to use sprop-parameter-
4366	     sets and sprop-level-parameter-sets for out-of-band transport of
4367	     parameter sets, with or without level downgrading.

4369	   11)In subsection 8.2.2, clarified the rules of using the media type
4370	     parameters with SDP Offer/Answer for multicast.

4372	   12)In subsection 8.2.2, completed and corrected the list of how
4373	     different media type parameters shall be interpreted in the
4374	     different combinations of offer or answer and direction attribute.

4376	   13)In subsection 8.4, changed the text such that both out-of-band and
4377	     in-band transport of parameter sets are allowed and neither is
4378	     recommended or required.

4380	   14)Added subsection 8.5 (informative) providing example methods for
4381	     decoder refresh to handle parameter set losses.

4383	18. Open issues

4385	   The issues remaining open are:

4387	   1) (From Randell) References to RFC 2733 should be updated to (and
4388	     checked against) RFC 5109.  There are a lot of calculations and
4389	     the like that should be checked.  Also update [17] to RFC 5109.