idnits 2.17.1 

draft-ietf-avt-rtp-vc1-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 14.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1432.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1403.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1410.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1416.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     This parameter MUST not be used with Simple and Main profiles. (For
     Main profile, the presence of B-pictures is indicated by the MAXBFRAMES
     field in STRUCT_C decoder initialization parameter.) For Advanced
     profile, if this parameter is not specified, a value of 1 MUST be assumed.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 2005) is 6737 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '11' is mentioned on line 805, but not defined

  == Missing Reference: '12' is mentioned on line 1286, but not defined

  == Missing Reference: '13' is mentioned on line 1286, but not defined

  == Missing Reference: '10' is mentioned on line 1323, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3548 (ref. '6') (Obsoleted by RFC 4648)

  ** Obsolete normative reference: RFC 3555 (ref. '7') (Obsoleted by RFC
     4855, RFC 4856)


     Summary: 6 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force
3	Internet Draft                                               A. Klemets
4	Document: draft-ietf-avt-rtp-vc1-02.txt                       Microsoft
5	Expires: May 2006                                         November 2005

7	                RTP Payload Format for Video Codec 1 (VC-1)

9	Status of this Memo

11	   By submitting this Internet-Draft, each author represents that any
12	   applicable patent or other IPR claims of which he or she is aware
13	   have been or will be disclosed, and any of which he or she becomes
14	   aware will be disclosed, in accordance with Section 6 of BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	Copyright Notice

34	      Copyright (C) The Internet Society (2005).

36	Abstract

38	   This memo specifies an RTP payload format for encapsulating Video
39	   Codec 1 (VC-1) compressed bit streams, as defined by the Society of
40	   Motion Picture and Television Engineers (SMPTE) standard, SMPTE 421M.
41	   SMPTE is the main standardizing body in the motion imaging industry
42	   and the SMPTE 421M standard defines a compressed video bit stream
43	   format and decoding process for television.

45	Table of Contents

47	   1. Introduction...................................................2
48	      1.1 Conventions used in this document..........................3
49	   2. Definitions and abbreviations..................................3
50	   3. Overview of VC-1...............................................5
51	      3.1 VC-1 bit stream layering model.............................5
52	      3.2 Bit-stream Data Units in Advanced profile..................6
53	      3.3 Decoder initialization parameters..........................6
54	      3.4 Ordering of frames.........................................7
55	   4. Encapsulation of VC-1 format bit streams in RTP................8
56	      4.1 Access Units...............................................8
57	      4.2 Fragmentation of VC-1 frames...............................9
58	      4.3 Time stamp considerations.................................10
59	      4.4 Random Access Points......................................11
60	      4.5 Removal of HRD parameters.................................11
61	      4.6 Repeating the Sequence Layer header.......................12
62	      4.7 Signaling of MIME format parameters.......................12
63	      4.8 MIME "mode=1" parameter...................................13
64	      4.9 MIME "mode=3" parameter...................................13
65	   5. RTP Payload Format syntax.....................................14
66	      5.1 RTP header usage..........................................14
67	      5.2 AU header syntax..........................................15
68	      5.3 AU Control field syntax...................................16
69	   6. RTP Payload format parameters.................................17
70	      6.1 Media Type Registration...................................17
71	      6.2 Mapping of MIME parameters to SDP.........................24
72	      6.3 Usage with the SDP Offer/Answer Model.....................25
73	      6.4 Usage in Declarative Session Descriptions.................27
74	   7. Security Considerations.......................................28
75	   8. IANA Considerations...........................................28
76	   9. References....................................................28
77	      9.1 Normative references......................................28
78	      9.2 Informative references....................................29

80	1. Introduction

82	   This memo specifies an RTP payload format for the video coding
83	   standard Video Codec 1, also known as VC-1.  The specification for
84	   the VC-1 bit stream format and decoding process is published by the
85	   Society of Motion Picture and Television Engineers (SMPTE) as SMPTE
86	   421M [1].

88	   VC-1 has a broad applicability, being suitable for low bit rate
89	   Internet streaming applications to HDTV broadcast and Digital Cinema
90	   applications with nearly lossless coding.  The overall performance of
91	   VC-1 is such that bit rate savings of more than 50% are reported [8],
92	   when compared against MPEG-2.  See [8] for further details about how
93	   VC-1 compares against other codecs, such as MPEG-4 and H.264/AVC.
94	   (In [8], VC-1 is referred to by its earlier name, VC-9.)

96	   VC-1 is widely used for downloading and streaming of movies on the
97	   Internet, in the form of Windows Media Video 9 (WMV-9) [8], because
98	   the WMV-9 codec is compliant with the VC-1 standard.  VC-1 has also
99	   recently been adopted as a mandatory compression format for the high-
100	   definition DVD formats HD DVD and Blu-ray.

102	   SMPTE 421M defines the VC-1 bit stream syntax and specifies
103	   constraints that must be met by VC-1 conformant bit streams.  SMPTE
104	   421M also specifies the complete process required to decode the bit
105	   stream.  However, it does not specify the VC-1 compression algorithm,
106	   thus allowing for different ways to implement a VC-1 encoder.

108	   The VC-1 bit stream syntax has three profiles. Each profile has
109	   specific bit stream syntax elements and algorithms associated with
110	   it.  Depending on the application in which VC-1 is used, some
111	   profiles may be more suitable than others.  For example, Simple
112	   profile is designed for low bit rate Internet streaming and for
113	   playback on devices that can only handle low complexity decoding.
114	   Advanced profile is designed for broadcast applications, such as
115	   digital TV, HD DVD or HDTV.  Advanced profile is the only VC-1
116	   profile that supports interlaced video frames and non-square pixels.

118	   Section 2 defines the abbreviations used in this document.  Section 3
119	   provides a more detailed overview of VC-1.  Sections 4 and 5 define
120	   the RTP payload format for VC-1, and section 6 defines the MIME and
121	   SDP parameters for VC-1.  See section 7 for security considerations.

123	1.1 Conventions used in this document

125	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
126	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
127	   document are to be interpreted as described in BCP 14, RFC 2119 [2].

129	2.
130	  Definitions and abbreviations

132	   This document uses the definitions in SMPTE 421M [1].  For
133	   convenience, the following terms from SMPTE 421M are restated here:

135	   B-picture: A picture that is coded using motion compensated
136	   prediction from past and/or future reference fields or frames.  A B-
137	   picture cannot be used for predicting any other picture.

139	   Bit-stream data unit (BDU): A unit of the compressed data which may
140	   be parsed (i.e., syntax decoded) independently of other information
141	   at the same hierarchical level.  A BDU can be, for example, a
142	   sequence layer header, an entry-point header, a frame, or a slice.

144	   Encapsulated BDU (EBDU): A BDU which has been encapsulated using the
145	   encapsulation mechanism described in Annex E of SMPTE 421M [1], to
146	   prevent emulation of the start code prefix in the bit stream.

148	   Entry-point: A point in the bit stream that offers random access.

150	   frame: A frame contains lines of spatial information of a video
151	   signal.  For progressive video, these lines contain samples starting
152	   from one time instant and continuing through successive lines to the
153	   bottom of the frame.  For interlaced video, a frame consists of two
154	   fields, a top field and a bottom field.  One of these fields will
155	   commence one field period later than the other.

157	   interlace: The property of frames where alternating lines of the
158	   frame represent different instances in time.  In an interlaced frame,
159	   one of the fields is meant to be displayed first.

161	   I-picture: A picture coded using information only from itself.

163	   level: A defined set of constraints on the values which may be taken
164	   by the parameters (such as bit rate and buffer size) within a
165	   particular profile.  A profile may contain one or more levels.

167	   P-picture: A picture that is coded using motion compensated
168	   prediction from past reference fields or frames.

170	   picture: For progressive video, a picture is identical to a frame,
171	   while for interlaced video, a picture may refer to a frame, or the
172	   top field or the bottom field of the frame depending on the context.

174	   profile: A defined subset of the syntax of VC-1, with a specific set
175	   of coding tools, algorithms, and syntax associated with it.  There
176	   are three VC-1 profiles: Simple, Main and Advanced.

178	   progressive: The property of frames where all the samples of the
179	   frame represent the same instance in time.

181	   random access: A random access point in the bit stream is defined by
182	   the following guarantee: If decoding begins at this point, all frames
183	   needed for display after this point will have no decoding dependency
184	   on any data preceding this point, and are also present in the
185	   decoding sequence after this point.  A random access point is also
186	   called an entry-point.

188	   sequence: A coded representation of a series of one or more pictures.
189	   In VC-1 Advanced profile, a sequence consists of a series of one or
190	   more entry-point segments, where each entry-point segment consists of
191	   a series of one or more pictures, and where the first picture in each
192	   entry-point segment provides random access.  In VC-1 Simple and Main
193	   profiles, the first picture in each sequence is an I-picture.

195	   slice: A consecutive series of macroblock rows in a picture, which
196	   are encoded as a single unit.

198	   start codes (SC): 32-bit codes embedded in that coded bit stream that
199	   are unique, and identify the beginning of a BDU.  Start codes consist
200	   of a unique three-byte Start Code Prefix (SCP), and a one-byte Start
201	   Code Suffix (SCS).

203	3. Overview of VC-1

205	   The VC-1 bit stream syntax consists of three profiles: Simple, Main,
206	   and Advanced.  Simple and Main profiles are designed for relatively
207	   low bit rate applications.  For example, the maximum bit rate
208	   supported by Simple profile is 384 kbps.  Certain features that can
209	   be used to achieve high compression efficiency, such as non-square
210	   pixels and support for interlaced pictures, are only included in
211	   Advanced profile.

213	   The maximum bit rate supported by the Advanced profile is 135 Mbps,
214	   making it suitable for nearly lossless encoding of HDTV signals.
215	   Only Advanced profile supports carrying user-data (meta-data) in-band
216	   with the compressed bit stream.  The user-data can be used for closed
217	   captioning support, for example.

219	   Of the three profiles, only Advanced profile allows codec
220	   configuration parameters, such as the picture aspect ratio, to be
221	   changed through in-band signaling in the compressed bit stream.

223	   For each of the profiles, a certain number of "levels" have been
224	   defined.  Unlike a "profile", which implies a certain set of features
225	   or syntax elements, a "level" is a set of constraints on the values
226	   of parameters in a profile, such as the bit rate or buffer size.  VC-
227	   1 Simple profile has two levels, Main profile has three, and Advanced
228	   profile has five levels.  See Annex D of SMPTE 421M [1] for a
229	   detailed list of the profiles and levels.

231	3.1 VC-1 bit stream layering model

233	   The VC-1 bit stream is defined as a hierarchy of layers.  This is
234	   conceptually similar to the notion of a protocol stack of networking
235	   protocols.  The outermost layer is called the sequence layer.  The
236	   other layers are entry-point, picture, slice, macroblock and block.

238	   In Simple and Main profiles, a sequence in the sequence layer
239	   consists of a series of one or more coded pictures.  In Advanced
240	   profile, a sequence consists of one or more entry-point segments,
241	   where each entry-point segment consists of a series of one or more
242	   pictures, and where the first picture in each entry-point segment
243	   provides random access.  A picture is decomposed into macroblocks.  A
244	   slice comprises one or more contiguous rows of macroblocks.

246	   The entry-point and slice layers are only present in Advanced
247	   profile.  In Advanced profile, the start of each entry-point layer
248	   segment indicates a random access point.  In Simple and Main profiles
249	   each I-picture is a random access point.

251	   Each picture can be coded as an I-picture, P-picture, skipped
252	   picture, or as a B-picture.  These terms are defined in section 2 of
253	   this document and in section 4.12 of SMPTE 421M [1].

255	3.2 Bit-stream Data Units in Advanced profile

257	   In Advanced profile only, each picture and slice is byte-aligned and
258	   is considered a Bit-stream Data Unit (BDU).  A BDU is defined as a
259	   unit that can be parsed (i.e., syntax decoded) independently of other
260	   information in the same layer.

262	   The beginning of a BDU is signaled by an identifier called Start Code
263	   (SC).  Sequence layer headers and entry-point headers are also BDUs
264	   and thus can be easily identified by their Start Codes.  See Annex E
265	   of SMPTE 421M [1] for a complete list of Start Codes.  Note that
266	   blocks and macroblocks are not BDUs and thus do not have a Start Code
267	   and are not necessarily byte-aligned.

269	   The Start Code consists of four bytes.  The first three bytes are
270	   0x00, 0x00 and 0x01. The fourth byte is called the Start Code Suffix
271	   (SCS) and it is used to indicate the type of BDU that follows the
272	   Start Code.  For example, the SCS of a sequence layer header (0x0F)
273	   is different from the SCS of an entry-point header (0x0E).  The Start
274	   Code is always byte-aligned and is transmitted in network byte order.

276	   To prevent accidental emulation of the Start Code in the coded bit
277	   stream, SMPTE 421M defines an encapsulation mechanism that uses byte
278	   stuffing.  A BDU which has been encapsulated by this mechanism is
279	   referred to as an Encapsulated BDU, or EBDU.

281	3.3 Decoder initialization parameters

283	   In VC-1 Advanced profile, the sequence layer header contains
284	   parameters that are necessary to initialize the VC-1 decoder.

286	   A sequence layer header is not defined for VC-1 Simple and Main
287	   profiles.  For these profiles, decoder initialization parameters MUST
288	   be conveyed out-of-band from the coded bit stream.  Section 4.7
289	   specifies how the parameters are conveyed by this RTP payload format.

291	   For Advanced profile, the parameters in the sequence layer header
292	   apply to all entry-point segments until the next occurrence of a
293	   sequence layer header in the coded bit stream.

295	   The parameters in the sequence layer header include the Advanced
296	   profile level, the dimensions of the coded pictures, the aspect
297	   ratio, interlace information, the frame rate and up to 31 leaky
298	   bucket parameter sets for the Hypothetical Reference Decoder (HRD).

300	   Section 6.1 of SMPTE 421M [1] provides the formal specification of
301	   the sequence layer header.

303	   Each leaky bucket parameter set for the HRD specifies a peak
304	   transmission bit rate and a decoder buffer capacity.  The coded bit
305	   stream is restricted by these parameters.  The HRD model does not
306	   mandate buffering by the decoder.  Its purpose is to limit the
307	   encoder's bit rate fluctuations according to a basic buffering model,
308	   so that the resources necessary to decode the bit stream are
309	   predictable.  The HRD has a constant-delay mode and a variable-delay
310	   mode.  The constant-delay mode is appropriate for broadcast and
311	   streaming applications, while the variable-delay mode is designed for
312	   video conferencing applications.

314	   Annex C of SMPTE 421M [1] specifies the usage of the hypothetical
315	   reference decoder for VC-1 bit streams.  A general description of the
316	   theory of the HRD can be found in [9].

318	   The concept of an entry-point layer applies only to VC-1 Advanced
319	   profile.  The presence of an entry-point header indicates a random
320	   access point within the bit stream.  The entry-point header specifies
321	   current buffer fullness values for the leaky buckets in the HRD.  The
322	   header also specifies coding control parameters that are in effect
323	   until the occurrence of the next entry-point header in the bit
324	   stream.  See Section 6.2 of SMPTE 421M [1] for the formal
325	   specification of the entry-point header.

327	3.4 Ordering of frames

329	   Frames are transmitted in the same order in which they are captured,
330	   except if B-pictures are present in the coded bit stream.  In the
331	   latter case, the frames are transmitted such that the frames that the
332	   B-pictures depend on are transmitted first.  This is referred to as
333	   the coded order of the frames.

335	   The rules for how a decoder converts frames from the coded order to
336	   the display order are stated in section 5.4 of SMPTE 421M [1].  In
337	   short, if B-pictures may be present in the coded bit stream, a
338	   hypothetical decoder implementation needs to buffer one additional
339	   decoded frame.  When an I-frame or a P-frame is received, the frame
340	   can be decoded immediately but it is not displayed until the next I-
341	   or P-frame is received.  However, B-frames are displayed immediately.

343	   Figure 1 illustrates the timing relationship between the capture of
344	   frames, their coded order, and the display order of the decoded
345	   frames, when B-pictures are present in the coded bit stream.  The
346	   figure shows that the display of frame P4 is delayed until frame P7
347	   is received, while frames B2 and B3 are displayed immediately.

349	   Capture:        |I0  P1  B2  B3  P4  B5  B6  P7  B8  B9  ...
350	                   |
351	   Coded order:    |        I0  P1  P4  B2  B3  P7  B5  B6  ...
352	                   |
353	   Display order:  |            I0  P1  B2  B3  P4  B5  B6  ...
354	                   |
355	                   |+---+---+---+---+---+---+---+---+---+--> time
356	                    0   1   2   3   4   5   6   7   8   9

358	   Figure 1.  Frame reordering when B-pictures are present.

360	   If B-pictures are not present, the coded order and the display order
361	   are identical, and frames can then be displayed without additional
362	   delay shown in Figure 1.

364	4. Encapsulation of VC-1 format bit streams in RTP

366	4.1 Access Units

368	   Each RTP packet contains an integral number of application data units
369	   (ADUs).  For VC-1 format bit streams, an ADU is equivalent to one
370	   Access Unit (AU).  An Access Unit is defined as the AU header
371	   (defined in section 5.2) followed by a variable length payload, with
372	   the rules and constraints described in sections 4.1 and 4.2.  Figure
373	   2 shows the layout of an RTP packet with multiple AUs.

375	   +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+
376	   | RTP     | AU(1) | AU(2) |     | AU(n) |
377	   | Header  |       |       |     |       |
378	   +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+

380	   Figure 2.  RTP packet structure.

382	   Each Access Unit MUST start with the AU header defined in section
383	   5.2.  The AU payload MUST contain data belonging to exactly one VC-1
384	   frame.  This means that data from different VC-1 frames will always
385	   be in different AUs, however, it possible for a single VC-1 frame to
386	   be fragmented across multiple AUs (see section 4.2.)

388	   The following rules apply to the contents of each AU payload when VC-
389	   1 Advanced profile is used:

391	   - The AU payload MUST contain VC-1 bit stream data in EBDU format
392	     (i.e., the bit stream must use the byte-stuffing encapsulation
393	     mode defined in Annex E of SMPTE 421M [1].)

395	   - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer
396	     header, an entry-point header, a picture header and multiple
397	     slices and the associated user-data.  (However, all slices and
398	     their corresponding macroblocks MUST belong to the same video
399	     frame.)

401	   - The AU payload MUST start at an EBDU boundary, except when the AU
402	     payload contains a fragmented frame, in which case the rules in
403	     section 4.2 apply.

405	   When VC-1 Simple or Main profiles are used, the AU payload MUST start
406	   with a picture header, except when the AU payload contains a
407	   fragmented frame.  Section 4.2 describes how to handle fragmented
408	   frames.

410	   Access Units MUST be byte-aligned.  If the data in an AU (EBDUs in
411	   the case of Advanced profile and frame in the case of Simple and
412	   Main) does not end at an octet boundary, up to 7 zero-valued padding
413	   bits MUST be added to achieve octet-alignment.

415	4.2 Fragmentation of VC-1 frames

417	   Each AU payload SHOULD contain a complete VC-1 frame.  However, if
418	   this would cause the RTP packet to exceed the MTU size, the frame
419	   SHOULD be fragmented into multiple AUs to avoid IP-level
420	   fragmentation.  When an AU contains a fragmented frame, this MUST be
421	   indicated by setting the FRAG field in the AU header as defined in
422	   section 5.3.

424	   AU payloads that do not contain a fragmented frame, or that contain
425	   the first fragment of a frame, MUST start at an EBDU boundary if
426	   Advanced profile is used.  In this case, for Simple and Main
427	   profiles, the AU payload MUST begin with the start of a picture
428	   header.

430	   If Advanced profile is used, AU payloads that contain a fragment of a
431	   frame other than the first fragment, SHOULD start at an EBDU
432	   boundary, such as at the start of a slice.

434	   However, slices are only defined for Advanced profile, and are not
435	   always used.  Blocks and macroblocks are not BDUs (have no Start
436	   Code) and are not byte-aligned.  Therefore, it may not always be
437	   possible to continue a fragmented frame at an EBDU boundary.

439	   In the case of Simple and Main profiles, since the blocks and
440	   macroblocks are not byte-aligned, the fragmentation boundary may be
441	   chosen arbitrarily.

443	   If an RTP packet contains an AU with the last fragment of a frame,
444	   additional AUs SHOULD NOT be included in the RTP packet.

446	   If the PTS Delta field in the AU header is present, each fragment of
447	   a frame MUST have the same presentation time.  If the DTS Delta field
448	   in the AU header is present, each fragment of a frame MUST have the
449	   same decode time.

451	4.3 Time stamp considerations

453	   Video frames MUST be transmitted in the coded order.  Coded order
454	   implies that no frames are dependent on subsequent frames, as
455	   discussed in section 3.4.  The RTP timestamp field MUST be set to the
456	   presentation time of the video frame contained in the first AU in the
457	   RTP packet.  The presentation time can be used as the timestamp field
458	   in the RTP header because it differs from the sampling instant of the
459	   frame only by an arbitrary constant offset.

461	   Each AU header MAY specify the decode time of video frame contained
462	   in the AU.  If B-pictures will not be present in the coded bit
463	   stream, then the decode time of a frame MUST be equal to the
464	   presentation time of the frame.

466	   If B-pictures may be present in the coded bit stream, then the decode
467	   time of non-B frames MUST be equal to the presentation time of the
468	   previous non-B frame in the coded order.  The decode time of B-frames
469	   MUST be equal to the presentation time of the B-frame.

471	   As an example, consider Figure 1 in section 3.4.  The decode time of
472	   non-B frame P4 is 4 time units, which is equal to the presentation
473	   time of the previous non-B frame in the coded order, which is P1.  On
474	   the other hand, the decode time of B-frame B2 is 5 time units, which
475	   is identical to its presentation time.

477	   Knowing if the stream will contain B-pictures may help the receiver
478	   allocate resources more efficiently and can reduce delay, as an
479	   absence of B-pictures in the stream implies that no reordering
480	   of frames will be needed between the decoding process and the display
481	   of the decoded frames.  This may be important for interactive
482	   applications.

484	   The receiver MUST assume that the coded bit stream may contain B-
485	   pictures in the following cases:

487	   - Advanced profile: If the value of the "bpic" MIME parameter
488	     defined in section 6.1 is 1, or if the "bpic" parameter is not
489	     specified.

491	   - Main profile: If the MAXBFRAMES field in STRUCT_C decoder
492	     initialization parameter has a non-zero value.  STRUCT_C is
493	     conveyed in the MIME "config" parameter, which is defined in
494	     section 6.1.

496	   Simple profile does not use B-pictures.

498	4.4 Random Access Points

500	   The entry-point header contains information that is needed by the
501	   decoder to decode the frames in that entry-point segment.  This means
502	   that in the event of lost RTP packets the decoder may be unable to
503	   decode frames until the next entry-point header is received.

505	   The first frame after an entry-point header is a random access points
506	   into the coded bit stream.  Simple and Main profiles do not have
507	   entry-point headers, so for those profiles each I-picture is a random
508	   access point.

510	   To allow the RTP receiver to detect that an RTP packet which was lost
511	   contained a random access point, this RTP payload format defines a
512	   field called "RA Count".  This field is present in every AU, and its
513	   value is incremented (modulo 256) for every random access point.  For
514	   additional details, see the definition of "RA Count" in section 5.2.

516	   To make it easy to determine if a AU contains a random access point,
517	   this RTP payload format also defines a bit called the "RA" flag in
518	   the AU Control field.  This bit is set to 1 only on those AU's that
519	   contain a random access point.  The RA bit is defined in section 5.3.

521	4.5 Removal of HRD parameters

523	   The sequence layer header of Advanced profile may include up to 31
524	   leaky bucket parameter sets for the Hypothetical Reference Decoder
525	   (HRD).  Each leaky bucket parameter set specifies a possible peak
526	   transmission bit rate (HDR_RATE) and a decoder buffer capacity
527	   (HRD_BUFFER).  (See section 3.3 for additional discussion about the
528	   HRD.)

530	   If the actual peak transmission rate is known by the RTP sender, the
531	   RTP sender MAY remove all leaky bucket parameter sets except for the
532	   one corresponding to the actual peak transmission rate.

534	   For each leaky bucket parameter set in the sequence layer header,
535	   there is also parameter in the entry-point header that specifies the
536	   initial fullness (HRD_FULL) of the leaky bucket.

538	   If the RTP sender has removed any leaky bucket parameter sets from
539	   the sequence layer header, then for any removed leaky bucket
540	   parameter set, it MUST also remove the corresponding HRD_FULL
541	   parameter in the entry-point header.

543	   Removing leaky bucket parameter sets, as described above, may
544	   significantly reduce the size of the sequence layer headers and the
545	   entry-point headers.

547	4.6 Repeating the Sequence Layer header

549	   To improve robustness against loss of RTP packets, it is RECOMMENDED
550	   that if the sequence layer header changes, it should be repeated
551	   frequently in the bit stream.  In this is case, it is RECOMMENDED
552	   that the number of leaky bucket parameters in the sequence layer
553	   header and the entry point headers be reduced to one, as described in
554	   section 4.5.  This will help reduce the overhead caused by repeating
555	   the sequence layer header.

557	   Note that any data in the VC-1 bit stream, including repeated copies
558	   of the sequence header itself, must be accounted for when computing
559	   the leaky bucket parameter for the HRD.  (See section 3.3 for a
560	   discussion about the HRD.)

562	   Note that if the value of TFCNTRFLAG in the sequence layer header is
563	   1, each picture header contains a frame counter field (TFCNTR).  Each
564	   time the sequence layer header is inserted in the bit stream, the
565	   value of this counter MUST be reset.

567	   To allow the RTP receiver to detect that an RTP packet which was lost
568	   contained a new sequence layer header, the AU Control field defines a
569	   bit called the "SL" flag.  This bit is toggled when a sequence layer
570	   header is transmitted, but only if that header is different from the
571	   most recently transmitted sequence layer header.  The SL bit is
572	   defined in section 5.3.

574	4.7 Signaling of MIME format parameters

576	   When this RTP payload format is used with SDP, the decoder
577	   initialization parameters described in section 3.3 MUST be signaled
578	   in SDP using the MIME parameters specified in section 6.1.  Section
579	   6.2 specifies how to map the MIME parameters to SDP.

581	   When Advanced profile is used, the decoder initialization parameters
582	   MAY be changed by inserting a new sequence layer header or an entry-
583	   point header in the coded bit stream.

585	   When Simple or Main profiles are used, it is not possible to change
586	   the decoder initialization parameters through the coded bit stream
587	   itself.  Any changes to the decoder initialization parameters would
588	   have to be done through out-of-band means, e.g., by updating the SDP
589	   [5].

591	   Note that the sequence layer header specifies the encoding level, the
592	   maximum size of the coded pictures and possibly also the maximum
593	   frame rate.  Thus, if the sequence layer header changes, the new
594	   header supersedes the values of the MIME parameters "level", "width",
595	   "height" and "framerate".

597	4.8 MIME "mode=1" parameter

599	   In certain applications using Advanced profile, the sequence layer
600	   header never changes.  This MAY be signaled with the MIME parameter
601	   "mode=1". (The "mode" parameter is defined in section 6.1.)  The
602	   "mode=1" parameter serves as a "hint" to the RTP receiver that all
603	   sequence layer headers in the bit stream will be identical.  If
604	   "mode=1" is signaled and a sequence layer header is present in the
605	   coded bit stream, then it MUST be identical to the sequence layer
606	   header specified by the MIME "config" parameter.

608	   Since the sequence layer header never changes in "mode=1", the RTP
609	   sender MAY remove it from the bit stream.  Note, however, that if
610	   that if the value of TFCNTRFLAG in the sequence layer header is 1,
611	   each picture header contains a frame counter field (TFCNTR).  This
612	   field is reset each time the sequence layer header occurs in the bit
613	   stream.  If the RTP sender chooses to remove the sequence layer
614	   header, then it MUST ensure that the resulting bit stream is still
615	   compliant with the VC-1 specification (e.g., by adjusting the TFCNTR
616	   field, if necessary.)

618	4.9 MIME "mode=3" parameter

620	   In certain applications using Advanced profile, both the sequence
621	   layer header and the entry-point header never change.  This MAY be
622	   signaled with the MIME parameter "mode=3".  The same rules apply to
623	   "mode=3" as for "mode=1", described in section 4.8.  Additionally, if
624	   "mode=3" is signaled, then the RTP sender MAY "compress" the coded
625	   bit stream by not including sequence layer headers and entry-point
626	   headers in the RTP packets.

628	   The RTP receiver MUST "decompress" the coded bit stream by re-
629	   inserting the entry-point headers prior to delivering the coded bit
630	   stream to the VC-1 decoder.  The sequence layer header does not need
631	   to be decompressed by the receiver, since it never changes.

633	   If "mode=3" is signaled and the RTP receiver receives a complete AU
634	   or the first fragment of an AU, and the RA bit is set to 1 but the AU
635	   does not begin with an entry-point header, then this indicates that
636	   entry-point header has been "compressed".  In that case, the RTP
637	   receiver MUST insert an entry-point header at the beginning of the
638	   AU.  When inserting the entry-point header, the RTP receiver MUST use
639	   the one that was specified by the MIME "config" parameter.

641	5. RTP Payload Format syntax

643	5.1 RTP header usage

645	   The format of the RTP header is specified in RFC 3550 [3] and is
646	   reprinted in Figure 3 for convenience.

648	      0                   1                   2                   3
649	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
650	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
651	     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
652	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
653	     |                           timestamp                           |
654	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
655	     |           synchronization source (SSRC) identifier            |
656	     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
657	     |            contributing source (CSRC) identifiers             |
658	     |                             ....                              |
659	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

661	     Figure 3.  RTP header according to RFC 3550

663	   The fields of the fixed RTP header have their usual meaning, which is
664	   defined in RFC 3550 and by the RTP profile in use, with the following
665	   additional notes:

667	   Marker bit (M): 1 bit
668	           This bit is set to 1 if the RTP packet contains an Access
669	           Unit containing a complete VC-1 frame, or the last fragment
670	           of a VC-1 frame.

672	   Payload type (PT): 7 bits
673	           This document does not assign an RTP payload type for this
674	           RTP payload format. The assignment of a payload type has to
675	           be performed either through the RTP profile used or in a
676	           dynamic way.

678	   Sequence Number: 16 bits
679	           The RTP receiver can use the sequence number field to recover
680	           the coded order of the VC-1 frames.  (A typical VC-1 decoder
681	           will require the VC-1 frames to be delivered in coded order.)
682	           When VC-1 frames have been fragmented across RTP packets, the
683	           RTP receiver can use the sequence number field to ensure that
684	           no fragment is missing.

686	   Timestamp: 32 bits
687	           The RTP timestamp is set to the presentation time of the VC-1
688	           frame in the first Access Unit.
689	           A clock rate of 90 kHz, or higher, MUST be used.

691	5.2 AU header syntax

693	   The Access Unit header consists of a one-byte AU Control field, the
694	   RA Count field and 3 optional fields.  All fields MUST be written in
695	   network byte order.  The structure of the AU header is illustrated in
696	   Figure 4.

698	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
699	   |AU     | RA    |  AUP  | PTS   | DTS   |
700	   |Control| Count |  Len  | Delta | Delta |
701	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

703	   Figure 4.  Structure of AU header.

705	   AU Control: 8 bits
706	           The usage of the AU Control field is defined in section 5.3.

708	   RA Count: 8 bits
709	           Random Access Point Counter.  This field is a binary modulo
710	           256 counter.  The value of this field, MUST be incremented by
711	           1, each time an AU is transmitted where the RA bit in the AU
712	           Control field is set to 1.  The initial value of this field
713	           is undefined and MAY be chosen randomly.

715	   AUP Len: 16 bits
716	           Access Unit Payload Length.  Specifies the size, in bytes, of
717	           the payload of the Access Unit.  The field does not include
718	           the size of the AU header itself.  The field MUST be included
719	           in each AU header in an RTP packet, except for the last AU
720	           header in the packet.

722	   PTS Delta: 32 bits
723	           Presentation time delta.  Specifies the presentation time of
724	           the frame as a 2's complement offset (delta) from the
725	           timestamp field in the RTP header of this RTP packet.  The
726	           PTS Delta field MUST use the same clock rate as the timestamp
727	           field in the RTP header.
728	           This field SHOULD NOT be included in the first AU header in
729	           the RTP packet, because the RTP timestamp field specifies the
730	           presentation time of the frame in the first AU.

732	   DTS Delta: 32 bits
733	           Decode time delta.  Specifies the decode time of the frame as
734	           a 2's complement offset (delta) between the presentation time
735	           and the decode time.  Note that if the presentation time is
736	           larger than the decode time, this results in a value for the
737	           DTS Delta field that is greater than zero.  The DTS Delta
738	           field MUST use the same clock rate as the timestamp field in
739	           the RTP header.

741	5.3 AU Control field syntax

743	   The structure of the 8-bit AU Control field is shown in Figure 5.

745	     0    1    2    3    4    5    6    7
746	   +----+----+----+----+----+----+----+----+
747	   |  FRAG   | RA | SL | LP | PT | DT | R  |
748	   +----+----+----+----+----+----+----+----+

750	   Figure 5.  Syntax of AU Control field.

752	   FRAG: 2 bits
753	           Fragmentation Information.  This field indicates if the AU
754	           payload contains a complete frame or a fragment of a frame.
755	           It MUST be set as follows:
756	           0: The AU payload contains a fragment of a frame other than
757	           the first or last fragment.
758	           1: The AU payload contains the first fragment of a frame.
759	           2: The AU payload contains the last fragment of a frame.
760	           3: The AU payload contains a complete frame (not fragmented.)

762	   RA: 1 bit
763	           Random Access Point indicator.  This bit MUST be set to 1 if
764	           the AU contains a frame that is a random access point.  In
765	           the case of Simple and Main profiles, any I-picture is a
766	           random access point.
767	           In the case of Advanced profile, the first frame after an
768	           entry-point header is a random access point.
769	           Note that if entry-point headers are not transmitted at every
770	           random access point, this MUST be indicated using the MIME
771	           parameter "mode=3".

773	   SL: 1 bit
774	           Sequence Layer Counter.  This bit MUST be toggled, i.e.,
775	           changed from 0 to 1 or from 1 to 0, if the AU contains a
776	           sequence layer header and if it is different from the most
777	           recently transmitted sequence layer header.  Otherwise, the
778	           value of this bit must be identical to the value of the SL
779	           bit in the previous AU.
780	           The initial value of this bit is undefined and MAY be chosen
781	           randomly.
782	           The bit MUST be 0 for Simple and Main profile bit streams or
783	           if the sequence layer header never changes.

785	   LP: 1 bit
786	           Length Present.  This bit MUST be set to 1 if the AU header
787	           includes the AUP Len field.

789	   PT: 1 bit
790	           PTS Delta Present.  This bit MUST be set to 1 if the AU
791	           header includes the PTS Delta field.

793	   DT: 1 bit
794	           DTS Delta Present.  This bit MUST be set to 1 if the AU
795	           header includes the DTS Delta field.

797	   R: 1 bit
798	           Reserved.  This bit MUST be set to 0 and MUST be ignored by
799	           receivers.

801	6. RTP Payload format parameters

803	6.1 Media Type Registration

805	   This registration uses the template defined in [11] and follows RFC
806	   3555 [7].

808	   Type name:  video

810	   Subtype name:  vc1

812	   Required parameters:

814	         profile:
815	           The value is an integer identifying the VC-1 profile.  The
816	           following values are defined:
817	           0: Simple profile.
818	           1: Main profile.
819	           3: Advanced profile.

821	           If the profile parameter is used to indicate properties of a
822	           coded bit stream, it indicates the VC-1 encoding profile that
823	           a decoder has to support in order to comply with [1] when it
824	           decodes the bit stream.

826	           If the profile parameter is used for capability exchange or
827	           in a session setup procedure, it indicates the VC-1 profile
828	           that codec supports.

830	         level:
831	           The value is an integer specifying the level of the VC-1
832	           profile.
833	           For Advanced profile, valid values are 0 to 4, which
834	           correspond to levels L0 to L4, respectively.  For Simple and
835	           Main profiles, the following values are defined:
836	           1: Low Level
837	           2: Medium Level
838	           3: High Level (only valid for Main profile)

840	           If the level parameter is used to indicate properties of a
841	           coded bit stream, it indicates the level of the VC-1 profile
842	           that a decoder has to support in order to comply with [1]
843	           when it decodes the bit stream.  Note that when Advanced
844	           profile is used, this parameter may only apply while the
845	           sequence layer header specified in the config parameter is in
846	           use.

848	           If the level parameter is used for capability exchange or in
849	           a session setup procedure, it indicates the highest level of
850	           the VC-1 profile that codec supports.  See section 6.3 for
851	           specific rules for how this parameter is used with the SDP
852	           Offer/Answer model.

854	   Optional parameters:

856	         config:
857	           The value is a base16 [6] (hexadecimal) representation of an
858	           octet string that expresses the decoder initialization
859	           parameters.  Decoder initialization parameters are mapped
860	           onto the base16 octet string in an MSB-first basis.  The
861	           first bit of the decoder initialization parameters MUST be
862	           located at the MSB of the first octet.  If the decoder
863	           initialization parameters are not multiple of 8 bits, in the
864	           last octet up to 7 zero-valued padding bits MUST be added to
865	           achieve octet alignment.

867	           For Simple and Main profiles, the decoder initialization
868	           parameters are STRUCT_C, as defined in Annex J of SMPTE 421M
869	           [1].

871	           For Advanced profile, the decoder initialization parameters
872	           are a sequence layer header directly followed by an entry-
873	           point header.  The two headers MUST be in EBDU format,
874	           meaning that they must include their Start Codes and must use
875	           the encapsulation method defined in Annex E of SMPTE 421M
876	           [1].

878	           This parameter MUST NOT be used to indicate codec
879	           capabilities in any capability exchange procedure.

881	         width:
882	           The value is an integer greater than zero, specifying the
883	           maximum horizontal size of the coded picture, in pixels.

885	           If this parameter is not specified, it defaults to the
886	           maximum horizontal size allowed by the profile and level.

888	           Note: When Advanced profile is used, this parameter only
889	           applies while the sequence layer header specified in the
890	           config parameter is in use.

892	         height:
893	           The value is an integer greater than zero, specifying the
894	           maximum vertical size of the coded picture in pixels.

896	           If this parameter is not specified, it defaults to the
897	           maximum vertical size allowed by the profile and level.

899	           Note: When Advanced profile is used, this parameter only
900	           applies while the sequence layer header specified in the
901	           config parameter is in use.

903	         bitrate:
904	           The value is an integer greater than zero, specifying the
905	           peak transmission rate of the coded bit stream in bits per
906	           second.  The number does not include the overhead caused by
907	           RTP encapsulation, i.e., it does not include the AU headers,
908	           or any of the RTP, UDP or IP headers.

910	           If this parameter is not specified, it defaults to the
911	           maximum bit rate allowed by the profile and level.  (See the
912	           values for "RMax" in Annex D of SMPTE 421M [1].)

914	           Note: When Advanced profile is used, this parameter only
915	           applies while the sequence layer header specified in the
916	           config parameter is in use.

918	         buffer:
919	           The value is an integer specifying the leaky bucket size, B,
920	           in milliseconds, required to contain a stream transmitted at
921	           the transmission rate specified by the bitrate parameter.
922	           This parameter is defined in the hypothetical reference
923	           decoder model for VC-1, in Annex C of SMPTE 421M [1].

925	           Note that this parameter relates to the codec bit stream
926	           only, and does not account for any buffering time that may be
927	           required to compensate for jitter in the network.

929	           If this parameter is not specified, it defaults to the
930	           maximum buffer size allowed by the profile and level.  (See
931	           the values for "BMax" and "RMax" in Annex D of SMPTE 421M
932	           [1].)

934	           Note: When Advanced profile is used, this parameter only
935	           applies while the sequence layer header specified in the
936	           config parameter is in use.

938	         framerate:
939	           The value is an integer greater than zero, specifying the
940	           maximum number of frames per second in the coded bit stream,
941	           multiplied by 1000 and rounded to the nearest integer value.
942	           For example, 30000/1001 (approximately 29.97) frames per
943	           second is represented as 29970.

945	           If the parameter is not specified, it defaults to the maximum
946	           frame rate allowed by the profile and level.

948	           Note: When Advanced profile is used, this parameter only
949	           applies while the sequence layer header specified in the
950	           config parameter is in use.

952	         bpic:
953	           This parameter signals if B-pictures may be present when
954	           Advanced profile is used.  If this parameter is present, and
955	           B-pictures may be present in the coded bit stream, this
956	           parameter MUST be equal to 1.
957	           If B-pictures will never be present in the coded bit stream,
958	           even if the sequence layer header changes, this parameter
959	           SHOULD be present and its value SHOULD be equal to 0.

961	           This parameter MUST not be used with Simple and Main
962	           profiles. (For Main profile, the presence of B-pictures is
963	           indicated by the MAXBFRAMES field in STRUCT_C decoder
964	           initialization parameter.)
965	           For Advanced profile, if this parameter is not specified, a
966	           value of 1 MUST be assumed.

968	         mode:
969	           The value is an integer specifying the use of the sequence
970	           layer header and the entry-point header.  This parameter is
971	           only defined for Advanced profile.  The following values are
972	           defined:
973	           0: Both the sequence layer header and the entry-point header
974	           may change, and changed headers will be included in the RTP
975	           packets.
976	           1: The sequence layer header specified in the config
977	           parameter never changes.
978	           3: The sequence layer header and the entry-point header
979	           specified in the config parameter never change.  Entry-point
980	           headers MAY not be included in the Access Units.  Each Access
981	           Unit that has the RA bit set to 1 contains a random access
982	           point even if an entry-point header is not included in the
983	           Access Unit.  If an entry-point header is not included at a
984	           random access point, then the RTP receiver MUST insert the
985	           entry-point header into the VC-1 bit stream prior to
986	           delivering the bit stream to the VC-1 decoder.

988	           If the mode parameter is not specified, a value of 0 MUST be
989	           assumed.  The mode parameter SHOULD be specified if modes 1
990	           or 3 apply to the VC-1 bit stream.

992	         max-width, max-height, max-bitrate, max-buffer, max-framerate:
993	           These parameters are defined for use in a capability exchange
994	           procedure.  The parameters do not specify properties of the
995	           coded bit stream, but rather upper limits or preferred values
996	           for the "width", "height", "bitrate", "buffer" and
997	           "framerate" parameters.  Section 6.3 provides specific rules
998	           for these parameters are used with the SDP Offer/Answer
999	           model.

1001	           Any of the max-width, max-height, max-bitrate, max-buffer and
1002	           max-framerate parameters MAY be used to indicate capabilities
1003	           that exceed the required capabilities of the signaled profile
1004	           and level.  In that case, the parameter MUST be interpreted
1005	           as the maximum value that can be supported for that
1006	           capability.

1008	           If any of the parameters specifies a capability that is less
1009	           than the required capabilities of the signaled profile and
1010	           level, then the parameter SHOULD be interpreted as a
1011	           preferred value for that capability.

1013	           When more than one parameter from the set (max-width, max-
1014	           height, max-bitrate, max-buffer and max-framerate) is
1015	           present, all signaled capabilities MUST be supported
1016	           simultaneously.

1018	           A sender or receiver MUST NOT use these parameters to
1019	           indicate capabilities that meet the requirements of a higher
1020	           level of the VC-1 profile than the one specified in the
1021	           "level" parameter, if the sender or receiver can support all
1022	           the properties of the higher level, except if specifying a
1023	           higher level is not allowed due to other restrictions.  (As
1024	           an example of such a restriction, in the SDP Offer/Answer
1025	           model, the value of the level parameter that can be used in
1026	           an Answer is limited by what was specified in the Offer.)

1028	         max-width:
1029	           The value is an integer greater than zero, specifying a
1030	           horizontal size for the coded picture, in pixels.  If the
1031	           value is less than the maximum horizontal size allowed by the
1032	           profile and level, then the value specifies the preferred
1033	           horizontal size.  Otherwise, it specifies the maximum
1034	           horizontal size that is supported.

1036	           If this parameter is not specified, it defaults to the
1037	           maximum horizontal size allowed by the profile and level.

1039	         max-height:
1040	           The value is an integer greater than zero, specifying a
1041	           vertical size for the coded picture, in pixels.  If the value
1042	           is less than the maximum vertical size allowed by the profile
1043	           and level, then the value specifies the preferred vertical
1044	           size.  Otherwise, it specifies the maximum vertical size that
1045	           is supported.

1047	           If this parameter is not specified, it defaults to the
1048	           maximum vertical size allowed by the profile and level.

1050	         max-bitrate:
1051	           The value is an integer greater than zero, specifying a peak
1052	           transmission rate for the coded bit stream in bits per
1053	           second.  The number does not include the overhead caused by
1054	           RTP encapsulation, i.e., it does not include the AU headers,
1055	           or any of the RTP, UDP or IP headers.

1057	           If the value is less than the maximum bit rate allowed by the
1058	           profile and level, then the value specifies the preferred bit
1059	           rate.  Otherwise, it specifies the maximum bit rate that is
1060	           supported.

1062	           If this parameter is not specified, it defaults to the
1063	           maximum bit rate allowed by the profile and level.  (See the
1064	           values for "RMax" in Annex D of SMPTE 421M [1].)

1066	         max-buffer:
1067	           The value is an integer specifying a leaky bucket size, B, in
1068	           milliseconds, required to contain a stream transmitted at the
1069	           transmission rate specified by the max-bitrate parameter.
1070	           This parameter is defined in the hypothetical reference
1071	           decoder model for VC-1, in Annex C of SMPTE 421M [1].

1073	           Note that this parameter relates to the codec bit stream
1074	           only, and does not account for any buffering time that may be
1075	           required to compensate for jitter in the network.

1077	           If the value is less than the maximum leaky bucket size
1078	           allowed by the max-bitrate parameter and the profile and
1079	           level, then the value specifies the preferred leaky bucket
1080	           size.  Otherwise, it specifies the maximum leaky bucket size
1081	           that is supported for the bit rate specified by the max-
1082	           bitrate parameter.

1084	           If this parameter is not specified, it defaults to the
1085	           maximum buffer size allowed by the profile and level.  (See
1086	           the values for "BMax" and "RMax" in Annex D of SMPTE 421M
1087	           [1].)

1089	         max-framerate:
1090	           The value is an integer greater than zero, specifying a
1091	           number of frames per second for the coded bit stream.  The
1092	           value is the frame rate multiplied by 1000 and rounded to the
1093	           nearest integer value.  For example, 30000/1001
1094	           (approximately 29.97) frames per second is represented as
1095	           29970.

1097	           If the value is less than the maximum frame rate allowed by
1098	           the profile and level, then the value specifies the preferred
1099	           frame rate.  Otherwise, it specifies the maximum frame rate
1100	           that is supported.

1102	           If the parameter is not specified, it defaults to the maximum
1103	           frame rate allowed by the profile and level.

1105	   Encoding considerations:
1106	           This media type is framed and contains binary data.

1108	   Security considerations:
1109	           See Section 7 of this document.

1111	   Interoperability considerations:
1112	           None.

1114	   Published specification:
1115	           This payload format specification.

1117	   Applications which use this media type:
1118	           Multimedia streaming and conferencing tools.

1120	   Additional Information:
1121	           None.

1123	   Person & email address to contact for further information:
1124	           Anders Klemets <anderskl@microsoft.com>
1125	           IETF AVT working group.

1127	   Intended Usage:
1128	           COMMON

1130	   Restrictions on usage:
1131	           This media type depends on RTP framing, and hence is only
1132	           defined for transfer via RTP [3].

1134	   Authors:
1135	           Anders Klemets

1137	   Change controller:
1138	           IETF Audio/Video Transport Working Group delegated from the
1139	           IESG.

1141	6.2 Mapping of MIME parameters to SDP

1143	   The information carried in the media type specification has a
1144	   specific mapping to fields in the Session Description Protocol (SDP)
1145	   [4].  If SDP is used to specify sessions using this payload format,
1146	   the mapping is done as follows:

1148	   o The media name in the "m=" line of SDP MUST be video (the type
1149	     name).

1151	   o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the
1152	     subtype name).

1154	   o The clock rate in the "a=rtpmap" line MUST be at least 90000.

1156	   o The REQUIRED parameters "profile" and "level" MUST be included in
1157	     the "a=fmtp" line of SDP.
1158	     These parameters are expressed as a MIME media type string, in the
1159	     form of a semicolon separated list of parameter=value pairs.

1161	   o The OPTIONAL parameters "config", "width", "height", "bitrate",
1162	     "buffer", "framerate", "bpic", "mode", "max-width", "max-height",
1163	     "max-bitrate", "max-buffer" and "max-framerate", when present,
1164	     MUST be included in the "a=fmtp" line of SDP.
1165	     These parameters are expressed as a MIME media type string, in the
1166	     form of a semicolon separated list of parameter=value pairs:

1168	         a=fmtp:<dynamic payload type> <parameter
1169	         name>=<value>[,<value>][; <parameter name>=<value>]

1171	   o Any unknown parameters to the device that uses the SDP MUST be
1172	     ignored.  For example, parameters defined in later specifications
1173	     MAY be copied into the SDP and MUST be ignored by receivers that
1174	     do not understand them.

1176	6.3 Usage with the SDP Offer/Answer Model

1178	   When VC-1 is offered over RTP using SDP in an Offer/Answer model [5]
1179	   for negotiation for unicast usage, the following rules and
1180	   limitations apply:

1182	   o The "profile" parameter MUST be used symmetrically, i.e., the
1183	     answerer MUST either maintain the parameter or remove the media
1184	     format (payload type) completely if the offered encoding profile
1185	     is not supported.

1187	   o The "level" parameter describes the level of the VC-1 profile of
1188	     the coded bit stream that the offerer or answerer is sending for
1189	     this media format configuration, when the direction attribute is
1190	     sendonly or sendrecv.  If the direction attribute is sendrecv or
1191	     recvonly, the parameter also specifies the highest level of the
1192	     VC-1 profile that the receiver implementation accepts.

1194	     The answerer MUST NOT specify a numerically higher level in the
1195	     answer than what was specified in the offer, regardless of the
1196	     direction attribute.

1198	     If an offer specifies the recvonly direction attribute, the
1199	     answerer MAY specify a level that is lower than what was specified
1200	     in the offer, i.e., the level parameter can be "downgraded".

1202	     If the offer specifies the sendonly direction attribute, the level
1203	     parameter cannot be downgraded by the answerer.  In this case, the
1204	     answerer MUST either maintain the level parameter or remove the
1205	     media format (payload type) completely if the level is not
1206	     supported.

1208	     If the offer specifies the sendrecv direction attribute, or if the
1209	     direction attribute is unspecified, the answerer MAY specify a
1210	     level that is lower than what was specified in the offer.  Note
1211	     that the level parameter specified in the answer applies to the
1212	     coded bit stream that will be sent by the answerer, and the
1213	     offerer will still use the level parameter that it specified in
1214	     the offer.

1216	   o The parameters "config", "bpic", "width", "height", "framerate",
1217	     "bitrate", "buffer" and "mode", describe the properties of the VC-
1218	     1 bit stream that the offerer or answerer is sending for this
1219	     media format configuration.

1221	     In the case of unicast usage and when the direction attribute in
1222	     the offer or answer is recvonly, the interpretation of these
1223	     parameters is undefined and they MUST NOT be used.

1225	   o The parameters "max-width", "max-height", "max-framerate", "max-
1226	     bitrate" and "max-buffer" MAY be specified in an offer or an
1227	     answer, and their interpretation is as follows:

1229	     When the direction attribute is sendonly, the parameters describe
1230	     the limits of the VC-1 bit stream that the sender is capable of
1231	     producing for the given profile and level, or any lower level of
1232	     the same profile.

1234	     When the direction attribute is recvonly or sendrecv, the
1235	     parameters describe properties of the receiver implementation.  If
1236	     the value of a property is less than what is allowed by the level
1237	     of the VC-1 profile, then it SHOULD be interpreted only as a
1238	     preferred value suggested by the sender.  If the value of a
1239	     property is greater than what is allowed by the level of the VC-1
1240	     profile, then it MUST be interpreted by the sender as an upper
1241	     limit of what the receiver accepts for the given profile and
1242	     level, and any lower level of the same profile.

1244	     For example, if a recvonly or sendrecv offer specifies
1245	     "profile=0;level=1;max-bitrate=48000", then 48 kbps is merely a
1246	     suggested bit rate, because all receiver implementations of Simple
1247	     profile, Low Level, are required to support bit rates of up to 96
1248	     kbps.  But if the offer specifies "max-bitrate=200000", this means
1249	     that the receiver implementation supports a maximum of 200 kbps
1250	     for the given profile and level (or lower level.)

1252	   o If an offerer wishes to have non-symmetrical capabilities between
1253	     sending and receiving, e.g., use different levels in each
1254	     direction, then the offerer has to offer different RTP sessions.
1255	     This can be done by specifiying different media lines declared as
1256	     "recvonly" and "sendonly", respectively.

1258	   For streams being delivered over multicast, the following rules apply
1259	   in addition:

1261	   o The "level" parameter specifies the highest level of the VC-1
1262	     profile of the bit stream that will be sent, and/or received, on
1263	     the multicast session.  The value of this parameter MUST NOT be
1264	     changed by the answerer.  Thus, a payload type can either be
1265	     accepted unaltered or removed.

1267	   o The parameters "config", "bpic", "width", "height", "framerate",
1268	     "bitrate", "buffer" and "mode", specify properties of the VC-1 bit
1269	     stream that will be sent, and/or received, on the multicast
1270	     session.  The parameters MAY be specified even if the direction
1271	     attribute is recvonly.

1273	     The values of these parameters MUST NOT be changed by the
1274	     answerer.  Thus, a payload type can either be accepted unaltered
1275	     or removed.

1277	   o The values of the parameters "max-width", "max-height", "max-
1278	     framerate", "max-bitrate" and "max-buffer" MUST be supported by
1279	     the answerer for all streams declared as sendrecv or recvonly.
1280	     Otherwise, one of the following actions MUST be performed: the
1281	     media format is removed, or the session rejected.

1283	6.4 Usage in Declarative Session Descriptions

1285	   When VC-1 is offered over RTP using SDP in a declarative style, as in
1286	   RTSP [12] or SAP [13], the following rules and limitations apply.

1288	   o The parameters "profile" and "level" indicate only the properties
1289	     of the coded bit stream.  They do not imply a limit on capabilties
1290	     supported by the sender.

1292	   o The parameters "config", "width", "height", "bitrate" and "buffer"
1293	     MUST be specified.

1295	   o The parameters "max-width", "max-height", "max-framerate", "max-
1296	     bitrate" and "max-buffer" MUST NOT be used.

1298	   An example of media representation in SDP is as follows (Simple
1299	   profile, Medium level):

1301	   m=video 49170 RTP/AVP 98
1302	   a=rtpmap:98 vc1/90000
1303	   a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000;
1304	   bitrate=384000;buffer=2000;config=4e291800

1306	7. Security Considerations

1308	   RTP packets using the payload format defined in this specification
1309	   are subject to the security considerations discussed in the RTP
1310	   specification [4], and in any appropriate RTP profile.  This implies
1311	   that confidentiality of the media streams is achieved by encryption;
1312	   for example, through the application of SRTP [10].

1314	   A potential denial-of-service threat exists for data encodings using
1315	   compression techniques that have non-uniform receiver-end
1316	   computational load.  The attacker can inject pathological RTP packets
1317	   into the stream that are complex to decode and that cause the
1318	   receiver to be overloaded.  VC-1 is particularly vulnerable to such
1319	   attacks, because it is possible for an attacker to generate RTP
1320	   packets containing frames that affect the decoding process of many
1321	   future frames.  Therefore, the usage of data origin authentication
1322	   and data integrity protection of at least the RTP packet is
1323	   RECOMMENDED; for example, with SRTP [10].

1325	   Note that the appropriate mechanism to ensure confidentiality and
1326	   integrity of RTP packets and their payloads is very dependent on the
1327	   application and on the transport and signaling protocols employed.
1328	   Thus, although SRTP is given as an example above, other possible
1329	   choices exist.

1331	8. IANA Considerations

1333	   IANA is requested to register the MIME type "video/vc1" and the
1334	   associated RTP payload format, as specified in section 6.1 of this
1335	   document, in the Media Types registry and in the RTP Payload Format
1336	   MIME types registry.

1338	9. References

1340	9.1 Normative references

1342	   [1] Proposed SMPTE 421M, "VC-1 Compressed Video Bitstream Format and
1343	       Decoding Process", www.smpte.org.
1344	   [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
1345	       Levels", BCP 14, RFC 2119, March 1997.
1346	   [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
1347	       "RTP: A Transport Protocol for Real-Time Applications", STD 64,
1348	       RFC 3550, July 2003.
1349	   [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol",
1350	       RFC 2327, April 1998.
1351	   [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1352	       Session Description Protocol (SDP)", RFC 3264, June 2002.
1353	   [6] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data
1354	       Encodings", RFC 3548, July 2003.

1356	   [7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload
1357	       Formats", RFC 3555, July 2003.

1359	9.2 Informative references

1361	   [8] Srinivasan, S., Hsu, P., Holcomb, T., Mukerjee, K., Regunathan,
1362	       S.L., Lin, B., Liang, J., Lee, M., and J. Ribas-Corbera, "Windows
1363	       Media Video 9: overview and applications", Signal Processing:
1364	       Image Communication, Volume 19, Issue 9, October 2004.
1365	   [9] Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A
1366	       generalized hypothetical reference decoder for H.264/AVC", IEEE
1367	       Transactions on Circuits and Systems for Video Technology, August
1368	       2003.
1369	   [10]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1370	       Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
1371	       3711, March 2004.
1372	   [11]Freed, N. and Klensin, J., "Media Type Specifications and
1373	       Registration Procedures", Work in Progress, July 2005.
1374	   [12]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
1375	       Protocol (RTSP)", RFC 2326, April 1998.
1376	   [13]Handley, M., Perkins, C., and E. Whelan, "Session Announcement
1377	       Protocol", RFC 2974, October 2000.

1379	Author's Addresses

1381	   Anders Klemets
1382	   Microsoft Corp.
1383	   1 Microsoft Way
1384	   Redmond, WA 98052
1385	   USA
1386	   Email: anderskl@microsoft.com

1388	Acknowledgements

1390	   Thanks to Shankar Regunathan, Gary Sullivan, Regis Crinon, Magnus
1391	   Westerlund and Colin Perkins for providing detailed feedback on this
1392	   document.

1394	IPR Notices

1396	   The IETF takes no position regarding the validity or scope of any
1397	   Intellectual Property Rights or other rights that might be claimed to
1398	   pertain to the implementation or use of the technology described in
1399	   this document or the extent to which any license under such rights
1400	   might or might not be available; nor does it represent that it has
1401	   made any independent effort to identify any such rights.  Information
1402	   on the procedures with respect to rights in RFC documents can be
1403	   found in BCP 78 and BCP 79.

1405	   Copies of IPR disclosures made to the IETF Secretariat and any
1406	   assurances of licenses to be made available, or the result of an
1407	   attempt made to obtain a general license or permission for the use of
1408	   such proprietary rights by implementers or users of this
1409	   specification can be obtained from the IETF on-line IPR repository at
1410	   http://www.ietf.org/ipr.

1412	   The IETF invites any interested party to bring to its attention any
1413	   copyrights, patents or patent applications, or other proprietary
1414	   rights that may cover technology that may be required to implement
1415	   this standard.  Please address the information to the IETF at
1416	   ietf-ipr@ietf.org.

1418	Full Copyright Statement

1420	   Copyright (C) The Internet Society (2005).

1422	   This document is subject to the rights, licenses and restrictions
1423	   contained in BCP 78, and except as set forth therein, the authors
1424	   retain all their rights.

1426	   This document and the information contained herein are provided on an
1427	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1428	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1429	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1430	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1431	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1432	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.