idnits 2.17.1 

draft-ietf-avt-rtp-vc1-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 14.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1459.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1430.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1437.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1443.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 2005) is 6706 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: '10' is mentioned on line 316, but not defined

  == Missing Reference: '12' is mentioned on line 1300, but not defined

  == Missing Reference: '13' is mentioned on line 1300, but not defined

  == Missing Reference: '11' is mentioned on line 1337, but not defined

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566)

  ** Obsolete normative reference: RFC 3548 (ref. '6') (Obsoleted by RFC 4648)

  ** Obsolete normative reference: RFC 4288 (ref. '7') (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 3555 (ref. '8') (Obsoleted by RFC
     4855, RFC 4856)


     Summary: 7 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Engineering Task Force
3	Internet Draft                                               A. Klemets
4	Document: draft-ietf-avt-rtp-vc1-04.txt                       Microsoft
5	Expires: June 2006                                        December 2005

7	               RTP Payload Format for Video Codec 1 (VC-1)

9	Status of this Memo

11	   By submitting this Internet-Draft, each author represents that any
12	   applicable patent or other IPR claims of which he or she is aware
13	   have been or will be disclosed, and any of which he or she becomes
14	   aware will be disclosed, in accordance with Section 6 of BCP 79.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	Copyright Notice

34	      Copyright (C) The Internet Society (2005).

36	Abstract

38	   This memo specifies an RTP payload format for encapsulating Video
39	   Codec 1 (VC-1) compressed bit streams, as defined by the Society of
40	   Motion Picture and Television Engineers (SMPTE) standard, SMPTE 421M.
41	   SMPTE is the main standardizing body in the motion imaging industry
42	   and the SMPTE 421M standard defines a compressed video bit stream
43	   format and decoding process for television.

45	Table of Contents

47	   1. Introduction...................................................2
48	      1.1 Conventions used in this document..........................3
49	   2. Definitions and abbreviations..................................3
50	   3. Overview of VC-1 ..............................................5
51	      3.1 VC-1 bit stream layering model.............................5
52	      3.2 Bit-stream Data Units in Advanced profile..................6
53	      3.3 Decoder initialization parameters..........................6
54	      3.4 Ordering of frames.........................................7
55	   4. Encapsulation of VC-1 format bit streams in RTP ...............8
56	      4.1 Access Units ..............................................8
57	      4.2 Fragmentation of VC-1 frames ..............................9
58	      4.3 Time stamp considerations.................................10
59	      4.4 Random Access Points .....................................11
60	      4.5 Removal of HRD parameters.................................12
61	      4.6 Repeating the Sequence Layer header ......................12
62	      4.7 Signaling of media type parameters........................13
63	      4.8 The "mode=1" media type parameter.........................13
64	      4.9 The "mode=3" media type parameter.........................14
65	   5. RTP Payload Format syntax.....................................14
66	      5.1 RTP header usage..........................................14
67	      5.2 AU header syntax..........................................15
68	      5.3 AU Control field syntax...................................16
69	   6. RTP Payload format parameters.................................18
70	      6.1 Media type Registration...................................18
71	      6.2 Mapping of media type parameters to SDP...................25
72	      6.3 Usage with the SDP Offer/Answer Model.....................25
73	      6.4 Usage in Declarative Session Descriptions.................27
74	   7. Security Considerations.......................................28
75	   8. IANA Considerations...........................................29
76	   9. References....................................................29
77	      9.1 Normative references .....................................29
78	      9.2 Informative references....................................29

80	1. Introduction

82	   This memo specifies an RTP payload format for the video coding
83	   standard Video Codec 1, also known as VC-1.  The specification for
84	   the VC-1 bit stream format and decoding process is published by the
85	   Society of Motion Picture and Television Engineers (SMPTE) as SMPTE
86	   421M [1].

88	   VC-1 has a broad applicability, being suitable for low bit rate
89	   Internet streaming applications to HDTV broadcast and Digital Cinema
90	   applications with nearly lossless coding.  The overall performance of
91	   VC-1 is such that bit rate savings of more than 50% are reported [9],
92	   when compared against MPEG-2.  See [9] for further details about how
93	   VC-1 compares against other codecs, such as MPEG-4 and H.264/AVC.
94	   (In [9], VC-1 is referred to by its earlier name, VC-9.)

96	   VC-1 is widely used for downloading and streaming of movies on the
97	   Internet, in the form of Windows Media Video 9 (WMV-9) [9], because
98	   the WMV-9 codec is compliant with the VC-1 standard.  VC-1 has also
99	   recently been adopted as a mandatory compression format for the high-
100	   definition DVD formats HD DVD and Blu-ray.

102	   SMPTE 421M defines the VC-1 bit stream syntax and specifies
103	   constraints that must be met by VC-1 conformant bit streams.  SMPTE
104	   421M also specifies the complete process required to decode the bit
105	   stream.  However, it does not specify the VC-1 compression algorithm,
106	   thus allowing for different ways to implement a VC-1 encoder.

108	   The VC-1 bit stream syntax has three profiles. Each profile has
109	   specific bit stream syntax elements and algorithms associated with
110	   it.  Depending on the application in which VC-1 is used, some
111	   profiles may be more suitable than others.  For example, Simple
112	   profile is designed for low bit rate Internet streaming and for
113	   playback on devices that can only handle low complexity decoding.
114	   Advanced profile is designed for broadcast applications, such as
115	   digital TV, HD DVD or HDTV.  Advanced profile is the only VC-1
116	   profile that supports interlaced video frames and non-square pixels.

118	   Section 2 defines the abbreviations used in this document.  Section 3
119	   provides a more detailed overview of VC-1.  Sections 4 and 5 define
120	   the RTP payload format for VC-1, and section 6 defines the media type
121	   and SDP parameters for VC-1.  See section 7 for security
122	   considerations.

124	1.1 Conventions used in this document

126	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
127	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
128	   document are to be interpreted as described in BCP 14, RFC 2119 [2].

130	2. Definitions and abbreviations

132	   This document uses the definitions in SMPTE 421M [1].  For
133	   convenience, the following terms from SMPTE 421M are restated here:

135	   B-picture: A picture that is coded using motion compensated
136	   prediction from past and/or future reference fields or frames.  A B-
137	   picture cannot be used for predicting any other picture.

139	   Bit-stream data unit (BDU): A unit of the compressed data which may
140	   be parsed (i.e., syntax decoded) independently of other information
141	   at the same hierarchical level.  A BDU can be, for example, a
142	   sequence layer header, an entry-point header, a frame, or a slice.

144	   Encapsulated BDU (EBDU): A BDU which has been encapsulated using the
145	   encapsulation mechanism described in Annex E of SMPTE 421M [1], to
146	   prevent emulation of the start code prefix in the bit stream.

148	   Entry-point: A point in the bit stream that offers random access.

150	   frame: A frame contains lines of spatial information of a video
151	   signal.  For progressive video, these lines contain samples starting
152	   from one time instant and continuing through successive lines to the
153	   bottom of the frame.  For interlaced video, a frame consists of two
154	   fields, a top field and a bottom field.  One of these fields will
155	   commence one field period later than the other.

157	   interlace: The property of frames where alternating lines of the
158	   frame represent different instances in time.  In an interlaced frame,
159	   one of the fields is meant to be displayed first.

161	   I-picture: A picture coded using information only from itself.

163	   level: A defined set of constraints on the values which may be taken
164	   by the parameters (such as bit rate and buffer size) within a
165	   particular profile.  A profile may contain one or more levels.

167	   P-picture: A picture that is coded using motion compensated
168	   prediction from past reference fields or frames.

170	   picture: For progressive video, a picture is identical to a frame,
171	   while for interlaced video, a picture may refer to a frame, or the
172	   top field or the bottom field of the frame depending on the context.

174	   profile: A defined subset of the syntax of VC-1, with a specific set
175	   of coding tools, algorithms, and syntax associated with it.  There
176	   are three VC-1 profiles: Simple, Main and Advanced.

178	   progressive: The property of frames where all the samples of the
179	   frame represent the same instance in time.

181	   random access: A random access point in the bit stream is defined by
182	   the following guarantee: If decoding begins at this point, all frames
183	   needed for display after this point will have no decoding dependency
184	   on any data preceding this point, and are also present in the
185	   decoding sequence after this point.  A random access point is also
186	   called an entry-point.

188	   sequence: A coded representation of a series of one or more pictures.
189	   In VC-1 Advanced profile, a sequence consists of a series of one or
190	   more entry-point segments, where each entry-point segment consists of
191	   a series of one or more pictures, and where the first picture in each
192	   entry-point segment provides random access.  In VC-1 Simple and Main
193	   profiles, the first picture in each sequence is an I-picture.

195	   slice: A consecutive series of macroblock rows in a picture, which
196	   are encoded as a single unit.

198	   start codes (SC): 32-bit codes embedded in that coded bit stream that
199	   are unique, and identify the beginning of a BDU.  Start codes consist
200	   of a unique three-byte Start Code Prefix (SCP), and a one-byte Start
201	   Code Suffix (SCS).

203	3. Overview of VC-1

205	   The VC-1 bit stream syntax consists of three profiles: Simple, Main,
206	   and Advanced.  Simple and Main profiles are designed for relatively
207	   low bit rate applications.  For example, the maximum bit rate
208	   supported by Simple profile is 384 kbps.  Certain features that can
209	   be used to achieve high compression efficiency, such as non-square
210	   pixels and support for interlaced pictures, are only included in
211	   Advanced profile.

213	   The maximum bit rate supported by the Advanced profile is 135 Mbps,
214	   making it suitable for nearly lossless encoding of HDTV signals.
215	   Only Advanced profile supports carrying user-data (meta-data) in-band
216	   with the compressed bit stream.  The user-data can be used for closed
217	   captioning support, for example.

219	   Of the three profiles, only Advanced profile allows codec
220	   configuration parameters, such as the picture aspect ratio, to be
221	   changed through in-band signaling in the compressed bit stream.

223	   For each of the profiles, a certain number of "levels" have been
224	   defined.  Unlike a "profile", which implies a certain set of features
225	   or syntax elements, a "level" is a set of constraints on the values
226	   of parameters in a profile, such as the bit rate or buffer size.  VC-
227	   1 Simple profile has two levels, Main profile has three, and Advanced
228	   profile has five levels.  See Annex D of SMPTE 421M [1] for a
229	   detailed list of the profiles and levels.

231	3.1 VC-1 bit stream layering model

233	   The VC-1 bit stream is defined as a hierarchy of layers.  This is
234	   conceptually similar to the notion of a protocol stack of networking
235	   protocols.  The outermost layer is called the sequence layer.  The
236	   other layers are entry-point, picture, slice, macroblock and block.

238	   In Simple and Main profiles, a sequence in the sequence layer
239	   consists of a series of one or more coded pictures.  In Advanced
240	   profile, a sequence consists of one or more entry-point segments,
241	   where each entry-point segment consists of a series of one or more
242	   pictures, and where the first picture in each entry-point segment
243	   provides random access.  A picture is decomposed into macroblocks.  A
244	   slice comprises one or more contiguous rows of macroblocks.

246	   The entry-point and slice layers are only present in Advanced
247	   profile.  In Advanced profile, the start of each entry-point layer
248	   segment indicates a random access point.  In Simple and Main profiles
249	   each I-picture is a random access point.

251	   Each picture can be coded as an I-picture, P-picture, skipped
252	   picture, or as a B-picture.  These terms are defined in section 2 of
253	   this document and in section 4.12 of SMPTE 421M [1].

255	3.2 Bit-stream Data Units in Advanced profile

257	   In Advanced profile only, each picture and slice is byte-aligned and
258	   is considered a Bit-stream Data Unit (BDU).  A BDU is defined as a
259	   unit that can be parsed (i.e., syntax decoded) independently of other
260	   information in the same layer.

262	   The beginning of a BDU is signaled by an identifier called Start Code
263	   (SC).  Sequence layer headers and entry-point headers are also BDUs
264	   and thus can be easily identified by their Start Codes.  See Annex E
265	   of SMPTE 421M [1] for a complete list of Start Codes.  Note that
266	   blocks and macroblocks are not BDUs and thus do not have a Start Code
267	   and are not necessarily byte-aligned.

269	   The Start Code consists of four bytes.  The first three bytes are
270	   0x00, 0x00 and 0x01.  The fourth byte is called the Start Code Suffix
271	   (SCS) and it is used to indicate the type of BDU that follows the
272	   Start Code.  For example, the SCS of a sequence layer header (0x0F)
273	   is different from the SCS of an entry-point header (0x0E).  The Start
274	   Code is always byte-aligned and is transmitted in network byte order.

276	   To prevent accidental emulation of the Start Code in the coded bit
277	   stream, SMPTE 421M defines an encapsulation mechanism that uses byte
278	   stuffing.  A BDU which has been encapsulated by this mechanism is
279	   referred to as an Encapsulated BDU, or EBDU.

281	3.3 Decoder initialization parameters

283	   In VC-1 Advanced profile, the sequence layer header contains
284	   parameters that are necessary to initialize the VC-1 decoder.

286	   A sequence layer header is not defined for VC-1 Simple and Main
287	   profiles.  For these profiles, decoder initialization parameters MUST
288	   be conveyed out-of-band from the coded bit stream.  Section 4.7
289	   specifies how the parameters are conveyed by this RTP payload format.

291	   For Advanced profile, the parameters in the sequence layer header
292	   apply to all entry-point segments until the next occurrence of a
293	   sequence layer header in the coded bit stream.

295	   The parameters in the sequence layer header include the Advanced
296	   profile level, the dimensions of the coded pictures, the aspect
297	   ratio, interlace information, the frame rate and up to 31 leaky
298	   bucket parameter sets for the Hypothetical Reference Decoder (HRD).

300	   Section 6.1 of SMPTE 421M [1] provides the formal specification of
301	   the sequence layer header.

303	   Each leaky bucket parameter set for the HRD specifies a peak
304	   transmission bit rate and a decoder buffer capacity.  The coded bit
305	   stream is restricted by these parameters.  The HRD model does not
306	   mandate buffering by the decoder.  Its purpose is to limit the
307	   encoder's bit rate fluctuations according to a basic buffering model,
308	   so that the resources necessary to decode the bit stream are
309	   predictable.  The HRD has a constant-delay mode and a variable-delay
310	   mode.  The constant-delay mode is appropriate for broadcast and
311	   streaming applications, while the variable-delay mode is designed for
312	   video conferencing applications.

314	   Annex C of SMPTE 421M [1] specifies the usage of the hypothetical
315	   reference decoder for VC-1 bit streams.  A general description of the
316	   theory of the HRD can be found in [10].

318	   The concept of an entry-point layer applies only to VC-1 Advanced
319	   profile.  The presence of an entry-point header indicates a random
320	   access point within the bit stream.  The entry-point header specifies
321	   current buffer fullness values for the leaky buckets in the HRD.  The
322	   header also specifies coding control parameters that are in effect
323	   until the occurrence of the next entry-point header in the bit
324	   stream.  See Section 6.2 of SMPTE 421M [1] for the formal
325	   specification of the entry-point header.

327	3.4 Ordering of frames

329	   Frames are transmitted in the same order in which they are captured,
330	   except if B-pictures are present in the coded bit stream.  In the
331	   latter case, the frames are transmitted such that the frames that the
332	   B-pictures depend on are transmitted first.  This is referred to as
333	   the coded order of the frames.

335	   The rules for how a decoder converts frames from the coded order to
336	   the display order are stated in section 5.4 of SMPTE 421M [1].  In
337	   short, if B-pictures may be present in the coded bit stream, a
338	   hypothetical decoder implementation needs to buffer one additional
339	   decoded frame.  When an I-frame or a P-frame is received, the frame
340	   can be decoded immediately but it is not displayed until the next I-
341	   or P-frame is received.  However, B-frames are displayed immediately.

343	   Figure 1 illustrates the timing relationship between the capture of
344	   frames, their coded order, and the display order of the decoded
345	   frames, when B-pictures are present in the coded bit stream.  The
346	   figure shows that the display of frame P4 is delayed until frame P7
347	   is received, while frames B2 and B3 are displayed immediately.

349	   Capture:        |I0  P1  B2  B3  P4  B5  B6  P7  B8  B9  ...
350	                   |
351	   Coded order:    |        I0  P1  P4  B2  B3  P7  B5  B6  ...
352	                   |
353	   Display order:  |            I0  P1  B2  B3  P4  B5  B6  ...
354	                   |
355	                   |+---+---+---+---+---+---+---+---+---+--> time
356	                    0   1   2   3   4   5   6   7   8   9

358	   Figure 1.  Frame reordering when B-pictures are present.

360	   If B-pictures are not present, the coded order and the display order
361	   are identical, and frames can then be displayed without additional
362	   delay shown in Figure 1.

364	4. Encapsulation of VC-1 format bit streams in RTP

366	4.1 Access Units

368	   Each RTP packet contains an integral number of application data units
369	   (ADUs).  For VC-1 format bit streams, an ADU is equivalent to one
370	   Access Unit (AU).  An Access Unit is defined as the AU header
371	   (defined in section 5.2) followed by a variable length payload, with
372	   the rules and constraints described in sections 4.1 and 4.2.  Figure
373	   2 shows the layout of an RTP packet with multiple AUs.

375	   +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+
376	   | RTP     | AU(1) | AU(2) |     | AU(n) |
377	   | Header  |       |       |     |       |
378	   +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+

380	   Figure 2.  RTP packet structure.

382	   Each Access Unit MUST start with the AU header defined in section
383	   5.2.  The AU payload MUST contain data belonging to exactly one VC-1
384	   frame.  This means that data from different VC-1 frames will always
385	   be in different AUs, however, it possible for a single VC-1 frame to
386	   be fragmented across multiple AUs (see section 4.2.)

388	   The following rules apply to the contents of each AU payload when VC-
389	   1 Advanced profile is used:

391	   - The AU payload MUST contain VC-1 bit stream data in EBDU format
392	     (i.e., the bit stream must use the byte-stuffing encapsulation
393	     mode defined in Annex E of SMPTE 421M [1].)

395	   - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer
396	     header, an entry-point header, a picture header and multiple
397	     slices and the associated user-data.  (However, all slices and
398	     their corresponding macroblocks MUST belong to the same video
399	     frame.)

401	   - The AU payload MUST start at an EBDU boundary, except when the AU
402	     payload contains a fragmented frame, in which case the rules in
403	     section 4.2 apply.

405	   When VC-1 Simple or Main profiles are used, the AU payload MUST start
406	   with a picture header, except when the AU payload contains a
407	   fragmented frame.  Section 4.2 describes how to handle fragmented
408	   frames.

410	   Access Units MUST be byte-aligned.  If the data in an AU (EBDUs in
411	   the case of Advanced profile and frame in the case of Simple and
412	   Main) does not end at an octet boundary, up to 7 zero-valued padding
413	   bits MUST be added to achieve octet-alignment.

415	4.2 Fragmentation of VC-1 frames

417	   Each AU payload SHOULD contain a complete VC-1 frame.  However, if
418	   this would cause the RTP packet to exceed the MTU size, the frame
419	   SHOULD be fragmented into multiple AUs to avoid IP-level
420	   fragmentation.  When an AU contains a fragmented frame, this MUST be
421	   indicated by setting the FRAG field in the AU header as defined in
422	   section 5.3.

424	   AU payloads that do not contain a fragmented frame, or that contain
425	   the first fragment of a frame, MUST start at an EBDU boundary if
426	   Advanced profile is used.  In this case, for Simple and Main
427	   profiles, the AU payload MUST begin with the start of a picture
428	   header.

430	   If Advanced profile is used, AU payloads that contain a fragment of a
431	   frame other than the first fragment, SHOULD start at an EBDU
432	   boundary, such as at the start of a slice.

434	   However, slices are only defined for Advanced profile, and are not
435	   always used.  Blocks and macroblocks are not BDUs (have no Start
436	   Code) and are not byte-aligned.  Therefore, it may not always be
437	   possible to continue a fragmented frame at an EBDU boundary.  One can
438	   determine if an AU payload starts at an EBDU boundary by inspecting
439	   the first three bytes of the AU payload.  The AU payload starts at an
440	   EBDU boundary if the first three bytes are identical to the Start
441	   Code Prefix (i.e., 0x00, 0x00, 0x01.)

443	   In the case of Simple and Main profiles, since the blocks and
444	   macroblocks are not byte-aligned, the fragmentation boundary may be
445	   chosen arbitrarily.

447	   If an RTP packet contains an AU with the last fragment of a frame,
448	   additional AUs SHOULD NOT be included in the RTP packet.

450	   If the PTS Delta field in the AU header is present, each fragment of
451	   a frame MUST have the same presentation time.  If the DTS Delta field
452	   in the AU header is present, each fragment of a frame MUST have the
453	   same decode time.

455	4.3 Time stamp considerations

457	   Video frames MUST be transmitted in the coded order.  Coded order
458	   implies that no frames are dependent on subsequent frames, as
459	   discussed in section 3.4.  The RTP timestamp field MUST be set to the
460	   presentation time of the video frame contained in the first AU in the
461	   RTP packet.  The presentation time can be used as the timestamp field
462	   in the RTP header because it differs from the sampling instant of the
463	   frame only by an arbitrary constant offset.

465	   If the video frame in an AU has a presentation time that differs from
466	   the RTP timestamp field, then the presentation time MUST be specified
467	   using the PTS Delta field in the AU header.  Since the RTP timestamp
468	   field must be identical to the presentation time of the first video
469	   frame, this can only happen if an RTP packet contains multiple AUs.
470	   The syntax of the PTS Delta field is defined in section 5.2.

472	   The decode time of a VC-1 frame is always monotonically increasing
473	   when the video frames are transmitted in the coded order.  If B-
474	   pictures will not be present in the coded bit stream, then the decode
475	   time of a frame SHALL be equal to the presentation time of the frame.

477	   If B-pictures may be present in the coded bit stream, then the decode
478	   times of frames are determined as follows:

480	   - Non-B frames:  The decode time SHALL be equal to the presentation
481	     time of the previous non-B frame in the coded order.

483	   - B-frames:  The decode time SHALL be equal to the presentation time
484	     of the B-frame.

486	   As an example, consider Figure 1 in section 3.4.  The decode time of
487	   non-B frame P4 is 4 time units, which is equal to the presentation
488	   time of the previous non-B frame in the coded order, which is P1.  On
489	   the other hand, the decode time of B-frame B2 is 5 time units, which
490	   is identical to its presentation time.

492	   If the decode time of a video frame differs from its presentation
493	   time, then the decode time MUST be specified using the DTS Delta
494	   field in the AU header.  The syntax of the DTS Delta field is defined
495	   in section 5.2.

497	   Knowing if the stream will contain B-pictures may help the receiver
498	   allocate resources more efficiently and can reduce delay, as an
499	   absence of B-pictures in the stream implies that no reordering
500	   of frames will be needed between the decoding process and the display
501	   of the decoded frames.  This may be important for interactive
502	   applications.

504	   The receiver SHALL assume that the coded bit stream may contain B-
505	   pictures in the following cases:

507	   - Advanced profile: If the value of the "bpic" media type parameter
508	     defined in section 6.1 is 1, or if the "bpic" parameter is not
509	     specified.

511	   - Main profile: If the MAXBFRAMES field in STRUCT_C decoder
512	     initialization parameter has a non-zero value.  STRUCT_C is
513	     conveyed in the "config" media type parameter, which is defined in
514	     section 6.1.

516	   Simple profile does not use B-pictures.

518	4.4 Random Access Points

520	   The entry-point header contains information that is needed by the
521	   decoder to decode the frames in that entry-point segment.  This means
522	   that in the event of lost RTP packets the decoder may be unable to
523	   decode frames until the next entry-point header is received.

525	   The first frame after an entry-point header is a random access points
526	   into the coded bit stream.  Simple and Main profiles do not have
527	   entry-point headers, so for those profiles each I-picture is a random
528	   access point.

530	   To allow the RTP receiver to detect that an RTP packet which was lost
531	   contained a random access point, this RTP payload format defines a
532	   field called "RA Count".  This field is present in every AU, and its
533	   value is incremented (modulo 256) for every random access point.  For
534	   additional details, see the definition of "RA Count" in section 5.2.

536	   To make it easy to determine if a AU contains a random access point,
537	   this RTP payload format also defines a bit called the "RA" flag in
538	   the AU Control field.  This bit is set to 1 only on those AU's that
539	   contain a random access point.  The RA bit is defined in section 5.3.

541	4.5 Removal of HRD parameters

543	   The sequence layer header of Advanced profile may include up to 31
544	   leaky bucket parameter sets for the Hypothetical Reference Decoder
545	   (HRD).  Each leaky bucket parameter set specifies a possible peak
546	   transmission bit rate (HRD_RATE) and a decoder buffer capacity
547	   (HRD_BUFFER).  (See section 3.3 for additional discussion about the
548	   HRD.)

550	   If the actual peak transmission rate is known by the RTP sender, the
551	   RTP sender MAY remove all leaky bucket parameter sets except for the
552	   one corresponding to the actual peak transmission rate.

554	   For each leaky bucket parameter set in the sequence layer header,
555	   there is also parameter in the entry-point header that specifies the
556	   initial fullness (HRD_FULL) of the leaky bucket.

558	   If the RTP sender has removed any leaky bucket parameter sets from
559	   the sequence layer header, then for any removed leaky bucket
560	   parameter set, it MUST also remove the corresponding HRD_FULL
561	   parameter in the entry-point header.

563	   Removing leaky bucket parameter sets, as described above, may
564	   significantly reduce the size of the sequence layer headers and the
565	   entry-point headers.

567	4.6 Repeating the Sequence Layer header

569	   To improve robustness against loss of RTP packets, it is RECOMMENDED
570	   that if the sequence layer header changes, it should be repeated
571	   frequently in the bit stream.  In this is case, it is RECOMMENDED
572	   that the number of leaky bucket parameters in the sequence layer
573	   header and the entry point headers be reduced to one, as described in
574	   section 4.5.  This will help reduce the overhead caused by repeating
575	   the sequence layer header.

577	   Note that any data in the VC-1 bit stream, including repeated copies
578	   of the sequence header itself, must be accounted for when computing
579	   the leaky bucket parameter for the HRD.  (See section 3.3 for a
580	   discussion about the HRD.)

582	   Note that if the value of TFCNTRFLAG in the sequence layer header is
583	   1, each picture header contains a frame counter field (TFCNTR).  Each
584	   time the sequence layer header is inserted in the bit stream, the
585	   value of this counter MUST be reset.

587	   To allow the RTP receiver to detect that an RTP packet which was lost
588	   contained a new sequence layer header, the AU Control field defines a
589	   bit called the "SL" flag.  This bit is toggled when a sequence layer
590	   header is transmitted, but only if that header is different from the
591	   most recently transmitted sequence layer header.  The SL bit is
592	   defined in section 5.3.

594	4.7 Signaling of media type parameters

596	   When this RTP payload format is used with SDP, the decoder
597	   initialization parameters described in section 3.3 MUST be signaled
598	   in SDP using the media type parameters specified in section 6.1.
599	   Section 6.2 specifies how to map the media type parameters to SDP
600	   [5], and section 6.3 defines rules specific to the SDP Offer/Answer
601	   model, and section 6.4 defines rules for when SDP is used in a
602	   declarative style.

604	   When Simple or Main profiles are used, it is not possible to change
605	   the decoder initialization parameters through the coded bit stream.
606	   Any changes to the decoder initialization parameters would have to be
607	   done through out-of-band means, e.g., by updating the SDP.

609	   When Advanced profile is used, the decoder initialization parameters
610	   MAY be changed by inserting a new sequence layer header or an entry-
611	   point header in the coded bit stream.

613	   Note that the sequence layer header specifies the VC-1 level, the
614	   maximum size of the coded pictures and optionally also the maximum
615	   frame rate.  The media type parameters "level", "width", "height" and
616	   "framerate" specify upper limits for these parameters.  Thus, the
617	   sequence layer header MAY specify values that that are lower than the
618	   values of the media type parameters "level", "width", "height" or
619	   "framerate", but the sequence layer header MUST NOT exceed the values
620	   of any of these media type parameters.

622	4.8 The "mode=1" media type parameter

624	   In certain applications using Advanced profile, the sequence layer
625	   header never changes.  This MAY be signaled with the media type
626	   parameter "mode=1". (The "mode" parameter is defined in section 6.1.)
627	   The "mode=1" parameter serves as a "hint" to the RTP receiver that
628	   all sequence layer headers in the bit stream will be identical.  If
629	   "mode=1" is signaled and a sequence layer header is present in the
630	   coded bit stream, then it MUST be identical to the sequence layer
631	   header specified by the "config" media type parameter.

633	   Since the sequence layer header never changes in "mode=1", the RTP
634	   sender MAY remove it from the bit stream.  Note, however, that if the
635	   value of TFCNTRFLAG in the sequence layer header is 1, each picture
636	   header contains a frame counter field (TFCNTR).  This field is reset
637	   each time the sequence layer header occurs in the bit stream.  If the
638	   RTP sender chooses to remove the sequence layer header, then it MUST
639	   ensure that the resulting bit stream is still compliant with the VC-1
640	   specification (e.g., by adjusting the TFCNTR field, if necessary.)

642	4.9 The "mode=3" media type parameter

644	   In certain applications using Advanced profile, both the sequence
645	   layer header and the entry-point header never change.  This MAY be
646	   signaled with the media type parameter "mode=3".  The same rules
647	   apply to "mode=3" as for "mode=1", described in section 4.8.
648	   Additionally, if "mode=3" is signaled, then the RTP sender MAY
649	   "compress" the coded bit stream by not including sequence layer
650	   headers and entry-point headers in the RTP packets.

652	   The RTP receiver MUST "decompress" the coded bit stream by re-
653	   inserting the entry-point headers prior to delivering the coded bit
654	   stream to the VC-1 decoder.  The sequence layer header does not need
655	   to be decompressed by the receiver, since it never changes.

657	   If "mode=3" is signaled and the RTP receiver receives a complete AU
658	   or the first fragment of an AU, and the RA bit is set to 1 but the AU
659	   does not begin with an entry-point header, then this indicates that
660	   entry-point header has been "compressed".  In that case, the RTP
661	   receiver MUST insert an entry-point header at the beginning of the
662	   AU.  When inserting the entry-point header, the RTP receiver MUST use
663	   the one that was specified by the "config" media type parameter.

665	5. RTP Payload Format syntax

667	5.1 RTP header usage

669	   The format of the RTP header is specified in RFC 3550 [3] and is
670	   reprinted in Figure 3 for convenience.

672	      0                   1                   2                   3
673	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
674	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
675	     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
676	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
677	     |                           timestamp                           |
678	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
679	     |           synchronization source (SSRC) identifier            |
680	     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
681	     |            contributing source (CSRC) identifiers             |
682	     |                             ....                              |
683	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
684	     Figure 3.  RTP header according to RFC 3550

686	   The fields of the fixed RTP header have their usual meaning, which is
687	   defined in RFC 3550 and by the RTP profile in use, with the following
688	   additional notes:

690	   Marker bit (M): 1 bit
691	           This bit is set to 1 if the RTP packet contains an Access
692	           Unit containing a complete VC-1 frame, or the last fragment
693	           of a VC-1 frame.

695	   Payload type (PT): 7 bits
696	           This document does not assign an RTP payload type for this
697	           RTP payload format. The assignment of a payload type has to
698	           be performed either through the RTP profile used or in a
699	           dynamic way.

701	   Sequence Number: 16 bits
702	           The RTP receiver can use the sequence number field to recover
703	           the coded order of the VC-1 frames.  (A typical VC-1 decoder
704	           will require the VC-1 frames to be delivered in coded order.)
705	           When VC-1 frames have been fragmented across RTP packets, the
706	           RTP receiver can use the sequence number field to ensure that
707	           no fragment is missing.

709	   Timestamp: 32 bits
710	           The RTP timestamp is set to the presentation time of the VC-1
711	           frame in the first Access Unit.
712	           A clock rate of 90 kHz MUST be used.

714	5.2 AU header syntax

716	   The Access Unit header consists of a one-byte AU Control field, the
717	   RA Count field and 3 optional fields.  All fields MUST be written in
718	   network byte order.  The structure of the AU header is illustrated in
719	   Figure 4.

721	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
722	   |AU     | RA    |  AUP  | PTS   | DTS   |
723	   |Control| Count |  Len  | Delta | Delta |
724	   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

726	   Figure 4.  Structure of AU header.

728	   AU Control: 8 bits
729	           The usage of the AU Control field is defined in section 5.3.

731	   RA Count: 8 bits
732	           Random Access Point Counter.  This field is a binary modulo
733	           256 counter.  The value of this field, MUST be incremented by
734	           1, each time an AU is transmitted where the RA bit in the AU
735	           Control field is set to 1.  The initial value of this field
736	           is undefined and MAY be chosen randomly.

738	   AUP Len: 16 bits
739	           Access Unit Payload Length.  Specifies the size, in bytes, of
740	           the payload of the Access Unit.  The field does not include
741	           the size of the AU header itself.  The field MUST be included
742	           in each AU header in an RTP packet, except for the last AU
743	           header in the packet.  If this field is not included, the
744	           payload of the Access Unit SHALL be assumed to extend to the
745	           end of the RTP payload.

747	   PTS Delta: 32 bits
748	           Presentation time delta.  Specifies the presentation time of
749	           the frame as a 2's complement offset (delta) from the
750	           timestamp field in the RTP header of this RTP packet.  The
751	           PTS Delta field MUST use the same clock rate as the timestamp
752	           field in the RTP header.
753	           This field SHOULD NOT be included in the first AU header in
754	           the RTP packet, because the RTP timestamp field specifies the
755	           presentation time of the frame in the first AU.  If this
756	           field is not included, the presentation time of the frame
757	           SHALL be assumed to be specified by the timestamp field in
758	           the RTP header.

760	   DTS Delta: 32 bits
761	           Decode time delta.  Specifies the decode time of the frame as
762	           a 2's complement offset (delta) between the presentation time
763	           and the decode time.  Note that if the presentation time is
764	           larger than the decode time, this results in a value for the
765	           DTS Delta field that is greater than zero.  The DTS Delta
766	           field MUST use the same clock rate as the timestamp field in
767	           the RTP header.  If this field is not included, the decode
768	           time of the frame SHALL be assumed to be identical to the
769	           presentation time of the frame.

771	5.3 AU Control field syntax

773	   The structure of the 8-bit AU Control field is shown in Figure 5.

775	     0    1    2    3    4    5    6    7
776	   +----+----+----+----+----+----+----+----+
777	   |  FRAG   | RA | SL | LP | PT | DT | R  |
778	   +----+----+----+----+----+----+----+----+

780	   Figure 5.  Syntax of AU Control field.

782	   FRAG: 2 bits
783	           Fragmentation Information.  This field indicates if the AU
784	           payload contains a complete frame or a fragment of a frame.
785	           It MUST be set as follows:
786	           0: The AU payload contains a fragment of a frame other than
787	           the first or last fragment.
788	           1: The AU payload contains the first fragment of a frame.
789	           2: The AU payload contains the last fragment of a frame.
790	           3: The AU payload contains a complete frame (not fragmented.)

792	   RA: 1 bit
793	           Random Access Point indicator.  This bit MUST be set to 1 if
794	           the AU contains a frame that is a random access point.  In
795	           the case of Simple and Main profiles, any I-picture is a
796	           random access point.
797	           In the case of Advanced profile, the first frame after an
798	           entry-point header is a random access point.
799	           Note that if entry-point headers are not transmitted at every
800	           random access point, this MUST be indicated using the media
801	           type parameter "mode=3".

803	   SL: 1 bit
804	           Sequence Layer Counter.  This bit MUST be toggled, i.e.,
805	           changed from 0 to 1 or from 1 to 0, if the AU contains a
806	           sequence layer header and if it is different from the most
807	           recently transmitted sequence layer header.  Otherwise, the
808	           value of this bit must be identical to the value of the SL
809	           bit in the previous AU.
810	           The initial value of this bit is undefined and MAY be chosen
811	           randomly.
812	           The bit MUST be 0 for Simple and Main profile bit streams or
813	           if the sequence layer header never changes.

815	   LP: 1 bit
816	           Length Present.  This bit MUST be set to 1 if the AU header
817	           includes the AUP Len field.

819	   PT: 1 bit
820	           PTS Delta Present.  This bit MUST be set to 1 if the AU
821	           header includes the PTS Delta field.

823	   DT: 1 bit
824	           DTS Delta Present.  This bit MUST be set to 1 if the AU
825	           header includes the DTS Delta field.

827	   R: 1 bit
828	           Reserved.  This bit MUST be set to 0 and MUST be ignored by
829	           receivers.

831	6. RTP Payload format parameters

833	6.1 Media type Registration

835	   This registration uses the template defined in RFC 4288 [7] and
836	   follows RFC 3555 [8].

838	   Type name:  video

840	   Subtype name:  vc1

842	   Required parameters:

844	         profile:
845	           The value is an integer identifying the VC-1 profile.  The
846	           following values are defined:
847	           0: Simple profile.
848	           1: Main profile.
849	           3: Advanced profile.

851	           If the profile parameter is used to indicate properties of a
852	           coded bit stream, it indicates the VC-1 profile that a
853	           decoder has to support when it decodes the bit stream.

855	           If the profile parameter is used for capability exchange or
856	           in a session setup procedure, it indicates the VC-1 profile
857	           that the codec supports.

859	         level:
860	           The value is an integer specifying the level of the VC-1
861	           profile.
862	           For Advanced profile, valid values are 0 to 4, which
863	           correspond to levels L0 to L4, respectively.  For Simple and
864	           Main profiles, the following values are defined:
865	           1: Low Level
866	           2: Medium Level
867	           3: High Level (only valid for Main profile)

869	           If the level parameter is used to indicate properties of a
870	           coded bit stream, it indicates the highest level of the VC-1
871	           profile that a decoder has to support when it decodes the bit
872	           stream.  Note that support for a level implies support for
873	           all numerically lower levels of the given profile.

875	           If the level parameter is used for capability exchange or in
876	           a session setup procedure, it indicates the highest level of
877	           the VC-1 profile that the codec supports.  See section 6.3 of
878	           RFC XXXX for specific rules for how this parameter is used
879	           with the SDP Offer/Answer model.

881	   Optional parameters:

883	         config:
884	           The value is a base16 [6] (hexadecimal) representation of an
885	           octet string that expresses the decoder initialization
886	           parameters.  Decoder initialization parameters are mapped
887	           onto the base16 octet string in an MSB-first basis.  The
888	           first bit of the decoder initialization parameters MUST be
889	           located at the MSB of the first octet.  If the decoder
890	           initialization parameters are not multiple of 8 bits, in the
891	           last octet up to 7 zero-valued padding bits MUST be added to
892	           achieve octet alignment.

894	           For Simple and Main profiles, the decoder initialization
895	           parameters are STRUCT_C, as defined in Annex J of SMPTE 421M
896	           [1].

898	           For Advanced profile, the decoder initialization parameters
899	           are a sequence layer header directly followed by an entry-
900	           point header.  The two headers MUST be in EBDU format,
901	           meaning that they must include their Start Codes and must use
902	           the encapsulation method defined in Annex E of SMPTE 421M
903	           [1].

905	         width:
906	           The value is an integer greater than zero, specifying the
907	           maximum horizontal size of the coded picture, in pixels.

909	           If this parameter is not specified, it defaults to the
910	           maximum horizontal size allowed by the specified profile and
911	           level.

913	         height:
914	           The value is an integer greater than zero, specifying the
915	           maximum vertical size of the coded picture in pixels.

917	           If this parameter is not specified, it defaults to the
918	           maximum vertical size allowed by the specified profile and
919	           level.

921	         bitrate:
922	           The value is an integer greater than zero, specifying the
923	           peak transmission rate of the coded bit stream in bits per
924	           second.  The number does not include the overhead caused by
925	           RTP encapsulation, i.e., it does not include the AU headers,
926	           or any of the RTP, UDP or IP headers.

928	           If this parameter is not specified, it defaults to the
929	           maximum bit rate allowed by the specified profile and level.
930	           (See the values for "RMax" in Annex D of SMPTE 421M [1].)

932	         buffer:
933	           The value is an integer specifying the leaky bucket size, B,
934	           in milliseconds, required to contain a stream transmitted at
935	           the transmission rate specified by the bitrate parameter.
936	           This parameter is defined in the hypothetical reference
937	           decoder model for VC-1, in Annex C of SMPTE 421M [1].

939	           Note that this parameter relates to the codec bit stream
940	           only, and does not account for any buffering time that may be
941	           required to compensate for jitter in the network.

943	           If this parameter is not specified, it defaults to the
944	           maximum buffer size allowed by the specified profile and
945	           level.  (See the values for "BMax" and "RMax" in Annex D of
946	           SMPTE 421M [1].)

948	         framerate:
949	           The value is an integer greater than zero, specifying the
950	           maximum number of frames per second in the coded bit stream,
951	           multiplied by 1000 and rounded to the nearest integer value.
952	           For example, 30000/1001 (approximately 29.97) frames per
953	           second is represented as 29970.

955	           If the parameter is not specified, it defaults to the maximum
956	           frame rate allowed by the specified profile and level.

958	         bpic:
959	           This parameter signals if B-pictures may be present when
960	           Advanced profile is used.  If this parameter is present, and
961	           B-pictures may be present in the coded bit stream, this
962	           parameter MUST be equal to 1.
963	           A value of 0 indicates that B-pictures SHALL NOT be present
964	           in the coded bit stream, even if the sequence layer header
965	           changes.  It is RECOMMENDED to include this parameter, with a
966	           value of 0, if no B-pictures will be included in the coded
967	           bit stream.

969	           This parameter MUST NOT be used with Simple and Main
970	           profiles. (For Main profile, the presence of B-pictures is
971	           indicated by the MAXBFRAMES field in STRUCT_C decoder
972	           initialization parameter.)

974	           For Advanced profile, if this parameter is not specified, a
975	           value of 1 SHALL be assumed.

977	         mode:
978	           The value is an integer specifying the use of the sequence
979	           layer header and the entry-point header.  This parameter is
980	           only defined for Advanced profile.  The following values are
981	           defined:
982	           0: Both the sequence layer header and the entry-point header
983	           may change, and changed headers will be included in the RTP
984	           packets.
985	           1: The sequence layer header specified in the config
986	           parameter never changes.  The rules in section 4.8 of RFC
987	           XXXX MUST be followed.
988	           3: The sequence layer header and the entry-point header
989	           specified in the config parameter never change.  The rules in
990	           section 4.9 of RFC XXXX MUST be followed.

992	           If the mode parameter is not specified, a value of 0 SHALL be
993	           assumed.  The mode parameter SHOULD be specified if modes 1
994	           or 3 apply to the VC-1 bit stream.

996	         max-width, max-height, max-bitrate, max-buffer, max-framerate:
997	           These parameters are defined for use in a capability exchange
998	           procedure.  The parameters do not signal properties of the
999	           coded bit stream, but rather upper limits or preferred values
1000	           for the "width", "height", "bitrate", "buffer" and
1001	           "framerate" parameters.  Section 6.3 of RFC XXXX provides
1002	           specific rules for these parameters are used with the SDP
1003	           Offer/Answer model.

1005	           Receivers that signal support for a given profile and level
1006	           MUST support the maximum values for these parameters for that
1007	           profile and level.  For example, a receiver that indicates
1008	           support for Main profile, Low level, must support a width of
1009	           352 pixels and height of 288 pixels, even if this requires
1010	           scaling the image to fit the resolution of a smaller display
1011	           device.

1013	           A receiver MAY use any of the max-width, max-height, max-
1014	           bitrate, max-buffer and max-framerate parameters to indicate
1015	           preferred capabilities.  For example, a receiver may choose
1016	           to specify values for max-width and max-height that match the
1017	           resolution of its display device, since a bit stream encoded
1018	           using those parameters would not need to be rescaled.

1020	           If any of the max-width, max-height, max-bitrate, max-buffer
1021	           and max-framerate parameters signal a capability that is less
1022	           than the required capabilities of the signaled profile and
1023	           level, then the parameter SHALL be interpreted as a preferred
1024	           value for that capability.

1026	           Any of the parameters MAY also be used to signal capabilities
1027	           that exceed the required capabilities of the signaled profile
1028	           and level.  In that case, the parameter SHALL be interpreted
1029	           as the maximum value that can be supported for that
1030	           capability.

1032	           When more than one parameter from the set (max-width, max-
1033	           height, max-bitrate, max-buffer and max-framerate) is
1034	           present, all signaled capabilities MUST be supported
1035	           simultaneously.

1037	           A sender or receiver MUST NOT use these parameters to signal
1038	           capabilities that meet the requirements of a higher level of
1039	           the VC-1 profile than the one specified in the "level"
1040	           parameter, if the sender or receiver can support all the
1041	           properties of the higher level, except if specifying a higher
1042	           level is not allowed due to other restrictions.  (As an
1043	           example of such a restriction, in the SDP Offer/Answer model,
1044	           the value of the level parameter that can be used in an
1045	           Answer is limited by what was specified in the Offer.)

1047	         max-width:
1048	           The value is an integer greater than zero, specifying a
1049	           horizontal size for the coded picture, in pixels.  If the
1050	           value is less than the maximum horizontal size allowed by the
1051	           profile and level, then the value specifies the preferred
1052	           horizontal size.  Otherwise, it specifies the maximum
1053	           horizontal size that is supported.

1055	           If this parameter is not specified, it defaults to the
1056	           maximum horizontal size allowed by the specified profile and
1057	           level.

1059	         max-height:
1060	           The value is an integer greater than zero, specifying a
1061	           vertical size for the coded picture, in pixels.  If the value
1062	           is less than the maximum vertical size allowed by the profile
1063	           and level, then the value specifies the preferred vertical
1064	           size.  Otherwise, it specifies the maximum vertical size that
1065	           is supported.

1067	           If this parameter is not specified, it defaults to the
1068	           maximum vertical size allowed by the specified profile and
1069	           level.

1071	         max-bitrate:
1072	           The value is an integer greater than zero, specifying a peak
1073	           transmission rate for the coded bit stream in bits per
1074	           second.  The number does not include the overhead caused by
1075	           RTP encapsulation, i.e., it does not include the AU headers,
1076	           or any of the RTP, UDP or IP headers.

1078	           If the value is less than the maximum bit rate allowed by the
1079	           profile and level, then the value specifies the preferred bit
1080	           rate.  Otherwise, it specifies the maximum bit rate that is
1081	           supported.

1083	           If this parameter is not specified, it defaults to the
1084	           maximum bit rate allowed by the specified profile and level.
1085	           (See the values for "RMax" in Annex D of SMPTE 421M [1].)

1087	         max-buffer:
1088	           The value is an integer specifying a leaky bucket size, B, in
1089	           milliseconds, required to contain a stream transmitted at the
1090	           transmission rate specified by the max-bitrate parameter.
1091	           This parameter is defined in the hypothetical reference
1092	           decoder model for VC-1, in Annex C of SMPTE 421M [1].

1094	           Note that this parameter relates to the codec bit stream
1095	           only, and does not account for any buffering time that may be
1096	           required to compensate for jitter in the network.

1098	           If the value is less than the maximum leaky bucket size
1099	           allowed by the max-bitrate parameter and the profile and
1100	           level, then the value specifies the preferred leaky bucket
1101	           size.  Otherwise, it specifies the maximum leaky bucket size
1102	           that is supported for the bit rate specified by the max-
1103	           bitrate parameter.

1105	           If this parameter is not specified, it defaults to the
1106	           maximum buffer size allowed by the specified profile and
1107	           level.  (See the values for "BMax" and "RMax" in Annex D of
1108	           SMPTE 421M [1].)

1110	         max-framerate:
1111	           The value is an integer greater than zero, specifying a
1112	           number of frames per second for the coded bit stream.  The
1113	           value is the frame rate multiplied by 1000 and rounded to the
1114	           nearest integer value.  For example, 30000/1001
1115	           (approximately 29.97) frames per second is represented as
1116	           29970.

1118	           If the value is less than the maximum frame rate allowed by
1119	           the profile and level, then the value specifies the preferred
1120	           frame rate.  Otherwise, it specifies the maximum frame rate
1121	           that is supported.

1123	           If the parameter is not specified, it defaults to the maximum
1124	           frame rate allowed by the specified profile and level.

1126	   Encoding considerations:
1127	           This media type is framed and contains binary data.

1129	   Security considerations:
1130	           See Section 7 of RFC XXXX.

1132	   Interoperability considerations:
1133	           None.

1135	   Published specification:
1136	           RFC XXXX.

1138	   Applications which use this media type:
1139	           Multimedia streaming and conferencing tools.

1141	   Additional Information:
1142	           None.

1144	   Person & email address to contact for further information:
1145	           Anders Klemets <anderskl@microsoft.com>
1146	           IETF AVT working group.

1148	   Intended Usage:
1149	           COMMON

1151	   Restrictions on usage:
1152	           This media type depends on RTP framing, and hence is only
1153	           defined for transfer via RTP [3].

1155	   Authors:
1156	           Anders Klemets

1158	   Change controller:
1159	           IETF Audio/Video Transport Working Group delegated from the
1160	           IESG.

1162	6.2 Mapping of media type parameters to SDP

1164	   The information carried in the media type specification has a
1165	   specific mapping to fields in the Session Description Protocol (SDP)
1166	   [4].  If SDP is used to specify sessions using this payload format,
1167	   the mapping is done as follows:

1169	   o The media name in the "m=" line of SDP MUST be video (the type
1170	     name).

1172	   o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the
1173	     subtype name).

1175	   o The clock rate in the "a=rtpmap" line MUST be 90000.

1177	   o The REQUIRED parameters "profile" and "level" MUST be included in
1178	     the "a=fmtp" line of SDP.
1179	     These parameters are expressed in the form of a semicolon
1180	     separated list of parameter=value pairs.

1182	   o The OPTIONAL parameters "config", "width", "height", "bitrate",
1183	     "buffer", "framerate", "bpic", "mode", "max-width", "max-height",
1184	     "max-bitrate", "max-buffer" and "max-framerate", when present,
1185	     MUST be included in the "a=fmtp" line of SDP.
1186	     These parameters are expressed in the form of a semicolon
1187	     separated list of parameter=value pairs:

1189	         a=fmtp:<dynamic payload type> <parameter
1190	         name>=<value>[,<value>][; <parameter name>=<value>]

1192	   o Any unknown parameters to the device that uses the SDP MUST be
1193	     ignored.  For example, parameters defined in later specifications
1194	     MAY be copied into the SDP and MUST be ignored by receivers that
1195	     do not understand them.

1197	6.3 Usage with the SDP Offer/Answer Model

1199	   When VC-1 is offered over RTP using SDP in an Offer/Answer model [5]
1200	   for negotiation for unicast usage, the following rules and
1201	   limitations apply:

1203	   o The "profile" parameter MUST be used symmetrically, i.e., the
1204	     answerer MUST either maintain the parameter or remove the media
1205	     format (payload type) completely if the offered VC-1 profile is
1206	     not supported.

1208	   o The "level" parameter specifies the highest level of the VC-1
1209	     profile supported by the codec.

1211	     The answerer MUST NOT specify a numerically higher level in the
1212	     answer than what was specified in the offer. The answerer MAY
1213	     specify a level that is lower than what was specified in the
1214	     offer, i.e., the level parameter can be "downgraded".

1216	     If the offer specifies the sendrecv or sendonly direction
1217	     attribute, and the answer downgrades the level parameter, this may
1218	     require a new offer to specify an updated "config" parameter.  If
1219	     the "config" parameter cannot be used with the level specified in
1220	     the answer, then the offerer MUST initiate another Offer/Answer
1221	     round, or not use media format (payload type).

1223	   o The parameters "config", "bpic", "width", "height", "framerate",
1224	     "bitrate", "buffer" and "mode", describe the properties of the VC-
1225	     1 bit stream that the offerer or answerer is sending for this
1226	     media format configuration.

1228	     In the case of unicast usage and when the direction attribute in
1229	     the offer or answer is recvonly, the interpretation of these
1230	     parameters is undefined and they MUST NOT be used.

1232	   o The parameters "config", "width", "height", "bitrate" and "buffer"
1233	     MUST be specified when the direction attribute is sendrecv or
1234	     sendonly.

1236	   o The parameters "max-width", "max-height", "max-framerate", "max-
1237	     bitrate" and "max-buffer" MAY be specified in an offer or an
1238	     answer, and their interpretation is as follows:

1240	     When the direction attribute is sendonly, the parameters describe
1241	     the limits of the VC-1 bit stream that the sender is capable of
1242	     producing for the given profile and level, and for any lower level
1243	     of the same profile.

1245	     When the direction attribute is recvonly or sendrecv, the
1246	     parameters describe properties of the receiver implementation.  If
1247	     the value of a property is less than what is allowed by the level
1248	     of the VC-1 profile, then it SHALL be interpreted as a preferred
1249	     value and the sender's VC-1 bit stream SHOULD NOT exceed it.  If
1250	     the value of a property is greater than what is allowed by the
1251	     level of the VC-1 profile, then it SHALL be interpreted as the
1252	     upper limit of the value that the receiver accepts for the given
1253	     profile and level, and for any lower level of the same profile.

1255	     For example, if a recvonly or sendrecv offer specifies
1256	     "profile=0;level=1;max-bitrate=48000", then 48 kbps is merely a
1257	     suggested bit rate, because all receiver implementations of Simple
1258	     profile, Low level, are required to support bit rates of up to 96
1259	     kbps.  Assuming that the offer is accepted, the answerer should
1260	     specify "bitrate=48000" in the answer, but any value up to 96000
1261	     is allowed.  But if the offer specifies "max-bitrate=200000", this
1262	     means that the receiver implementation supports a maximum of 200
1263	     kbps for the given profile and level (or lower level.)  In this
1264	     case, the answerer is allowed to answer with a bitrate parameter
1265	     of up to 200000.

1267	   o If an offerer wishes to have non-symmetrical capabilities between
1268	     sending and receiving, e.g., use different levels in each
1269	     direction, then the offerer has to offer different RTP sessions.
1270	     This can be done by specifiying different media lines declared as
1271	     "recvonly" and "sendonly", respectively.

1273	   For streams being delivered over multicast, the following rules apply
1274	   in addition:

1276	   o The "level" parameter specifies the highest level of the VC-1
1277	     profile used by the participants in the multicast session.  The
1278	     value of this parameter MUST NOT be changed by the answerer.
1279	     Thus, a payload type can either be accepted unaltered or removed.

1281	   o The parameters "config", "bpic", "width", "height", "framerate",
1282	     "bitrate", "buffer" and "mode", specify properties of the VC-1 bit
1283	     stream that will be sent, and/or received, on the multicast
1284	     session.  The parameters MAY be specified even if the direction
1285	     attribute is recvonly.

1287	     The values of these parameters MUST NOT be changed by the
1288	     answerer.  Thus, a payload type can either be accepted unaltered
1289	     or removed.

1291	   o The values of the parameters "max-width", "max-height", "max-
1292	     framerate", "max-bitrate" and "max-buffer" MUST be supported by
1293	     the answerer for all streams declared as sendrecv or recvonly.
1294	     Otherwise, one of the following actions MUST be performed: the
1295	     media format is removed, or the session rejected.

1297	6.4 Usage in Declarative Session Descriptions

1299	   When VC-1 is offered over RTP using SDP in a declarative style, as in
1300	   RTSP [12] or SAP [13], the following rules and limitations apply.

1302	   o The parameters "profile" and "level" indicate only the properties
1303	     of the coded bit stream.  They do not imply a limit on capabilties
1304	     supported by the sender.

1306	   o The parameters "config", "width", "height", "bitrate" and "buffer"
1307	     MUST be specified.

1309	   o The parameters "max-width", "max-height", "max-framerate", "max-
1310	     bitrate" and "max-buffer" MUST NOT be used.

1312	   An example of media representation in SDP is as follows (Simple
1313	   profile, Medium level):

1315	   m=video 49170 RTP/AVP 98
1316	   a=rtpmap:98 vc1/90000
1317	   a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000;
1318	   bitrate=384000;buffer=2000;config=4e291800

1320	7. Security Considerations

1322	   RTP packets using the payload format defined in this specification
1323	   are subject to the security considerations discussed in the RTP
1324	   specification [4], and in any appropriate RTP profile.  This implies
1325	   that confidentiality of the media streams is achieved by encryption;
1326	   for example, through the application of SRTP [11].

1328	   A potential denial-of-service threat exists for data encodings using
1329	   compression techniques that have non-uniform receiver-end
1330	   computational load.  The attacker can inject pathological RTP packets
1331	   into the stream that are complex to decode and that cause the
1332	   receiver to be overloaded.  VC-1 is particularly vulnerable to such
1333	   attacks, because it is possible for an attacker to generate RTP
1334	   packets containing frames that affect the decoding process of many
1335	   future frames.  Therefore, the usage of data origin authentication
1336	   and data integrity protection of at least the RTP packet is
1337	   RECOMMENDED; for example, with SRTP [11].

1339	   Note that the appropriate mechanism to ensure confidentiality and
1340	   integrity of RTP packets and their payloads is very dependent on the
1341	   application and on the transport and signaling protocols employed.
1342	   Thus, although SRTP is given as an example above, other possible
1343	   choices exist.

1345	   VC-1 bit streams can carry user-data, such as closed captioning
1346	   information and content meta-data.  The VC-1 specification does not
1347	   define how to interpret user-data.  Identifiers for user-data are
1348	   required to be registered with SMPTE.  It is conceivable for types of
1349	   user-data to be defined to include programmatic content, such as
1350	   scripts or commands that would be executed by the receiver.
1351	   Depending on the type of user-data, it might be possible for a sender
1352	   to generate user-data in a non-compliant manner to crash the receiver
1353	   or make it temporarily unavailable.  Senders that transport VC-1 bit
1354	   streams SHOULD ensure that the user-data is compliant with the
1355	   specification registered with SMPTE (see Annex F of [1].)  Receivers
1356	   SHOULD prevent malfunction in case of non-compliant user-data.

1358	8. IANA Considerations

1360	   IANA is requested to register the media type "video/vc1" and the
1361	   associated RTP payload format, as specified in section 6.1 of this
1362	   document, in the Media Types registry and in the RTP Payload Format
1363	   MIME types registry.

1365	9. References

1367	9.1 Normative references

1369	   [1] Society of Motion Picture and Television Engineers, "VC-1
1370	       Compressed Video Bitstream Format and Decoding Process", SMPTE
1371	       421M.
1372	   [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement
1373	       Levels", BCP 14, RFC 2119, March 1997.
1374	   [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson,
1375	       "RTP: A Transport Protocol for Real-Time Applications", STD 64,
1376	       RFC 3550, July 2003.
1377	   [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol",
1378	       RFC 2327, April 1998.
1379	   [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with
1380	       Session Description Protocol (SDP)", RFC 3264, June 2002.
1381	   [6] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data
1382	       Encodings", RFC 3548, July 2003.
1383	   [7] Freed, N. and Klensin, J., "Media Type Specifications and
1384	       Registration Procedures", BCP 13, RFC 4288, December 2005.
1385	   [8] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload
1386	       Formats", RFC 3555, July 2003.

1388	9.2 Informative references

1390	   [9] Srinivasan, S., Hsu, P., Holcomb, T., Mukerjee, K., Regunathan,
1391	       S.L., Lin, B., Liang, J., Lee, M., and J. Ribas-Corbera, "Windows
1392	       Media Video 9: overview and applications", Signal Processing:
1393	       Image Communication, Volume 19, Issue 9, October 2004.
1394	   [10]Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A
1395	       generalized hypothetical reference decoder for H.264/AVC", IEEE
1396	       Transactions on Circuits and Systems for Video Technology, August
1397	       2003.
1398	   [11]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1399	       Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
1400	       3711, March 2004.
1401	   [12]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
1402	       Protocol (RTSP)", RFC 2326, April 1998.
1403	   [13]Handley, M., Perkins, C., and E. Whelan, "Session Announcement
1404	       Protocol", RFC 2974, October 2000.

1406	Author's Addresses

1408	   Anders Klemets
1409	   Microsoft Corp.
1410	   1 Microsoft Way
1411	   Redmond, WA 98052
1412	   USA
1413	   Email: anderskl@microsoft.com

1415	Acknowledgements

1417	   Thanks to Shankar Regunathan, Gary Sullivan, Regis Crinon, Magnus
1418	   Westerlund and Colin Perkins for providing detailed feedback on this
1419	   document.

1421	IPR Notices

1423	   The IETF takes no position regarding the validity or scope of any
1424	   Intellectual Property Rights or other rights that might be claimed to
1425	   pertain to the implementation or use of the technology described in
1426	   this document or the extent to which any license under such rights
1427	   might or might not be available; nor does it represent that it has
1428	   made any independent effort to identify any such rights.  Information
1429	   on the procedures with respect to rights in RFC documents can be
1430	   found in BCP 78 and BCP 79.

1432	   Copies of IPR disclosures made to the IETF Secretariat and any
1433	   assurances of licenses to be made available, or the result of an
1434	   attempt made to obtain a general license or permission for the use of
1435	   such proprietary rights by implementers or users of this
1436	   specification can be obtained from the IETF on-line IPR repository at
1437	   http://www.ietf.org/ipr.

1439	   The IETF invites any interested party to bring to its attention any
1440	   copyrights, patents or patent applications, or other proprietary
1441	   rights that may cover technology that may be required to implement
1442	   this standard.  Please address the information to the IETF at
1443	   ietf-ipr@ietf.org.

1445	Full Copyright Statement

1447	   Copyright (C) The Internet Society (2005).

1449	   This document is subject to the rights, licenses and restrictions
1450	   contained in BCP 78, and except as set forth therein, the authors
1451	   retain all their rights.

1453	   This document and the information contained herein are provided on an
1454	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1455	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1456	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1457	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1458	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1459	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.