idnits 2.17.1 draft-ietf-avt-rtp-vc1-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1459. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1430. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1437. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1443. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2005) is 6706 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '10' is mentioned on line 316, but not defined == Missing Reference: '12' is mentioned on line 1300, but not defined == Missing Reference: '13' is mentioned on line 1300, but not defined == Missing Reference: '11' is mentioned on line 1337, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 3548 (ref. '6') (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 4288 (ref. '7') (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 3555 (ref. '8') (Obsoleted by RFC 4855, RFC 4856) Summary: 7 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force 3 Internet Draft A. Klemets 4 Document: draft-ietf-avt-rtp-vc1-04.txt Microsoft 5 Expires: June 2006 December 2005 7 RTP Payload Format for Video Codec 1 (VC-1) 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Copyright Notice 34 Copyright (C) The Internet Society (2005). 36 Abstract 38 This memo specifies an RTP payload format for encapsulating Video 39 Codec 1 (VC-1) compressed bit streams, as defined by the Society of 40 Motion Picture and Television Engineers (SMPTE) standard, SMPTE 421M. 41 SMPTE is the main standardizing body in the motion imaging industry 42 and the SMPTE 421M standard defines a compressed video bit stream 43 format and decoding process for television. 45 Table of Contents 47 1. Introduction...................................................2 48 1.1 Conventions used in this document..........................3 49 2. Definitions and abbreviations..................................3 50 3. Overview of VC-1 ..............................................5 51 3.1 VC-1 bit stream layering model.............................5 52 3.2 Bit-stream Data Units in Advanced profile..................6 53 3.3 Decoder initialization parameters..........................6 54 3.4 Ordering of frames.........................................7 55 4. Encapsulation of VC-1 format bit streams in RTP ...............8 56 4.1 Access Units ..............................................8 57 4.2 Fragmentation of VC-1 frames ..............................9 58 4.3 Time stamp considerations.................................10 59 4.4 Random Access Points .....................................11 60 4.5 Removal of HRD parameters.................................12 61 4.6 Repeating the Sequence Layer header ......................12 62 4.7 Signaling of media type parameters........................13 63 4.8 The "mode=1" media type parameter.........................13 64 4.9 The "mode=3" media type parameter.........................14 65 5. RTP Payload Format syntax.....................................14 66 5.1 RTP header usage..........................................14 67 5.2 AU header syntax..........................................15 68 5.3 AU Control field syntax...................................16 69 6. RTP Payload format parameters.................................18 70 6.1 Media type Registration...................................18 71 6.2 Mapping of media type parameters to SDP...................25 72 6.3 Usage with the SDP Offer/Answer Model.....................25 73 6.4 Usage in Declarative Session Descriptions.................27 74 7. Security Considerations.......................................28 75 8. IANA Considerations...........................................29 76 9. References....................................................29 77 9.1 Normative references .....................................29 78 9.2 Informative references....................................29 80 1. Introduction 82 This memo specifies an RTP payload format for the video coding 83 standard Video Codec 1, also known as VC-1. The specification for 84 the VC-1 bit stream format and decoding process is published by the 85 Society of Motion Picture and Television Engineers (SMPTE) as SMPTE 86 421M [1]. 88 VC-1 has a broad applicability, being suitable for low bit rate 89 Internet streaming applications to HDTV broadcast and Digital Cinema 90 applications with nearly lossless coding. The overall performance of 91 VC-1 is such that bit rate savings of more than 50% are reported [9], 92 when compared against MPEG-2. See [9] for further details about how 93 VC-1 compares against other codecs, such as MPEG-4 and H.264/AVC. 94 (In [9], VC-1 is referred to by its earlier name, VC-9.) 96 VC-1 is widely used for downloading and streaming of movies on the 97 Internet, in the form of Windows Media Video 9 (WMV-9) [9], because 98 the WMV-9 codec is compliant with the VC-1 standard. VC-1 has also 99 recently been adopted as a mandatory compression format for the high- 100 definition DVD formats HD DVD and Blu-ray. 102 SMPTE 421M defines the VC-1 bit stream syntax and specifies 103 constraints that must be met by VC-1 conformant bit streams. SMPTE 104 421M also specifies the complete process required to decode the bit 105 stream. However, it does not specify the VC-1 compression algorithm, 106 thus allowing for different ways to implement a VC-1 encoder. 108 The VC-1 bit stream syntax has three profiles. Each profile has 109 specific bit stream syntax elements and algorithms associated with 110 it. Depending on the application in which VC-1 is used, some 111 profiles may be more suitable than others. For example, Simple 112 profile is designed for low bit rate Internet streaming and for 113 playback on devices that can only handle low complexity decoding. 114 Advanced profile is designed for broadcast applications, such as 115 digital TV, HD DVD or HDTV. Advanced profile is the only VC-1 116 profile that supports interlaced video frames and non-square pixels. 118 Section 2 defines the abbreviations used in this document. Section 3 119 provides a more detailed overview of VC-1. Sections 4 and 5 define 120 the RTP payload format for VC-1, and section 6 defines the media type 121 and SDP parameters for VC-1. See section 7 for security 122 considerations. 124 1.1 Conventions used in this document 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 128 document are to be interpreted as described in BCP 14, RFC 2119 [2]. 130 2. Definitions and abbreviations 132 This document uses the definitions in SMPTE 421M [1]. For 133 convenience, the following terms from SMPTE 421M are restated here: 135 B-picture: A picture that is coded using motion compensated 136 prediction from past and/or future reference fields or frames. A B- 137 picture cannot be used for predicting any other picture. 139 Bit-stream data unit (BDU): A unit of the compressed data which may 140 be parsed (i.e., syntax decoded) independently of other information 141 at the same hierarchical level. A BDU can be, for example, a 142 sequence layer header, an entry-point header, a frame, or a slice. 144 Encapsulated BDU (EBDU): A BDU which has been encapsulated using the 145 encapsulation mechanism described in Annex E of SMPTE 421M [1], to 146 prevent emulation of the start code prefix in the bit stream. 148 Entry-point: A point in the bit stream that offers random access. 150 frame: A frame contains lines of spatial information of a video 151 signal. For progressive video, these lines contain samples starting 152 from one time instant and continuing through successive lines to the 153 bottom of the frame. For interlaced video, a frame consists of two 154 fields, a top field and a bottom field. One of these fields will 155 commence one field period later than the other. 157 interlace: The property of frames where alternating lines of the 158 frame represent different instances in time. In an interlaced frame, 159 one of the fields is meant to be displayed first. 161 I-picture: A picture coded using information only from itself. 163 level: A defined set of constraints on the values which may be taken 164 by the parameters (such as bit rate and buffer size) within a 165 particular profile. A profile may contain one or more levels. 167 P-picture: A picture that is coded using motion compensated 168 prediction from past reference fields or frames. 170 picture: For progressive video, a picture is identical to a frame, 171 while for interlaced video, a picture may refer to a frame, or the 172 top field or the bottom field of the frame depending on the context. 174 profile: A defined subset of the syntax of VC-1, with a specific set 175 of coding tools, algorithms, and syntax associated with it. There 176 are three VC-1 profiles: Simple, Main and Advanced. 178 progressive: The property of frames where all the samples of the 179 frame represent the same instance in time. 181 random access: A random access point in the bit stream is defined by 182 the following guarantee: If decoding begins at this point, all frames 183 needed for display after this point will have no decoding dependency 184 on any data preceding this point, and are also present in the 185 decoding sequence after this point. A random access point is also 186 called an entry-point. 188 sequence: A coded representation of a series of one or more pictures. 189 In VC-1 Advanced profile, a sequence consists of a series of one or 190 more entry-point segments, where each entry-point segment consists of 191 a series of one or more pictures, and where the first picture in each 192 entry-point segment provides random access. In VC-1 Simple and Main 193 profiles, the first picture in each sequence is an I-picture. 195 slice: A consecutive series of macroblock rows in a picture, which 196 are encoded as a single unit. 198 start codes (SC): 32-bit codes embedded in that coded bit stream that 199 are unique, and identify the beginning of a BDU. Start codes consist 200 of a unique three-byte Start Code Prefix (SCP), and a one-byte Start 201 Code Suffix (SCS). 203 3. Overview of VC-1 205 The VC-1 bit stream syntax consists of three profiles: Simple, Main, 206 and Advanced. Simple and Main profiles are designed for relatively 207 low bit rate applications. For example, the maximum bit rate 208 supported by Simple profile is 384 kbps. Certain features that can 209 be used to achieve high compression efficiency, such as non-square 210 pixels and support for interlaced pictures, are only included in 211 Advanced profile. 213 The maximum bit rate supported by the Advanced profile is 135 Mbps, 214 making it suitable for nearly lossless encoding of HDTV signals. 215 Only Advanced profile supports carrying user-data (meta-data) in-band 216 with the compressed bit stream. The user-data can be used for closed 217 captioning support, for example. 219 Of the three profiles, only Advanced profile allows codec 220 configuration parameters, such as the picture aspect ratio, to be 221 changed through in-band signaling in the compressed bit stream. 223 For each of the profiles, a certain number of "levels" have been 224 defined. Unlike a "profile", which implies a certain set of features 225 or syntax elements, a "level" is a set of constraints on the values 226 of parameters in a profile, such as the bit rate or buffer size. VC- 227 1 Simple profile has two levels, Main profile has three, and Advanced 228 profile has five levels. See Annex D of SMPTE 421M [1] for a 229 detailed list of the profiles and levels. 231 3.1 VC-1 bit stream layering model 233 The VC-1 bit stream is defined as a hierarchy of layers. This is 234 conceptually similar to the notion of a protocol stack of networking 235 protocols. The outermost layer is called the sequence layer. The 236 other layers are entry-point, picture, slice, macroblock and block. 238 In Simple and Main profiles, a sequence in the sequence layer 239 consists of a series of one or more coded pictures. In Advanced 240 profile, a sequence consists of one or more entry-point segments, 241 where each entry-point segment consists of a series of one or more 242 pictures, and where the first picture in each entry-point segment 243 provides random access. A picture is decomposed into macroblocks. A 244 slice comprises one or more contiguous rows of macroblocks. 246 The entry-point and slice layers are only present in Advanced 247 profile. In Advanced profile, the start of each entry-point layer 248 segment indicates a random access point. In Simple and Main profiles 249 each I-picture is a random access point. 251 Each picture can be coded as an I-picture, P-picture, skipped 252 picture, or as a B-picture. These terms are defined in section 2 of 253 this document and in section 4.12 of SMPTE 421M [1]. 255 3.2 Bit-stream Data Units in Advanced profile 257 In Advanced profile only, each picture and slice is byte-aligned and 258 is considered a Bit-stream Data Unit (BDU). A BDU is defined as a 259 unit that can be parsed (i.e., syntax decoded) independently of other 260 information in the same layer. 262 The beginning of a BDU is signaled by an identifier called Start Code 263 (SC). Sequence layer headers and entry-point headers are also BDUs 264 and thus can be easily identified by their Start Codes. See Annex E 265 of SMPTE 421M [1] for a complete list of Start Codes. Note that 266 blocks and macroblocks are not BDUs and thus do not have a Start Code 267 and are not necessarily byte-aligned. 269 The Start Code consists of four bytes. The first three bytes are 270 0x00, 0x00 and 0x01. The fourth byte is called the Start Code Suffix 271 (SCS) and it is used to indicate the type of BDU that follows the 272 Start Code. For example, the SCS of a sequence layer header (0x0F) 273 is different from the SCS of an entry-point header (0x0E). The Start 274 Code is always byte-aligned and is transmitted in network byte order. 276 To prevent accidental emulation of the Start Code in the coded bit 277 stream, SMPTE 421M defines an encapsulation mechanism that uses byte 278 stuffing. A BDU which has been encapsulated by this mechanism is 279 referred to as an Encapsulated BDU, or EBDU. 281 3.3 Decoder initialization parameters 283 In VC-1 Advanced profile, the sequence layer header contains 284 parameters that are necessary to initialize the VC-1 decoder. 286 A sequence layer header is not defined for VC-1 Simple and Main 287 profiles. For these profiles, decoder initialization parameters MUST 288 be conveyed out-of-band from the coded bit stream. Section 4.7 289 specifies how the parameters are conveyed by this RTP payload format. 291 For Advanced profile, the parameters in the sequence layer header 292 apply to all entry-point segments until the next occurrence of a 293 sequence layer header in the coded bit stream. 295 The parameters in the sequence layer header include the Advanced 296 profile level, the dimensions of the coded pictures, the aspect 297 ratio, interlace information, the frame rate and up to 31 leaky 298 bucket parameter sets for the Hypothetical Reference Decoder (HRD). 300 Section 6.1 of SMPTE 421M [1] provides the formal specification of 301 the sequence layer header. 303 Each leaky bucket parameter set for the HRD specifies a peak 304 transmission bit rate and a decoder buffer capacity. The coded bit 305 stream is restricted by these parameters. The HRD model does not 306 mandate buffering by the decoder. Its purpose is to limit the 307 encoder's bit rate fluctuations according to a basic buffering model, 308 so that the resources necessary to decode the bit stream are 309 predictable. The HRD has a constant-delay mode and a variable-delay 310 mode. The constant-delay mode is appropriate for broadcast and 311 streaming applications, while the variable-delay mode is designed for 312 video conferencing applications. 314 Annex C of SMPTE 421M [1] specifies the usage of the hypothetical 315 reference decoder for VC-1 bit streams. A general description of the 316 theory of the HRD can be found in [10]. 318 The concept of an entry-point layer applies only to VC-1 Advanced 319 profile. The presence of an entry-point header indicates a random 320 access point within the bit stream. The entry-point header specifies 321 current buffer fullness values for the leaky buckets in the HRD. The 322 header also specifies coding control parameters that are in effect 323 until the occurrence of the next entry-point header in the bit 324 stream. See Section 6.2 of SMPTE 421M [1] for the formal 325 specification of the entry-point header. 327 3.4 Ordering of frames 329 Frames are transmitted in the same order in which they are captured, 330 except if B-pictures are present in the coded bit stream. In the 331 latter case, the frames are transmitted such that the frames that the 332 B-pictures depend on are transmitted first. This is referred to as 333 the coded order of the frames. 335 The rules for how a decoder converts frames from the coded order to 336 the display order are stated in section 5.4 of SMPTE 421M [1]. In 337 short, if B-pictures may be present in the coded bit stream, a 338 hypothetical decoder implementation needs to buffer one additional 339 decoded frame. When an I-frame or a P-frame is received, the frame 340 can be decoded immediately but it is not displayed until the next I- 341 or P-frame is received. However, B-frames are displayed immediately. 343 Figure 1 illustrates the timing relationship between the capture of 344 frames, their coded order, and the display order of the decoded 345 frames, when B-pictures are present in the coded bit stream. The 346 figure shows that the display of frame P4 is delayed until frame P7 347 is received, while frames B2 and B3 are displayed immediately. 349 Capture: |I0 P1 B2 B3 P4 B5 B6 P7 B8 B9 ... 350 | 351 Coded order: | I0 P1 P4 B2 B3 P7 B5 B6 ... 352 | 353 Display order: | I0 P1 B2 B3 P4 B5 B6 ... 354 | 355 |+---+---+---+---+---+---+---+---+---+--> time 356 0 1 2 3 4 5 6 7 8 9 358 Figure 1. Frame reordering when B-pictures are present. 360 If B-pictures are not present, the coded order and the display order 361 are identical, and frames can then be displayed without additional 362 delay shown in Figure 1. 364 4. Encapsulation of VC-1 format bit streams in RTP 366 4.1 Access Units 368 Each RTP packet contains an integral number of application data units 369 (ADUs). For VC-1 format bit streams, an ADU is equivalent to one 370 Access Unit (AU). An Access Unit is defined as the AU header 371 (defined in section 5.2) followed by a variable length payload, with 372 the rules and constraints described in sections 4.1 and 4.2. Figure 373 2 shows the layout of an RTP packet with multiple AUs. 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 376 | RTP | AU(1) | AU(2) | | AU(n) | 377 | Header | | | | | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 380 Figure 2. RTP packet structure. 382 Each Access Unit MUST start with the AU header defined in section 383 5.2. The AU payload MUST contain data belonging to exactly one VC-1 384 frame. This means that data from different VC-1 frames will always 385 be in different AUs, however, it possible for a single VC-1 frame to 386 be fragmented across multiple AUs (see section 4.2.) 388 The following rules apply to the contents of each AU payload when VC- 389 1 Advanced profile is used: 391 - The AU payload MUST contain VC-1 bit stream data in EBDU format 392 (i.e., the bit stream must use the byte-stuffing encapsulation 393 mode defined in Annex E of SMPTE 421M [1].) 395 - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer 396 header, an entry-point header, a picture header and multiple 397 slices and the associated user-data. (However, all slices and 398 their corresponding macroblocks MUST belong to the same video 399 frame.) 401 - The AU payload MUST start at an EBDU boundary, except when the AU 402 payload contains a fragmented frame, in which case the rules in 403 section 4.2 apply. 405 When VC-1 Simple or Main profiles are used, the AU payload MUST start 406 with a picture header, except when the AU payload contains a 407 fragmented frame. Section 4.2 describes how to handle fragmented 408 frames. 410 Access Units MUST be byte-aligned. If the data in an AU (EBDUs in 411 the case of Advanced profile and frame in the case of Simple and 412 Main) does not end at an octet boundary, up to 7 zero-valued padding 413 bits MUST be added to achieve octet-alignment. 415 4.2 Fragmentation of VC-1 frames 417 Each AU payload SHOULD contain a complete VC-1 frame. However, if 418 this would cause the RTP packet to exceed the MTU size, the frame 419 SHOULD be fragmented into multiple AUs to avoid IP-level 420 fragmentation. When an AU contains a fragmented frame, this MUST be 421 indicated by setting the FRAG field in the AU header as defined in 422 section 5.3. 424 AU payloads that do not contain a fragmented frame, or that contain 425 the first fragment of a frame, MUST start at an EBDU boundary if 426 Advanced profile is used. In this case, for Simple and Main 427 profiles, the AU payload MUST begin with the start of a picture 428 header. 430 If Advanced profile is used, AU payloads that contain a fragment of a 431 frame other than the first fragment, SHOULD start at an EBDU 432 boundary, such as at the start of a slice. 434 However, slices are only defined for Advanced profile, and are not 435 always used. Blocks and macroblocks are not BDUs (have no Start 436 Code) and are not byte-aligned. Therefore, it may not always be 437 possible to continue a fragmented frame at an EBDU boundary. One can 438 determine if an AU payload starts at an EBDU boundary by inspecting 439 the first three bytes of the AU payload. The AU payload starts at an 440 EBDU boundary if the first three bytes are identical to the Start 441 Code Prefix (i.e., 0x00, 0x00, 0x01.) 443 In the case of Simple and Main profiles, since the blocks and 444 macroblocks are not byte-aligned, the fragmentation boundary may be 445 chosen arbitrarily. 447 If an RTP packet contains an AU with the last fragment of a frame, 448 additional AUs SHOULD NOT be included in the RTP packet. 450 If the PTS Delta field in the AU header is present, each fragment of 451 a frame MUST have the same presentation time. If the DTS Delta field 452 in the AU header is present, each fragment of a frame MUST have the 453 same decode time. 455 4.3 Time stamp considerations 457 Video frames MUST be transmitted in the coded order. Coded order 458 implies that no frames are dependent on subsequent frames, as 459 discussed in section 3.4. The RTP timestamp field MUST be set to the 460 presentation time of the video frame contained in the first AU in the 461 RTP packet. The presentation time can be used as the timestamp field 462 in the RTP header because it differs from the sampling instant of the 463 frame only by an arbitrary constant offset. 465 If the video frame in an AU has a presentation time that differs from 466 the RTP timestamp field, then the presentation time MUST be specified 467 using the PTS Delta field in the AU header. Since the RTP timestamp 468 field must be identical to the presentation time of the first video 469 frame, this can only happen if an RTP packet contains multiple AUs. 470 The syntax of the PTS Delta field is defined in section 5.2. 472 The decode time of a VC-1 frame is always monotonically increasing 473 when the video frames are transmitted in the coded order. If B- 474 pictures will not be present in the coded bit stream, then the decode 475 time of a frame SHALL be equal to the presentation time of the frame. 477 If B-pictures may be present in the coded bit stream, then the decode 478 times of frames are determined as follows: 480 - Non-B frames: The decode time SHALL be equal to the presentation 481 time of the previous non-B frame in the coded order. 483 - B-frames: The decode time SHALL be equal to the presentation time 484 of the B-frame. 486 As an example, consider Figure 1 in section 3.4. The decode time of 487 non-B frame P4 is 4 time units, which is equal to the presentation 488 time of the previous non-B frame in the coded order, which is P1. On 489 the other hand, the decode time of B-frame B2 is 5 time units, which 490 is identical to its presentation time. 492 If the decode time of a video frame differs from its presentation 493 time, then the decode time MUST be specified using the DTS Delta 494 field in the AU header. The syntax of the DTS Delta field is defined 495 in section 5.2. 497 Knowing if the stream will contain B-pictures may help the receiver 498 allocate resources more efficiently and can reduce delay, as an 499 absence of B-pictures in the stream implies that no reordering 500 of frames will be needed between the decoding process and the display 501 of the decoded frames. This may be important for interactive 502 applications. 504 The receiver SHALL assume that the coded bit stream may contain B- 505 pictures in the following cases: 507 - Advanced profile: If the value of the "bpic" media type parameter 508 defined in section 6.1 is 1, or if the "bpic" parameter is not 509 specified. 511 - Main profile: If the MAXBFRAMES field in STRUCT_C decoder 512 initialization parameter has a non-zero value. STRUCT_C is 513 conveyed in the "config" media type parameter, which is defined in 514 section 6.1. 516 Simple profile does not use B-pictures. 518 4.4 Random Access Points 520 The entry-point header contains information that is needed by the 521 decoder to decode the frames in that entry-point segment. This means 522 that in the event of lost RTP packets the decoder may be unable to 523 decode frames until the next entry-point header is received. 525 The first frame after an entry-point header is a random access points 526 into the coded bit stream. Simple and Main profiles do not have 527 entry-point headers, so for those profiles each I-picture is a random 528 access point. 530 To allow the RTP receiver to detect that an RTP packet which was lost 531 contained a random access point, this RTP payload format defines a 532 field called "RA Count". This field is present in every AU, and its 533 value is incremented (modulo 256) for every random access point. For 534 additional details, see the definition of "RA Count" in section 5.2. 536 To make it easy to determine if a AU contains a random access point, 537 this RTP payload format also defines a bit called the "RA" flag in 538 the AU Control field. This bit is set to 1 only on those AU's that 539 contain a random access point. The RA bit is defined in section 5.3. 541 4.5 Removal of HRD parameters 543 The sequence layer header of Advanced profile may include up to 31 544 leaky bucket parameter sets for the Hypothetical Reference Decoder 545 (HRD). Each leaky bucket parameter set specifies a possible peak 546 transmission bit rate (HRD_RATE) and a decoder buffer capacity 547 (HRD_BUFFER). (See section 3.3 for additional discussion about the 548 HRD.) 550 If the actual peak transmission rate is known by the RTP sender, the 551 RTP sender MAY remove all leaky bucket parameter sets except for the 552 one corresponding to the actual peak transmission rate. 554 For each leaky bucket parameter set in the sequence layer header, 555 there is also parameter in the entry-point header that specifies the 556 initial fullness (HRD_FULL) of the leaky bucket. 558 If the RTP sender has removed any leaky bucket parameter sets from 559 the sequence layer header, then for any removed leaky bucket 560 parameter set, it MUST also remove the corresponding HRD_FULL 561 parameter in the entry-point header. 563 Removing leaky bucket parameter sets, as described above, may 564 significantly reduce the size of the sequence layer headers and the 565 entry-point headers. 567 4.6 Repeating the Sequence Layer header 569 To improve robustness against loss of RTP packets, it is RECOMMENDED 570 that if the sequence layer header changes, it should be repeated 571 frequently in the bit stream. In this is case, it is RECOMMENDED 572 that the number of leaky bucket parameters in the sequence layer 573 header and the entry point headers be reduced to one, as described in 574 section 4.5. This will help reduce the overhead caused by repeating 575 the sequence layer header. 577 Note that any data in the VC-1 bit stream, including repeated copies 578 of the sequence header itself, must be accounted for when computing 579 the leaky bucket parameter for the HRD. (See section 3.3 for a 580 discussion about the HRD.) 582 Note that if the value of TFCNTRFLAG in the sequence layer header is 583 1, each picture header contains a frame counter field (TFCNTR). Each 584 time the sequence layer header is inserted in the bit stream, the 585 value of this counter MUST be reset. 587 To allow the RTP receiver to detect that an RTP packet which was lost 588 contained a new sequence layer header, the AU Control field defines a 589 bit called the "SL" flag. This bit is toggled when a sequence layer 590 header is transmitted, but only if that header is different from the 591 most recently transmitted sequence layer header. The SL bit is 592 defined in section 5.3. 594 4.7 Signaling of media type parameters 596 When this RTP payload format is used with SDP, the decoder 597 initialization parameters described in section 3.3 MUST be signaled 598 in SDP using the media type parameters specified in section 6.1. 599 Section 6.2 specifies how to map the media type parameters to SDP 600 [5], and section 6.3 defines rules specific to the SDP Offer/Answer 601 model, and section 6.4 defines rules for when SDP is used in a 602 declarative style. 604 When Simple or Main profiles are used, it is not possible to change 605 the decoder initialization parameters through the coded bit stream. 606 Any changes to the decoder initialization parameters would have to be 607 done through out-of-band means, e.g., by updating the SDP. 609 When Advanced profile is used, the decoder initialization parameters 610 MAY be changed by inserting a new sequence layer header or an entry- 611 point header in the coded bit stream. 613 Note that the sequence layer header specifies the VC-1 level, the 614 maximum size of the coded pictures and optionally also the maximum 615 frame rate. The media type parameters "level", "width", "height" and 616 "framerate" specify upper limits for these parameters. Thus, the 617 sequence layer header MAY specify values that that are lower than the 618 values of the media type parameters "level", "width", "height" or 619 "framerate", but the sequence layer header MUST NOT exceed the values 620 of any of these media type parameters. 622 4.8 The "mode=1" media type parameter 624 In certain applications using Advanced profile, the sequence layer 625 header never changes. This MAY be signaled with the media type 626 parameter "mode=1". (The "mode" parameter is defined in section 6.1.) 627 The "mode=1" parameter serves as a "hint" to the RTP receiver that 628 all sequence layer headers in the bit stream will be identical. If 629 "mode=1" is signaled and a sequence layer header is present in the 630 coded bit stream, then it MUST be identical to the sequence layer 631 header specified by the "config" media type parameter. 633 Since the sequence layer header never changes in "mode=1", the RTP 634 sender MAY remove it from the bit stream. Note, however, that if the 635 value of TFCNTRFLAG in the sequence layer header is 1, each picture 636 header contains a frame counter field (TFCNTR). This field is reset 637 each time the sequence layer header occurs in the bit stream. If the 638 RTP sender chooses to remove the sequence layer header, then it MUST 639 ensure that the resulting bit stream is still compliant with the VC-1 640 specification (e.g., by adjusting the TFCNTR field, if necessary.) 642 4.9 The "mode=3" media type parameter 644 In certain applications using Advanced profile, both the sequence 645 layer header and the entry-point header never change. This MAY be 646 signaled with the media type parameter "mode=3". The same rules 647 apply to "mode=3" as for "mode=1", described in section 4.8. 648 Additionally, if "mode=3" is signaled, then the RTP sender MAY 649 "compress" the coded bit stream by not including sequence layer 650 headers and entry-point headers in the RTP packets. 652 The RTP receiver MUST "decompress" the coded bit stream by re- 653 inserting the entry-point headers prior to delivering the coded bit 654 stream to the VC-1 decoder. The sequence layer header does not need 655 to be decompressed by the receiver, since it never changes. 657 If "mode=3" is signaled and the RTP receiver receives a complete AU 658 or the first fragment of an AU, and the RA bit is set to 1 but the AU 659 does not begin with an entry-point header, then this indicates that 660 entry-point header has been "compressed". In that case, the RTP 661 receiver MUST insert an entry-point header at the beginning of the 662 AU. When inserting the entry-point header, the RTP receiver MUST use 663 the one that was specified by the "config" media type parameter. 665 5. RTP Payload Format syntax 667 5.1 RTP header usage 669 The format of the RTP header is specified in RFC 3550 [3] and is 670 reprinted in Figure 3 for convenience. 672 0 1 2 3 673 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 675 |V=2|P|X| CC |M| PT | sequence number | 676 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 677 | timestamp | 678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 679 | synchronization source (SSRC) identifier | 680 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 681 | contributing source (CSRC) identifiers | 682 | .... | 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 Figure 3. RTP header according to RFC 3550 686 The fields of the fixed RTP header have their usual meaning, which is 687 defined in RFC 3550 and by the RTP profile in use, with the following 688 additional notes: 690 Marker bit (M): 1 bit 691 This bit is set to 1 if the RTP packet contains an Access 692 Unit containing a complete VC-1 frame, or the last fragment 693 of a VC-1 frame. 695 Payload type (PT): 7 bits 696 This document does not assign an RTP payload type for this 697 RTP payload format. The assignment of a payload type has to 698 be performed either through the RTP profile used or in a 699 dynamic way. 701 Sequence Number: 16 bits 702 The RTP receiver can use the sequence number field to recover 703 the coded order of the VC-1 frames. (A typical VC-1 decoder 704 will require the VC-1 frames to be delivered in coded order.) 705 When VC-1 frames have been fragmented across RTP packets, the 706 RTP receiver can use the sequence number field to ensure that 707 no fragment is missing. 709 Timestamp: 32 bits 710 The RTP timestamp is set to the presentation time of the VC-1 711 frame in the first Access Unit. 712 A clock rate of 90 kHz MUST be used. 714 5.2 AU header syntax 716 The Access Unit header consists of a one-byte AU Control field, the 717 RA Count field and 3 optional fields. All fields MUST be written in 718 network byte order. The structure of the AU header is illustrated in 719 Figure 4. 721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 722 |AU | RA | AUP | PTS | DTS | 723 |Control| Count | Len | Delta | Delta | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 Figure 4. Structure of AU header. 728 AU Control: 8 bits 729 The usage of the AU Control field is defined in section 5.3. 731 RA Count: 8 bits 732 Random Access Point Counter. This field is a binary modulo 733 256 counter. The value of this field, MUST be incremented by 734 1, each time an AU is transmitted where the RA bit in the AU 735 Control field is set to 1. The initial value of this field 736 is undefined and MAY be chosen randomly. 738 AUP Len: 16 bits 739 Access Unit Payload Length. Specifies the size, in bytes, of 740 the payload of the Access Unit. The field does not include 741 the size of the AU header itself. The field MUST be included 742 in each AU header in an RTP packet, except for the last AU 743 header in the packet. If this field is not included, the 744 payload of the Access Unit SHALL be assumed to extend to the 745 end of the RTP payload. 747 PTS Delta: 32 bits 748 Presentation time delta. Specifies the presentation time of 749 the frame as a 2's complement offset (delta) from the 750 timestamp field in the RTP header of this RTP packet. The 751 PTS Delta field MUST use the same clock rate as the timestamp 752 field in the RTP header. 753 This field SHOULD NOT be included in the first AU header in 754 the RTP packet, because the RTP timestamp field specifies the 755 presentation time of the frame in the first AU. If this 756 field is not included, the presentation time of the frame 757 SHALL be assumed to be specified by the timestamp field in 758 the RTP header. 760 DTS Delta: 32 bits 761 Decode time delta. Specifies the decode time of the frame as 762 a 2's complement offset (delta) between the presentation time 763 and the decode time. Note that if the presentation time is 764 larger than the decode time, this results in a value for the 765 DTS Delta field that is greater than zero. The DTS Delta 766 field MUST use the same clock rate as the timestamp field in 767 the RTP header. If this field is not included, the decode 768 time of the frame SHALL be assumed to be identical to the 769 presentation time of the frame. 771 5.3 AU Control field syntax 773 The structure of the 8-bit AU Control field is shown in Figure 5. 775 0 1 2 3 4 5 6 7 776 +----+----+----+----+----+----+----+----+ 777 | FRAG | RA | SL | LP | PT | DT | R | 778 +----+----+----+----+----+----+----+----+ 780 Figure 5. Syntax of AU Control field. 782 FRAG: 2 bits 783 Fragmentation Information. This field indicates if the AU 784 payload contains a complete frame or a fragment of a frame. 785 It MUST be set as follows: 786 0: The AU payload contains a fragment of a frame other than 787 the first or last fragment. 788 1: The AU payload contains the first fragment of a frame. 789 2: The AU payload contains the last fragment of a frame. 790 3: The AU payload contains a complete frame (not fragmented.) 792 RA: 1 bit 793 Random Access Point indicator. This bit MUST be set to 1 if 794 the AU contains a frame that is a random access point. In 795 the case of Simple and Main profiles, any I-picture is a 796 random access point. 797 In the case of Advanced profile, the first frame after an 798 entry-point header is a random access point. 799 Note that if entry-point headers are not transmitted at every 800 random access point, this MUST be indicated using the media 801 type parameter "mode=3". 803 SL: 1 bit 804 Sequence Layer Counter. This bit MUST be toggled, i.e., 805 changed from 0 to 1 or from 1 to 0, if the AU contains a 806 sequence layer header and if it is different from the most 807 recently transmitted sequence layer header. Otherwise, the 808 value of this bit must be identical to the value of the SL 809 bit in the previous AU. 810 The initial value of this bit is undefined and MAY be chosen 811 randomly. 812 The bit MUST be 0 for Simple and Main profile bit streams or 813 if the sequence layer header never changes. 815 LP: 1 bit 816 Length Present. This bit MUST be set to 1 if the AU header 817 includes the AUP Len field. 819 PT: 1 bit 820 PTS Delta Present. This bit MUST be set to 1 if the AU 821 header includes the PTS Delta field. 823 DT: 1 bit 824 DTS Delta Present. This bit MUST be set to 1 if the AU 825 header includes the DTS Delta field. 827 R: 1 bit 828 Reserved. This bit MUST be set to 0 and MUST be ignored by 829 receivers. 831 6. RTP Payload format parameters 833 6.1 Media type Registration 835 This registration uses the template defined in RFC 4288 [7] and 836 follows RFC 3555 [8]. 838 Type name: video 840 Subtype name: vc1 842 Required parameters: 844 profile: 845 The value is an integer identifying the VC-1 profile. The 846 following values are defined: 847 0: Simple profile. 848 1: Main profile. 849 3: Advanced profile. 851 If the profile parameter is used to indicate properties of a 852 coded bit stream, it indicates the VC-1 profile that a 853 decoder has to support when it decodes the bit stream. 855 If the profile parameter is used for capability exchange or 856 in a session setup procedure, it indicates the VC-1 profile 857 that the codec supports. 859 level: 860 The value is an integer specifying the level of the VC-1 861 profile. 862 For Advanced profile, valid values are 0 to 4, which 863 correspond to levels L0 to L4, respectively. For Simple and 864 Main profiles, the following values are defined: 865 1: Low Level 866 2: Medium Level 867 3: High Level (only valid for Main profile) 869 If the level parameter is used to indicate properties of a 870 coded bit stream, it indicates the highest level of the VC-1 871 profile that a decoder has to support when it decodes the bit 872 stream. Note that support for a level implies support for 873 all numerically lower levels of the given profile. 875 If the level parameter is used for capability exchange or in 876 a session setup procedure, it indicates the highest level of 877 the VC-1 profile that the codec supports. See section 6.3 of 878 RFC XXXX for specific rules for how this parameter is used 879 with the SDP Offer/Answer model. 881 Optional parameters: 883 config: 884 The value is a base16 [6] (hexadecimal) representation of an 885 octet string that expresses the decoder initialization 886 parameters. Decoder initialization parameters are mapped 887 onto the base16 octet string in an MSB-first basis. The 888 first bit of the decoder initialization parameters MUST be 889 located at the MSB of the first octet. If the decoder 890 initialization parameters are not multiple of 8 bits, in the 891 last octet up to 7 zero-valued padding bits MUST be added to 892 achieve octet alignment. 894 For Simple and Main profiles, the decoder initialization 895 parameters are STRUCT_C, as defined in Annex J of SMPTE 421M 896 [1]. 898 For Advanced profile, the decoder initialization parameters 899 are a sequence layer header directly followed by an entry- 900 point header. The two headers MUST be in EBDU format, 901 meaning that they must include their Start Codes and must use 902 the encapsulation method defined in Annex E of SMPTE 421M 903 [1]. 905 width: 906 The value is an integer greater than zero, specifying the 907 maximum horizontal size of the coded picture, in pixels. 909 If this parameter is not specified, it defaults to the 910 maximum horizontal size allowed by the specified profile and 911 level. 913 height: 914 The value is an integer greater than zero, specifying the 915 maximum vertical size of the coded picture in pixels. 917 If this parameter is not specified, it defaults to the 918 maximum vertical size allowed by the specified profile and 919 level. 921 bitrate: 922 The value is an integer greater than zero, specifying the 923 peak transmission rate of the coded bit stream in bits per 924 second. The number does not include the overhead caused by 925 RTP encapsulation, i.e., it does not include the AU headers, 926 or any of the RTP, UDP or IP headers. 928 If this parameter is not specified, it defaults to the 929 maximum bit rate allowed by the specified profile and level. 930 (See the values for "RMax" in Annex D of SMPTE 421M [1].) 932 buffer: 933 The value is an integer specifying the leaky bucket size, B, 934 in milliseconds, required to contain a stream transmitted at 935 the transmission rate specified by the bitrate parameter. 936 This parameter is defined in the hypothetical reference 937 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 939 Note that this parameter relates to the codec bit stream 940 only, and does not account for any buffering time that may be 941 required to compensate for jitter in the network. 943 If this parameter is not specified, it defaults to the 944 maximum buffer size allowed by the specified profile and 945 level. (See the values for "BMax" and "RMax" in Annex D of 946 SMPTE 421M [1].) 948 framerate: 949 The value is an integer greater than zero, specifying the 950 maximum number of frames per second in the coded bit stream, 951 multiplied by 1000 and rounded to the nearest integer value. 952 For example, 30000/1001 (approximately 29.97) frames per 953 second is represented as 29970. 955 If the parameter is not specified, it defaults to the maximum 956 frame rate allowed by the specified profile and level. 958 bpic: 959 This parameter signals if B-pictures may be present when 960 Advanced profile is used. If this parameter is present, and 961 B-pictures may be present in the coded bit stream, this 962 parameter MUST be equal to 1. 963 A value of 0 indicates that B-pictures SHALL NOT be present 964 in the coded bit stream, even if the sequence layer header 965 changes. It is RECOMMENDED to include this parameter, with a 966 value of 0, if no B-pictures will be included in the coded 967 bit stream. 969 This parameter MUST NOT be used with Simple and Main 970 profiles. (For Main profile, the presence of B-pictures is 971 indicated by the MAXBFRAMES field in STRUCT_C decoder 972 initialization parameter.) 974 For Advanced profile, if this parameter is not specified, a 975 value of 1 SHALL be assumed. 977 mode: 978 The value is an integer specifying the use of the sequence 979 layer header and the entry-point header. This parameter is 980 only defined for Advanced profile. The following values are 981 defined: 982 0: Both the sequence layer header and the entry-point header 983 may change, and changed headers will be included in the RTP 984 packets. 985 1: The sequence layer header specified in the config 986 parameter never changes. The rules in section 4.8 of RFC 987 XXXX MUST be followed. 988 3: The sequence layer header and the entry-point header 989 specified in the config parameter never change. The rules in 990 section 4.9 of RFC XXXX MUST be followed. 992 If the mode parameter is not specified, a value of 0 SHALL be 993 assumed. The mode parameter SHOULD be specified if modes 1 994 or 3 apply to the VC-1 bit stream. 996 max-width, max-height, max-bitrate, max-buffer, max-framerate: 997 These parameters are defined for use in a capability exchange 998 procedure. The parameters do not signal properties of the 999 coded bit stream, but rather upper limits or preferred values 1000 for the "width", "height", "bitrate", "buffer" and 1001 "framerate" parameters. Section 6.3 of RFC XXXX provides 1002 specific rules for these parameters are used with the SDP 1003 Offer/Answer model. 1005 Receivers that signal support for a given profile and level 1006 MUST support the maximum values for these parameters for that 1007 profile and level. For example, a receiver that indicates 1008 support for Main profile, Low level, must support a width of 1009 352 pixels and height of 288 pixels, even if this requires 1010 scaling the image to fit the resolution of a smaller display 1011 device. 1013 A receiver MAY use any of the max-width, max-height, max- 1014 bitrate, max-buffer and max-framerate parameters to indicate 1015 preferred capabilities. For example, a receiver may choose 1016 to specify values for max-width and max-height that match the 1017 resolution of its display device, since a bit stream encoded 1018 using those parameters would not need to be rescaled. 1020 If any of the max-width, max-height, max-bitrate, max-buffer 1021 and max-framerate parameters signal a capability that is less 1022 than the required capabilities of the signaled profile and 1023 level, then the parameter SHALL be interpreted as a preferred 1024 value for that capability. 1026 Any of the parameters MAY also be used to signal capabilities 1027 that exceed the required capabilities of the signaled profile 1028 and level. In that case, the parameter SHALL be interpreted 1029 as the maximum value that can be supported for that 1030 capability. 1032 When more than one parameter from the set (max-width, max- 1033 height, max-bitrate, max-buffer and max-framerate) is 1034 present, all signaled capabilities MUST be supported 1035 simultaneously. 1037 A sender or receiver MUST NOT use these parameters to signal 1038 capabilities that meet the requirements of a higher level of 1039 the VC-1 profile than the one specified in the "level" 1040 parameter, if the sender or receiver can support all the 1041 properties of the higher level, except if specifying a higher 1042 level is not allowed due to other restrictions. (As an 1043 example of such a restriction, in the SDP Offer/Answer model, 1044 the value of the level parameter that can be used in an 1045 Answer is limited by what was specified in the Offer.) 1047 max-width: 1048 The value is an integer greater than zero, specifying a 1049 horizontal size for the coded picture, in pixels. If the 1050 value is less than the maximum horizontal size allowed by the 1051 profile and level, then the value specifies the preferred 1052 horizontal size. Otherwise, it specifies the maximum 1053 horizontal size that is supported. 1055 If this parameter is not specified, it defaults to the 1056 maximum horizontal size allowed by the specified profile and 1057 level. 1059 max-height: 1060 The value is an integer greater than zero, specifying a 1061 vertical size for the coded picture, in pixels. If the value 1062 is less than the maximum vertical size allowed by the profile 1063 and level, then the value specifies the preferred vertical 1064 size. Otherwise, it specifies the maximum vertical size that 1065 is supported. 1067 If this parameter is not specified, it defaults to the 1068 maximum vertical size allowed by the specified profile and 1069 level. 1071 max-bitrate: 1072 The value is an integer greater than zero, specifying a peak 1073 transmission rate for the coded bit stream in bits per 1074 second. The number does not include the overhead caused by 1075 RTP encapsulation, i.e., it does not include the AU headers, 1076 or any of the RTP, UDP or IP headers. 1078 If the value is less than the maximum bit rate allowed by the 1079 profile and level, then the value specifies the preferred bit 1080 rate. Otherwise, it specifies the maximum bit rate that is 1081 supported. 1083 If this parameter is not specified, it defaults to the 1084 maximum bit rate allowed by the specified profile and level. 1085 (See the values for "RMax" in Annex D of SMPTE 421M [1].) 1087 max-buffer: 1088 The value is an integer specifying a leaky bucket size, B, in 1089 milliseconds, required to contain a stream transmitted at the 1090 transmission rate specified by the max-bitrate parameter. 1091 This parameter is defined in the hypothetical reference 1092 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 1094 Note that this parameter relates to the codec bit stream 1095 only, and does not account for any buffering time that may be 1096 required to compensate for jitter in the network. 1098 If the value is less than the maximum leaky bucket size 1099 allowed by the max-bitrate parameter and the profile and 1100 level, then the value specifies the preferred leaky bucket 1101 size. Otherwise, it specifies the maximum leaky bucket size 1102 that is supported for the bit rate specified by the max- 1103 bitrate parameter. 1105 If this parameter is not specified, it defaults to the 1106 maximum buffer size allowed by the specified profile and 1107 level. (See the values for "BMax" and "RMax" in Annex D of 1108 SMPTE 421M [1].) 1110 max-framerate: 1111 The value is an integer greater than zero, specifying a 1112 number of frames per second for the coded bit stream. The 1113 value is the frame rate multiplied by 1000 and rounded to the 1114 nearest integer value. For example, 30000/1001 1115 (approximately 29.97) frames per second is represented as 1116 29970. 1118 If the value is less than the maximum frame rate allowed by 1119 the profile and level, then the value specifies the preferred 1120 frame rate. Otherwise, it specifies the maximum frame rate 1121 that is supported. 1123 If the parameter is not specified, it defaults to the maximum 1124 frame rate allowed by the specified profile and level. 1126 Encoding considerations: 1127 This media type is framed and contains binary data. 1129 Security considerations: 1130 See Section 7 of RFC XXXX. 1132 Interoperability considerations: 1133 None. 1135 Published specification: 1136 RFC XXXX. 1138 Applications which use this media type: 1139 Multimedia streaming and conferencing tools. 1141 Additional Information: 1142 None. 1144 Person & email address to contact for further information: 1145 Anders Klemets 1146 IETF AVT working group. 1148 Intended Usage: 1149 COMMON 1151 Restrictions on usage: 1152 This media type depends on RTP framing, and hence is only 1153 defined for transfer via RTP [3]. 1155 Authors: 1156 Anders Klemets 1158 Change controller: 1159 IETF Audio/Video Transport Working Group delegated from the 1160 IESG. 1162 6.2 Mapping of media type parameters to SDP 1164 The information carried in the media type specification has a 1165 specific mapping to fields in the Session Description Protocol (SDP) 1166 [4]. If SDP is used to specify sessions using this payload format, 1167 the mapping is done as follows: 1169 o The media name in the "m=" line of SDP MUST be video (the type 1170 name). 1172 o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the 1173 subtype name). 1175 o The clock rate in the "a=rtpmap" line MUST be 90000. 1177 o The REQUIRED parameters "profile" and "level" MUST be included in 1178 the "a=fmtp" line of SDP. 1179 These parameters are expressed in the form of a semicolon 1180 separated list of parameter=value pairs. 1182 o The OPTIONAL parameters "config", "width", "height", "bitrate", 1183 "buffer", "framerate", "bpic", "mode", "max-width", "max-height", 1184 "max-bitrate", "max-buffer" and "max-framerate", when present, 1185 MUST be included in the "a=fmtp" line of SDP. 1186 These parameters are expressed in the form of a semicolon 1187 separated list of parameter=value pairs: 1189 a=fmtp: =[,][; =] 1192 o Any unknown parameters to the device that uses the SDP MUST be 1193 ignored. For example, parameters defined in later specifications 1194 MAY be copied into the SDP and MUST be ignored by receivers that 1195 do not understand them. 1197 6.3 Usage with the SDP Offer/Answer Model 1199 When VC-1 is offered over RTP using SDP in an Offer/Answer model [5] 1200 for negotiation for unicast usage, the following rules and 1201 limitations apply: 1203 o The "profile" parameter MUST be used symmetrically, i.e., the 1204 answerer MUST either maintain the parameter or remove the media 1205 format (payload type) completely if the offered VC-1 profile is 1206 not supported. 1208 o The "level" parameter specifies the highest level of the VC-1 1209 profile supported by the codec. 1211 The answerer MUST NOT specify a numerically higher level in the 1212 answer than what was specified in the offer. The answerer MAY 1213 specify a level that is lower than what was specified in the 1214 offer, i.e., the level parameter can be "downgraded". 1216 If the offer specifies the sendrecv or sendonly direction 1217 attribute, and the answer downgrades the level parameter, this may 1218 require a new offer to specify an updated "config" parameter. If 1219 the "config" parameter cannot be used with the level specified in 1220 the answer, then the offerer MUST initiate another Offer/Answer 1221 round, or not use media format (payload type). 1223 o The parameters "config", "bpic", "width", "height", "framerate", 1224 "bitrate", "buffer" and "mode", describe the properties of the VC- 1225 1 bit stream that the offerer or answerer is sending for this 1226 media format configuration. 1228 In the case of unicast usage and when the direction attribute in 1229 the offer or answer is recvonly, the interpretation of these 1230 parameters is undefined and they MUST NOT be used. 1232 o The parameters "config", "width", "height", "bitrate" and "buffer" 1233 MUST be specified when the direction attribute is sendrecv or 1234 sendonly. 1236 o The parameters "max-width", "max-height", "max-framerate", "max- 1237 bitrate" and "max-buffer" MAY be specified in an offer or an 1238 answer, and their interpretation is as follows: 1240 When the direction attribute is sendonly, the parameters describe 1241 the limits of the VC-1 bit stream that the sender is capable of 1242 producing for the given profile and level, and for any lower level 1243 of the same profile. 1245 When the direction attribute is recvonly or sendrecv, the 1246 parameters describe properties of the receiver implementation. If 1247 the value of a property is less than what is allowed by the level 1248 of the VC-1 profile, then it SHALL be interpreted as a preferred 1249 value and the sender's VC-1 bit stream SHOULD NOT exceed it. If 1250 the value of a property is greater than what is allowed by the 1251 level of the VC-1 profile, then it SHALL be interpreted as the 1252 upper limit of the value that the receiver accepts for the given 1253 profile and level, and for any lower level of the same profile. 1255 For example, if a recvonly or sendrecv offer specifies 1256 "profile=0;level=1;max-bitrate=48000", then 48 kbps is merely a 1257 suggested bit rate, because all receiver implementations of Simple 1258 profile, Low level, are required to support bit rates of up to 96 1259 kbps. Assuming that the offer is accepted, the answerer should 1260 specify "bitrate=48000" in the answer, but any value up to 96000 1261 is allowed. But if the offer specifies "max-bitrate=200000", this 1262 means that the receiver implementation supports a maximum of 200 1263 kbps for the given profile and level (or lower level.) In this 1264 case, the answerer is allowed to answer with a bitrate parameter 1265 of up to 200000. 1267 o If an offerer wishes to have non-symmetrical capabilities between 1268 sending and receiving, e.g., use different levels in each 1269 direction, then the offerer has to offer different RTP sessions. 1270 This can be done by specifiying different media lines declared as 1271 "recvonly" and "sendonly", respectively. 1273 For streams being delivered over multicast, the following rules apply 1274 in addition: 1276 o The "level" parameter specifies the highest level of the VC-1 1277 profile used by the participants in the multicast session. The 1278 value of this parameter MUST NOT be changed by the answerer. 1279 Thus, a payload type can either be accepted unaltered or removed. 1281 o The parameters "config", "bpic", "width", "height", "framerate", 1282 "bitrate", "buffer" and "mode", specify properties of the VC-1 bit 1283 stream that will be sent, and/or received, on the multicast 1284 session. The parameters MAY be specified even if the direction 1285 attribute is recvonly. 1287 The values of these parameters MUST NOT be changed by the 1288 answerer. Thus, a payload type can either be accepted unaltered 1289 or removed. 1291 o The values of the parameters "max-width", "max-height", "max- 1292 framerate", "max-bitrate" and "max-buffer" MUST be supported by 1293 the answerer for all streams declared as sendrecv or recvonly. 1294 Otherwise, one of the following actions MUST be performed: the 1295 media format is removed, or the session rejected. 1297 6.4 Usage in Declarative Session Descriptions 1299 When VC-1 is offered over RTP using SDP in a declarative style, as in 1300 RTSP [12] or SAP [13], the following rules and limitations apply. 1302 o The parameters "profile" and "level" indicate only the properties 1303 of the coded bit stream. They do not imply a limit on capabilties 1304 supported by the sender. 1306 o The parameters "config", "width", "height", "bitrate" and "buffer" 1307 MUST be specified. 1309 o The parameters "max-width", "max-height", "max-framerate", "max- 1310 bitrate" and "max-buffer" MUST NOT be used. 1312 An example of media representation in SDP is as follows (Simple 1313 profile, Medium level): 1315 m=video 49170 RTP/AVP 98 1316 a=rtpmap:98 vc1/90000 1317 a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000; 1318 bitrate=384000;buffer=2000;config=4e291800 1320 7. Security Considerations 1322 RTP packets using the payload format defined in this specification 1323 are subject to the security considerations discussed in the RTP 1324 specification [4], and in any appropriate RTP profile. This implies 1325 that confidentiality of the media streams is achieved by encryption; 1326 for example, through the application of SRTP [11]. 1328 A potential denial-of-service threat exists for data encodings using 1329 compression techniques that have non-uniform receiver-end 1330 computational load. The attacker can inject pathological RTP packets 1331 into the stream that are complex to decode and that cause the 1332 receiver to be overloaded. VC-1 is particularly vulnerable to such 1333 attacks, because it is possible for an attacker to generate RTP 1334 packets containing frames that affect the decoding process of many 1335 future frames. Therefore, the usage of data origin authentication 1336 and data integrity protection of at least the RTP packet is 1337 RECOMMENDED; for example, with SRTP [11]. 1339 Note that the appropriate mechanism to ensure confidentiality and 1340 integrity of RTP packets and their payloads is very dependent on the 1341 application and on the transport and signaling protocols employed. 1342 Thus, although SRTP is given as an example above, other possible 1343 choices exist. 1345 VC-1 bit streams can carry user-data, such as closed captioning 1346 information and content meta-data. The VC-1 specification does not 1347 define how to interpret user-data. Identifiers for user-data are 1348 required to be registered with SMPTE. It is conceivable for types of 1349 user-data to be defined to include programmatic content, such as 1350 scripts or commands that would be executed by the receiver. 1351 Depending on the type of user-data, it might be possible for a sender 1352 to generate user-data in a non-compliant manner to crash the receiver 1353 or make it temporarily unavailable. Senders that transport VC-1 bit 1354 streams SHOULD ensure that the user-data is compliant with the 1355 specification registered with SMPTE (see Annex F of [1].) Receivers 1356 SHOULD prevent malfunction in case of non-compliant user-data. 1358 8. IANA Considerations 1360 IANA is requested to register the media type "video/vc1" and the 1361 associated RTP payload format, as specified in section 6.1 of this 1362 document, in the Media Types registry and in the RTP Payload Format 1363 MIME types registry. 1365 9. References 1367 9.1 Normative references 1369 [1] Society of Motion Picture and Television Engineers, "VC-1 1370 Compressed Video Bitstream Format and Decoding Process", SMPTE 1371 421M. 1372 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1373 Levels", BCP 14, RFC 2119, March 1997. 1374 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 1375 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 1376 RFC 3550, July 2003. 1377 [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", 1378 RFC 2327, April 1998. 1379 [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1380 Session Description Protocol (SDP)", RFC 3264, June 2002. 1381 [6] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data 1382 Encodings", RFC 3548, July 2003. 1383 [7] Freed, N. and Klensin, J., "Media Type Specifications and 1384 Registration Procedures", BCP 13, RFC 4288, December 2005. 1385 [8] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload 1386 Formats", RFC 3555, July 2003. 1388 9.2 Informative references 1390 [9] Srinivasan, S., Hsu, P., Holcomb, T., Mukerjee, K., Regunathan, 1391 S.L., Lin, B., Liang, J., Lee, M., and J. Ribas-Corbera, "Windows 1392 Media Video 9: overview and applications", Signal Processing: 1393 Image Communication, Volume 19, Issue 9, October 2004. 1394 [10]Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A 1395 generalized hypothetical reference decoder for H.264/AVC", IEEE 1396 Transactions on Circuits and Systems for Video Technology, August 1397 2003. 1398 [11]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1399 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 1400 3711, March 2004. 1401 [12]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 1402 Protocol (RTSP)", RFC 2326, April 1998. 1403 [13]Handley, M., Perkins, C., and E. Whelan, "Session Announcement 1404 Protocol", RFC 2974, October 2000. 1406 Author's Addresses 1408 Anders Klemets 1409 Microsoft Corp. 1410 1 Microsoft Way 1411 Redmond, WA 98052 1412 USA 1413 Email: anderskl@microsoft.com 1415 Acknowledgements 1417 Thanks to Shankar Regunathan, Gary Sullivan, Regis Crinon, Magnus 1418 Westerlund and Colin Perkins for providing detailed feedback on this 1419 document. 1421 IPR Notices 1423 The IETF takes no position regarding the validity or scope of any 1424 Intellectual Property Rights or other rights that might be claimed to 1425 pertain to the implementation or use of the technology described in 1426 this document or the extent to which any license under such rights 1427 might or might not be available; nor does it represent that it has 1428 made any independent effort to identify any such rights. Information 1429 on the procedures with respect to rights in RFC documents can be 1430 found in BCP 78 and BCP 79. 1432 Copies of IPR disclosures made to the IETF Secretariat and any 1433 assurances of licenses to be made available, or the result of an 1434 attempt made to obtain a general license or permission for the use of 1435 such proprietary rights by implementers or users of this 1436 specification can be obtained from the IETF on-line IPR repository at 1437 http://www.ietf.org/ipr. 1439 The IETF invites any interested party to bring to its attention any 1440 copyrights, patents or patent applications, or other proprietary 1441 rights that may cover technology that may be required to implement 1442 this standard. Please address the information to the IETF at 1443 ietf-ipr@ietf.org. 1445 Full Copyright Statement 1447 Copyright (C) The Internet Society (2005). 1449 This document is subject to the rights, licenses and restrictions 1450 contained in BCP 78, and except as set forth therein, the authors 1451 retain all their rights. 1453 This document and the information contained herein are provided on an 1454 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1455 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1456 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1457 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1458 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1459 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.