idnits 2.17.1 draft-ietf-avt-rtp-vc1-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1432. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1403. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1410. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1416. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: This parameter MUST not be used with Simple and Main profiles. (For Main profile, the presence of B-pictures is indicated by the MAXBFRAMES field in STRUCT_C decoder initialization parameter.) For Advanced profile, if this parameter is not specified, a value of 1 MUST be assumed. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2005) is 6737 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '11' is mentioned on line 805, but not defined == Missing Reference: '12' is mentioned on line 1286, but not defined == Missing Reference: '13' is mentioned on line 1286, but not defined == Missing Reference: '10' is mentioned on line 1323, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 3548 (ref. '6') (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 3555 (ref. '7') (Obsoleted by RFC 4855, RFC 4856) Summary: 6 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force 3 Internet Draft A. Klemets 4 Document: draft-ietf-avt-rtp-vc1-02.txt Microsoft 5 Expires: May 2006 November 2005 7 RTP Payload Format for Video Codec 1 (VC-1) 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Copyright Notice 34 Copyright (C) The Internet Society (2005). 36 Abstract 38 This memo specifies an RTP payload format for encapsulating Video 39 Codec 1 (VC-1) compressed bit streams, as defined by the Society of 40 Motion Picture and Television Engineers (SMPTE) standard, SMPTE 421M. 41 SMPTE is the main standardizing body in the motion imaging industry 42 and the SMPTE 421M standard defines a compressed video bit stream 43 format and decoding process for television. 45 Table of Contents 47 1. Introduction...................................................2 48 1.1 Conventions used in this document..........................3 49 2. Definitions and abbreviations..................................3 50 3. Overview of VC-1...............................................5 51 3.1 VC-1 bit stream layering model.............................5 52 3.2 Bit-stream Data Units in Advanced profile..................6 53 3.3 Decoder initialization parameters..........................6 54 3.4 Ordering of frames.........................................7 55 4. Encapsulation of VC-1 format bit streams in RTP................8 56 4.1 Access Units...............................................8 57 4.2 Fragmentation of VC-1 frames...............................9 58 4.3 Time stamp considerations.................................10 59 4.4 Random Access Points......................................11 60 4.5 Removal of HRD parameters.................................11 61 4.6 Repeating the Sequence Layer header.......................12 62 4.7 Signaling of MIME format parameters.......................12 63 4.8 MIME "mode=1" parameter...................................13 64 4.9 MIME "mode=3" parameter...................................13 65 5. RTP Payload Format syntax.....................................14 66 5.1 RTP header usage..........................................14 67 5.2 AU header syntax..........................................15 68 5.3 AU Control field syntax...................................16 69 6. RTP Payload format parameters.................................17 70 6.1 Media Type Registration...................................17 71 6.2 Mapping of MIME parameters to SDP.........................24 72 6.3 Usage with the SDP Offer/Answer Model.....................25 73 6.4 Usage in Declarative Session Descriptions.................27 74 7. Security Considerations.......................................28 75 8. IANA Considerations...........................................28 76 9. References....................................................28 77 9.1 Normative references......................................28 78 9.2 Informative references....................................29 80 1. Introduction 82 This memo specifies an RTP payload format for the video coding 83 standard Video Codec 1, also known as VC-1. The specification for 84 the VC-1 bit stream format and decoding process is published by the 85 Society of Motion Picture and Television Engineers (SMPTE) as SMPTE 86 421M [1]. 88 VC-1 has a broad applicability, being suitable for low bit rate 89 Internet streaming applications to HDTV broadcast and Digital Cinema 90 applications with nearly lossless coding. The overall performance of 91 VC-1 is such that bit rate savings of more than 50% are reported [8], 92 when compared against MPEG-2. See [8] for further details about how 93 VC-1 compares against other codecs, such as MPEG-4 and H.264/AVC. 94 (In [8], VC-1 is referred to by its earlier name, VC-9.) 96 VC-1 is widely used for downloading and streaming of movies on the 97 Internet, in the form of Windows Media Video 9 (WMV-9) [8], because 98 the WMV-9 codec is compliant with the VC-1 standard. VC-1 has also 99 recently been adopted as a mandatory compression format for the high- 100 definition DVD formats HD DVD and Blu-ray. 102 SMPTE 421M defines the VC-1 bit stream syntax and specifies 103 constraints that must be met by VC-1 conformant bit streams. SMPTE 104 421M also specifies the complete process required to decode the bit 105 stream. However, it does not specify the VC-1 compression algorithm, 106 thus allowing for different ways to implement a VC-1 encoder. 108 The VC-1 bit stream syntax has three profiles. Each profile has 109 specific bit stream syntax elements and algorithms associated with 110 it. Depending on the application in which VC-1 is used, some 111 profiles may be more suitable than others. For example, Simple 112 profile is designed for low bit rate Internet streaming and for 113 playback on devices that can only handle low complexity decoding. 114 Advanced profile is designed for broadcast applications, such as 115 digital TV, HD DVD or HDTV. Advanced profile is the only VC-1 116 profile that supports interlaced video frames and non-square pixels. 118 Section 2 defines the abbreviations used in this document. Section 3 119 provides a more detailed overview of VC-1. Sections 4 and 5 define 120 the RTP payload format for VC-1, and section 6 defines the MIME and 121 SDP parameters for VC-1. See section 7 for security considerations. 123 1.1 Conventions used in this document 125 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 126 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 127 document are to be interpreted as described in BCP 14, RFC 2119 [2]. 129 2. 130 Definitions and abbreviations 132 This document uses the definitions in SMPTE 421M [1]. For 133 convenience, the following terms from SMPTE 421M are restated here: 135 B-picture: A picture that is coded using motion compensated 136 prediction from past and/or future reference fields or frames. A B- 137 picture cannot be used for predicting any other picture. 139 Bit-stream data unit (BDU): A unit of the compressed data which may 140 be parsed (i.e., syntax decoded) independently of other information 141 at the same hierarchical level. A BDU can be, for example, a 142 sequence layer header, an entry-point header, a frame, or a slice. 144 Encapsulated BDU (EBDU): A BDU which has been encapsulated using the 145 encapsulation mechanism described in Annex E of SMPTE 421M [1], to 146 prevent emulation of the start code prefix in the bit stream. 148 Entry-point: A point in the bit stream that offers random access. 150 frame: A frame contains lines of spatial information of a video 151 signal. For progressive video, these lines contain samples starting 152 from one time instant and continuing through successive lines to the 153 bottom of the frame. For interlaced video, a frame consists of two 154 fields, a top field and a bottom field. One of these fields will 155 commence one field period later than the other. 157 interlace: The property of frames where alternating lines of the 158 frame represent different instances in time. In an interlaced frame, 159 one of the fields is meant to be displayed first. 161 I-picture: A picture coded using information only from itself. 163 level: A defined set of constraints on the values which may be taken 164 by the parameters (such as bit rate and buffer size) within a 165 particular profile. A profile may contain one or more levels. 167 P-picture: A picture that is coded using motion compensated 168 prediction from past reference fields or frames. 170 picture: For progressive video, a picture is identical to a frame, 171 while for interlaced video, a picture may refer to a frame, or the 172 top field or the bottom field of the frame depending on the context. 174 profile: A defined subset of the syntax of VC-1, with a specific set 175 of coding tools, algorithms, and syntax associated with it. There 176 are three VC-1 profiles: Simple, Main and Advanced. 178 progressive: The property of frames where all the samples of the 179 frame represent the same instance in time. 181 random access: A random access point in the bit stream is defined by 182 the following guarantee: If decoding begins at this point, all frames 183 needed for display after this point will have no decoding dependency 184 on any data preceding this point, and are also present in the 185 decoding sequence after this point. A random access point is also 186 called an entry-point. 188 sequence: A coded representation of a series of one or more pictures. 189 In VC-1 Advanced profile, a sequence consists of a series of one or 190 more entry-point segments, where each entry-point segment consists of 191 a series of one or more pictures, and where the first picture in each 192 entry-point segment provides random access. In VC-1 Simple and Main 193 profiles, the first picture in each sequence is an I-picture. 195 slice: A consecutive series of macroblock rows in a picture, which 196 are encoded as a single unit. 198 start codes (SC): 32-bit codes embedded in that coded bit stream that 199 are unique, and identify the beginning of a BDU. Start codes consist 200 of a unique three-byte Start Code Prefix (SCP), and a one-byte Start 201 Code Suffix (SCS). 203 3. Overview of VC-1 205 The VC-1 bit stream syntax consists of three profiles: Simple, Main, 206 and Advanced. Simple and Main profiles are designed for relatively 207 low bit rate applications. For example, the maximum bit rate 208 supported by Simple profile is 384 kbps. Certain features that can 209 be used to achieve high compression efficiency, such as non-square 210 pixels and support for interlaced pictures, are only included in 211 Advanced profile. 213 The maximum bit rate supported by the Advanced profile is 135 Mbps, 214 making it suitable for nearly lossless encoding of HDTV signals. 215 Only Advanced profile supports carrying user-data (meta-data) in-band 216 with the compressed bit stream. The user-data can be used for closed 217 captioning support, for example. 219 Of the three profiles, only Advanced profile allows codec 220 configuration parameters, such as the picture aspect ratio, to be 221 changed through in-band signaling in the compressed bit stream. 223 For each of the profiles, a certain number of "levels" have been 224 defined. Unlike a "profile", which implies a certain set of features 225 or syntax elements, a "level" is a set of constraints on the values 226 of parameters in a profile, such as the bit rate or buffer size. VC- 227 1 Simple profile has two levels, Main profile has three, and Advanced 228 profile has five levels. See Annex D of SMPTE 421M [1] for a 229 detailed list of the profiles and levels. 231 3.1 VC-1 bit stream layering model 233 The VC-1 bit stream is defined as a hierarchy of layers. This is 234 conceptually similar to the notion of a protocol stack of networking 235 protocols. The outermost layer is called the sequence layer. The 236 other layers are entry-point, picture, slice, macroblock and block. 238 In Simple and Main profiles, a sequence in the sequence layer 239 consists of a series of one or more coded pictures. In Advanced 240 profile, a sequence consists of one or more entry-point segments, 241 where each entry-point segment consists of a series of one or more 242 pictures, and where the first picture in each entry-point segment 243 provides random access. A picture is decomposed into macroblocks. A 244 slice comprises one or more contiguous rows of macroblocks. 246 The entry-point and slice layers are only present in Advanced 247 profile. In Advanced profile, the start of each entry-point layer 248 segment indicates a random access point. In Simple and Main profiles 249 each I-picture is a random access point. 251 Each picture can be coded as an I-picture, P-picture, skipped 252 picture, or as a B-picture. These terms are defined in section 2 of 253 this document and in section 4.12 of SMPTE 421M [1]. 255 3.2 Bit-stream Data Units in Advanced profile 257 In Advanced profile only, each picture and slice is byte-aligned and 258 is considered a Bit-stream Data Unit (BDU). A BDU is defined as a 259 unit that can be parsed (i.e., syntax decoded) independently of other 260 information in the same layer. 262 The beginning of a BDU is signaled by an identifier called Start Code 263 (SC). Sequence layer headers and entry-point headers are also BDUs 264 and thus can be easily identified by their Start Codes. See Annex E 265 of SMPTE 421M [1] for a complete list of Start Codes. Note that 266 blocks and macroblocks are not BDUs and thus do not have a Start Code 267 and are not necessarily byte-aligned. 269 The Start Code consists of four bytes. The first three bytes are 270 0x00, 0x00 and 0x01. The fourth byte is called the Start Code Suffix 271 (SCS) and it is used to indicate the type of BDU that follows the 272 Start Code. For example, the SCS of a sequence layer header (0x0F) 273 is different from the SCS of an entry-point header (0x0E). The Start 274 Code is always byte-aligned and is transmitted in network byte order. 276 To prevent accidental emulation of the Start Code in the coded bit 277 stream, SMPTE 421M defines an encapsulation mechanism that uses byte 278 stuffing. A BDU which has been encapsulated by this mechanism is 279 referred to as an Encapsulated BDU, or EBDU. 281 3.3 Decoder initialization parameters 283 In VC-1 Advanced profile, the sequence layer header contains 284 parameters that are necessary to initialize the VC-1 decoder. 286 A sequence layer header is not defined for VC-1 Simple and Main 287 profiles. For these profiles, decoder initialization parameters MUST 288 be conveyed out-of-band from the coded bit stream. Section 4.7 289 specifies how the parameters are conveyed by this RTP payload format. 291 For Advanced profile, the parameters in the sequence layer header 292 apply to all entry-point segments until the next occurrence of a 293 sequence layer header in the coded bit stream. 295 The parameters in the sequence layer header include the Advanced 296 profile level, the dimensions of the coded pictures, the aspect 297 ratio, interlace information, the frame rate and up to 31 leaky 298 bucket parameter sets for the Hypothetical Reference Decoder (HRD). 300 Section 6.1 of SMPTE 421M [1] provides the formal specification of 301 the sequence layer header. 303 Each leaky bucket parameter set for the HRD specifies a peak 304 transmission bit rate and a decoder buffer capacity. The coded bit 305 stream is restricted by these parameters. The HRD model does not 306 mandate buffering by the decoder. Its purpose is to limit the 307 encoder's bit rate fluctuations according to a basic buffering model, 308 so that the resources necessary to decode the bit stream are 309 predictable. The HRD has a constant-delay mode and a variable-delay 310 mode. The constant-delay mode is appropriate for broadcast and 311 streaming applications, while the variable-delay mode is designed for 312 video conferencing applications. 314 Annex C of SMPTE 421M [1] specifies the usage of the hypothetical 315 reference decoder for VC-1 bit streams. A general description of the 316 theory of the HRD can be found in [9]. 318 The concept of an entry-point layer applies only to VC-1 Advanced 319 profile. The presence of an entry-point header indicates a random 320 access point within the bit stream. The entry-point header specifies 321 current buffer fullness values for the leaky buckets in the HRD. The 322 header also specifies coding control parameters that are in effect 323 until the occurrence of the next entry-point header in the bit 324 stream. See Section 6.2 of SMPTE 421M [1] for the formal 325 specification of the entry-point header. 327 3.4 Ordering of frames 329 Frames are transmitted in the same order in which they are captured, 330 except if B-pictures are present in the coded bit stream. In the 331 latter case, the frames are transmitted such that the frames that the 332 B-pictures depend on are transmitted first. This is referred to as 333 the coded order of the frames. 335 The rules for how a decoder converts frames from the coded order to 336 the display order are stated in section 5.4 of SMPTE 421M [1]. In 337 short, if B-pictures may be present in the coded bit stream, a 338 hypothetical decoder implementation needs to buffer one additional 339 decoded frame. When an I-frame or a P-frame is received, the frame 340 can be decoded immediately but it is not displayed until the next I- 341 or P-frame is received. However, B-frames are displayed immediately. 343 Figure 1 illustrates the timing relationship between the capture of 344 frames, their coded order, and the display order of the decoded 345 frames, when B-pictures are present in the coded bit stream. The 346 figure shows that the display of frame P4 is delayed until frame P7 347 is received, while frames B2 and B3 are displayed immediately. 349 Capture: |I0 P1 B2 B3 P4 B5 B6 P7 B8 B9 ... 350 | 351 Coded order: | I0 P1 P4 B2 B3 P7 B5 B6 ... 352 | 353 Display order: | I0 P1 B2 B3 P4 B5 B6 ... 354 | 355 |+---+---+---+---+---+---+---+---+---+--> time 356 0 1 2 3 4 5 6 7 8 9 358 Figure 1. Frame reordering when B-pictures are present. 360 If B-pictures are not present, the coded order and the display order 361 are identical, and frames can then be displayed without additional 362 delay shown in Figure 1. 364 4. Encapsulation of VC-1 format bit streams in RTP 366 4.1 Access Units 368 Each RTP packet contains an integral number of application data units 369 (ADUs). For VC-1 format bit streams, an ADU is equivalent to one 370 Access Unit (AU). An Access Unit is defined as the AU header 371 (defined in section 5.2) followed by a variable length payload, with 372 the rules and constraints described in sections 4.1 and 4.2. Figure 373 2 shows the layout of an RTP packet with multiple AUs. 375 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 376 | RTP | AU(1) | AU(2) | | AU(n) | 377 | Header | | | | | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 380 Figure 2. RTP packet structure. 382 Each Access Unit MUST start with the AU header defined in section 383 5.2. The AU payload MUST contain data belonging to exactly one VC-1 384 frame. This means that data from different VC-1 frames will always 385 be in different AUs, however, it possible for a single VC-1 frame to 386 be fragmented across multiple AUs (see section 4.2.) 388 The following rules apply to the contents of each AU payload when VC- 389 1 Advanced profile is used: 391 - The AU payload MUST contain VC-1 bit stream data in EBDU format 392 (i.e., the bit stream must use the byte-stuffing encapsulation 393 mode defined in Annex E of SMPTE 421M [1].) 395 - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer 396 header, an entry-point header, a picture header and multiple 397 slices and the associated user-data. (However, all slices and 398 their corresponding macroblocks MUST belong to the same video 399 frame.) 401 - The AU payload MUST start at an EBDU boundary, except when the AU 402 payload contains a fragmented frame, in which case the rules in 403 section 4.2 apply. 405 When VC-1 Simple or Main profiles are used, the AU payload MUST start 406 with a picture header, except when the AU payload contains a 407 fragmented frame. Section 4.2 describes how to handle fragmented 408 frames. 410 Access Units MUST be byte-aligned. If the data in an AU (EBDUs in 411 the case of Advanced profile and frame in the case of Simple and 412 Main) does not end at an octet boundary, up to 7 zero-valued padding 413 bits MUST be added to achieve octet-alignment. 415 4.2 Fragmentation of VC-1 frames 417 Each AU payload SHOULD contain a complete VC-1 frame. However, if 418 this would cause the RTP packet to exceed the MTU size, the frame 419 SHOULD be fragmented into multiple AUs to avoid IP-level 420 fragmentation. When an AU contains a fragmented frame, this MUST be 421 indicated by setting the FRAG field in the AU header as defined in 422 section 5.3. 424 AU payloads that do not contain a fragmented frame, or that contain 425 the first fragment of a frame, MUST start at an EBDU boundary if 426 Advanced profile is used. In this case, for Simple and Main 427 profiles, the AU payload MUST begin with the start of a picture 428 header. 430 If Advanced profile is used, AU payloads that contain a fragment of a 431 frame other than the first fragment, SHOULD start at an EBDU 432 boundary, such as at the start of a slice. 434 However, slices are only defined for Advanced profile, and are not 435 always used. Blocks and macroblocks are not BDUs (have no Start 436 Code) and are not byte-aligned. Therefore, it may not always be 437 possible to continue a fragmented frame at an EBDU boundary. 439 In the case of Simple and Main profiles, since the blocks and 440 macroblocks are not byte-aligned, the fragmentation boundary may be 441 chosen arbitrarily. 443 If an RTP packet contains an AU with the last fragment of a frame, 444 additional AUs SHOULD NOT be included in the RTP packet. 446 If the PTS Delta field in the AU header is present, each fragment of 447 a frame MUST have the same presentation time. If the DTS Delta field 448 in the AU header is present, each fragment of a frame MUST have the 449 same decode time. 451 4.3 Time stamp considerations 453 Video frames MUST be transmitted in the coded order. Coded order 454 implies that no frames are dependent on subsequent frames, as 455 discussed in section 3.4. The RTP timestamp field MUST be set to the 456 presentation time of the video frame contained in the first AU in the 457 RTP packet. The presentation time can be used as the timestamp field 458 in the RTP header because it differs from the sampling instant of the 459 frame only by an arbitrary constant offset. 461 Each AU header MAY specify the decode time of video frame contained 462 in the AU. If B-pictures will not be present in the coded bit 463 stream, then the decode time of a frame MUST be equal to the 464 presentation time of the frame. 466 If B-pictures may be present in the coded bit stream, then the decode 467 time of non-B frames MUST be equal to the presentation time of the 468 previous non-B frame in the coded order. The decode time of B-frames 469 MUST be equal to the presentation time of the B-frame. 471 As an example, consider Figure 1 in section 3.4. The decode time of 472 non-B frame P4 is 4 time units, which is equal to the presentation 473 time of the previous non-B frame in the coded order, which is P1. On 474 the other hand, the decode time of B-frame B2 is 5 time units, which 475 is identical to its presentation time. 477 Knowing if the stream will contain B-pictures may help the receiver 478 allocate resources more efficiently and can reduce delay, as an 479 absence of B-pictures in the stream implies that no reordering 480 of frames will be needed between the decoding process and the display 481 of the decoded frames. This may be important for interactive 482 applications. 484 The receiver MUST assume that the coded bit stream may contain B- 485 pictures in the following cases: 487 - Advanced profile: If the value of the "bpic" MIME parameter 488 defined in section 6.1 is 1, or if the "bpic" parameter is not 489 specified. 491 - Main profile: If the MAXBFRAMES field in STRUCT_C decoder 492 initialization parameter has a non-zero value. STRUCT_C is 493 conveyed in the MIME "config" parameter, which is defined in 494 section 6.1. 496 Simple profile does not use B-pictures. 498 4.4 Random Access Points 500 The entry-point header contains information that is needed by the 501 decoder to decode the frames in that entry-point segment. This means 502 that in the event of lost RTP packets the decoder may be unable to 503 decode frames until the next entry-point header is received. 505 The first frame after an entry-point header is a random access points 506 into the coded bit stream. Simple and Main profiles do not have 507 entry-point headers, so for those profiles each I-picture is a random 508 access point. 510 To allow the RTP receiver to detect that an RTP packet which was lost 511 contained a random access point, this RTP payload format defines a 512 field called "RA Count". This field is present in every AU, and its 513 value is incremented (modulo 256) for every random access point. For 514 additional details, see the definition of "RA Count" in section 5.2. 516 To make it easy to determine if a AU contains a random access point, 517 this RTP payload format also defines a bit called the "RA" flag in 518 the AU Control field. This bit is set to 1 only on those AU's that 519 contain a random access point. The RA bit is defined in section 5.3. 521 4.5 Removal of HRD parameters 523 The sequence layer header of Advanced profile may include up to 31 524 leaky bucket parameter sets for the Hypothetical Reference Decoder 525 (HRD). Each leaky bucket parameter set specifies a possible peak 526 transmission bit rate (HDR_RATE) and a decoder buffer capacity 527 (HRD_BUFFER). (See section 3.3 for additional discussion about the 528 HRD.) 530 If the actual peak transmission rate is known by the RTP sender, the 531 RTP sender MAY remove all leaky bucket parameter sets except for the 532 one corresponding to the actual peak transmission rate. 534 For each leaky bucket parameter set in the sequence layer header, 535 there is also parameter in the entry-point header that specifies the 536 initial fullness (HRD_FULL) of the leaky bucket. 538 If the RTP sender has removed any leaky bucket parameter sets from 539 the sequence layer header, then for any removed leaky bucket 540 parameter set, it MUST also remove the corresponding HRD_FULL 541 parameter in the entry-point header. 543 Removing leaky bucket parameter sets, as described above, may 544 significantly reduce the size of the sequence layer headers and the 545 entry-point headers. 547 4.6 Repeating the Sequence Layer header 549 To improve robustness against loss of RTP packets, it is RECOMMENDED 550 that if the sequence layer header changes, it should be repeated 551 frequently in the bit stream. In this is case, it is RECOMMENDED 552 that the number of leaky bucket parameters in the sequence layer 553 header and the entry point headers be reduced to one, as described in 554 section 4.5. This will help reduce the overhead caused by repeating 555 the sequence layer header. 557 Note that any data in the VC-1 bit stream, including repeated copies 558 of the sequence header itself, must be accounted for when computing 559 the leaky bucket parameter for the HRD. (See section 3.3 for a 560 discussion about the HRD.) 562 Note that if the value of TFCNTRFLAG in the sequence layer header is 563 1, each picture header contains a frame counter field (TFCNTR). Each 564 time the sequence layer header is inserted in the bit stream, the 565 value of this counter MUST be reset. 567 To allow the RTP receiver to detect that an RTP packet which was lost 568 contained a new sequence layer header, the AU Control field defines a 569 bit called the "SL" flag. This bit is toggled when a sequence layer 570 header is transmitted, but only if that header is different from the 571 most recently transmitted sequence layer header. The SL bit is 572 defined in section 5.3. 574 4.7 Signaling of MIME format parameters 576 When this RTP payload format is used with SDP, the decoder 577 initialization parameters described in section 3.3 MUST be signaled 578 in SDP using the MIME parameters specified in section 6.1. Section 579 6.2 specifies how to map the MIME parameters to SDP. 581 When Advanced profile is used, the decoder initialization parameters 582 MAY be changed by inserting a new sequence layer header or an entry- 583 point header in the coded bit stream. 585 When Simple or Main profiles are used, it is not possible to change 586 the decoder initialization parameters through the coded bit stream 587 itself. Any changes to the decoder initialization parameters would 588 have to be done through out-of-band means, e.g., by updating the SDP 589 [5]. 591 Note that the sequence layer header specifies the encoding level, the 592 maximum size of the coded pictures and possibly also the maximum 593 frame rate. Thus, if the sequence layer header changes, the new 594 header supersedes the values of the MIME parameters "level", "width", 595 "height" and "framerate". 597 4.8 MIME "mode=1" parameter 599 In certain applications using Advanced profile, the sequence layer 600 header never changes. This MAY be signaled with the MIME parameter 601 "mode=1". (The "mode" parameter is defined in section 6.1.) The 602 "mode=1" parameter serves as a "hint" to the RTP receiver that all 603 sequence layer headers in the bit stream will be identical. If 604 "mode=1" is signaled and a sequence layer header is present in the 605 coded bit stream, then it MUST be identical to the sequence layer 606 header specified by the MIME "config" parameter. 608 Since the sequence layer header never changes in "mode=1", the RTP 609 sender MAY remove it from the bit stream. Note, however, that if 610 that if the value of TFCNTRFLAG in the sequence layer header is 1, 611 each picture header contains a frame counter field (TFCNTR). This 612 field is reset each time the sequence layer header occurs in the bit 613 stream. If the RTP sender chooses to remove the sequence layer 614 header, then it MUST ensure that the resulting bit stream is still 615 compliant with the VC-1 specification (e.g., by adjusting the TFCNTR 616 field, if necessary.) 618 4.9 MIME "mode=3" parameter 620 In certain applications using Advanced profile, both the sequence 621 layer header and the entry-point header never change. This MAY be 622 signaled with the MIME parameter "mode=3". The same rules apply to 623 "mode=3" as for "mode=1", described in section 4.8. Additionally, if 624 "mode=3" is signaled, then the RTP sender MAY "compress" the coded 625 bit stream by not including sequence layer headers and entry-point 626 headers in the RTP packets. 628 The RTP receiver MUST "decompress" the coded bit stream by re- 629 inserting the entry-point headers prior to delivering the coded bit 630 stream to the VC-1 decoder. The sequence layer header does not need 631 to be decompressed by the receiver, since it never changes. 633 If "mode=3" is signaled and the RTP receiver receives a complete AU 634 or the first fragment of an AU, and the RA bit is set to 1 but the AU 635 does not begin with an entry-point header, then this indicates that 636 entry-point header has been "compressed". In that case, the RTP 637 receiver MUST insert an entry-point header at the beginning of the 638 AU. When inserting the entry-point header, the RTP receiver MUST use 639 the one that was specified by the MIME "config" parameter. 641 5. RTP Payload Format syntax 643 5.1 RTP header usage 645 The format of the RTP header is specified in RFC 3550 [3] and is 646 reprinted in Figure 3 for convenience. 648 0 1 2 3 649 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 651 |V=2|P|X| CC |M| PT | sequence number | 652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 653 | timestamp | 654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 655 | synchronization source (SSRC) identifier | 656 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 657 | contributing source (CSRC) identifiers | 658 | .... | 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 661 Figure 3. RTP header according to RFC 3550 663 The fields of the fixed RTP header have their usual meaning, which is 664 defined in RFC 3550 and by the RTP profile in use, with the following 665 additional notes: 667 Marker bit (M): 1 bit 668 This bit is set to 1 if the RTP packet contains an Access 669 Unit containing a complete VC-1 frame, or the last fragment 670 of a VC-1 frame. 672 Payload type (PT): 7 bits 673 This document does not assign an RTP payload type for this 674 RTP payload format. The assignment of a payload type has to 675 be performed either through the RTP profile used or in a 676 dynamic way. 678 Sequence Number: 16 bits 679 The RTP receiver can use the sequence number field to recover 680 the coded order of the VC-1 frames. (A typical VC-1 decoder 681 will require the VC-1 frames to be delivered in coded order.) 682 When VC-1 frames have been fragmented across RTP packets, the 683 RTP receiver can use the sequence number field to ensure that 684 no fragment is missing. 686 Timestamp: 32 bits 687 The RTP timestamp is set to the presentation time of the VC-1 688 frame in the first Access Unit. 689 A clock rate of 90 kHz, or higher, MUST be used. 691 5.2 AU header syntax 693 The Access Unit header consists of a one-byte AU Control field, the 694 RA Count field and 3 optional fields. All fields MUST be written in 695 network byte order. The structure of the AU header is illustrated in 696 Figure 4. 698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 |AU | RA | AUP | PTS | DTS | 700 |Control| Count | Len | Delta | Delta | 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 703 Figure 4. Structure of AU header. 705 AU Control: 8 bits 706 The usage of the AU Control field is defined in section 5.3. 708 RA Count: 8 bits 709 Random Access Point Counter. This field is a binary modulo 710 256 counter. The value of this field, MUST be incremented by 711 1, each time an AU is transmitted where the RA bit in the AU 712 Control field is set to 1. The initial value of this field 713 is undefined and MAY be chosen randomly. 715 AUP Len: 16 bits 716 Access Unit Payload Length. Specifies the size, in bytes, of 717 the payload of the Access Unit. The field does not include 718 the size of the AU header itself. The field MUST be included 719 in each AU header in an RTP packet, except for the last AU 720 header in the packet. 722 PTS Delta: 32 bits 723 Presentation time delta. Specifies the presentation time of 724 the frame as a 2's complement offset (delta) from the 725 timestamp field in the RTP header of this RTP packet. The 726 PTS Delta field MUST use the same clock rate as the timestamp 727 field in the RTP header. 728 This field SHOULD NOT be included in the first AU header in 729 the RTP packet, because the RTP timestamp field specifies the 730 presentation time of the frame in the first AU. 732 DTS Delta: 32 bits 733 Decode time delta. Specifies the decode time of the frame as 734 a 2's complement offset (delta) between the presentation time 735 and the decode time. Note that if the presentation time is 736 larger than the decode time, this results in a value for the 737 DTS Delta field that is greater than zero. The DTS Delta 738 field MUST use the same clock rate as the timestamp field in 739 the RTP header. 741 5.3 AU Control field syntax 743 The structure of the 8-bit AU Control field is shown in Figure 5. 745 0 1 2 3 4 5 6 7 746 +----+----+----+----+----+----+----+----+ 747 | FRAG | RA | SL | LP | PT | DT | R | 748 +----+----+----+----+----+----+----+----+ 750 Figure 5. Syntax of AU Control field. 752 FRAG: 2 bits 753 Fragmentation Information. This field indicates if the AU 754 payload contains a complete frame or a fragment of a frame. 755 It MUST be set as follows: 756 0: The AU payload contains a fragment of a frame other than 757 the first or last fragment. 758 1: The AU payload contains the first fragment of a frame. 759 2: The AU payload contains the last fragment of a frame. 760 3: The AU payload contains a complete frame (not fragmented.) 762 RA: 1 bit 763 Random Access Point indicator. This bit MUST be set to 1 if 764 the AU contains a frame that is a random access point. In 765 the case of Simple and Main profiles, any I-picture is a 766 random access point. 767 In the case of Advanced profile, the first frame after an 768 entry-point header is a random access point. 769 Note that if entry-point headers are not transmitted at every 770 random access point, this MUST be indicated using the MIME 771 parameter "mode=3". 773 SL: 1 bit 774 Sequence Layer Counter. This bit MUST be toggled, i.e., 775 changed from 0 to 1 or from 1 to 0, if the AU contains a 776 sequence layer header and if it is different from the most 777 recently transmitted sequence layer header. Otherwise, the 778 value of this bit must be identical to the value of the SL 779 bit in the previous AU. 780 The initial value of this bit is undefined and MAY be chosen 781 randomly. 782 The bit MUST be 0 for Simple and Main profile bit streams or 783 if the sequence layer header never changes. 785 LP: 1 bit 786 Length Present. This bit MUST be set to 1 if the AU header 787 includes the AUP Len field. 789 PT: 1 bit 790 PTS Delta Present. This bit MUST be set to 1 if the AU 791 header includes the PTS Delta field. 793 DT: 1 bit 794 DTS Delta Present. This bit MUST be set to 1 if the AU 795 header includes the DTS Delta field. 797 R: 1 bit 798 Reserved. This bit MUST be set to 0 and MUST be ignored by 799 receivers. 801 6. RTP Payload format parameters 803 6.1 Media Type Registration 805 This registration uses the template defined in [11] and follows RFC 806 3555 [7]. 808 Type name: video 810 Subtype name: vc1 812 Required parameters: 814 profile: 815 The value is an integer identifying the VC-1 profile. The 816 following values are defined: 817 0: Simple profile. 818 1: Main profile. 819 3: Advanced profile. 821 If the profile parameter is used to indicate properties of a 822 coded bit stream, it indicates the VC-1 encoding profile that 823 a decoder has to support in order to comply with [1] when it 824 decodes the bit stream. 826 If the profile parameter is used for capability exchange or 827 in a session setup procedure, it indicates the VC-1 profile 828 that codec supports. 830 level: 831 The value is an integer specifying the level of the VC-1 832 profile. 833 For Advanced profile, valid values are 0 to 4, which 834 correspond to levels L0 to L4, respectively. For Simple and 835 Main profiles, the following values are defined: 836 1: Low Level 837 2: Medium Level 838 3: High Level (only valid for Main profile) 840 If the level parameter is used to indicate properties of a 841 coded bit stream, it indicates the level of the VC-1 profile 842 that a decoder has to support in order to comply with [1] 843 when it decodes the bit stream. Note that when Advanced 844 profile is used, this parameter may only apply while the 845 sequence layer header specified in the config parameter is in 846 use. 848 If the level parameter is used for capability exchange or in 849 a session setup procedure, it indicates the highest level of 850 the VC-1 profile that codec supports. See section 6.3 for 851 specific rules for how this parameter is used with the SDP 852 Offer/Answer model. 854 Optional parameters: 856 config: 857 The value is a base16 [6] (hexadecimal) representation of an 858 octet string that expresses the decoder initialization 859 parameters. Decoder initialization parameters are mapped 860 onto the base16 octet string in an MSB-first basis. The 861 first bit of the decoder initialization parameters MUST be 862 located at the MSB of the first octet. If the decoder 863 initialization parameters are not multiple of 8 bits, in the 864 last octet up to 7 zero-valued padding bits MUST be added to 865 achieve octet alignment. 867 For Simple and Main profiles, the decoder initialization 868 parameters are STRUCT_C, as defined in Annex J of SMPTE 421M 869 [1]. 871 For Advanced profile, the decoder initialization parameters 872 are a sequence layer header directly followed by an entry- 873 point header. The two headers MUST be in EBDU format, 874 meaning that they must include their Start Codes and must use 875 the encapsulation method defined in Annex E of SMPTE 421M 876 [1]. 878 This parameter MUST NOT be used to indicate codec 879 capabilities in any capability exchange procedure. 881 width: 882 The value is an integer greater than zero, specifying the 883 maximum horizontal size of the coded picture, in pixels. 885 If this parameter is not specified, it defaults to the 886 maximum horizontal size allowed by the profile and level. 888 Note: When Advanced profile is used, this parameter only 889 applies while the sequence layer header specified in the 890 config parameter is in use. 892 height: 893 The value is an integer greater than zero, specifying the 894 maximum vertical size of the coded picture in pixels. 896 If this parameter is not specified, it defaults to the 897 maximum vertical size allowed by the profile and level. 899 Note: When Advanced profile is used, this parameter only 900 applies while the sequence layer header specified in the 901 config parameter is in use. 903 bitrate: 904 The value is an integer greater than zero, specifying the 905 peak transmission rate of the coded bit stream in bits per 906 second. The number does not include the overhead caused by 907 RTP encapsulation, i.e., it does not include the AU headers, 908 or any of the RTP, UDP or IP headers. 910 If this parameter is not specified, it defaults to the 911 maximum bit rate allowed by the profile and level. (See the 912 values for "RMax" in Annex D of SMPTE 421M [1].) 914 Note: When Advanced profile is used, this parameter only 915 applies while the sequence layer header specified in the 916 config parameter is in use. 918 buffer: 919 The value is an integer specifying the leaky bucket size, B, 920 in milliseconds, required to contain a stream transmitted at 921 the transmission rate specified by the bitrate parameter. 922 This parameter is defined in the hypothetical reference 923 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 925 Note that this parameter relates to the codec bit stream 926 only, and does not account for any buffering time that may be 927 required to compensate for jitter in the network. 929 If this parameter is not specified, it defaults to the 930 maximum buffer size allowed by the profile and level. (See 931 the values for "BMax" and "RMax" in Annex D of SMPTE 421M 932 [1].) 934 Note: When Advanced profile is used, this parameter only 935 applies while the sequence layer header specified in the 936 config parameter is in use. 938 framerate: 939 The value is an integer greater than zero, specifying the 940 maximum number of frames per second in the coded bit stream, 941 multiplied by 1000 and rounded to the nearest integer value. 942 For example, 30000/1001 (approximately 29.97) frames per 943 second is represented as 29970. 945 If the parameter is not specified, it defaults to the maximum 946 frame rate allowed by the profile and level. 948 Note: When Advanced profile is used, this parameter only 949 applies while the sequence layer header specified in the 950 config parameter is in use. 952 bpic: 953 This parameter signals if B-pictures may be present when 954 Advanced profile is used. If this parameter is present, and 955 B-pictures may be present in the coded bit stream, this 956 parameter MUST be equal to 1. 957 If B-pictures will never be present in the coded bit stream, 958 even if the sequence layer header changes, this parameter 959 SHOULD be present and its value SHOULD be equal to 0. 961 This parameter MUST not be used with Simple and Main 962 profiles. (For Main profile, the presence of B-pictures is 963 indicated by the MAXBFRAMES field in STRUCT_C decoder 964 initialization parameter.) 965 For Advanced profile, if this parameter is not specified, a 966 value of 1 MUST be assumed. 968 mode: 969 The value is an integer specifying the use of the sequence 970 layer header and the entry-point header. This parameter is 971 only defined for Advanced profile. The following values are 972 defined: 973 0: Both the sequence layer header and the entry-point header 974 may change, and changed headers will be included in the RTP 975 packets. 976 1: The sequence layer header specified in the config 977 parameter never changes. 978 3: The sequence layer header and the entry-point header 979 specified in the config parameter never change. Entry-point 980 headers MAY not be included in the Access Units. Each Access 981 Unit that has the RA bit set to 1 contains a random access 982 point even if an entry-point header is not included in the 983 Access Unit. If an entry-point header is not included at a 984 random access point, then the RTP receiver MUST insert the 985 entry-point header into the VC-1 bit stream prior to 986 delivering the bit stream to the VC-1 decoder. 988 If the mode parameter is not specified, a value of 0 MUST be 989 assumed. The mode parameter SHOULD be specified if modes 1 990 or 3 apply to the VC-1 bit stream. 992 max-width, max-height, max-bitrate, max-buffer, max-framerate: 993 These parameters are defined for use in a capability exchange 994 procedure. The parameters do not specify properties of the 995 coded bit stream, but rather upper limits or preferred values 996 for the "width", "height", "bitrate", "buffer" and 997 "framerate" parameters. Section 6.3 provides specific rules 998 for these parameters are used with the SDP Offer/Answer 999 model. 1001 Any of the max-width, max-height, max-bitrate, max-buffer and 1002 max-framerate parameters MAY be used to indicate capabilities 1003 that exceed the required capabilities of the signaled profile 1004 and level. In that case, the parameter MUST be interpreted 1005 as the maximum value that can be supported for that 1006 capability. 1008 If any of the parameters specifies a capability that is less 1009 than the required capabilities of the signaled profile and 1010 level, then the parameter SHOULD be interpreted as a 1011 preferred value for that capability. 1013 When more than one parameter from the set (max-width, max- 1014 height, max-bitrate, max-buffer and max-framerate) is 1015 present, all signaled capabilities MUST be supported 1016 simultaneously. 1018 A sender or receiver MUST NOT use these parameters to 1019 indicate capabilities that meet the requirements of a higher 1020 level of the VC-1 profile than the one specified in the 1021 "level" parameter, if the sender or receiver can support all 1022 the properties of the higher level, except if specifying a 1023 higher level is not allowed due to other restrictions. (As 1024 an example of such a restriction, in the SDP Offer/Answer 1025 model, the value of the level parameter that can be used in 1026 an Answer is limited by what was specified in the Offer.) 1028 max-width: 1029 The value is an integer greater than zero, specifying a 1030 horizontal size for the coded picture, in pixels. If the 1031 value is less than the maximum horizontal size allowed by the 1032 profile and level, then the value specifies the preferred 1033 horizontal size. Otherwise, it specifies the maximum 1034 horizontal size that is supported. 1036 If this parameter is not specified, it defaults to the 1037 maximum horizontal size allowed by the profile and level. 1039 max-height: 1040 The value is an integer greater than zero, specifying a 1041 vertical size for the coded picture, in pixels. If the value 1042 is less than the maximum vertical size allowed by the profile 1043 and level, then the value specifies the preferred vertical 1044 size. Otherwise, it specifies the maximum vertical size that 1045 is supported. 1047 If this parameter is not specified, it defaults to the 1048 maximum vertical size allowed by the profile and level. 1050 max-bitrate: 1051 The value is an integer greater than zero, specifying a peak 1052 transmission rate for the coded bit stream in bits per 1053 second. The number does not include the overhead caused by 1054 RTP encapsulation, i.e., it does not include the AU headers, 1055 or any of the RTP, UDP or IP headers. 1057 If the value is less than the maximum bit rate allowed by the 1058 profile and level, then the value specifies the preferred bit 1059 rate. Otherwise, it specifies the maximum bit rate that is 1060 supported. 1062 If this parameter is not specified, it defaults to the 1063 maximum bit rate allowed by the profile and level. (See the 1064 values for "RMax" in Annex D of SMPTE 421M [1].) 1066 max-buffer: 1067 The value is an integer specifying a leaky bucket size, B, in 1068 milliseconds, required to contain a stream transmitted at the 1069 transmission rate specified by the max-bitrate parameter. 1070 This parameter is defined in the hypothetical reference 1071 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 1073 Note that this parameter relates to the codec bit stream 1074 only, and does not account for any buffering time that may be 1075 required to compensate for jitter in the network. 1077 If the value is less than the maximum leaky bucket size 1078 allowed by the max-bitrate parameter and the profile and 1079 level, then the value specifies the preferred leaky bucket 1080 size. Otherwise, it specifies the maximum leaky bucket size 1081 that is supported for the bit rate specified by the max- 1082 bitrate parameter. 1084 If this parameter is not specified, it defaults to the 1085 maximum buffer size allowed by the profile and level. (See 1086 the values for "BMax" and "RMax" in Annex D of SMPTE 421M 1087 [1].) 1089 max-framerate: 1090 The value is an integer greater than zero, specifying a 1091 number of frames per second for the coded bit stream. The 1092 value is the frame rate multiplied by 1000 and rounded to the 1093 nearest integer value. For example, 30000/1001 1094 (approximately 29.97) frames per second is represented as 1095 29970. 1097 If the value is less than the maximum frame rate allowed by 1098 the profile and level, then the value specifies the preferred 1099 frame rate. Otherwise, it specifies the maximum frame rate 1100 that is supported. 1102 If the parameter is not specified, it defaults to the maximum 1103 frame rate allowed by the profile and level. 1105 Encoding considerations: 1106 This media type is framed and contains binary data. 1108 Security considerations: 1109 See Section 7 of this document. 1111 Interoperability considerations: 1112 None. 1114 Published specification: 1115 This payload format specification. 1117 Applications which use this media type: 1118 Multimedia streaming and conferencing tools. 1120 Additional Information: 1121 None. 1123 Person & email address to contact for further information: 1124 Anders Klemets 1125 IETF AVT working group. 1127 Intended Usage: 1128 COMMON 1130 Restrictions on usage: 1131 This media type depends on RTP framing, and hence is only 1132 defined for transfer via RTP [3]. 1134 Authors: 1135 Anders Klemets 1137 Change controller: 1138 IETF Audio/Video Transport Working Group delegated from the 1139 IESG. 1141 6.2 Mapping of MIME parameters to SDP 1143 The information carried in the media type specification has a 1144 specific mapping to fields in the Session Description Protocol (SDP) 1145 [4]. If SDP is used to specify sessions using this payload format, 1146 the mapping is done as follows: 1148 o The media name in the "m=" line of SDP MUST be video (the type 1149 name). 1151 o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the 1152 subtype name). 1154 o The clock rate in the "a=rtpmap" line MUST be at least 90000. 1156 o The REQUIRED parameters "profile" and "level" MUST be included in 1157 the "a=fmtp" line of SDP. 1158 These parameters are expressed as a MIME media type string, in the 1159 form of a semicolon separated list of parameter=value pairs. 1161 o The OPTIONAL parameters "config", "width", "height", "bitrate", 1162 "buffer", "framerate", "bpic", "mode", "max-width", "max-height", 1163 "max-bitrate", "max-buffer" and "max-framerate", when present, 1164 MUST be included in the "a=fmtp" line of SDP. 1165 These parameters are expressed as a MIME media type string, in the 1166 form of a semicolon separated list of parameter=value pairs: 1168 a=fmtp: =[,][; =] 1171 o Any unknown parameters to the device that uses the SDP MUST be 1172 ignored. For example, parameters defined in later specifications 1173 MAY be copied into the SDP and MUST be ignored by receivers that 1174 do not understand them. 1176 6.3 Usage with the SDP Offer/Answer Model 1178 When VC-1 is offered over RTP using SDP in an Offer/Answer model [5] 1179 for negotiation for unicast usage, the following rules and 1180 limitations apply: 1182 o The "profile" parameter MUST be used symmetrically, i.e., the 1183 answerer MUST either maintain the parameter or remove the media 1184 format (payload type) completely if the offered encoding profile 1185 is not supported. 1187 o The "level" parameter describes the level of the VC-1 profile of 1188 the coded bit stream that the offerer or answerer is sending for 1189 this media format configuration, when the direction attribute is 1190 sendonly or sendrecv. If the direction attribute is sendrecv or 1191 recvonly, the parameter also specifies the highest level of the 1192 VC-1 profile that the receiver implementation accepts. 1194 The answerer MUST NOT specify a numerically higher level in the 1195 answer than what was specified in the offer, regardless of the 1196 direction attribute. 1198 If an offer specifies the recvonly direction attribute, the 1199 answerer MAY specify a level that is lower than what was specified 1200 in the offer, i.e., the level parameter can be "downgraded". 1202 If the offer specifies the sendonly direction attribute, the level 1203 parameter cannot be downgraded by the answerer. In this case, the 1204 answerer MUST either maintain the level parameter or remove the 1205 media format (payload type) completely if the level is not 1206 supported. 1208 If the offer specifies the sendrecv direction attribute, or if the 1209 direction attribute is unspecified, the answerer MAY specify a 1210 level that is lower than what was specified in the offer. Note 1211 that the level parameter specified in the answer applies to the 1212 coded bit stream that will be sent by the answerer, and the 1213 offerer will still use the level parameter that it specified in 1214 the offer. 1216 o The parameters "config", "bpic", "width", "height", "framerate", 1217 "bitrate", "buffer" and "mode", describe the properties of the VC- 1218 1 bit stream that the offerer or answerer is sending for this 1219 media format configuration. 1221 In the case of unicast usage and when the direction attribute in 1222 the offer or answer is recvonly, the interpretation of these 1223 parameters is undefined and they MUST NOT be used. 1225 o The parameters "max-width", "max-height", "max-framerate", "max- 1226 bitrate" and "max-buffer" MAY be specified in an offer or an 1227 answer, and their interpretation is as follows: 1229 When the direction attribute is sendonly, the parameters describe 1230 the limits of the VC-1 bit stream that the sender is capable of 1231 producing for the given profile and level, or any lower level of 1232 the same profile. 1234 When the direction attribute is recvonly or sendrecv, the 1235 parameters describe properties of the receiver implementation. If 1236 the value of a property is less than what is allowed by the level 1237 of the VC-1 profile, then it SHOULD be interpreted only as a 1238 preferred value suggested by the sender. If the value of a 1239 property is greater than what is allowed by the level of the VC-1 1240 profile, then it MUST be interpreted by the sender as an upper 1241 limit of what the receiver accepts for the given profile and 1242 level, and any lower level of the same profile. 1244 For example, if a recvonly or sendrecv offer specifies 1245 "profile=0;level=1;max-bitrate=48000", then 48 kbps is merely a 1246 suggested bit rate, because all receiver implementations of Simple 1247 profile, Low Level, are required to support bit rates of up to 96 1248 kbps. But if the offer specifies "max-bitrate=200000", this means 1249 that the receiver implementation supports a maximum of 200 kbps 1250 for the given profile and level (or lower level.) 1252 o If an offerer wishes to have non-symmetrical capabilities between 1253 sending and receiving, e.g., use different levels in each 1254 direction, then the offerer has to offer different RTP sessions. 1255 This can be done by specifiying different media lines declared as 1256 "recvonly" and "sendonly", respectively. 1258 For streams being delivered over multicast, the following rules apply 1259 in addition: 1261 o The "level" parameter specifies the highest level of the VC-1 1262 profile of the bit stream that will be sent, and/or received, on 1263 the multicast session. The value of this parameter MUST NOT be 1264 changed by the answerer. Thus, a payload type can either be 1265 accepted unaltered or removed. 1267 o The parameters "config", "bpic", "width", "height", "framerate", 1268 "bitrate", "buffer" and "mode", specify properties of the VC-1 bit 1269 stream that will be sent, and/or received, on the multicast 1270 session. The parameters MAY be specified even if the direction 1271 attribute is recvonly. 1273 The values of these parameters MUST NOT be changed by the 1274 answerer. Thus, a payload type can either be accepted unaltered 1275 or removed. 1277 o The values of the parameters "max-width", "max-height", "max- 1278 framerate", "max-bitrate" and "max-buffer" MUST be supported by 1279 the answerer for all streams declared as sendrecv or recvonly. 1280 Otherwise, one of the following actions MUST be performed: the 1281 media format is removed, or the session rejected. 1283 6.4 Usage in Declarative Session Descriptions 1285 When VC-1 is offered over RTP using SDP in a declarative style, as in 1286 RTSP [12] or SAP [13], the following rules and limitations apply. 1288 o The parameters "profile" and "level" indicate only the properties 1289 of the coded bit stream. They do not imply a limit on capabilties 1290 supported by the sender. 1292 o The parameters "config", "width", "height", "bitrate" and "buffer" 1293 MUST be specified. 1295 o The parameters "max-width", "max-height", "max-framerate", "max- 1296 bitrate" and "max-buffer" MUST NOT be used. 1298 An example of media representation in SDP is as follows (Simple 1299 profile, Medium level): 1301 m=video 49170 RTP/AVP 98 1302 a=rtpmap:98 vc1/90000 1303 a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000; 1304 bitrate=384000;buffer=2000;config=4e291800 1306 7. Security Considerations 1308 RTP packets using the payload format defined in this specification 1309 are subject to the security considerations discussed in the RTP 1310 specification [4], and in any appropriate RTP profile. This implies 1311 that confidentiality of the media streams is achieved by encryption; 1312 for example, through the application of SRTP [10]. 1314 A potential denial-of-service threat exists for data encodings using 1315 compression techniques that have non-uniform receiver-end 1316 computational load. The attacker can inject pathological RTP packets 1317 into the stream that are complex to decode and that cause the 1318 receiver to be overloaded. VC-1 is particularly vulnerable to such 1319 attacks, because it is possible for an attacker to generate RTP 1320 packets containing frames that affect the decoding process of many 1321 future frames. Therefore, the usage of data origin authentication 1322 and data integrity protection of at least the RTP packet is 1323 RECOMMENDED; for example, with SRTP [10]. 1325 Note that the appropriate mechanism to ensure confidentiality and 1326 integrity of RTP packets and their payloads is very dependent on the 1327 application and on the transport and signaling protocols employed. 1328 Thus, although SRTP is given as an example above, other possible 1329 choices exist. 1331 8. IANA Considerations 1333 IANA is requested to register the MIME type "video/vc1" and the 1334 associated RTP payload format, as specified in section 6.1 of this 1335 document, in the Media Types registry and in the RTP Payload Format 1336 MIME types registry. 1338 9. References 1340 9.1 Normative references 1342 [1] Proposed SMPTE 421M, "VC-1 Compressed Video Bitstream Format and 1343 Decoding Process", www.smpte.org. 1344 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1345 Levels", BCP 14, RFC 2119, March 1997. 1346 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 1347 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 1348 RFC 3550, July 2003. 1349 [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", 1350 RFC 2327, April 1998. 1351 [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1352 Session Description Protocol (SDP)", RFC 3264, June 2002. 1353 [6] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data 1354 Encodings", RFC 3548, July 2003. 1356 [7] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload 1357 Formats", RFC 3555, July 2003. 1359 9.2 Informative references 1361 [8] Srinivasan, S., Hsu, P., Holcomb, T., Mukerjee, K., Regunathan, 1362 S.L., Lin, B., Liang, J., Lee, M., and J. Ribas-Corbera, "Windows 1363 Media Video 9: overview and applications", Signal Processing: 1364 Image Communication, Volume 19, Issue 9, October 2004. 1365 [9] Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A 1366 generalized hypothetical reference decoder for H.264/AVC", IEEE 1367 Transactions on Circuits and Systems for Video Technology, August 1368 2003. 1369 [10]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1370 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 1371 3711, March 2004. 1372 [11]Freed, N. and Klensin, J., "Media Type Specifications and 1373 Registration Procedures", Work in Progress, July 2005. 1374 [12]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 1375 Protocol (RTSP)", RFC 2326, April 1998. 1376 [13]Handley, M., Perkins, C., and E. Whelan, "Session Announcement 1377 Protocol", RFC 2974, October 2000. 1379 Author's Addresses 1381 Anders Klemets 1382 Microsoft Corp. 1383 1 Microsoft Way 1384 Redmond, WA 98052 1385 USA 1386 Email: anderskl@microsoft.com 1388 Acknowledgements 1390 Thanks to Shankar Regunathan, Gary Sullivan, Regis Crinon, Magnus 1391 Westerlund and Colin Perkins for providing detailed feedback on this 1392 document. 1394 IPR Notices 1396 The IETF takes no position regarding the validity or scope of any 1397 Intellectual Property Rights or other rights that might be claimed to 1398 pertain to the implementation or use of the technology described in 1399 this document or the extent to which any license under such rights 1400 might or might not be available; nor does it represent that it has 1401 made any independent effort to identify any such rights. Information 1402 on the procedures with respect to rights in RFC documents can be 1403 found in BCP 78 and BCP 79. 1405 Copies of IPR disclosures made to the IETF Secretariat and any 1406 assurances of licenses to be made available, or the result of an 1407 attempt made to obtain a general license or permission for the use of 1408 such proprietary rights by implementers or users of this 1409 specification can be obtained from the IETF on-line IPR repository at 1410 http://www.ietf.org/ipr. 1412 The IETF invites any interested party to bring to its attention any 1413 copyrights, patents or patent applications, or other proprietary 1414 rights that may cover technology that may be required to implement 1415 this standard. Please address the information to the IETF at 1416 ietf-ipr@ietf.org. 1418 Full Copyright Statement 1420 Copyright (C) The Internet Society (2005). 1422 This document is subject to the rights, licenses and restrictions 1423 contained in BCP 78, and except as set forth therein, the authors 1424 retain all their rights. 1426 This document and the information contained herein are provided on an 1427 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1428 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1429 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1430 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1431 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1432 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.