idnits 2.17.1 draft-ietf-avt-rtp-vc1-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1613. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1584. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1591. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1597. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2006) is 6674 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '10' is mentioned on line 326, but not defined == Missing Reference: '14' is mentioned on line 667, but not defined == Missing Reference: '12' is mentioned on line 1383, but not defined == Missing Reference: '13' is mentioned on line 1383, but not defined == Missing Reference: '11' is mentioned on line 1420, but not defined == Missing Reference: '15' is mentioned on line 1451, but not defined -- Possible downref: Non-RFC (?) normative reference: ref. '1' ** Obsolete normative reference: RFC 2327 (ref. '4') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 3548 (ref. '6') (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 4288 (ref. '7') (Obsoleted by RFC 6838) ** Obsolete normative reference: RFC 3555 (ref. '8') (Obsoleted by RFC 4855, RFC 4856) Summary: 7 errors (**), 0 flaws (~~), 8 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force 3 Internet Draft A. Klemets 4 Document: draft-ietf-avt-rtp-vc1-06.txt Microsoft 5 Expires: July 2006 January 2006 7 RTP Payload Format for Video Codec 1 (VC-1) 9 Status of this Memo 11 By submitting this Internet-Draft, each author represents that any 12 applicable patent or other IPR claims of which he or she is aware 13 have been or will be disclosed, and any of which he or she becomes 14 aware will be disclosed, in accordance with Section 6 of BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 Copyright Notice 34 Copyright (C) The Internet Society (2006). 36 Abstract 38 This memo specifies an RTP payload format for encapsulating Video 39 Codec 1 (VC-1) compressed bit streams, as defined by the Society of 40 Motion Picture and Television Engineers (SMPTE) standard, SMPTE 421M. 41 SMPTE is the main standardizing body in the motion imaging industry 42 and the SMPTE 421M standard defines a compressed video bit stream 43 format and decoding process for television. 45 Table of Contents 47 1. Introduction...................................................2 48 1.1 Conventions used in this document..........................3 49 2. Definitions and abbreviations..................................3 50 3. Overview of VC-1...............................................5 51 3.1 VC-1 bit stream layering model.............................5 52 3.2 Bit-stream Data Units in Advanced profile..................6 53 3.3 Decoder initialization parameters..........................6 54 3.4 Ordering of frames.........................................7 55 4. Encapsulation of VC-1 format bit streams in RTP................8 56 4.1 Access Units...............................................8 57 4.2 Fragmentation of VC-1 frames...............................9 58 4.3 Time stamp considerations.................................10 59 4.4 Random Access Points......................................12 60 4.5 Removal of HRD parameters.................................13 61 4.6 Repeating the Sequence Layer header.......................13 62 4.7 Signaling of media type parameters........................14 63 4.8 The "mode=1" media type parameter.........................15 64 4.9 The "mode=3" media type parameter.........................15 65 5. RTP Payload Format syntax.....................................15 66 5.1 RTP header usage..........................................15 67 5.2 AU header syntax..........................................16 68 5.3 AU Control field syntax...................................18 69 6. RTP Payload format parameters.................................19 70 6.1 Media type Registration...................................19 71 6.2 Mapping of media type parameters to SDP...................26 72 6.3 Usage with the SDP Offer/Answer Model.....................27 73 6.4 Usage in Declarative Session Descriptions.................29 74 7. Security Considerations.......................................29 75 8. Congestion Control............................................30 76 9. IANA Considerations...........................................32 77 10. References...................................................32 78 10.1 Normative references.....................................32 79 10.2 Informative references...................................32 81 1. Introduction 83 This memo specifies an RTP payload format for the video coding 84 standard Video Codec 1, also known as VC-1. The specification for 85 the VC-1 bit stream format and decoding process is published by the 86 Society of Motion Picture and Television Engineers (SMPTE) as SMPTE 87 421M [1]. 89 VC-1 has a broad applicability, being suitable for low bit rate 90 Internet streaming applications to HDTV broadcast and Digital Cinema 91 applications with nearly lossless coding. The overall performance of 92 VC-1 is such that bit rate savings of more than 50% are reported [9], 93 when compared against MPEG-2. See [9] for further details about how 94 VC-1 compares against other codecs, such as MPEG-4 and H.264/AVC. 95 (In [9], VC-1 is referred to by its earlier name, VC-9.) 97 VC-1 is widely used for downloading and streaming of movies on the 98 Internet, in the form of Windows Media Video 9 (WMV-9) [9], because 99 the WMV-9 codec is compliant with the VC-1 standard. VC-1 has also 100 recently been adopted as a mandatory compression format for the high- 101 definition DVD formats HD DVD and Blu-ray. 103 SMPTE 421M defines the VC-1 bit stream syntax and specifies 104 constraints that must be met by VC-1 conformant bit streams. SMPTE 105 421M also specifies the complete process required to decode the bit 106 stream. However, it does not specify the VC-1 compression algorithm, 107 thus allowing for different ways to implement a VC-1 encoder. 109 The VC-1 bit stream syntax has three profiles. Each profile has 110 specific bit stream syntax elements and algorithms associated with 111 it. Depending on the application in which VC-1 is used, some 112 profiles may be more suitable than others. For example, Simple 113 profile is designed for low bit rate Internet streaming and for 114 playback on devices that can only handle low complexity decoding. 115 Advanced profile is designed for broadcast applications, such as 116 digital TV, HD DVD or HDTV. Advanced profile is the only VC-1 117 profile that supports interlaced video frames and non-square pixels. 119 Section 2 defines the abbreviations used in this document. Section 3 120 provides a more detailed overview of VC-1. Sections 4 and 5 define 121 the RTP payload format for VC-1, and section 6 defines the media type 122 and SDP parameters for VC-1. See section 7 for security 123 considerations, and section 8 for congestion control requirements. 125 1.1 Conventions used in this document 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 129 document are to be interpreted as described in BCP 14, RFC 2119 [2]. 131 2. Definitions and abbreviations 133 This document uses the definitions in SMPTE 421M [1]. For 134 convenience, the following terms from SMPTE 421M are restated here: 136 B-picture: A picture that is coded using motion compensated 137 prediction from past and/or future reference fields or frames. A B- 138 picture cannot be used for predicting any other picture. 140 BI-picture: A B-picture that is coded using information only from 141 itself. A BI-picture cannot be used for predicting any other 142 picture. 144 Bit-stream data unit (BDU): A unit of the compressed data which may 145 be parsed (i.e., syntax decoded) independently of other information 146 at the same hierarchical level. A BDU can be, for example, a 147 sequence layer header, an entry-point header, a frame, or a slice. 149 Encapsulated BDU (EBDU): A BDU which has been encapsulated using the 150 encapsulation mechanism described in Annex E of SMPTE 421M [1], to 151 prevent emulation of the start code prefix in the bit stream. 153 Entry-point: A point in the bit stream that offers random access. 155 frame: A frame contains lines of spatial information of a video 156 signal. For progressive video, these lines contain samples starting 157 from one time instant and continuing through successive lines to the 158 bottom of the frame. For interlaced video, a frame consists of two 159 fields, a top field and a bottom field. One of these fields will 160 commence one field period later than the other. 162 interlace: The property of frames where alternating lines of the 163 frame represent different instances in time. In an interlaced frame, 164 one of the fields is meant to be displayed first. 166 I-picture: A picture coded using information only from itself. 168 level: A defined set of constraints on the values which may be taken 169 by the parameters (such as bit rate and buffer size) within a 170 particular profile. A profile may contain one or more levels. 172 P-picture: A picture that is coded using motion compensated 173 prediction from past reference fields or frames. 175 picture: For progressive video, a picture is identical to a frame, 176 while for interlaced video, a picture may refer to a frame, or the 177 top field or the bottom field of the frame depending on the context. 179 profile: A defined subset of the syntax of VC-1, with a specific set 180 of coding tools, algorithms, and syntax associated with it. There 181 are three VC-1 profiles: Simple, Main and Advanced. 183 progressive: The property of frames where all the samples of the 184 frame represent the same instance in time. 186 random access: A random access point in the bit stream is defined by 187 the following guarantee: If decoding begins at this point, all frames 188 needed for display after this point will have no decoding dependency 189 on any data preceding this point, and are also present in the 190 decoding sequence after this point. A random access point is also 191 called an entry-point. 193 sequence: A coded representation of a series of one or more pictures. 194 In VC-1 Advanced profile, a sequence consists of a series of one or 195 more entry-point segments, where each entry-point segment consists of 196 a series of one or more pictures, and where the first picture in each 197 entry-point segment provides random access. In VC-1 Simple and Main 198 profiles, the first picture in each sequence is an I-picture. 200 slice: A consecutive series of macroblock rows in a picture, which 201 are encoded as a single unit. 203 start codes (SC): 32-bit codes embedded in that coded bit stream that 204 are unique, and identify the beginning of a BDU. Start codes consist 205 of a unique three-byte Start Code Prefix (SCP), and a one-byte Start 206 Code Suffix (SCS). 208 3. Overview of VC-1 210 The VC-1 bit stream syntax consists of three profiles: Simple, Main, 211 and Advanced. Simple profile is designed for low bit rates and for 212 low complexity applications, such as playback of media on personal 213 digital assistants. The maximum bit rate supported by Simple profile 214 is 384 kbps. Main profile is targets high bit rate applications, 215 such as streaming and TV over IP. Main profile supports B-pictures, 216 which provide improved compression efficiency at the cost of higher 217 complexity. 219 Certain features that can be used to achieve high compression 220 efficiency, such as non-square pixels and support for interlaced 221 pictures, are only included in Advanced profile. The maximum bit 222 rate supported by the Advanced profile is 135 Mbps, making it 223 suitable for nearly lossless encoding of HDTV signals. 224 Only Advanced profile supports carrying user-data (meta-data) in-band 225 with the compressed bit stream. The user-data can be used for closed 226 captioning support, for example. 228 Of the three profiles, only Advanced profile allows codec 229 configuration parameters, such as the picture aspect ratio, to be 230 changed through in-band signaling in the compressed bit stream. 232 For each of the profiles, a certain number of "levels" have been 233 defined. Unlike a "profile", which implies a certain set of features 234 or syntax elements, a "level" is a set of constraints on the values 235 of parameters in a profile, such as the bit rate or buffer size. VC- 236 1 Simple profile has two levels, Main profile has three, and Advanced 237 profile has five levels. See Annex D of SMPTE 421M [1] for a 238 detailed list of the profiles and levels. 240 3.1 VC-1 bit stream layering model 242 The VC-1 bit stream is defined as a hierarchy of layers. This is 243 conceptually similar to the notion of a protocol stack of networking 244 protocols. The outermost layer is called the sequence layer. The 245 other layers are entry-point, picture, slice, macroblock and block. 247 In Simple and Main profiles, a sequence in the sequence layer 248 consists of a series of one or more coded pictures. In Advanced 249 profile, a sequence consists of one or more entry-point segments, 250 where each entry-point segment consists of a series of one or more 251 pictures, and where the first picture in each entry-point segment 252 provides random access. A picture is decomposed into macroblocks. A 253 slice comprises one or more contiguous rows of macroblocks. 255 The entry-point and slice layers are only present in Advanced 256 profile. In Advanced profile, the start of each entry-point layer 257 segment indicates a random access point. In Simple and Main profiles 258 each I-picture is a random access point. 260 Each picture can be coded as an I-picture, P-picture, skipped 261 picture, BI-picture, or as a B-picture. These terms are defined in 262 section 2 of this document and in section 4.12 of SMPTE 421M [1]. 264 3.2 Bit-stream Data Units in Advanced profile 266 In Advanced profile, each picture and slice is considered a Bit- 267 stream Data Unit (BDU). A BDU is always byte-aligned and is defined 268 as a unit that can be parsed (i.e., syntax decoded) independently of 269 other information in the same layer. 271 The beginning of a BDU is signaled by an identifier called Start Code 272 (SC). Sequence layer headers and entry-point headers are also BDUs 273 and thus can be easily identified by their Start Codes. See Annex E 274 of SMPTE 421M [1] for a complete list of Start Codes. Blocks and 275 macroblocks are not BDUs and thus do not have a Start Code and are 276 not necessarily byte-aligned. 278 The Start Code consists of four bytes. The first three bytes are 279 0x00, 0x00 and 0x01. The fourth byte is called the Start Code Suffix 280 (SCS) and it is used to indicate the type of BDU that follows the 281 Start Code. For example, the SCS of a sequence layer header (0x0F) 282 is different from the SCS of an entry-point header (0x0E). The Start 283 Code is always byte-aligned and is transmitted in network byte order. 285 To prevent accidental emulation of the Start Code in the coded bit 286 stream, SMPTE 421M defines an encapsulation mechanism that uses byte 287 stuffing. A BDU which has been encapsulated by this mechanism is 288 referred to as an Encapsulated BDU, or EBDU. 290 3.3 Decoder initialization parameters 292 In VC-1 Advanced profile, the sequence layer header contains 293 parameters that are necessary to initialize the VC-1 decoder. 295 The parameters apply to all entry-point segments until the next 296 occurrence of a sequence layer header in the coded bit stream. 298 The parameters in the sequence layer header include the Advanced 299 profile level, the maximum dimensions of the coded frames, the aspect 300 ratio, interlace information, the frame rate and up to 31 leaky 301 bucket parameter sets for the Hypothetical Reference Decoder (HRD). 303 Section 6.1 of SMPTE 421M [1] provides the formal specification of 304 the sequence layer header. 306 A sequence layer header is not defined for VC-1 Simple and Main 307 profiles. For these profiles, decoder initialization parameters MUST 308 be conveyed out-of-band. The decoder initialization parameters for 309 Simple and Main profiles include the maximum dimensions of the coded 310 frames, and a leaky bucket parameter set for the HRD. Section 4.7 311 specifies how the parameters are conveyed by this RTP payload format. 313 Each leaky bucket parameter set for the HRD specifies a peak 314 transmission bit rate and a decoder buffer capacity. The coded bit 315 stream is restricted by these parameters. The HRD model does not 316 mandate buffering by the decoder. Its purpose is to limit the 317 encoder's bit rate fluctuations according to a basic buffering model, 318 so that the resources necessary to decode the bit stream are 319 predictable. The HRD has a constant-delay mode and a variable-delay 320 mode. The constant-delay mode is appropriate for broadcast and 321 streaming applications, while the variable-delay mode is designed for 322 video conferencing applications. 324 Annex C of SMPTE 421M [1] specifies the usage of the hypothetical 325 reference decoder for VC-1 bit streams. A general description of the 326 theory of the HRD can be found in [10]. 328 For Simple and Main profiles, the current buffer fullness value for 329 the HRD leaky bucket is signaled using the BF syntax element in the 330 picture header of I-pictures and BI-pictures. 332 For Advanced profile, the entry-point header specifies current buffer 333 fullness values for the leaky buckets in the HRD. The entry-point 334 header also specifies coding control parameters that are in effect 335 until the occurrence of the next entry-point header in the bit 336 stream. The concept of an entry-point layer applies only to VC-1 337 Advanced profile. See Section 6.2 of SMPTE 421M [1] for the formal 338 specification of the entry-point header. 340 3.4 Ordering of frames 342 Frames are transmitted in the same order in which they are captured, 343 except if B-pictures or BI-pictures are present in the coded bit 344 stream. A BI-picture is a special kind of B-picture, and in the 345 remainder of this section the terms B-picture and B-frame also apply 346 to BI-pictures and BI-frames, respectively. 348 When B-pictures are present in the coded bit stream, the frames are 349 transmitted such that the frames that the B-pictures depend on are 350 transmitted first. This is referred to as the coded order of the 351 frames. 353 The rules for how a decoder converts frames from the coded order to 354 the display order are stated in section 5.4 of SMPTE 421M [1]. In 355 short, if B-pictures may be present in the coded bit stream, a 356 hypothetical decoder implementation needs to buffer one additional 357 decoded frame. When an I-frame or a P-frame is received, the frame 358 can be decoded immediately but it is not displayed until the next I- 359 or P-frame is received. However, B-frames are displayed immediately. 361 Figure 1 illustrates the timing relationship between the capture of 362 frames, their coded order, and the display order of the decoded 363 frames, when B-pictures are present in the coded bit stream. The 364 figure shows that the display of frame P4 is delayed until frame P7 365 is received, while frames B2 and B3 are displayed immediately. 367 Capture: |I0 P1 B2 B3 P4 B5 B6 P7 B8 B9 ... 368 | 369 Coded order: | I0 P1 P4 B2 B3 P7 B5 B6 ... 370 | 371 Display order: | I0 P1 B2 B3 P4 B5 B6 ... 372 | 373 |+---+---+---+---+---+---+---+---+---+--> time 374 0 1 2 3 4 5 6 7 8 9 376 Figure 1. Frame reordering when B-pictures are present. 378 If B-pictures are not present, the coded order and the display order 379 are identical, and frames can then be displayed without additional 380 delay shown in Figure 1. 382 4. Encapsulation of VC-1 format bit streams in RTP 384 4.1 Access Units 386 Each RTP packet contains an integral number of application data units 387 (ADUs). For VC-1 format bit streams, an ADU is equivalent to one 388 Access Unit (AU). An Access Unit is defined as the AU header 389 (defined in section 5.2) followed by a variable length payload, with 390 the rules and constraints described in sections 4.1 and 4.2. Figure 391 2 shows the layout of an RTP packet with multiple AUs. 393 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 394 | RTP | AU(1) | AU(2) | | AU(n) | 395 | Header | | | | | 396 +-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+ 398 Figure 2. RTP packet structure. 400 Each Access Unit MUST start with the AU header defined in section 401 5.2. The AU payload MUST contain data belonging to exactly one VC-1 402 frame. This means that data from different VC-1 frames will always 403 be in different AUs, however, it possible for a single VC-1 frame to 404 be fragmented across multiple AUs (see section 4.2.) 406 In the case of interlaced video, a VC-1 frame consists of two fields 407 that may be coded as separate pictures. The two pictures still 408 belong to the same VC-1 frame. 410 The following rules apply to the contents of each AU payload when VC- 411 1 Advanced profile is used: 413 - The AU payload MUST contain VC-1 bit stream data in EBDU format 414 (i.e., the bit stream must use the byte-stuffing encapsulation 415 mode defined in Annex E of SMPTE 421M [1].) 417 - The AU payload MAY contain multiple EBDUs, e.g., a sequence layer 418 header, an entry-point header, a frame (picture) header, a field 419 header, and multiple slices and the associated user-data. 420 (However, all slices and their corresponding macroblocks MUST 421 belong to the same video frame.) 423 - The AU payload MUST start at an EBDU boundary, except when the AU 424 payload contains a fragmented frame, in which case the rules in 425 section 4.2 apply. 427 When VC-1 Simple or Main profiles are used, the AU payload MUST start 428 at the beginning of a frame, except when the AU payload contains a 429 fragmented frame. Section 4.2 describes how to handle fragmented 430 frames. 432 Access Units MUST be byte-aligned. If the data in an AU (EBDUs in 433 the case of Advanced profile and frame in the case of Simple and 434 Main) does not end at an octet boundary, up to 7 zero-valued padding 435 bits MUST be added to achieve octet-alignment. 437 4.2 Fragmentation of VC-1 frames 439 Each AU payload SHOULD contain a complete VC-1 frame. However, if 440 this would cause the RTP packet to exceed the MTU size, the frame 441 SHOULD be fragmented into multiple AUs to avoid IP-level 442 fragmentation. When an AU contains a fragmented frame, this MUST be 443 indicated by setting the FRAG field in the AU header as defined in 444 section 5.3. 446 AU payloads that do not contain a fragmented frame, or that contain 447 the first fragment of a frame, MUST start at an EBDU boundary if 448 Advanced profile is used. In this case, for Simple and Main 449 profiles, the AU payload MUST start at the beginning of a frame. 451 If Advanced profile is used, AU payloads that contain a fragment of a 452 frame other than the first fragment, SHOULD start at an EBDU 453 boundary, such as at the start of a slice. 455 However, slices are only defined for Advanced profile, and are not 456 always used. Blocks and macroblocks are not BDUs (have no Start 457 Code) and are not byte-aligned. Therefore, it may not always be 458 possible to continue a fragmented frame at an EBDU boundary. One can 459 determine if an AU payload starts at an EBDU boundary by inspecting 460 the first three bytes of the AU payload. The AU payload starts at an 461 EBDU boundary if the first three bytes are identical to the Start 462 Code Prefix (i.e., 0x00, 0x00, 0x01.) 464 In the case of Simple and Main profiles, since the blocks and 465 macroblocks are not byte-aligned, the fragmentation boundary may be 466 chosen arbitrarily. 468 If an RTP packet contains an AU with the last fragment of a frame, 469 additional AUs SHOULD NOT be included in the RTP packet. 471 If the PTS Delta field in the AU header is present, each fragment of 472 a frame MUST have the same presentation time. If the DTS Delta field 473 in the AU header is present, each fragment of a frame MUST have the 474 same decode time. 476 4.3 Time stamp considerations 478 VC-1 video frames MUST be transmitted in the coded order. Coded 479 order implies that no frames are dependent on subsequent frames, as 480 discussed in section 3.4. When a video frame consists of a single 481 picture, the presentation time of the frame is identical to the 482 presentation time of the picture. When the VC-1 interlace coding 483 mode is used, frames may contain two pictures, one for each field. 484 In that case, the presentation time of a frame is the presentation 485 time of the field that is displayed first. 487 The RTP timestamp field MUST be set to the presentation time of the 488 video frame contained in the first AU in the RTP packet. The 489 presentation time can be used as the timestamp field in the RTP 490 header because it differs from the sampling instant of the frame only 491 by an arbitrary constant offset. 493 If the video frame in an AU has a presentation time that differs from 494 the RTP timestamp field, then the presentation time MUST be specified 495 using the PTS Delta field in the AU header. Since the RTP timestamp 496 field must be identical to the presentation time of the first video 497 frame, this can only happen if an RTP packet contains multiple AUs. 498 The syntax of the PTS Delta field is defined in section 5.2. 500 The decode time of a VC-1 frame is always monotonically increasing 501 when the video frames are transmitted in the coded order. If neither 502 B- nor BI-pictures are present in the coded bit stream, then the 503 decode time of a frame SHALL be equal to the presentation time of the 504 frame. A BI-picture is a special kind of B-picture, and in the 505 remainder of this section the terms B-picture and B-frame also apply 506 to BI-pictures and BI-frames, respectively. 508 If B-pictures may be present in the coded bit stream, then the decode 509 times of frames are determined as follows: 511 - B-frames: 512 The decode time SHALL be equal to the presentation time of the B- 513 frame. 515 - First non-B frame in the coded order: 516 The decode time SHALL be at least one frame period less than the 517 decode time of the next frame in the coded order. A frame period 518 is defined as the inverse of the frame rate used in the coded bit 519 stream (e.g., 100 milliseconds if the frame rate is 10 frames per 520 seconds.) For bit streams with a variable frame rate, the maximum 521 frame rate SHALL determine the frame period. If the maximum frame 522 is not specified, the maximum frame rate allowed by the profile 523 and level SHALL be used. 525 - Non-B frames (other than the first frame in the coded order): 526 The decode time SHALL be equal to the presentation time of the 527 previous non-B frame in the coded order. 529 As an example, consider Figure 1 in section 3.4. To determine the 530 decode time of the first frame, I0, one must first determine the 531 decode time of the next frame, P1. Because P1 is a non-B frame, its 532 decode time is equal to the presentation time of I0, which is 3 time 533 units. Thus, the decode time of I0 must be at least one frame period 534 less than 3. In this example, the frame period is 1, because one 535 frame is displayed every time unit. Consequently, the decode time of 536 I0 is chosen as 2 time units. The decode time of the third frame in 537 the coded order, P4, is 4, because it must be equal to the 538 presentation time of the previous non-B frame in the coded order, P1. 540 On the other hand, the decode time of B-frame B2 is 5 time units, 541 which is identical to its presentation time. 543 If the decode time of a video frame differs from its presentation 544 time, then the decode time MUST be specified using the DTS Delta 545 field in the AU header. The syntax of the DTS Delta field is defined 546 in section 5.2. 548 Receivers are not required to use the DTS Delta field. However, 549 possible uses include buffer management and pacing of frames prior to 550 decoding. If RTP packets are lost, it is possible to use the DTS 551 Delta field to determine if the sequence of lost RTP packets 552 contained reference frames or only B-frames. This can be done by 553 comparing the decode and presentation times of the first frame 554 received after the lost sequence against the presentation time of the 555 last reference frame received prior to the lost sequence. 557 Knowing if the stream will contain B-pictures may help the receiver 558 allocate resources more efficiently and can reduce delay, as an 559 absence of B-pictures in the stream implies that no reordering 560 of frames will be needed between the decoding process and the display 561 of the decoded frames. This may be important for interactive 562 applications. 564 The receiver SHALL assume that the coded bit stream may contain B- 565 pictures in the following cases: 567 - Advanced profile: If the value of the "bpic" media type parameter 568 defined in section 6.1 is 1, or if the "bpic" parameter is not 569 specified. 571 - Main profile: If the MAXBFRAMES field in STRUCT_C decoder 572 initialization parameter has a non-zero value. STRUCT_C is 573 conveyed in the "config" media type parameter, which is defined in 574 section 6.1. 576 Simple profile does not use B-pictures. 578 4.4 Random Access Points 580 The entry-point header contains information that is needed by the 581 decoder to decode the frames in that entry-point segment. This means 582 that in the event of lost RTP packets the decoder may be unable to 583 decode frames until the next entry-point header is received. 585 The first frame after an entry-point header is a random access points 586 into the coded bit stream. Simple and Main profiles do not have 587 entry-point headers, so for those profiles each I-picture is a random 588 access point. 590 To allow the RTP receiver to detect that an RTP packet which was lost 591 contained a random access point, this RTP payload format defines a 592 field called "RA Count". This field is present in every AU, and its 593 value is incremented (modulo 256) for every random access point. For 594 additional details, see the definition of "RA Count" in section 5.2. 596 To make it easy to determine if an AU contains a random access point, 597 this RTP payload format also defines a bit called the "RA" flag in 598 the AU Control field. This bit is set to 1 only on those AU's that 599 contain a random access point. The RA bit is defined in section 5.3. 601 4.5 Removal of HRD parameters 603 The sequence layer header of Advanced profile may include up to 31 604 leaky bucket parameter sets for the Hypothetical Reference Decoder 605 (HRD). Each leaky bucket parameter set specifies a possible peak 606 transmission bit rate (HRD_RATE) and a decoder buffer capacity 607 (HRD_BUFFER). (See section 3.3 for additional discussion about the 608 HRD.) 610 If the actual peak transmission rate is known by the RTP sender, the 611 RTP sender MAY remove all leaky bucket parameter sets except for the 612 one corresponding to the actual peak transmission rate. 614 For each leaky bucket parameter set in the sequence layer header, 615 there is also parameter in the entry-point header that specifies the 616 initial fullness (HRD_FULL) of the leaky bucket. 618 If the RTP sender has removed any leaky bucket parameter sets from 619 the sequence layer header, then for any removed leaky bucket 620 parameter set, it MUST also remove the corresponding HRD_FULL 621 parameter in the entry-point header. 623 Removing leaky bucket parameter sets, as described above, may 624 significantly reduce the size of the sequence layer headers and the 625 entry-point headers. 627 4.6 Repeating the Sequence Layer header 629 To improve robustness against loss of RTP packets, it is RECOMMENDED 630 that if the sequence layer header changes, it should be repeated 631 frequently in the bit stream. In this is case, it is RECOMMENDED 632 that the number of leaky bucket parameters in the sequence layer 633 header and the entry point headers be reduced to one, as described in 634 section 4.5. This will help reduce the overhead caused by repeating 635 the sequence layer header. 637 Any data in the VC-1 bit stream, including repeated copies of the 638 sequence header itself, must be accounted for when computing the 639 leaky bucket parameter for the HRD. (See section 3.3 for a 640 discussion about the HRD.) 642 If the value of TFCNTRFLAG in the sequence layer header is 1, each 643 picture header contains a frame counter field (TFCNTR). Each time 644 the sequence layer header is inserted in the bit stream, the value of 645 this counter MUST be reset. 647 To allow the RTP receiver to detect that an RTP packet which was lost 648 contained a new sequence layer header, the AU Control field defines a 649 bit called the "SL" flag. This bit is toggled when a sequence layer 650 header is transmitted, but only if that header is different from the 651 most recently transmitted sequence layer header. The SL bit is 652 defined in section 5.3. 654 4.7 Signaling of media type parameters 656 When this RTP payload format is used with SDP, the decoder 657 initialization parameters described in section 3.3 MUST be signaled 658 in SDP using the media type parameters specified in section 6.1. 659 Section 6.2 specifies how to map the media type parameters to SDP 660 [5], and section 6.3 defines rules specific to the SDP Offer/Answer 661 model, and section 6.4 defines rules for when SDP is used in a 662 declarative style. 664 When Simple or Main profiles are used, it is not possible to change 665 the decoder initialization parameters through the coded bit stream. 666 Any changes to the decoder initialization parameters would have to be 667 done through out-of-band means, e.g., by a SIP [14] re-invite or 668 similar means that convey an updated session description. 670 When Advanced profile is used, the decoder initialization parameters 671 MAY be changed by inserting a new sequence layer header or an entry- 672 point header in the coded bit stream. 674 The sequence layer header specifies the VC-1 level, the maximum size 675 of the coded frames and optionally also the maximum frame rate. The 676 media type parameters "level", "width", "height" and "framerate" 677 specify upper limits for these parameters. Thus, the sequence layer 678 header MAY specify values that are lower than the values of the media 679 type parameters "level", "width", "height" or "framerate", but the 680 sequence layer header MUST NOT exceed the values of any of these 681 media type parameters. 683 4.8 The "mode=1" media type parameter 685 In certain applications using Advanced profile, the sequence layer 686 header never changes. This MAY be signaled with the media type 687 parameter "mode=1". (The "mode" parameter is defined in section 6.1.) 688 The "mode=1" parameter serves as a "hint" to the RTP receiver that 689 all sequence layer headers in the bit stream will be identical. If 690 "mode=1" is signaled and a sequence layer header is present in the 691 coded bit stream, then it MUST be identical to the sequence layer 692 header specified by the "config" media type parameter. 694 Since the sequence layer header never changes in "mode=1", the RTP 695 sender MAY remove it from the bit stream. Note, however, that if the 696 value of TFCNTRFLAG in the sequence layer header is 1, each picture 697 header contains a frame counter field (TFCNTR). This field is reset 698 each time the sequence layer header occurs in the bit stream. If the 699 RTP sender chooses to remove the sequence layer header, then it MUST 700 ensure that the resulting bit stream is still compliant with the VC-1 701 specification (e.g., by adjusting the TFCNTR field, if necessary.) 703 4.9 The "mode=3" media type parameter 705 In certain applications using Advanced profile, both the sequence 706 layer header and the entry-point header never change. This MAY be 707 signaled with the media type parameter "mode=3". The same rules 708 apply to "mode=3" as for "mode=1", described in section 4.8. 709 Additionally, if "mode=3" is signaled, then the RTP sender MAY 710 "compress" the coded bit stream by not including sequence layer 711 headers and entry-point headers in the RTP packets. 713 The RTP receiver MUST "decompress" the coded bit stream by re- 714 inserting the entry-point headers prior to delivering the coded bit 715 stream to the VC-1 decoder. The sequence layer header does not need 716 to be decompressed by the receiver, since it never changes. 718 If "mode=3" is signaled and the RTP receiver receives a complete AU 719 or the first fragment of an AU, and the RA bit is set to 1 but the AU 720 does not begin with an entry-point header, then this indicates that 721 entry-point header has been "compressed". In that case, the RTP 722 receiver MUST insert an entry-point header at the beginning of the 723 AU. When inserting the entry-point header, the RTP receiver MUST use 724 the one that was specified by the "config" media type parameter. 726 5. RTP Payload Format syntax 728 5.1 RTP header usage 730 The format of the RTP header is specified in RFC 3550 [3] and is 731 reprinted in Figure 3 for convenience. 733 0 1 2 3 734 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 736 |V=2|P|X| CC |M| PT | sequence number | 737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 738 | timestamp | 739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 740 | synchronization source (SSRC) identifier | 741 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 742 | contributing source (CSRC) identifiers | 743 | .... | 744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 746 Figure 3. RTP header according to RFC 3550 748 The fields of the fixed RTP header have their usual meaning, which is 749 defined in RFC 3550 and by the RTP profile in use, with the following 750 additional notes: 752 Marker bit (M): 1 bit 753 This bit is set to 1 if the RTP packet contains an Access 754 Unit containing a complete VC-1 frame, or the last fragment 755 of a VC-1 frame. 757 Payload type (PT): 7 bits 758 This document does not assign an RTP payload type for this 759 RTP payload format. The assignment of a payload type has to 760 be performed either through the RTP profile used or in a 761 dynamic way. 763 Sequence Number: 16 bits 764 The RTP receiver can use the sequence number field to recover 765 the coded order of the VC-1 frames. (A typical VC-1 decoder 766 will require the VC-1 frames to be delivered in coded order.) 767 When VC-1 frames have been fragmented across RTP packets, the 768 RTP receiver can use the sequence number field to ensure that 769 no fragment is missing. 771 Timestamp: 32 bits 772 The RTP timestamp is set to the presentation time of the VC-1 773 frame in the first Access Unit. 774 A clock rate of 90 kHz MUST be used. 776 5.2 AU header syntax 778 The Access Unit header consists of a one-byte AU Control field, the 779 RA Count field and 3 optional fields. All fields MUST be written in 780 network byte order. The structure of the AU header is illustrated in 781 Figure 4. 783 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 784 |AU | RA | AUP | PTS | DTS | 785 |Control| Count | Len | Delta | Delta | 786 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 788 Figure 4. Structure of AU header. 790 AU Control: 8 bits 791 The usage of the AU Control field is defined in section 5.3. 793 RA Count: 8 bits 794 Random Access Point Counter. This field is a binary modulo 795 256 counter. The value of this field MUST be incremented by 796 1 each time an AU is transmitted where the RA bit in the AU 797 Control field is set to 1. The initial value of this field 798 is undefined and MAY be chosen randomly. 800 AUP Len: 16 bits 801 Access Unit Payload Length. Specifies the size, in bytes, of 802 the payload of the Access Unit. The field does not include 803 the size of the AU header itself. The field MUST be included 804 in each AU header in an RTP packet, except for the last AU 805 header in the packet. If this field is not included, the 806 payload of the Access Unit SHALL be assumed to extend to the 807 end of the RTP payload. 809 PTS Delta: 32 bits 810 Presentation time delta. Specifies the presentation time of 811 the frame as a 2's complement offset (delta) from the 812 timestamp field in the RTP header of this RTP packet. The 813 PTS Delta field MUST use the same clock rate as the timestamp 814 field in the RTP header. 815 This field SHOULD NOT be included in the first AU header in 816 the RTP packet, because the RTP timestamp field specifies the 817 presentation time of the frame in the first AU. If this 818 field is not included, the presentation time of the frame 819 SHALL be assumed to be specified by the timestamp field in 820 the RTP header. 822 DTS Delta: 32 bits 823 Decode time delta. Specifies the decode time of the frame as 824 a 2's complement offset (delta) between the presentation time 825 and the decode time. Note that if the presentation time is 826 larger than the decode time, this results in a value for the 827 DTS Delta field that is greater than zero. The DTS Delta 828 field MUST use the same clock rate as the timestamp field in 829 the RTP header. If this field is not included, the decode 830 time of the frame SHALL be assumed to be identical to the 831 presentation time of the frame. 833 5.3 AU Control field syntax 835 The structure of the 8-bit AU Control field is shown in Figure 5. 837 0 1 2 3 4 5 6 7 838 +----+----+----+----+----+----+----+----+ 839 | FRAG | RA | SL | LP | PT | DT | R | 840 +----+----+----+----+----+----+----+----+ 842 Figure 5. Syntax of AU Control field. 844 FRAG: 2 bits 845 Fragmentation Information. This field indicates if the AU 846 payload contains a complete frame or a fragment of a frame. 847 It MUST be set as follows: 848 0: The AU payload contains a fragment of a frame other than 849 the first or last fragment. 850 1: The AU payload contains the first fragment of a frame. 851 2: The AU payload contains the last fragment of a frame. 852 3: The AU payload contains a complete frame (not fragmented.) 854 RA: 1 bit 855 Random Access Point indicator. This bit MUST be set to 1 if 856 the AU contains a frame that is a random access point. In 857 the case of Simple and Main profiles, any I-picture is a 858 random access point. 859 In the case of Advanced profile, the first frame after an 860 entry-point header is a random access point. 861 If entry-point headers are not transmitted at every random 862 access point, this MUST be indicated using the media type 863 parameter "mode=3". 865 SL: 1 bit 866 Sequence Layer Counter. This bit MUST be toggled, i.e., 867 changed from 0 to 1 or from 1 to 0, if the AU contains a 868 sequence layer header and if it is different from the most 869 recently transmitted sequence layer header. Otherwise, the 870 value of this bit must be identical to the value of the SL 871 bit in the previous AU. 872 The initial value of this bit is undefined and MAY be chosen 873 randomly. 874 The bit MUST be 0 for Simple and Main profile bit streams or 875 if the sequence layer header never changes. 877 LP: 1 bit 878 Length Present. This bit MUST be set to 1 if the AU header 879 includes the AUP Len field. 881 PT: 1 bit 882 PTS Delta Present. This bit MUST be set to 1 if the AU 883 header includes the PTS Delta field. 885 DT: 1 bit 886 DTS Delta Present. This bit MUST be set to 1 if the AU 887 header includes the DTS Delta field. 889 R: 1 bit 890 Reserved. This bit MUST be set to 0 and MUST be ignored by 891 receivers. 893 6. RTP Payload format parameters 895 6.1 Media type Registration 897 This registration uses the template defined in RFC 4288 [7] and 898 follows RFC 3555 [8]. 900 Type name: video 902 Subtype name: vc1 904 Required parameters: 906 profile: 907 The value is an integer identifying the VC-1 profile. The 908 following values are defined: 909 0: Simple profile. 910 1: Main profile. 911 3: Advanced profile. 913 If the profile parameter is used to indicate properties of a 914 coded bit stream, it indicates the VC-1 profile that a 915 decoder has to support when it decodes the bit stream. 917 If the profile parameter is used for capability exchange or 918 in a session setup procedure, it indicates the VC-1 profile 919 that the codec supports. 921 level: 922 The value is an integer specifying the level of the VC-1 923 profile. 925 For Advanced profile, valid values are 0 to 4, which 926 correspond to levels L0 to L4, respectively. For Simple and 927 Main profiles, the following values are defined: 928 1: Low Level 929 2: Medium Level 930 3: High Level (only valid for Main profile) 932 If the level parameter is used to indicate properties of a 933 coded bit stream, it indicates the highest level of the VC-1 934 profile that a decoder has to support when it decodes the bit 935 stream. Note that support for a level implies support for 936 all numerically lower levels of the given profile. 938 If the level parameter is used for capability exchange or in 939 a session setup procedure, it indicates the highest level of 940 the VC-1 profile that the codec supports. See section 6.3 of 941 RFC XXXX for specific rules for how this parameter is used 942 with the SDP Offer/Answer model. 944 Optional parameters: 946 config: 947 The value is a base16 [6] (hexadecimal) representation of an 948 octet string that expresses the decoder initialization 949 parameters. Decoder initialization parameters are mapped 950 onto the base16 octet string in an MSB-first basis. The 951 first bit of the decoder initialization parameters MUST be 952 located at the MSB of the first octet. If the decoder 953 initialization parameters are not multiple of 8 bits, in the 954 last octet up to 7 zero-valued padding bits MUST be added to 955 achieve octet alignment. 957 For Simple and Main profiles, the decoder initialization 958 parameters are STRUCT_C, as defined in Annex J of SMPTE 421M 959 [1]. 961 For Advanced profile, the decoder initialization parameters 962 are a sequence layer header directly followed by an entry- 963 point header. The two headers MUST be in EBDU format, 964 meaning that they must include their Start Codes and must use 965 the encapsulation method defined in Annex E of SMPTE 421M 966 [1]. 968 width: 969 The value is an integer greater than zero, specifying the 970 maximum horizontal size of the coded frames, in luma samples 971 (pixels in the luma picture.) 972 For Simple and Main profiles, the value SHALL be identical to 973 the actual horizontal size of the coded frames. 974 For Advanced profile, the value SHALL be greater than, or 975 equal to, the largest horizontal size of the coded frames. 977 If this parameter is not specified, it defaults to the 978 maximum horizontal size allowed by the specified profile and 979 level. 981 height: 982 The value is an integer greater than zero, specifying the 983 maximum vertical size of the coded frames, in luma samples 984 (pixels in a progressively coded luma picture.) 986 For Simple and Main profiles, the value SHALL be identical to 987 the actual vertical size of the coded frames. 988 For Advanced profile, the value SHALL be greater than, or 989 equal to, the largest vertical size of the coded frames. 991 If this parameter is not specified, it defaults to the 992 maximum vertical size allowed by the specified profile and 993 level. 995 bitrate: 996 The value is an integer greater than zero, specifying the 997 peak transmission rate of the coded bit stream in bits per 998 second. The number does not include the overhead caused by 999 RTP encapsulation, i.e., it does not include the AU headers, 1000 or any of the RTP, UDP or IP headers. 1002 If this parameter is not specified, it defaults to the 1003 maximum bit rate allowed by the specified profile and level. 1004 (See the values for "RMax" in Annex D of SMPTE 421M [1].) 1006 buffer: 1007 The value is an integer specifying the leaky bucket size, B, 1008 in milliseconds, required to contain a stream transmitted at 1009 the transmission rate specified by the bitrate parameter. 1010 This parameter is defined in the hypothetical reference 1011 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 1013 Note that this parameter relates to the codec bit stream 1014 only, and does not account for any buffering time that may be 1015 required to compensate for jitter in the network. 1017 If this parameter is not specified, it defaults to the 1018 maximum buffer size allowed by the specified profile and 1019 level. (See the values for "BMax" and "RMax" in Annex D of 1020 SMPTE 421M [1].) 1022 framerate: 1023 The value is an integer greater than zero, specifying the 1024 maximum number of frames per second in the coded bit stream, 1025 multiplied by 1000 and rounded to the nearest integer value. 1026 For example, 30000/1001 (approximately 29.97) frames per 1027 second is represented as 29970. 1029 This parameter can be used to control resource allocation at 1030 the receiver. For example, a receiver may choose to perform 1031 additional post-processing on decoded frames only if the 1032 frame rate is expected to be low. The parameter MUST NOT be 1033 used for pacing of the rendering process, since the actual 1034 frame rate may differ from the specified value. 1036 If the parameter is not specified, it defaults to the maximum 1037 frame rate allowed by the specified profile and level. 1039 bpic: 1040 This parameter signals that B- and BI-pictures may be present 1041 when Advanced profile is used. If this parameter is present, 1042 and B- or BI-pictures may be present in the coded bit stream, 1043 this parameter MUST be equal to 1. 1044 A value of 0 indicates that B- and BI-pictures SHALL NOT be 1045 present in the coded bit stream, even if the sequence layer 1046 header changes. It is RECOMMENDED to include this parameter, 1047 with a value of 0, if neither B- nor BI-pictures are included 1048 in the coded bit stream. 1050 This parameter MUST NOT be used with Simple and Main 1051 profiles. (For Main profile, the presence of B- and BI- 1052 pictures is indicated by the MAXBFRAMES field in STRUCT_C 1053 decoder initialization parameter.) 1055 For Advanced profile, if this parameter is not specified, a 1056 value of 1 SHALL be assumed. 1058 mode: 1059 The value is an integer specifying the use of the sequence 1060 layer header and the entry-point header. This parameter is 1061 only defined for Advanced profile. The following values are 1062 defined: 1063 0: Both the sequence layer header and the entry-point header 1064 may change, and changed headers will be included in the RTP 1065 packets. 1066 1: The sequence layer header specified in the config 1067 parameter never changes. The rules in section 4.8 of RFC 1068 XXXX MUST be followed. 1070 3: The sequence layer header and the entry-point header 1071 specified in the config parameter never change. The rules in 1072 section 4.9 of RFC XXXX MUST be followed. 1074 If the mode parameter is not specified, a value of 0 SHALL be 1075 assumed. The mode parameter SHOULD be specified if modes 1 1076 or 3 apply to the VC-1 bit stream. 1078 max-width, max-height, max-bitrate, max-buffer, max-framerate: 1079 These parameters are defined for use in a capability exchange 1080 procedure. The parameters do not signal properties of the 1081 coded bit stream, but rather upper limits or preferred values 1082 for the "width", "height", "bitrate", "buffer" and 1083 "framerate" parameters. Section 6.3 of RFC XXXX provides 1084 specific rules for these parameters are used with the SDP 1085 Offer/Answer model. 1087 Receivers that signal support for a given profile and level 1088 MUST support the maximum values for these parameters for that 1089 profile and level. For example, a receiver that indicates 1090 support for Main profile, Low level, must support a width of 1091 352 luma samples and a height of 288 luma samples, even if 1092 this requires scaling the image to fit the resolution of a 1093 smaller display device. 1095 A receiver MAY use any of the max-width, max-height, max- 1096 bitrate, max-buffer and max-framerate parameters to indicate 1097 preferred capabilities. For example, a receiver may choose 1098 to specify values for max-width and max-height that match the 1099 resolution of its display device, since a bit stream encoded 1100 using those parameters would not need to be rescaled. 1102 If any of the max-width, max-height, max-bitrate, max-buffer 1103 and max-framerate parameters signal a capability that is less 1104 than the required capabilities of the signaled profile and 1105 level, then the parameter SHALL be interpreted as a preferred 1106 value for that capability. 1108 Any of the parameters MAY also be used to signal capabilities 1109 that exceed the required capabilities of the signaled profile 1110 and level. In that case, the parameter SHALL be interpreted 1111 as the maximum value that can be supported for that 1112 capability. 1114 When more than one parameter from the set (max-width, max- 1115 height, max-bitrate, max-buffer and max-framerate) is 1116 present, all signaled capabilities MUST be supported 1117 simultaneously. 1119 A sender or receiver MUST NOT use these parameters to signal 1120 capabilities that meet the requirements of a higher level of 1121 the VC-1 profile than the one specified in the "level" 1122 parameter, if the sender or receiver can support all the 1123 properties of the higher level, except if specifying a higher 1124 level is not allowed due to other restrictions. (As an 1125 example of such a restriction, in the SDP Offer/Answer model, 1126 the value of the level parameter that can be used in an 1127 Answer is limited by what was specified in the Offer.) 1129 max-width: 1130 The value is an integer greater than zero, specifying a 1131 horizontal size for the coded frames, in luma samples (pixels 1132 in the luma picture.) If the value is less than the maximum 1133 horizontal size allowed by the profile and level, then the 1134 value specifies the preferred horizontal size. Otherwise, it 1135 specifies the maximum horizontal size that is supported. 1137 If this parameter is not specified, it defaults to the 1138 maximum horizontal size allowed by the specified profile and 1139 level. 1141 max-height: 1142 The value is an integer greater than zero, specifying a 1143 vertical size for the coded frames, in luma samples (pixels 1144 in a progressively coded luma picture.) If the value is less 1145 than the maximum vertical size allowed by the profile and 1146 level, then the value specifies the preferred vertical size. 1147 Otherwise, it specifies the maximum vertical size that is 1148 supported. 1150 If this parameter is not specified, it defaults to the 1151 maximum vertical size allowed by the specified profile and 1152 level. 1154 max-bitrate: 1155 The value is an integer greater than zero, specifying a peak 1156 transmission rate for the coded bit stream in bits per 1157 second. The number does not include the overhead caused by 1158 RTP encapsulation, i.e., it does not include the AU headers, 1159 or any of the RTP, UDP or IP headers. 1161 If the value is less than the maximum bit rate allowed by the 1162 profile and level, then the value specifies the preferred bit 1163 rate. Otherwise, it specifies the maximum bit rate that is 1164 supported. 1166 If this parameter is not specified, it defaults to the 1167 maximum bit rate allowed by the specified profile and level. 1168 (See the values for "RMax" in Annex D of SMPTE 421M [1].) 1170 max-buffer: 1171 The value is an integer specifying a leaky bucket size, B, in 1172 milliseconds, required to contain a stream transmitted at the 1173 transmission rate specified by the max-bitrate parameter. 1174 This parameter is defined in the hypothetical reference 1175 decoder model for VC-1, in Annex C of SMPTE 421M [1]. 1177 Note that this parameter relates to the codec bit stream 1178 only, and does not account for any buffering time that may be 1179 required to compensate for jitter in the network. 1181 If the value is less than the maximum leaky bucket size 1182 allowed by the max-bitrate parameter and the profile and 1183 level, then the value specifies the preferred leaky bucket 1184 size. Otherwise, it specifies the maximum leaky bucket size 1185 that is supported for the bit rate specified by the max- 1186 bitrate parameter. 1188 If this parameter is not specified, it defaults to the 1189 maximum buffer size allowed by the specified profile and 1190 level. (See the values for "BMax" and "RMax" in Annex D of 1191 SMPTE 421M [1].) 1193 max-framerate: 1194 The value is an integer greater than zero, specifying a 1195 number of frames per second for the coded bit stream. The 1196 value is the frame rate multiplied by 1000 and rounded to the 1197 nearest integer value. For example, 30000/1001 1198 (approximately 29.97) frames per second is represented as 1199 29970. 1201 If the value is less than the maximum frame rate allowed by 1202 the profile and level, then the value specifies the preferred 1203 frame rate. Otherwise, it specifies the maximum frame rate 1204 that is supported. 1206 If the parameter is not specified, it defaults to the maximum 1207 frame rate allowed by the specified profile and level. 1209 Encoding considerations: 1210 This media type is framed and contains binary data. 1212 Security considerations: 1213 See Section 7 of RFC XXXX. 1215 Interoperability considerations: 1216 None. 1218 Published specification: 1219 RFC XXXX. 1221 Applications which use this media type: 1222 Multimedia streaming and conferencing tools. 1224 Additional Information: 1225 None. 1227 Person & email address to contact for further information: 1228 Anders Klemets 1229 IETF AVT working group. 1231 Intended Usage: 1232 COMMON 1234 Restrictions on usage: 1235 This media type depends on RTP framing, and hence is only 1236 defined for transfer via RTP [3]. 1238 Authors: 1239 Anders Klemets 1241 Change controller: 1242 IETF Audio/Video Transport Working Group delegated from the 1243 IESG. 1245 6.2 Mapping of media type parameters to SDP 1247 The information carried in the media type specification has a 1248 specific mapping to fields in the Session Description Protocol (SDP) 1249 [4]. If SDP is used to specify sessions using this payload format, 1250 the mapping is done as follows: 1252 o The media name in the "m=" line of SDP MUST be video (the type 1253 name). 1255 o The encoding name in the "a=rtpmap" line of SDP MUST be vc1 (the 1256 subtype name). 1258 o The clock rate in the "a=rtpmap" line MUST be 90000. 1260 o The REQUIRED parameters "profile" and "level" MUST be included in 1261 the "a=fmtp" line of SDP. 1262 These parameters are expressed in the form of a semicolon 1263 separated list of parameter=value pairs. 1265 o The OPTIONAL parameters "config", "width", "height", "bitrate", 1266 "buffer", "framerate", "bpic", "mode", "max-width", "max-height", 1267 "max-bitrate", "max-buffer" and "max-framerate", when present, 1268 MUST be included in the "a=fmtp" line of SDP. 1269 These parameters are expressed in the form of a semicolon 1270 separated list of parameter=value pairs: 1272 a=fmtp: =[,][; =] 1275 o Any unknown parameters to the device that uses the SDP MUST be 1276 ignored. For example, parameters defined in later specifications 1277 MAY be copied into the SDP and MUST be ignored by receivers that 1278 do not understand them. 1280 6.3 Usage with the SDP Offer/Answer Model 1282 When VC-1 is offered over RTP using SDP in an Offer/Answer model [5] 1283 for negotiation for unicast usage, the following rules and 1284 limitations apply: 1286 o The "profile" parameter MUST be used symmetrically, i.e., the 1287 answerer MUST either maintain the parameter or remove the media 1288 format (payload type) completely if the offered VC-1 profile is 1289 not supported. 1291 o The "level" parameter specifies the highest level of the VC-1 1292 profile supported by the codec. 1294 The answerer MUST NOT specify a numerically higher level in the 1295 answer than what was specified in the offer. The answerer MAY 1296 specify a level that is lower than what was specified in the 1297 offer, i.e., the level parameter can be "downgraded". 1299 If the offer specifies the sendrecv or sendonly direction 1300 attribute, and the answer downgrades the level parameter, this may 1301 require a new offer to specify an updated "config" parameter. If 1302 the "config" parameter cannot be used with the level specified in 1303 the answer, then the offerer MUST initiate another Offer/Answer 1304 round, or not use media format (payload type). 1306 o The parameters "config", "bpic", "width", "height", "framerate", 1307 "bitrate", "buffer" and "mode", describe the properties of the VC- 1308 1 bit stream that the offerer or answerer is sending for this 1309 media format configuration. 1311 In the case of unicast usage and when the direction attribute in 1312 the offer or answer is recvonly, the interpretation of these 1313 parameters is undefined and they MUST NOT be used. 1315 o The parameters "config", "width", "height", "bitrate" and "buffer" 1316 MUST be specified when the direction attribute is sendrecv or 1317 sendonly. 1319 o The parameters "max-width", "max-height", "max-framerate", "max- 1320 bitrate" and "max-buffer" MAY be specified in an offer or an 1321 answer, and their interpretation is as follows: 1323 When the direction attribute is sendonly, the parameters describe 1324 the limits of the VC-1 bit stream that the sender is capable of 1325 producing for the given profile and level, and for any lower level 1326 of the same profile. 1328 When the direction attribute is recvonly or sendrecv, the 1329 parameters describe properties of the receiver implementation. If 1330 the value of a property is less than what is allowed by the level 1331 of the VC-1 profile, then it SHALL be interpreted as a preferred 1332 value and the sender's VC-1 bit stream SHOULD NOT exceed it. If 1333 the value of a property is greater than what is allowed by the 1334 level of the VC-1 profile, then it SHALL be interpreted as the 1335 upper limit of the value that the receiver accepts for the given 1336 profile and level, and for any lower level of the same profile. 1338 For example, if a recvonly or sendrecv offer specifies 1339 "profile=0;level=1;max-bitrate=48000", then 48 kbps is merely a 1340 suggested bit rate, because all receiver implementations of Simple 1341 profile, Low level, are required to support bit rates of up to 96 1342 kbps. Assuming that the offer is accepted, the answerer should 1343 specify "bitrate=48000" in the answer, but any value up to 96000 1344 is allowed. But if the offer specifies "max-bitrate=200000", this 1345 means that the receiver implementation supports a maximum of 200 1346 kbps for the given profile and level (or lower level.) In this 1347 case, the answerer is allowed to answer with a bitrate parameter 1348 of up to 200000. 1350 o If an offerer wishes to have non-symmetrical capabilities between 1351 sending and receiving, e.g., use different levels in each 1352 direction, then the offerer has to offer different RTP sessions. 1353 This can be done by specifiying different media lines declared as 1354 "recvonly" and "sendonly", respectively. 1356 For streams being delivered over multicast, the following rules apply 1357 in addition: 1359 o The "level" parameter specifies the highest level of the VC-1 1360 profile used by the participants in the multicast session. The 1361 value of this parameter MUST NOT be changed by the answerer. 1362 Thus, a payload type can either be accepted unaltered or removed. 1364 o The parameters "config", "bpic", "width", "height", "framerate", 1365 "bitrate", "buffer" and "mode", specify properties of the VC-1 bit 1366 stream that will be sent, and/or received, on the multicast 1367 session. The parameters MAY be specified even if the direction 1368 attribute is recvonly. 1370 The values of these parameters MUST NOT be changed by the 1371 answerer. Thus, a payload type can either be accepted unaltered 1372 or removed. 1374 o The values of the parameters "max-width", "max-height", "max- 1375 framerate", "max-bitrate" and "max-buffer" MUST be supported by 1376 the answerer for all streams declared as sendrecv or recvonly. 1377 Otherwise, one of the following actions MUST be performed: the 1378 media format is removed, or the session rejected. 1380 6.4 Usage in Declarative Session Descriptions 1382 When VC-1 is offered over RTP using SDP in a declarative style, as in 1383 RTSP [12] or SAP [13], the following rules and limitations apply. 1385 o The parameters "profile" and "level" indicate only the properties 1386 of the coded bit stream. They do not imply a limit on capabilties 1387 supported by the sender. 1389 o The parameters "config", "width", "height", "bitrate" and "buffer" 1390 MUST be specified. 1392 o The parameters "max-width", "max-height", "max-framerate", "max- 1393 bitrate" and "max-buffer" MUST NOT be used. 1395 An example of media representation in SDP is as follows (Simple 1396 profile, Medium level): 1398 m=video 49170 RTP/AVP 98 1399 a=rtpmap:98 vc1/90000 1400 a=fmtp:98 profile=0;level=2;width=352;height=288;framerate=15000; 1401 bitrate=384000;buffer=2000;config=4e291800 1403 7. Security Considerations 1405 RTP packets using the payload format defined in this specification 1406 are subject to the security considerations discussed in the RTP 1407 specification [4], and in any appropriate RTP profile. This implies 1408 that confidentiality of the media streams is achieved by encryption; 1409 for example, through the application of SRTP [11]. 1411 A potential denial-of-service threat exists for data encodings using 1412 compression techniques that have non-uniform receiver-end 1413 computational load. The attacker can inject pathological RTP packets 1414 into the stream that are complex to decode and that cause the 1415 receiver to be overloaded. VC-1 is particularly vulnerable to such 1416 attacks, because it is possible for an attacker to generate RTP 1417 packets containing frames that affect the decoding process of many 1418 future frames. Therefore, the usage of data origin authentication 1419 and data integrity protection of at least the RTP packet is 1420 RECOMMENDED; for example, with SRTP [11]. 1422 Note that the appropriate mechanism to ensure confidentiality and 1423 integrity of RTP packets and their payloads is very dependent on the 1424 application and on the transport and signaling protocols employed. 1425 Thus, although SRTP is given as an example above, other possible 1426 choices exist. 1428 VC-1 bit streams can carry user-data, such as closed captioning 1429 information and content meta-data. The VC-1 specification does not 1430 define how to interpret user-data. Identifiers for user-data are 1431 required to be registered with SMPTE. It is conceivable for types of 1432 user-data to be defined to include programmatic content, such as 1433 scripts or commands that would be executed by the receiver. 1434 Depending on the type of user-data, it might be possible for a sender 1435 to generate user-data in a non-compliant manner to crash the receiver 1436 or make it temporarily unavailable. Senders that transport VC-1 bit 1437 streams SHOULD ensure that the user-data is compliant with the 1438 specification registered with SMPTE (see Annex F of [1].) Receivers 1439 SHOULD prevent malfunction in case of non-compliant user-data. 1441 It is important to note that VC-1 streams can have very high 1442 bandwidth requirements (up to 135 Mbps for high-definition video.) 1443 This is sufficient to cause potential for denial-of-service if 1444 transmitted onto many Internet paths. Therefore, users of this 1445 payload format MUST comply with the congestion control requirements 1446 described in section 8. 1448 8. Congestion Control 1450 Congestion control for RTP SHALL be used in accordance with RFC 3550 1451 [3], and with any applicable RTP profile; e.g., RFC 3551 [15]. 1453 If best-effort service is being used, users of this payload format 1454 MUST monitor packet loss to ensure that the packet loss rate is 1455 within acceptable parameters. Packet loss is considered acceptable 1456 if a TCP flow across the same network path, and experiencing the same 1457 network conditions, would achieve an average throughput, measured on 1458 a reasonable timescale, that is not less than the RTP flow is 1459 achieving. This condition can be satisfied by implementing 1460 congestion control mechanisms to adapt the transmission rate or by 1461 arranging for a receiver to leave the session if the loss rate is 1462 unacceptably high. 1464 The bit rate adaptation necessary for obeying the congestion control 1465 principle is easily achievable when real-time encoding is used. When 1466 pre-encoded content is being transmitted, bandwidth adaptation 1467 requires one or more of the following: 1469 - The availability of more than one coded representation of the same 1470 content at different bit rates. The switching between the 1471 different representations can normally be performed in the same 1472 RTP session, by switching streams at random access point 1473 boundaries. 1475 - The existence of non-reference frames (e.g., B-frames) in the bit 1476 stream. Non-reference frames can be discarded by the transmitter 1477 prior to encapsulation in RTP. 1479 Only when non-downgradable parameters (such as the VC-1 "profile" 1480 parameter) are required to be changed does it become necessary to 1481 terminate and re-start the media stream. This may be accomplished by 1482 using a different RTP payload type. 1484 Regardless of the method used for bandwidth adaptation, the resulting 1485 bit stream MUST be compliant with the VC-1 specification [1]. For 1486 example, if non-reference frames are discarded, then the FRMCNT 1487 syntax element (Simple and Main profile frames only) and the optional 1488 TFCNTR syntax element (Advanced profile frames only) must increment 1489 as if no frames had been discarded. Because the TFCNTR syntax 1490 element counts the frames in the display order, which is different 1491 from the order in which they are transmitted (the coded order), it 1492 will require the transmitter to "look ahead", or buffer, of some 1493 number of frames. 1495 As another example, when switching between different representations 1496 of the same content, it may be necessary to signal a discontinuity by 1497 modifying the FRMCNT field, or if Advanced profile is used, by 1498 setting the BROKEN_LINK flag in the entry-point header to 1. 1500 This payload format may also be used in networks that provide 1501 quality-of-service guarantees. If enhanced service is being used, 1502 receivers SHOULD monitor packet loss to ensure that the service that 1503 was requested is actually being delivered. If it is not, then they 1504 SHOULD assume that they are receiving best-effort service and behave 1505 accordingly. 1507 9. IANA Considerations 1509 IANA is requested to register the media type "video/vc1" and the 1510 associated RTP payload format, as specified in section 6.1 of this 1511 document, in the Media Types registry and in the RTP Payload Format 1512 MIME types registry. 1514 10. References 1516 10.1 Normative references 1518 [1] Society of Motion Picture and Television Engineers, "VC-1 1519 Compressed Video Bitstream Format and Decoding Process", SMPTE 1520 421M. 1521 [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1522 Levels", BCP 14, RFC 2119, March 1997. 1523 [3] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 1524 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 1525 RFC 3550, July 2003. 1526 [4] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", 1527 RFC 2327, April 1998. 1528 [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 1529 Session Description Protocol (SDP)", RFC 3264, June 2002. 1530 [6] Josefsson, S., Ed., "The Base16, Base32, and Base64 Data 1531 Encodings", RFC 3548, July 2003. 1532 [7] Freed, N. and Klensin, J., "Media Type Specifications and 1533 Registration Procedures", BCP 13, RFC 4288, December 2005. 1534 [8] Casner, S. and P. Hoschka, "MIME Type Registration of RTP Payload 1535 Formats", RFC 3555, July 2003. 1537 10.2 Informative references 1539 [9] Srinivasan, S., Hsu, P., Holcomb, T., Mukerjee, K., Regunathan, 1540 S.L., Lin, B., Liang, J., Lee, M., and J. Ribas-Corbera, "Windows 1541 Media Video 9: overview and applications", Signal Processing: 1542 Image Communication, Volume 19, Issue 9, October 2004. 1543 [10]Ribas-Corbera, J., Chou, P.A., and S.L. Regunathan, "A 1544 generalized hypothetical reference decoder for H.264/AVC", IEEE 1545 Transactions on Circuits and Systems for Video Technology, August 1546 2003. 1547 [11]Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1548 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 1549 3711, March 2004. 1550 [12]Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 1551 Protocol (RTSP)", RFC 2326, April 1998. 1552 [13]Handley, M., Perkins, C., and E. Whelan, "Session Announcement 1553 Protocol", RFC 2974, October 2000. 1554 [14]Handley, M., Schulzrinne, H., Schooler, E. and J. Rosenberg, 1555 "SIP: Session Initiation Protocol", RFC 2543, March 1999. 1557 [15]Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 1558 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 1560 Author's Addresses 1562 Anders Klemets 1563 Microsoft Corp. 1564 1 Microsoft Way 1565 Redmond, WA 98052 1566 USA 1567 Email: anderskl@microsoft.com 1569 Acknowledgements 1571 Thanks to Regis Crinon, Miska Hannuksela, Colin Perkins, Shankar 1572 Regunathan, Gary Sullivan, Stephan Wenger and Magnus Westerlund for 1573 providing detailed feedback on this document. 1575 IPR Notices 1577 The IETF takes no position regarding the validity or scope of any 1578 Intellectual Property Rights or other rights that might be claimed to 1579 pertain to the implementation or use of the technology described in 1580 this document or the extent to which any license under such rights 1581 might or might not be available; nor does it represent that it has 1582 made any independent effort to identify any such rights. Information 1583 on the procedures with respect to rights in RFC documents can be 1584 found in BCP 78 and BCP 79. 1586 Copies of IPR disclosures made to the IETF Secretariat and any 1587 assurances of licenses to be made available, or the result of an 1588 attempt made to obtain a general license or permission for the use of 1589 such proprietary rights by implementers or users of this 1590 specification can be obtained from the IETF on-line IPR repository at 1591 http://www.ietf.org/ipr. 1593 The IETF invites any interested party to bring to its attention any 1594 copyrights, patents or patent applications, or other proprietary 1595 rights that may cover technology that may be required to implement 1596 this standard. Please address the information to the IETF at 1597 ietf-ipr@ietf.org. 1599 Full Copyright Statement 1601 Copyright (C) The Internet Society (2006). 1603 This document is subject to the rights, licenses and restrictions 1604 contained in BCP 78, and except as set forth therein, the authors 1605 retain all their rights. 1607 This document and the information contained herein are provided on an 1608 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1609 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1610 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1611 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1612 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1613 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.