idnits 2.17.1 draft-ietf-avt-rtp-h264-rcdo-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 19, 2010) is 4907 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 4566 (ref. '8') (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '11') (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 4288 (ref. '13') (Obsoleted by RFC 6838) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Audio/Video Transport WG T. Kristensen 3 Internet-Draft P. Luthi 4 Intended status: Standards Track TANDBERG 5 Expires: May 23, 2011 November 19, 2010 7 RTP Payload Format for H.264 Reduced-Complexity Decoding Operation 8 (RCDO) Video 9 draft-ietf-avt-rtp-h264-rcdo-08 11 Abstract 13 This document describes an RTP payload format for the Reduced- 14 Complexity Decoding Operation (RCDO) for H.264 Baseline profile 15 bitstreams, as specified in ITU-T Recommendation H.241. RCDO reduces 16 the decoding cost and resource consumption of the video processing. 17 The RCDO RTP payload format is based on the H.264 RTP payload format. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on May 23, 2011. 36 Copyright Notice 38 Copyright (c) 2010 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 This document may contain material from IETF Documents or IETF 52 Contributions published or made publicly available before November 53 10, 2008. The person(s) controlling the copyright in some of this 54 material may not have granted the IETF Trust the right to allow 55 modifications of such material outside the IETF Standards Process. 56 Without obtaining an adequate license from the person(s) controlling 57 the copyright in such materials, this document may not be modified 58 outside the IETF Standards Process, and derivative works of it may 59 not be created outside the IETF Standards Process, except to format 60 it for publication as an RFC or to translate it into languages other 61 than English. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 2. Conventions, Definitions and Acronyms . . . . . . . . . . . . 3 67 3. Media Format Background . . . . . . . . . . . . . . . . . . . 3 68 4. Payload Format . . . . . . . . . . . . . . . . . . . . . . . . 4 69 5. Congestion Control Considerations . . . . . . . . . . . . . . 4 70 6. Payload Format Parameters . . . . . . . . . . . . . . . . . . 4 71 6.1. Media Type Definition . . . . . . . . . . . . . . . . . . 4 72 7. Mapping to SDP . . . . . . . . . . . . . . . . . . . . . . . . 20 73 7.1. Offer/Answer Considerations . . . . . . . . . . . . . . . 20 74 7.2. Declarative SDP Considerations . . . . . . . . . . . . . . 20 75 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 76 9. Security Considerations . . . . . . . . . . . . . . . . . . . 21 77 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 78 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 79 11.1. Normative References . . . . . . . . . . . . . . . . . . . 21 80 11.2. Informative references . . . . . . . . . . . . . . . . . . 22 81 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 83 1. Introduction 85 ITU-T Recommendation H.241 [3] specifies a reduced-complexity 86 decoding operation (RCDO) for use with H.264 [2] Baseline profile 87 bitstreams. It also specifies a bitstream constraint associated with 88 RCDO and a mechanism for signalling RCDO within the bitstream. The 89 RCDO signalling indicates that the bitstream conforms to the 90 bitstream constraint and that the decoder shall apply the RCDO 91 decoding process to the bitstream. 93 RCDO for H.264 offers a solution to support higher resolutions at the 94 same high framerates used in current implementations. This is 95 achieved by reducing the processing requirements and thus the 96 decoding cost/resource consumption of the video processing. 98 This document defines media type parameters and allows use in systems 99 based on the Session Description Protocol (SDP) [8] for signalling. 101 2. Conventions, Definitions and Acronyms 103 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 104 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 105 document are to be interpreted as described in [4]. 107 RFC-editor note: RFC XXXX is to be replaced by the RFC number this 108 specification receives when published. 110 RFC-editor note: RFC YYYY is to be replaced by the RFC number the 111 normative reference RFC3984bis [1] receives when published. 113 3. Media Format Background 115 The Reduced-Complexity Decoding Operation (RCDO) for H.264 Baseline 116 profile bitstreams is specified in Annex B of H.241 [3]. RCDO is 117 specified as a separate H.264 mode, and is distinct from any profile 118 defined in H.264. An RCDO bitstream obey to all the constraints of 119 the Baseline profile. 121 The media format is based on the H.264 RTP payload format as 122 specified in RFC YYYY [1]. Therefore, RFC YYYY constitutes the basis 123 for this document and is referred to several times. 125 In order to signal H.264 additional modes, Table 9f of H.241 [3] 126 specifies an AdditionalModesSupported parameter. Currently, the only 127 additional mode defined is RCDO. 129 Informative note: Other additional modes may be defined in the 130 future. H.264 additional modes may or may not be distinct from 131 the Profiles in H.264. 133 A separate media subtype, named H264-RCDO, is defined to ensure 134 backward compatibility with deployed implementations of H.264. 136 4. Payload Format 138 The payload format defined in Section 5 of RFC YYYY [1] SHALL be 139 used. This includes the RTP header usage and the payload format in 140 RFC YYYY. Examples of typical RTP packets can be found in RFC YYYY. 142 5. Congestion Control Considerations 144 Congestion control for RTP SHALL be used in accordance with RFC 3550 145 [6], and with any applicable RTP profile; e.g., RFC 3551 [7]. If 146 best-effort service is being used, users of this payload format SHALL 147 monitor packet loss to ensure that the packet loss rate is within 148 acceptable parameters. 150 6. Payload Format Parameters 152 This RTP payload format is identified using the H264-RCDO media 153 subtype which is registered in accordance with RFC 4855 [10] and 154 using the template of RFC 4288 [13]. 156 6.1. Media Type Definition 158 RFC-editor note: We need to sync with RFC YYYY [1], when it is 159 certain no more changes will go into that draft, and copy across any 160 changes to the identical part of the media subtype definition. 162 Informative note: The media subtype definition for H264-RCDO is 163 based on the definition of the H264 media subtype as specified in 164 Section 8.1 of RFC YYYY [1]. Except for the profile-level-id 165 parameter, where new semantics are specified below, the optional 166 parameters are copied from RFC YYYY [1] in order to provide a 167 complete, self-contained media subtype registration to IANA. The 168 references are updated to match the numbering used in this 169 document. 171 The media subtype for RCDO for H.264 is allocated from the IETF tree. 173 Type name: video 174 Subtype name: H264-RCDO 176 Required parameters: 178 rate: Indicates the RTP timestamp clock rate. The rate value MUST 179 be 90000. 181 Optional parameters: 183 profile-level-id: A base16 RFC 4648 [9] (hexadecimal) representation 184 of the following three bytes in the sequence parameter set NAL 185 unit specified in H.264 [2]: 1) profile_idc, 2) a byte herein 186 referred to as profile-iop, composed of the values of 187 constraint_set0_flag, constraint_set1_flag, constraint_set2_flag, 188 constraint_set3_flag, and reserved_zero_4bits in bit-significance 189 order, starting from the most significant bit, and 3) level_idc. 190 Note that reserved_zero_4bits is required to be equal to 0 in 191 H.264 [2], but other values for it may be specified in the future 192 by ITU-T or ISO/IEC. 194 The profile-level-id parameter indicates the default sub-profile, 195 i.e. the subset of coding tools that may have been used to 196 generate the stream or that the receiver supports, and the default 197 level of the stream or the receiver supports. 199 RCDO is distinct from any profile, this implies that the profile 200 value 0 (no profile) and the profile_idc byte of the profile- 201 level-id parameter are equal to 0. An RCDO bitstream MUST obey to 202 all the constraints of the Baseline profile. Therefore, only 203 constraint_set0_flag is equal to 1 in the profile-iop part of the 204 profile-level-id parameter, the remaining bits are set to 0. 206 If the profile-level-id parameter is used to indicate properties 207 of a NAL unit stream, it indicates that, to decode the stream, the 208 lowest level the decoder has to support is the default level. 209 If the profile-level-id parameter is used for capability exchange 210 or session setup procedure, and if max-recv-level is not present, 211 the default level from profile-level-id indicates the highest 212 level the codec wishes to support. If max-recv-level is present 213 it indicates the highest level the codec supports for receiving. 214 For either receiving or sending, all levels that are lower than 215 the highest level supported MUST also be supported. 217 For example, if a codec supports level 1.3, the profile-level-id 218 becomes 00800d, in which 00 indicates the "no profile" value, 80 219 indicates the constraints of the Baseline profile and 0d indicates 220 level 1.3. When level 2.1 is supported, the profile-level-id 221 becomes 008015. 223 If no profile-level-id is present, level 1 MUST be implied, i.e. 224 equivalent to profile-level-id 00800a. 226 Informative note: The definitions of the remaining optional 227 parameters below are copied verbatim from Section 8.1 of RFC 228 YYYY [1]. Only the references are updated to match the 229 numbering used in this document. 230 max-recv-level: This parameter MAY be used to indicate the highest 231 level a receiver supports when the highest level is higher than 232 the default level (the level indicated by profile-level-id). The 233 value of max-recv-level is a base16 (hexadecimal) representation 234 of the two bytes after the syntax element profile_idc in the 235 sequence parameter set NAL unit specified in H.264 [2]: profile- 236 iop (as defined above) and level_idc. If (the level_idc byte of 237 max-recv-level is equal to 11 and bit 4 of the profile-iop byte of 238 max-recv-level is equal to 1) or (the level_idc byte of max-recv- 239 level is equal to 9 and bit 4 of the profile-iop byte of max-recv- 240 level is equal to 0), the highest level the receiver supports is 241 level 1b. Otherwise, the highest level the receiver supports is 242 equal to the level_idc byte of max-recv-level divided by 10. 244 max-recv-level MUST NOT be present if the highest level the 245 receiver supports is not higher than the default level. 247 max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br: These 248 parameters MAY be used to signal the capabilities of a receiver 249 implementation. These parameters MUST NOT be used for any other 250 purpose. The highest level conveyed in the value of the profile- 251 level-id parameter or the max-recv-level parameter MUST be such 252 that the receiver is fully capable of supporting. max-mbps, max- 253 smbps, max-fs, max-cpb, max-dpb, and max-br MAY be used to 254 indicate capabilities of the receiver that extend the required 255 capabilities of the signaled highest level, as specified below. 257 When more than one parameter from the set (max-mbps, max-smbps , 258 max-fs, max-cpb, max-dpb, max-br) is present, the receiver MUST 259 support all signaled capabilities simultaneously. For example, if 260 both max-mbps and max-br are present, the signaled highest level 261 with the extension of both the frame rate and bit rate is 262 supported. That is, the receiver is able to decode NAL unit 263 streams in which the macroblock processing rate is up to max-mbps 264 (inclusive), the bit rate is up to max-br (inclusive), the coded 265 picture buffer size is derived as specified in the semantics of 266 the max-br parameter below, and other properties comply with the 267 highest level specified in the value of the profile-level-id 268 parameter or the max-recv-level parameter. 270 If a receiver can support all the properties of level A, the 271 highest level specified in the value of the profile-level-id 272 parameter or the max-recv-level parameter MUST be level A (i.e. 273 MUST NOT be lower than level A). In other words, a receiver MUST 274 NOT signal values of max-mbps, max-fs, max-cpb, max-dpb, and 275 max-br that taken together meet the requirements of a higher level 276 compared to the highest level specified in the value of the 277 profile-level-id parameter or the max-recv- level parameter. 279 Informative note: When the OPTIONAL media type parameters are 280 used to signal the properties of a NAL unit stream, max-mbps, 281 max-smbps, max-fs, max-cpb, max-dpb, and max-br are not 282 present, and the value of profile-level-id must always be such 283 that the NAL unit stream complies fully with the specified 284 profile and level. 286 max-mbps: The value of max-mbps is an integer indicating the maximum 287 macroblock processing rate in units of macroblocks per second. 288 The max-mbps parameter signals that the receiver is capable of 289 decoding video at a higher rate than is required by the signaled 290 highest level conveyed in the value of the profile-level-id 291 parameter or the max-recv-level parameter. When max-mbps is 292 signaled, the receiver MUST be able to decode NAL unit streams 293 that conform to the signaled highest level, with the exception 294 that the MaxMBPS value in Table A-1 of H.264 [2] for the signaled 295 highest level is replaced with the value of max-mbps. The value 296 of max-mbps MUST be greater than or equal to the value of MaxMBPS 297 given in Table A-1 of H.264 [2] for the highest level. Senders 298 MAY use this knowledge to send pictures of a given size at a 299 higher picture rate than is indicated in the signaled highest 300 level. 302 max-smbps: The value of max-smbps is an integer indicating the 303 maximum static macroblock processing rate in units of static 304 macroblocks per second, under the hypothetical assumption that all 305 macroblocks are static macroblocks. When max-smbps is signalled 306 the MaxMBPS value in Table A-1 of H.264 [2] should be replaced 307 with the result of the following computation: 309 o If the parameter max-mbps is signalled, set a variable 310 MaxMacroblocksPerSecond to the value of max-mbps. Otherwise, set 311 MaxMacroblocksPerSecond equal to the value of MaxMBPS in Table A-1 312 H.264 [2] for the highest level. 314 o Set a variable P_non-static to the proportion of non- static 315 macroblocks in picture n. 317 o Set a variable P_static to the proportion of static macroblocks 318 in picture n. 320 o The value of MaxMBPS in Table A-1 of H.264 [2] should be 321 considered by the encoder to be equal to: 323 MaxMacroblocksPerSecond * max-smbps / (P_non-static * max-smbps 324 + P_static * MaxMacroblocksPerSecond) 326 The encoder should recompute this value for each picture. The 327 value of max-smbps MUST be greater than the value of MaxMBPS given 328 in Table A-1 of H.264 [2] for the highest level. Senders MAY use 329 this knowledge to send pictures of a given size at a higher 330 picture rate than is indicated in the signaled highest level. 332 max-fs: The value of max-fs is an integer indicating the maximum 333 frame size in units of macroblocks. The max-fs parameter signals 334 that the receiver is capable of decoding larger picture sizes than 335 are required by the signaled highest level conveyed in the value 336 of the profile-level-id parameter or the max-recv-level parameter. 337 When max-fs is signaled, the receiver MUST be able to decode NAL 338 unit streams that conform to the signaled highest level, with the 339 exception that the MaxFS value in Table A-1 of H.264 [2] for the 340 signaled highest level is replaced with the value of max-fs. The 341 value of max-fs MUST be greater than or equal to the value of 342 MaxFS given in Table A-1 of H.264 [2] for the highest level. 343 Senders MAY use this knowledge to send larger pictures at a 344 proportionally lower frame rate than is indicated in the signaled 345 highest level. 347 max-cpb: The value of max-cpb is an integer indicating the maximum 348 coded picture buffer size in units of 1000 bits for the VCL HRD 349 parameters (see A.3.1 item i of H.264 [2]) and in units of 1200 350 bits for the NAL HRD parameters (see A.3.1 item j of H.264 [2]). 351 The max-cpb parameter signals that the receiver has more memory 352 than the minimum amount of coded picture buffer memory required by 353 the signaled highest level conveyed in the value of the profile- 354 level-id parameter or the max-recv-level parameter. When max-cpb 355 is signaled, the receiver MUST be able to decode NAL unit streams 356 that conform to the signaled highest level, with the exception 357 that the MaxCPB value in Table A-1 of H.264 [2] for the signaled 358 highest level is replaced with the value of max-cpb. The value of 359 max-cpb MUST be greater than or equal to the value of MaxCPB given 360 in Table A-1 of H.264 [2] for the highest level. Senders MAY use 361 this knowledge to construct coded video streams with greater 362 variation of bit rate than can be achieved with the MaxCPB value 363 in Table A-1 of H.264 [2]. 365 Informative note: The coded picture buffer is used in the 366 hypothetical reference decoder (Annex C) of H.264. The use of 367 the hypothetical reference decoder is recommended in H.264 368 encoders to verify that the produced bitstream conforms to the 369 standard and to control the output bitrate. Thus, the coded 370 picture buffer is conceptually independent of any other 371 potential buffers in the receiver, including de-interleaving 372 and de-jitter buffers. The coded picture buffer need not be 373 implemented in decoders as specified in Annex C of H.264, but 374 rather standard-compliant decoders can have any buffering 375 arrangements provided that they can decode standard- compliant 376 bitstreams. Thus, in practice, the input buffer for video 377 decoder can be integrated with de- interleaving and de-jitter 378 buffers of the receiver. 380 max-dpb: The value of max-dpb is an integer indicating the maximum 381 decoded picture buffer size in units of 1024 bytes. The max-dpb 382 parameter signals that the receiver has more memory than the 383 minimum amount of decoded picture buffer memory required by the 384 signaled highest level conveyed in the value of the 385 profile-level-id parameter or the max-recv-level parameter. When 386 max-dpb is signaled, the receiver MUST be able to decode NAL unit 387 streams that conform to the signaled highest level, with the 388 exception that the MaxDPB value in Table A-1 of H.264 [2] for the 389 signaled highest level is replaced with the value of max-dpb. 390 Consequently, a receiver that signals max-dpb MUST be capable of 391 storing the following number of decoded frames, complementary 392 field pairs, and non- paired fields in its decoded picture buffer: 394 Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs * 256 * 395 ChromaFormatFactor ), 16) 397 PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are 398 defined in H.264 [2]. 400 The value of max-dpb MUST be greater than or equal to the value of 401 MaxDPB given in Table A-1 of H.264 [2] for the highest level. 402 Senders MAY use this knowledge to construct coded video streams 403 with improved compression. 405 Informative note: This parameter was added primarily to 406 complement a similar codepoint in the ITU-T Recommendation 407 H.245, so as to facilitate signaling gateway designs. The 408 decoded picture buffer stores reconstructed samples. There is 409 no relationship between the size of the decoded picture buffer 410 and the buffers used in RTP, especially de-interleaving and de- 411 jitter buffers. 413 max-br: The value of max-br is an integer indicating the maximum 414 video bit rate in units of 1000 bits per second for the VCL HRD 415 parameters (see A.3.1 item i of H.264 [2]) and in units of 1200 416 bits per second for the NAL HRD parameters (see A.3.1 item j of 417 H.264 [2]). 419 The max-br parameter signals that the video decoder of the 420 receiver is capable of decoding video at a higher bit rate than is 421 required by the signaled highest level conveyed in the value of 422 the profile-level-id parameter or the max-recv- level parameter. 424 When max-br is signaled, the video codec of the receiver MUST be 425 able to decode NAL unit streams that conform to the signaled 426 highest level, with the following exceptions in the limits 427 specified by the highest level: 429 o The value of max-br replaces the MaxBR value in Table A-1 of 430 H.264 [2] for the highest level. 432 o When the max-cpb parameter is not present, the result of the 433 following formula replaces the value of MaxCPB in Table A-1 of 434 H.264 [2]: (MaxCPB of the signaled level) * max-br / (MaxBR of the 435 signaled highest level). 437 For example, if a receiver signals capability for Level 1.2 with 438 max-br equal to 1550, this indicates a maximum video bitrate of 439 1550 kbits/sec for VCL HRD parameters, a maximum video bitrate of 440 1860 kbits/sec for NAL HRD parameters, and a CPB size of 4036458 441 bits (1550000 / 384000 * 1000 * 1000). 443 The value of max-br MUST be greater than or equal to the value 444 MaxBR given in Table A-1 of H.264 [2] for the signaled highest 445 level. 447 Senders MAY use this knowledge to send higher bitrate video as 448 allowed in the level definition of Annex A of H.264, to achieve 449 improved video quality. 451 Informative note: This parameter was added primarily to 452 complement a similar codepoint in the ITU-T Recommendation 453 H.245, so as to facilitate signaling gateway designs. No 454 assumption can be made from the value of this parameter that 455 the network is capable of handling such bit rates at any given 456 time. In particular, no conclusion can be drawn that the 457 signaled bit rate is possible under congestion control 458 constraints. 460 redundant-pic-cap: This parameter signals the capabilities of a 461 receiver implementation. When equal to 0, the parameter indicates 462 that the receiver makes no attempt to use redundant coded pictures 463 to correct incorrectly decoded primary coded pictures. When equal 464 to 0, the receiver is not capable of using redundant slices; 465 therefore, a sender SHOULD avoid sending redundant slices to save 466 bandwidth. When equal to 1, the receiver is capable of decoding 467 any such redundant slice that covers a corrupted area in a primary 468 decoded picture (at least partly), and therefore a sender MAY send 469 redundant slices. When the parameter is not present, then a value 470 of 0 MUST be used for redundant-pic-cap. When present, the value 471 of redundant-pic-cap MUST be either 0 or 1. 473 When the profile-level-id parameter is present in the same 474 signaling as the redundant-pic-cap parameter, and the profile 475 indicated in profile-level-id is such that it disallows the use of 476 redundant coded pictures (e.g., Main Profile), the value of 477 redundant-pic-cap MUST be equal to 0. When a receiver indicates 478 redundant-pic-cap equal to 0, the received stream SHOULD NOT 479 contain redundant coded pictures. 481 Informative note: Even if redundant-pic-cap is equal to 0, the 482 decoder is able to ignore redundant codec pictures provided 483 that the decoder supports such a profile (Baseline, Extended) 484 in which redundant coded pictures are allowed. 486 Informative note: Even if redundant-pic-cap is equal to 1, the 487 receiver may also choose other error concealment strategies to 488 replace or complement decoding of redundant slices. 490 sprop-parameter-sets: This parameter MAY be used to convey any 491 sequence and picture parameter set NAL units (herein referred to 492 as the initial parameter set NAL units) that can be placed in the 493 NAL unit stream to precede any other NAL units in decoding order. 494 The parameter MUST NOT be used to indicate codec capability in any 495 capability exchange procedure. The value of the parameter is a 496 comma (',') separated list of base64 RFC 4648 [9] representations 497 of parameter set NAL units as specified in sections 7.3.2.1 and 498 7.3.2.2 of H.264 [2]. Note that the number of bytes in a 499 parameter set NAL unit is typically less than 10, but a picture 500 parameter set NAL unit can contain several hundreds of bytes. 502 Informative note: When several payload types are offered in the 503 SDP Offer/Answer model, each with its own sprop- parameter-sets 504 parameter, then the receiver cannot assume that those parameter 505 sets do not use conflicting storage locations (i.e., identical 506 values of parameter set identifiers). Therefore, a receiver 507 should buffer all sprop-parameter-sets and make them available 508 to the decoder instance that decodes a certain payload type. 510 The "sprop-parameter-sets" parameter MUST only contain parameter 511 sets that are conforming to the profile-level-id, i.e., the subset 512 of coding tools indicated by any of the parameter sets MUST be 513 equal to the default sub-profile, and the level indicated by any 514 of the parameter sets MUST be equal to the default level. 516 sprop-level-parameter-sets: This parameter MAY be used to convey any 517 sequence and picture parameter set NAL units (herein referred to 518 as the initial parameter set NAL units) that can be placed in the 519 NAL unit stream to precede any other NAL units in decoding order 520 and that are associated with one or more levels different than the 521 default level. The parameter MUST NOT be used to indicate codec 522 capability in any capability exchange procedure. 524 The sprop-level-parameter-sets parameter contains parameter sets 525 for one or more levels which are different than the default level. 526 All parameter sets associated with one level are clustered and 527 prefixed with a three-byte field which has the same syntax as 528 profile-level-id. This enables the receiver to install the 529 parameter sets for one level and discard the rest. The three-byte 530 field is named PLId, and all parameter sets associated with one 531 level are named PSL, which has the same syntax as sprop-parameter- 532 sets. Parameter sets for each level are represented in the form 533 of PLId:PSL, i.e., PLId followed by a colon (':') and the base64 534 RFC 4648 [9] representation of the initial parameter set NAL units 535 for the level. Each pair of PLId:PSL is also separated by a 536 colon. Note that a PSL can contain multiple parameter sets for 537 that level, separated with commas (','). 539 The subset of coding tools indicated by each PLId field MUST be 540 equal to the default sub-profile, and the level indicated by each 541 PLId field MUST be different than the default level. All sequence 542 parameter sets contained in each PSL MUST have the three bytes 543 from profile_idc to level_idc, inclusive, equal to the preceding 544 PLId. 546 Informative note: This parameter allows for efficient level 547 downgrade or upgrade in SDP Offer/Answer and out- of-band 548 transport of parameter sets, simultaneously. 550 use-level-src-parameter-sets: This parameter MAY be used to indicate 551 a receiver capability. The value MAY be equal to either 0 or 1. 552 When the parameter is not present, the value MUST be inferred to 553 be equal to 0. The value 0 indicates that the receiver does not 554 understand the sprop-level-parameter-sets parameter, and does not 555 understand the "fmtp" source attribute as specified in section 6.3 556 of RFC 5576 [14], and will ignore sprop-level-parameter- sets when 557 present, and will ignore sprop-parameter-sets when conveyed using 558 the "fmtp" source attribute. The value 1 indicates that the 559 receiver understands the sprop-level- parameter-sets parameter, 560 and understands the "fmtp" source attribute as specified in 561 section 6.3 of RFC 5576 [14], and is capable of using parameter 562 sets contained in the sprop-level- parameter-sets or contained in 563 the sprop-parameter-sets that is conveyed using the "fmtp" source 564 attribute. 566 Informative note: An RFC 3984 receiver does not understand 567 sprop-level-parameter-sets, use-level-src- parameter-sets, or 568 the "fmtp" source attribute as specified in section 6.3 of RFC 569 5576 [14]. Therefore, during SDP Offer/Answer, an RFC 3984 570 receiver as the answerer will simply ignore sprop-level- 571 parameter-sets, when present in an offer, and sprop-parameter- 572 sets conveyed using the "fmtp" source attribute as specified in 573 section 6.3 of RFC 5576 [14]. Assume that the offered payload 574 type was accepted at a level lower than the default level. If 575 the offered payload type included sprop-level-parameter-sets or 576 included sprop-parameter-sets conveyed using the "fmtp" source 577 attribute, and the offerer sees that the answerer has not 578 included use-level-src-parameter-sets equal to 1 in the answer, 579 the offerer knows that in-band transport of parameter sets is 580 needed. 582 in-band-parameter-sets: This parameter MAY be used to indicate a 583 receiver capability. The value MAY be equal to either 0 or 1. 584 The value 1 indicates that the receiver discards out-of-band 585 parameter sets in sprop-parameter-sets and sprop-level-parameter- 586 sets, therefore the sender MUST transmit all parameter sets in- 587 band. The value 0 indicates that the receiver utilizes out-of- 588 band parameter sets included in sprop-parameter-sets and/or sprop- 589 level-parameter-sets. However, in this case, the sender MAY still 590 choose to send parameter sets in-band. When in-band- parameter- 591 sets is equal to 1, use-level-src-parameter-sets MUST NOT be 592 present or MUST be equal to 0. When the parameter is not present, 593 this receiver capability is not specified, and therefore the 594 sender MAY send out-of-band parameter sets only, or it MAY send 595 in-band-parameter-sets only, or it MAY send both. 597 level-asymmetry-allowed: This parameter MAY be used in SDP Offer/ 598 Answer to indicate whether level asymmetry, i.e., sending media 599 encoded at a different level in the offerer-to-answerer direction 600 than the level in the answerer-to-offerer direction, is allowed. 601 The value MAY be equal to either 0 or 1. When the parameter is 602 not present, the value MUST be inferred to be equal to 0. The 603 value 1 in both the offer and the answer indicates that level 604 asymmetry is allowed. The value of 0 in either the offer or the 605 answer indicates the level asymmetry is not allowed. 607 If "level-asymmetry-allowed" is equal to 0 (or not present) in 608 either the offer or the answer, level asymmetry is not allowed. 609 In this case, the level to use in the direction from the offerer 610 to the answerer MUST be the same as the level to use in the 611 opposite direction. 613 packetization-mode: This parameter signals the properties of an RTP 614 payload type or the capabilities of a receiver implementation. 615 Only a single configuration point can be indicated; thus, when 616 capabilities to support more than one packetization-mode are 617 declared, multiple configuration points (RTP payload types) must 618 be used. 620 When the value of packetization-mode is equal to 0 or 621 packetization-mode is not present, the single NAL mode MUST be 622 used. This mode is in use in standards using ITU-T Recommendation 623 H.241 [3] (see section 12.1). When the value of packetization- 624 mode is equal to 1, the non-interleaved mode MUST be used. When 625 the value of packetization-mode is equal to 2, the interleaved 626 mode MUST be used. The value of packetization-mode MUST be an 627 integer in the range of 0 to 2, inclusive. 629 sprop-interleaving-depth: This parameter MUST NOT be present when 630 packetization-mode is not present or the value of packetization- 631 mode is equal to 0 or 1. This parameter MUST be present when the 632 value of packetization-mode is equal to 2. 634 This parameter signals the properties of an RTP packet stream. It 635 specifies the maximum number of VCL NAL units that precede any VCL 636 NAL unit in the RTP packet stream in transmission order and follow 637 the VCL NAL unit in decoding order. Consequently, it is 638 guaranteed that receivers can reconstruct NAL unit decoding order 639 when the buffer size for NAL unit decoding order recovery is at 640 least the value of sprop- interleaving-depth + 1 in terms of VCL 641 NAL units. 643 The value of sprop-interleaving-depth MUST be an integer in the 644 range of 0 to 32767, inclusive. 646 sprop-deint-buf-req: This parameter MUST NOT be present when 647 packetization-mode is not present or the value of packetization- 648 mode is equal to 0 or 1. It MUST be present when the value of 649 packetization- mode is equal to 2. 651 sprop-deint-buf-req signals the required size of the de- 652 interleaving buffer for the RTP packet stream. The value of the 653 parameter MUST be greater than or equal to the maximum buffer 654 occupancy (in units of bytes) required in such a de- interleaving 655 buffer that is specified in section 7.2 of RFC YYYY [1]. It is 656 guaranteed that receivers can perform the de-interleaving of 657 interleaved NAL units into NAL unit decoding order, when the de- 658 interleaving buffer size is at least the value of sprop-deint-buf- 659 req in terms of bytes. 661 The value of sprop-deint-buf-req MUST be an integer in the range 662 of 0 to 4294967295, inclusive. 664 Informative note: sprop-deint-buf-req indicates the required 665 size of the de-interleaving buffer only. When network jitter 666 can occur, an appropriately sized jitter buffer has to be 667 provisioned for as well. 669 deint-buf-cap: This parameter signals the capabilities of a receiver 670 implementation and indicates the amount of de-interleaving buffer 671 space in units of bytes that the receiver has available for 672 reconstructing the NAL unit decoding order. A receiver is able to 673 handle any stream for which the value of the sprop-deint-buf-req 674 parameter is smaller than or equal to this parameter. 676 If the parameter is not present, then a value of 0 MUST be used 677 for deint-buf-cap. The value of deint-buf-cap MUST be an integer 678 in the range of 0 to 4294967295, inclusive. 680 Informative note: deint-buf-cap indicates the maximum possible 681 size of the de-interleaving buffer of the receiver only. When 682 network jitter can occur, an appropriately sized jitter buffer 683 has to be provisioned for as well. 685 sprop-init-buf-time: This parameter MAY be used to signal the 686 properties of an RTP packet stream. The parameter MUST NOT be 687 present, if the value of packetization-mode is equal to 0 or 1. 689 The parameter signals the initial buffering time that a receiver 690 MUST wait before starting decoding to recover the NAL unit 691 decoding order from the transmission order. The parameter is the 692 maximum value of (decoding time of the NAL unit - transmission 693 time of a NAL unit), assuming reliable and instantaneous 694 transmission, the same timeline for transmission and decoding, and 695 that decoding starts when the first packet arrives. 697 An example of specifying the value of sprop-init-buf-time follows. 698 A NAL unit stream is sent in the following interleaved order, in 699 which the value corresponds to the decoding time and the 700 transmission order is from left to right: 702 0 2 1 3 5 4 6 8 7 ... 704 Assuming a steady transmission rate of NAL units, the transmission 705 times are: 707 0 1 2 3 4 5 6 7 8 ... 709 Subtracting the decoding time from the transmission time column- 710 wise results in the following series: 712 0 -1 1 0 -1 1 0 -1 1 ... 714 Thus, in terms of intervals of NAL unit transmission times, the 715 value of sprop-init-buf-time in this example is 1. The parameter 716 is coded as a non-negative base10 integer representation in clock 717 ticks of a 90-kHz clock. If the parameter is not present, then no 718 initial buffering time value is defined. Otherwise the value of 719 sprop-init-buf-time MUST be an integer in the range of 0 to 720 4294967295, inclusive. 722 In addition to the signaled sprop-init-buf-time, receivers SHOULD 723 take into account the transmission delay jitter buffering, 724 including buffering for the delay jitter caused by mixers, 725 translators, gateways, proxies, traffic-shapers, and other network 726 elements. 728 sprop-max-don-diff: This parameter MAY be used to signal the 729 properties of an RTP packet stream. It MUST NOT be used to signal 730 transmitter or receiver or codec capabilities. The parameter MUST 731 NOT be present if the value of packetization-mode is equal to 0 or 732 1. sprop-max-don-diff is an integer in the range of 0 to 32767, 733 inclusive. If sprop-max-don-diff is not present, the value of the 734 parameter is unspecified. sprop-max-don-diff is calculated as 735 follows: 737 sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)}, for any i and 738 any j>i, 740 where i and j indicate the index of the NAL unit in the 741 transmission order and AbsDON denotes a decoding order number of 742 the NAL unit that does not wrap around to 0 after 65535. In other 743 words, AbsDON is calculated as follows: Let m and n be consecutive 744 NAL units in transmission order. For the very first NAL unit in 745 transmission order (whose index is 0), AbsDON(0) = DON(0). For 746 other NAL units, AbsDON is calculated as follows: 748 If DON(m) == DON(n), AbsDON(n) = AbsDON(m) 750 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 751 AbsDON(n) = AbsDON(m) + DON(n) - DON(m) 753 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 754 AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n) 756 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 757 AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n)) 759 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 760 AbsDON(n) = AbsDON(m) - (DON(m) - DON(n)) 762 where DON(i) is the decoding order number of the NAL unit having 763 index i in the transmission order. The decoding order number is 764 specified in section 5.5 of RFC YYYY [1]. 766 Informative note: Receivers may use sprop-max-don-diff to 767 trigger which NAL units in the receiver buffer can be passed to 768 the decoder. 770 max-rcmd-nalu-size: This parameter MAY be used to signal the 771 capabilities of a receiver. The parameter MUST NOT be used for 772 any other purposes. The value of the parameter indicates the 773 largest NALU size in bytes that the receiver can handle 774 efficiently. The parameter value is a recommendation, not a 775 strict upper boundary. The sender MAY create larger NALUs but 776 must be aware that the handling of these may come at a higher cost 777 than NALUs conforming to the limitation. 779 The value of max-rcmd-nalu-size MUST be an integer in the range of 780 0 to 4294967295, inclusive. If this parameter is not specified, 781 no known limitation to the NALU size exists. Senders still have 782 to consider the MTU size available between the sender and the 783 receiver and SHOULD run MTU discovery for this purpose. 785 This parameter is motivated by, for example, an IP to H.223 video 786 telephony gateway, where NALUs smaller than the H.223 transport 787 data unit will be more efficient. A gateway may terminate IP; 788 thus, MTU discovery will normally not work beyond the gateway. 790 Informative note: Setting this parameter to a lower than 791 necessary value may have a negative impact. 793 sar-understood: This parameter MAY be used to indicate a receiver 794 capability and not anything else. The parameter indicates the 795 maximum value of aspect_ratio_idc (specified in H.264 [2]) smaller 796 than 255 that the receiver understands. Table E-1 of H.264 [2] 797 specifies aspect_ratio_idc equal to 0 as "unspecified", 1 to 16, 798 inclusive, as specific Sample Aspect Ratios (SARs), 17 to 254, 799 inclusive, as "reserved", and 255 as the Extended SAR, for which 800 SAR width and SAR height are explicitly signaled. Therefore, a 801 receiver with a decoder according to H.264 [2] understands 802 aspect_ratio_idc in the range of 1 to 16, inclusive and 803 aspect_ratio_idc equal to 255, in the sense that the receiver 804 knows what exactly the SAR is. For such a receiver, the value of 805 sar-understood is 16. If in the future Table E-1 of H.264 [2] is 806 extended, e.g., such that the SAR for aspect_ratio_idc equal to 17 807 is specified, then for a receiver with a decoder that understands 808 the extension, the value of sar-understood is 17. For a receiver 809 with a decoder according to the 2003 version of H.264 [2], the 810 value of sar- understood is 13, as the minimum reserved 811 aspect_ratio_idc therein is 14. 813 When sar-understood is not present, the value MUST be inferred to 814 be equal to 13. 816 sar-supported: This parameter MAY be used to indicate a receiver 817 capability and not anything else. The value of this parameter is 818 an integer in the range of 1 to sar-understood, inclusive, equal 819 to 255. The value of sar-supported equal to N smaller than 255 820 indicates that the receiver supports all the SARs corresponding to 821 H.264 aspect_ratio_idc values (see Table E-1 of H.264 [2]) in the 822 range from 1 to N, inclusive, without geometric distortion. The 823 value of sar-supported equal to 255 indicates that the receiver 824 supports all sample aspect ratios which are expressible using two 825 16-bit integer values as the numerator and denominator, i.e., 826 those that are expressible using the H.264 aspect_ratio_idc value 827 of 255 (Extended_SAR, see Table E-1 of H.264 [2]), without 828 geometric distortion. 830 H.264 compliant encoders SHOULD NOT send an aspect_ratio_idc equal 831 to 0, or an aspect_ratio_idc larger than sar-understood and 832 smaller than 255. H.264 compliant encoders SHOULD send an 833 aspect_ratio_idc that the receiver is able to display without 834 geometrical distortion. However, H.264 compliant encoders MAY 835 choose to send pictures using any SAR. 837 Note that the actual sample aspect ratio or extended sample aspect 838 ratio, when present, of the stream is conveyed in the Video 839 Usability Information (VUI) part of the sequence parameter set. 841 Encoding considerations: This type is only defined for transfer via 842 RTP (RFC 3550) and is framed and binary, see section 4.8 in 843 RFC4288. 845 Security considerations: See section 9 of RFC XXXX. 847 Interoperability considerations: None 849 Published specification: RFC XXXX and its reference section. 851 Applications that use this media type: Video streaming and 852 conferencing applications. 854 Additional information: None 856 Magic number(s): 858 File extension(s): 860 Macintosh file type code(s): 862 Person & email address to contact for further information: 863 Tom Kristensen , 865 Intended usage: COMMON 867 Restrictions on usage: This type depends on RTP framing, and hence 868 is only defined for transfer via RTP, ref RFC3550. Transport 869 within other framing protocols is not defined at this time. 871 Author: Tom Kristensen 873 Change controller: IETF Audio/Video Transport working group 874 delegated from the IESG. 876 7. Mapping to SDP 878 The mapping of the above defined payload format media subtype and its 879 parameters SHALL be done according to Section 3 of RFC 4855 [10]. 881 An example of the fmtp attribute in the media representation of a 882 level 2.2 bitstream is as follows: 884 a=fmtp:97 profile-level-id=008016 886 7.1. Offer/Answer Considerations 888 When H264-RCDO is offered over RTP using SDP in an Offer/Answer model 889 [5] for unicast and multicast usage, the limitations and rules 890 described in Section 8.2.2 of RFC YYYY [1] apply. Note that the 891 profile_idc byte of the H264-RCDO profile-level-id parameter can only 892 take the value of 0 (no profile). 894 For interoperability with systems not supporting H264-RCDO, it is 895 RECOMMENDED to offer the H264 media subtype as well. As specified in 896 RFC 3264 [5], listing the payload number for H264-RCDO before H264 in 897 the format list on the "m=" line signals that H264-RCDO is preferred 898 over H264. An example where this scheme is applied: 900 m=video 5555 RTP/AVP 97 98 901 a=rtpmap:97 H264-RCDO/90000 902 a=fmtp:97 profile-level-id=008016;max-mbps=42000;max-smbps=323500 903 a=rtpmap:98 H264/90000 904 a=fmtp:98 profile-level-id=428016;max-mbps=35000;max-smbps=323500 906 7.2. Declarative SDP Considerations 908 When H264-RCDO over RTP is offered with SDP in a declarative style, 909 as in the Real Time Streaming Protocol (RTSP) [11] or the Session 910 Announcement Protocol (SAP) [12], the considerations in Section 8.2.3 911 of RFC YYYY [1] apply. Note that the profile_idc byte of the H264- 912 RCDO profile-level-id parameter can only take the value of 0 (no 913 profile). 915 8. IANA Considerations 917 This document requests that IANA registers H264-RCDO as specified in 918 Section Section 6.1. The media subtype is also requested to be added 919 to the IANA registry for "RTP Payload Format MIME types" 920 (http://www.iana.org/assignments/rtp-parameters). 922 9. Security Considerations 924 RTP packets using the payload format defined in this specification 925 are subject to the security considerations discussed in the RTP 926 specification [6], and in any applicable RTP profile. Refer also to 927 the security considerations of the RTP Payload Format for H.264 Video 928 specification in RFC YYYY [1]. No additional security considerations 929 are introduced by this specification. 931 10. Acknowledgements 933 The authors would like to acknowledge Gisle Bjoentegaard and Arild 934 Fuldseth for their technical contribution to the specification. In 935 the final phases Roni Even did a helpful review. 937 11. References 939 11.1. Normative References 941 [1] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP Payload 942 Format for H.264 Video", draft-ietf-avt-rtp-rfc3984bis-12 (work 943 in progress), October 2010. 945 [2] International Telecommunications Union, "Advanced video coding 946 for generic audiovisual services", ITU-T Recommendation H.264, 947 November 2007. 949 [3] International Telecommunications Union, "Extended video 950 procedures and control signals for H.300-series terminals", 951 ITU-T Recommendation H.241, May 2006. 953 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 954 Levels", BCP 14, RFC 2119, March 1997. 956 [5] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 957 Session Description Protocol (SDP)", RFC 3264, June 2002. 959 [6] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 960 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 961 RFC 3550, July 2003. 963 [7] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 964 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 966 [8] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 967 Description Protocol", RFC 4566, July 2006. 969 [9] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", 970 RFC 4648, October 2006. 972 [10] Casner, S., "Media Type Registration of RTP Payload Formats", 973 RFC 4855, February 2007. 975 11.2. Informative references 977 [11] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 978 Protocol (RTSP)", RFC 2326, April 1998. 980 [12] Handley, M., Perkins, C., and E. Whelan, "Session Announcement 981 Protocol", RFC 2974, October 2000. 983 [13] Freed, N. and J. Klensin, "Media Type Specifications and 984 Registration Procedures", BCP 13, RFC 4288, December 2005. 986 [14] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media 987 Attributes in the Session Description Protocol (SDP)", 988 RFC 5576, June 2009. 990 Authors' Addresses 992 Tom Kristensen 993 TANDBERG 994 Philip Pedersens vei 22 995 N-1366 Lysaker 996 Norway 998 Phone: +47 67125125 999 Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no 1000 URI: http://www.tandberg.com 1001 Patrick Luthi 1002 TANDBERG 1003 Philip Pedersens vei 22 1004 N-1366 Lysaker 1005 Norway 1007 Email: patrick.luthi@tandberg.com 1008 URI: http://www.tandberg.com