idnits 2.17.1 draft-ietf-avt-rtp-rfc3984bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 4196. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 4173. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 4180. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 4186. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The exact meaning of the all-uppercase expression 'NOT REQUIRED' is not defined in RFC 2119. If it is intended as a requirements expression, it should be rewritten using one of the combinations defined in RFC 2119; otherwise it should not be all-uppercase. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: For streams being delivered over multicast, the following rules apply: [Ed. (YkW): Add rules for "deint-buf-cap" and "sar". If "deint-buf-cap" MUST not be used in offer/answer for multicast, say it. With the latest change, the rule for "deint-buf-cap" is the same as for unicast above.] -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 6, 2008) is 5653 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '4' is defined on line 4042, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' ** Obsolete normative reference: RFC 2327 (ref. '6') (Obsoleted by RFC 4566) ** Obsolete normative reference: RFC 3548 (ref. '7') (Obsoleted by RFC 4648) -- Obsolete informational reference (is this intentional?): RFC 2429 (ref. '10') (Obsoleted by RFC 4629) -- Obsolete informational reference (is this intentional?): RFC 2733 (ref. '17') (Obsoleted by RFC 5109) -- Obsolete informational reference (is this intentional?): RFC 2326 (ref. '26') (Obsoleted by RFC 7826) -- Obsolete informational reference (is this intentional?): RFC 5117 (ref. '28') (Obsoleted by RFC 7667) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG Y.-K. Wang 2 Internet Draft Nokia 3 Intended status: Standards track R. Even 4 Expires: April 2009 Self-employed 5 T. Kristensen 6 Tanberg 7 October 6, 2008 9 RTP Payload Format for H.264 Video 10 draft-ietf-avt-rtp-rfc3984bis-00.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html 35 This Internet-Draft will expire on April 6, 2009. 37 Copyright Notice 39 Copyright (C) The IETF Trust (2008). 41 Abstract 43 This memo describes an RTP Payload format for the ITU-T 44 Recommendation H.264 video codec and the technically identical 45 ISO/IEC International Standard 14496-10 video codec. The RTP payload 46 format allows for packetization of one or more Network Abstraction 47 Layer Units (NALUs), produced by an H.264 video encoder, in each RTP 48 payload. The payload format has wide applicability, as it supports 49 applications from simple low bit-rate conversational usage, to 50 Internet video streaming with interleaved transmission, to high bit- 51 rate video-on-demand. 53 This memo intends to obsolete RFC 3984. 55 Table of Contents 57 1. Introduction...................................................4 58 1.1. The H.264 Codec...........................................4 59 1.2. Parameter Set Concept.....................................5 60 1.3. Network Abstraction Layer Unit Types......................6 61 2. Conventions....................................................7 62 3. Scope..........................................................7 63 4. Definitions and Abbreviations..................................7 64 4.1. Definitions...............................................7 65 4.2. Abbreviations.............................................9 66 5. RTP Payload Format............................................10 67 5.1. RTP Header Usage.........................................10 68 5.2. Payload Structures.......................................12 69 5.3. NAL Unit Header Usage....................................13 70 5.4. Packetization Modes......................................16 71 5.5. Decoding Order Number (DON)..............................17 72 5.6. Single NAL Unit Packet...................................19 73 5.7. Aggregation Packets......................................20 74 5.7.1. Single-Time Aggregation Packet......................22 75 5.7.2. Multi-Time Aggregation Packets (MTAPs)..............24 76 5.7.3. Fragmentation Units (FUs)...........................28 77 6. Packetization Rules...........................................32 78 6.1. Common Packetization Rules...............................32 79 6.2. Single NAL Unit Mode.....................................33 80 6.3. Non-Interleaved Mode.....................................33 81 6.4. Interleaved Mode.........................................33 82 7. De-Packetization Process......................................34 83 7.1. Single NAL Unit and Non-Interleaved Mode.................34 84 7.2. Interleaved Mode.........................................34 85 7.2.1. Size of the Deinterleaving Buffer...................35 86 7.2.2. Deinterleaving Process..............................35 87 7.3. Additional De-Packetization Guidelines...................37 88 8. Payload Format Parameters.....................................38 89 8.1. Media Type Registration..................................38 90 8.2. SDP Parameters...........................................56 91 8.2.1. Mapping of Payload Type Parameters to SDP...........56 92 8.2.2. Usage with the SDP Offer/Answer Model...............56 93 8.2.3. Usage in Declarative Session Descriptions...........64 94 8.3. Examples.................................................65 95 8.4. Parameter Set Considerations.............................69 96 8.5. Decoder Refresh Point Procedure using In-Band Transport of 97 Parameter Sets (Informative)..................................72 98 8.5.1. IDR Procedure to Respond to a Request for a Decoder 99 Refresh Point..............................................72 100 8.5.2. Gradual Recovery Procedure to Respond to a Request for a 101 Decoder Refresh Point......................................72 102 9. Security Considerations.......................................73 103 10. Congestion Control...........................................74 104 11. IANA Consideration...........................................75 105 12. Informative Appendix: Application Examples...................75 106 12.1. Video Telephony according to ITU-T Recommendation H.241 107 Annex A.......................................................75 108 12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit 109 Aggregation...................................................75 110 12.3. Video Telephony, Interleaved Packetization Using NAL Unit 111 Aggregation...................................................76 112 12.4. Video Telephony with Data Partitioning..................77 113 12.5. Video Telephony or Streaming with FUs and Forward Error 114 Correction....................................................77 115 12.6. Low Bit-Rate Streaming..................................80 116 12.7. Robust Packet Scheduling in Video Streaming.............80 117 13. Informative Appendix: Rationale for Decoding Order Number....81 118 13.1. Introduction............................................81 119 13.2. Example of Multi-Picture Slice Interleaving.............81 120 13.3. Example of Robust Packet Scheduling.....................83 121 13.4. Robust Transmission Scheduling of Redundant Coded Slices87 122 13.5. Remarks on Other Design Possibilities...................87 123 14. Acknowledgements.............................................88 124 15. References...................................................88 125 15.1. Normative References....................................88 126 15.2. Informative References..................................89 127 Authors' Addresses...............................................91 128 Intellectual Property Statement..................................91 129 Disclaimer of Validity...........................................92 130 Acknowledgement..................................................92 131 16. Backward compatibility to RFC 3984...........................92 132 17. Changes from RFC 3984........................................92 133 17.1. Technical changes.......................................92 134 17.2. Editorial changes.......................................95 135 18. Open issues.................................................106 136 19. Changes Log.................................................107 138 1. Introduction 140 This memo intends to obsolete RFC 3984. [Ed. (YkW): Add a brief 141 summary of the changes to RFC 3984.] 143 1.1. The H.264 Codec 145 This memo specifies an RTP payload specification for the video coding 146 standard known as ITU-T Recommendation H.264 [1] and ISO/IEC 147 International Standard 14496 Part 10 [2] (both also known as Advanced 148 Video Coding, or AVC). In this memo the H.264 acronym is used for 149 the codec and the standard, but the memo is equally applicable to the 150 ISO/IEC counterpart of the coding standard. 152 The H.264 video codec has a very broad application range that covers 153 all forms of digital compressed video from, low bit-rate Internet 154 streaming applications to HDTV broadcast and Digital Cinema 155 applications with nearly lossless coding. Compared to the current 156 state of technology, the overall performance of H.264 is such that 157 bit rate savings of 50% or more are reported. Digital Satellite TV 158 quality, for example, was reported to be achievable at 1.5 Mbit/s, 159 compared to the current operation point of MPEG 2 video at around 3.5 160 Mbit/s [9]. 162 The codec specification [1] itself distinguishes conceptually between 163 a video coding layer (VCL) and a network abstraction layer (NAL). 164 The VCL contains the signal processing functionality of the codec; 165 mechanisms such as transform, quantization, and motion compensated 166 prediction; and a loop filter. It follows the general concept of 167 most of today's video codecs, a macroblock-based coder that uses 168 inter picture prediction with motion compensation and transform 169 coding of the residual signal. The VCL encoder outputs slices: a bit 170 string that contains the macroblock data of an integer number of 171 macroblocks, and the information of the slice header (containing the 172 spatial address of the first macroblock in the slice, the initial 173 quantization parameter, and similar information). Macroblocks in 174 slices are arranged in scan order unless a different macroblock 175 allocation is specified, by using the so-called Flexible Macroblock 176 Ordering syntax. In-picture prediction is used only within a slice. 177 More information is provided in [9]. 179 The Network Abstraction Layer (NAL) encoder encapsulates the slice 180 output of the VCL encoder into Network Abstraction Layer Units (NAL 181 units), which are suitable for transmission over packet networks or 182 use in packet oriented multiplex environments. Annex B of H.264 183 defines an encapsulation process to transmit such NAL units over 184 byte-stream oriented networks. In the scope of this memo, Annex B is 185 not relevant. 187 Internally, the NAL uses NAL units. A NAL unit consists of a one- 188 byte header and the payload byte string. The header indicates the 189 type of the NAL unit, the (potential) presence of bit errors or 190 syntax violations in the NAL unit payload, and information regarding 191 the relative importance of the NAL unit for the decoding process. 192 This RTP payload specification is designed to be unaware of the bit 193 string in the NAL unit payload. 195 One of the main properties of H.264 is the complete decoupling of the 196 transmission time, the decoding time, and the sampling or 197 presentation time of slices and pictures. The decoding process 198 specified in H.264 is unaware of time, and the H.264 syntax does not 199 carry information such as the number of skipped frames (as is common 200 in the form of the Temporal Reference in earlier video compression 201 standards). Also, there are NAL units that affect many pictures and 202 that are, therefore, inherently timeless. For this reason, the 203 handling of the RTP timestamp requires some special considerations 204 for NAL units for which the sampling or presentation time is not 205 defined or, at transmission time, unknown. 207 1.2. Parameter Set Concept 209 One very fundamental design concept of H.264 is to generate self- 210 contained packets, to make mechanisms such as the header duplication 211 of RFC 2429 [10] or MPEG-4's Header Extension Code (HEC) [11] 212 unnecessary. This was achieved by decoupling information relevant to 213 more than one slice from the media stream. This higher layer meta 214 information should be sent reliably, asynchronously, and in advance 215 from the RTP packet stream that contains the slice packets. 216 (Provisions for sending this information in-band are also available 217 for applications that do not have an out-of-band transport channel 218 appropriate for the purpose.) The combination of the higher-level 219 parameters is called a parameter set. The H.264 specification 220 includes two types of parameter sets: sequence parameter set and 221 picture parameter set. An active sequence parameter set remains 222 unchanged throughout a coded video sequence, and an active picture 223 parameter set remains unchanged within a coded picture. The sequence 224 and picture parameter set structures contain information such as 225 picture size, optional coding modes employed, and macroblock to slice 226 group map. 228 To be able to change picture parameters (such as the picture size) 229 without having to transmit parameter set updates synchronously to the 230 slice packet stream, the encoder and decoder can maintain a list of 231 more than one sequence and picture parameter set. Each slice header 232 contains a codeword that indicates the sequence and picture parameter 233 set to be used. 235 This mechanism allows the decoupling of the transmission of parameter 236 sets from the packet stream, and the transmission of them by external 237 means (e.g., as a side effect of the capability exchange), or through 238 a (reliable or unreliable) control protocol. It may even be possible 239 that they are never transmitted but are fixed by an application 240 design specification. 242 1.3. Network Abstraction Layer Unit Types 244 Tutorial information on the NAL design can be found in [12], [13], 245 and [14]. 247 All NAL units consist of a single NAL unit type octet, which also co- 248 serves as the payload header of this RTP payload format. The payload 249 of a NAL unit follows immediately. 251 The syntax and semantics of the NAL unit type octet are specified in 252 [1], but the essential properties of the NAL unit type octet are 253 summarized below. The NAL unit type octet has the following format: 255 +---------------+ 256 |0|1|2|3|4|5|6|7| 257 +-+-+-+-+-+-+-+-+ 258 |F|NRI| Type | 259 +---------------+ 261 The semantics of the components of the NAL unit type octet, as 262 specified in the H.264 specification, are described briefly below. 264 F: 1 bit 265 forbidden_zero_bit. The H.264 specification declares a value of 266 1 as a syntax violation. 268 NRI: 2 bits 269 nal_ref_idc. A value of 00 indicates that the content of the NAL 270 unit is not used to reconstruct reference pictures for inter 271 picture prediction. Such NAL units can be discarded without 272 risking the integrity of the reference pictures. Values greater 273 than 00 indicate that the decoding of the NAL unit is required to 274 maintain the integrity of the reference pictures. 276 Type: 5 bits 277 nal_unit_type. This component specifies the NAL unit payload 278 type as defined in Table 7-1 of [1], and later within this memo. 279 For a reference of all currently defined NAL unit types and their 280 semantics, please refer to section 7.4.1 in [1]. 282 This memo introduces new NAL unit types, which are presented in 283 section 5.2. The NAL unit types defined in this memo are marked as 284 unspecified in [1]. Moreover, this specification extends the 285 semantics of F and NRI as described in section 5.3. 287 2. Conventions 289 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 290 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 291 document are to be interpreted as described in RFC-2119 [3]. 293 This specification uses the notion of setting and clearing a bit when 294 bit fields are handled. Setting a bit is the same as assigning that 295 bit the value of 1 (On). Clearing a bit is the same as assigning 296 that bit the value of 0 (Off). 298 3. Scope 300 This payload specification can only be used to carry the "naked" 301 H.264 NAL unit stream over RTP, and not the bitstream format 302 discussed in Annex B of H.264. Likely, the first applications of 303 this specification will be in the conversational multimedia field, 304 video telephony or video conferencing, but the payload format also 305 covers other applications, such as Internet streaming and TV over IP. 307 4. Definitions and Abbreviations 309 4.1. Definitions 311 This document uses the definitions of [1]. The following terms, 312 defined in [1], are summed up for convenience: 314 access unit: A set of NAL units always containing a primary coded 315 picture. In addition to the primary coded picture, an access 316 unit may also contain one or more redundant coded pictures or 317 other NAL units not containing slices or slice data partitions of 318 a coded picture. The decoding of an access unit always results 319 in a decoded picture. 321 coded video sequence: A sequence of access units that consists, 322 in decoding order, of an instantaneous decoding refresh (IDR) 323 access unit followed by zero or more non-IDR access units 324 including all subsequent access units up to but not including any 325 subsequent IDR access unit. 327 IDR access unit: An access unit in which the primary coded 328 picture is an IDR picture. 330 IDR picture: A coded picture containing only slices with I or SI 331 slice types that causes a "reset" in the decoding process. After 332 the decoding of an IDR picture, all following coded pictures in 333 decoding order can be decoded without inter prediction from any 334 picture decoded prior to the IDR picture. 336 primary coded picture: The coded representation of a picture to 337 be used by the decoding process for a bitstream conforming to 338 H.264. The primary coded picture contains all macroblocks of the 339 picture. 341 redundant coded picture: A coded representation of a picture or a 342 part of a picture. The content of a redundant coded picture 343 shall not be used by the decoding process for a bitstream 344 conforming to H.264. The content of a redundant coded picture 345 may be used by the decoding process for a bitstream that contains 346 errors or losses. 348 VCL NAL unit: A collective term used to refer to coded slice and 349 coded data partition NAL units. 351 In addition, the following definitions apply: 353 decoding order number (DON): A field in the payload structure, or 354 a derived variable indicating NAL unit decoding order. Values of 355 DON are in the range of 0 to 65535, inclusive. After reaching 356 the maximum value, the value of DON wraps around to 0. 358 NAL unit decoding order: A NAL unit order that conforms to the 359 constraints on NAL unit order given in section 7.4.1.2 in [1]. 361 NALU-time: The value that the RTP timestamp would have if the NAL 362 unit would be transported in its own RTP packet. 364 transmission order: The order of packets in ascending RTP 365 sequence number order (in modulo arithmetic). Within an 366 aggregation packet, the NAL unit transmission order is the same 367 as the order of appearance of NAL units in the packet. 369 media aware network element (MANE): A network element, such as a 370 middlebox or application layer gateway that is capable of parsing 371 certain aspects of the RTP payload headers or the RTP payload and 372 reacting to the contents. 374 Informative note: The concept of a MANE goes beyond normal 375 routers or gateways in that a MANE has to be aware of the 376 signaling (e.g., to learn about the payload type mappings of 377 the media streams), and in that it has to be trusted when 378 working with SRTP. The advantage of using MANEs is that they 379 allow packets to be dropped according to the needs of the 380 media coding. For example, if a MANE has to drop packets due 381 to congestion on a certain link, it can identify those packets 382 whose dropping has the smallest negative impact on the user 383 experience and remove them in order to remove the congestion 384 and/or keep the delay low. 386 static macroblock: A certain amount of macroblocks in the video 387 stream can be defined as static, as defined in section 8.3.2.8 in 388 [3]. Static macroblocks free up additional processing cycles for 389 the handling of non-static macroblocks. Based on a given amount 390 of video processing resources and a given resolution, a higher 391 number of static macroblocks enables a correspondingly higher 392 frame rate. 394 4.2. Abbreviations 396 DON: Decoding Order Number 397 DONB: Decoding Order Number Base 398 DOND: Decoding Order Number Difference 399 FEC: Forward Error Correction 400 FU: Fragmentation Unit 401 IDR: Instantaneous Decoding Refresh 402 IEC: International Electrotechnical Commission 403 ISO: International Organization for Standardization 404 ITU-T: International Telecommunication Union, 405 Telecommunication Standardization Sector 406 MANE: Media Aware Network Element 407 MTAP: Multi-Time Aggregation Packet 408 MTAP16: MTAP with 16-bit timestamp offset 409 MTAP24: MTAP with 24-bit timestamp offset 410 NAL: Network Abstraction Layer 411 NALU: NAL Unit 412 SAR: Sample Aspect Ratio 413 SEI: Supplemental Enhancement Information 414 STAP: Single-Time Aggregation Packet 415 STAP-A: STAP type A 416 STAP-B: STAP type B 417 TS: Timestamp 418 VCL: Video Coding Layer 419 VUI: Video Usability Information 421 5. RTP Payload Format 423 5.1. RTP Header Usage 425 The format of the RTP header is specified in RFC 3550 [5] and 426 reprinted in Figure 1 for convenience. This payload format uses the 427 fields of the header in a manner consistent with that specification. 429 When one NAL unit is encapsulated per RTP packet, the RECOMMENDED RTP 430 payload format is specified in section 5.6. The RTP payload (and the 431 settings for some RTP header bits) for aggregation packets and 432 fragmentation units are specified in sections 5.7 and 5.8, 433 respectively. 435 0 1 2 3 436 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 |V=2|P|X| CC |M| PT | sequence number | 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | timestamp | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 | synchronization source (SSRC) identifier | 443 +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 444 | contributing source (CSRC) identifiers | 445 | .... | 446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 448 Figure 1 RTP header according to RFC 3550 450 The RTP header information to be set according to this RTP payload 451 format is set as follows: 453 Marker bit (M): 1 bit 454 Set for the very last packet of the access unit indicated by the 455 RTP timestamp, in line with the normal use of the M bit in video 456 formats, to allow an efficient playout buffer handling. For 457 aggregation packets (STAP and MTAP), the marker bit in the RTP 458 header MUST be set to the value that the marker bit of the last 459 NAL unit of the aggregation packet would have been if it were 460 transported in its own RTP packet. Decoders MAY use this bit as 461 an early indication of the last packet of an access unit, but 462 MUST NOT rely on this property. 464 Informative note: Only one M bit is associated with an 465 aggregation packet carrying multiple NAL units. Thus, if a 466 gateway has re-packetized an aggregation packet into several 467 packets, it cannot reliably set the M bit of those packets. 469 Payload type (PT): 7 bits 470 The assignment of an RTP payload type for this new packet format 471 is outside the scope of this document and will not be specified 472 here. The assignment of a payload type has to be performed 473 either through the profile used or in a dynamic way. 475 Sequence number (SN): 16 bits 476 Set and used in accordance with RFC 3550. For the single NALU 477 and non-interleaved packetization mode, the sequence number is 478 used to determine decoding order for the NALU. 480 Timestamp: 32 bits 481 The RTP timestamp is set to the sampling timestamp of the 482 content. A 90 kHz clock rate MUST be used. 484 If the NAL unit has no timing properties of its own (e.g., 485 parameter set and SEI NAL units), the RTP timestamp is set to the 486 RTP timestamp of the primary coded picture of the access unit in 487 which the NAL unit is included, according to section 7.4.1.2 of 488 [1]. 490 The setting of the RTP Timestamp for MTAPs is defined in section 491 5.7.2. 493 Receivers SHOULD ignore any picture timing SEI messages included 494 in access units that have only one display timestamp. Instead, 495 receivers SHOULD use the RTP timestamp for synchronizing the 496 display process. 498 RTP senders SHOULD NOT transmit picture timing SEI messages for 499 pictures that are not supposed to be displayed as multiple 500 fields. 502 If one access unit has more than one display timestamp carried in 503 a picture timing SEI message, then the information in the SEI 504 message SHOULD be treated as relative to the RTP timestamp, with 505 the earliest event occurring at the time given by the RTP 506 timestamp, and subsequent events later, as given by the 507 difference in SEI message picture timing values. Let tSEI1, 508 tSEI2, ..., tSEIn be the display timestamps carried in the SEI 509 message of an access unit, where tSEI1 is the earliest of all 510 such timestamps. Let tmadjst() be a function that adjusts the 511 SEI messages time scale to a 90-kHz time scale. Let TS be the 512 RTP timestamp. Then, the display time for the event associated 513 with tSEI1 is TS. The display time for the event with tSEIx, 514 where x is [2..n] is TS + tmadjst (tSEIx - tSEI1). 516 Informative note: Displaying coded frames as fields is needed 517 commonly in an operation known as 3:2 pulldown, in which film 518 content that consists of coded frames is displayed on a 519 display using interlaced scanning. The picture timing SEI 520 message enables carriage of multiple timestamps for the same 521 coded picture, and therefore the 3:2 pulldown process is 522 perfectly controlled. The picture timing SEI message 523 mechanism is necessary because only one timestamp per coded 524 frame can be conveyed in the RTP timestamp. 526 Informative note: Because H.264 allows the decoding order to 527 be different from the display order, values of RTP timestamps 528 may not be monotonically non-decreasing as a function of RTP 529 sequence numbers. Furthermore, the value for interarrival 530 jitter reported in the RTCP reports may not be a trustworthy 531 indication of the network performance, as the calculation 532 rules for interarrival jitter (section 6.4.1 of RFC 3550) 533 assume that the RTP timestamp of a packet is directly 534 proportional to its transmission time. 536 5.2. Payload Structures 538 The payload format defines three different basic payload structures. 539 A receiver can identify the payload structure by the first byte of 540 the RTP packet payload, which co-serves as the RTP payload header 541 and, in some cases, as the first byte of the payload. This byte is 542 always structured as a NAL unit header. The NAL unit type field 543 indicates which structure is present. The possible structures are as 544 follows: 546 Single NAL Unit Packet: Contains only a single NAL unit in the 547 payload. The NAL header type field will be equal to the original NAL 548 unit type; i.e., in the range of 1 to 23, inclusive. Specified in 549 section 5.6. 551 Aggregation Packet: Packet type used to aggregate multiple NAL units 552 into a single RTP payload. This packet exists in four versions, the 553 Single-Time Aggregation Packet type A (STAP-A), the Single-Time 554 Aggregation Packet type B (STAP-B), Multi-Time Aggregation Packet 555 (MTAP) with 16-bit offset (MTAP16), and Multi-Time Aggregation Packet 556 (MTAP) with 24-bit offset (MTAP24). The NAL unit type numbers 557 assigned for STAP-A, STAP-B, MTAP16, and MTAP24 are 24, 25, 26, and 558 27, respectively. Specified in section 5.7. 560 Fragmentation Unit: Used to fragment a single NAL unit over multiple 561 RTP packets. Exists with two versions, FU-A and FU-B, identified 562 with the NAL unit type numbers 28 and 29, respectively. Specified in 563 section 5.8. 565 Informative note: This specification does not limit the size of 566 NAL units encapsulated in single NAL unit packets and 567 fragmentation units. The maximum size of a NAL unit encapsulated 568 in any aggregation packet is 65535 bytes. 570 Table 1 summarizes NAL unit types and the corresponding RTP packet 571 types when each of these NAL units is directly used a packet payload, 572 and where the types are described in this memo. 574 Table 1. Summary of NAL unit types and the corresponding packet 575 types 577 NAL Unit Packet Packet Type Name Section 578 Type Type 579 --------------------------------------------------------- 580 0 reserved - 581 1-23 NAL unit Single NAL unit packet 5.6 582 24 STAP-A Single-time aggregation packet 5.7.1 583 25 STAP-B Single-time aggregation packet 5.7.1 584 26 MTAP16 Multi-time aggregation packet 5.7.2 585 27 MTAP24 Multi-time aggregation packet 5.7.2 586 28 FU-A Fragmentation unit 5.8 587 29 FU-B Fragmentation unit 5.8 588 30-31 reserved - 590 5.3. NAL Unit Header Usage 592 The structure and semantics of the NAL unit header were introduced in 593 section 1.3. For convenience, the format of the NAL unit header is 594 reprinted below: 596 +---------------+ 597 |0|1|2|3|4|5|6|7| 598 +-+-+-+-+-+-+-+-+ 599 |F|NRI| Type | 600 +---------------+ 602 This section specifies the semantics of F and NRI according to this 603 specification. 605 F: 1 bit 606 forbidden_zero_bit. A value of 0 indicates that the NAL unit 607 type octet and payload should not contain bit errors or other 608 syntax violations. A value of 1 indicates that the NAL unit type 609 octet and payload may contain bit errors or other syntax 610 violations. 612 MANEs SHOULD set the F bit to indicate detected bit errors in the 613 NAL unit. The H.264 specification requires that the F bit is 614 equal to 0. When the F bit is set, the decoder is advised that 615 bit errors or any other syntax violations may be present in the 616 payload or in the NAL unit type octet. The simplest decoder 617 reaction to a NAL unit in which the F bit is equal to 1 is to 618 discard such a NAL unit and to conceal the lost data in the 619 discarded NAL unit. 621 NRI: 2 bits 622 nal_ref_idc. The semantics of value 00 and a non-zero value 623 remain unchanged from the H.264 specification. In other words, a 624 value of 00 indicates that the content of the NAL unit is not 625 used to reconstruct reference pictures for inter picture 626 prediction. Such NAL units can be discarded without risking the 627 integrity of the reference pictures. Values greater than 00 628 indicate that the decoding of the NAL unit is required to 629 maintain the integrity of the reference pictures. 631 In addition to the specification above, according to this RTP 632 payload specification, values of NRI indicate the relative 633 transport priority, as determined by the encoder. MANEs can use 634 this information to protect more important NAL units better than 635 they do less important NAL units. The highest transport priority 636 is 11, followed by 10, and then by 01; finally, 00 is the lowest. 638 Informative note: Any non-zero value of NRI is handled 639 identically in H.264 decoders. Therefore, receivers need not 640 manipulate the value of NRI when passing NAL units to the 641 decoder. 643 An H.264 encoder MUST set the value of NRI according to the H.264 644 specification (subclause 7.4.1) when the value of nal_unit_type 645 is in the range of 1 to 12, inclusive. In particular, the H.264 646 specification requires that the value of NRI SHALL be equal to 0 647 for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or 648 12. 650 For NAL units having nal_unit_type equal to 7 or 8 (indicating a 651 sequence parameter set or a picture parameter set, respectively), 652 an H.264 encoder SHOULD set the value of NRI to 11 (in binary 653 format). For coded slice NAL units of a primary coded picture 654 having nal_unit_type equal to 5 (indicating a coded slice 655 belonging to an IDR picture), an H.264 encoder SHOULD set the 656 value of NRI to 11 (in binary format). 658 For a mapping of the remaining nal_unit_types to NRI values, the 659 following example MAY be used and has been shown to be efficient 660 in a certain environment [13]. Other mappings MAY also be 661 desirable, depending on the application and the H.264/AVC Annex A 662 profile in use. 664 Informative note: Data Partitioning is not available in 665 certain profiles; e.g., in the Main or Baseline profiles. 666 Consequently, the NAL unit types 2, 3, and 4 can occur only if 667 the video bitstream conforms to a profile in which data 668 partitioning is allowed and not in streams that conform to the 669 Main or Baseline profiles. 671 Table 2. Example of NRI values for coded slices and coded slice data 672 partitions of primary coded reference pictures 674 NAL Unit Type Content of NAL unit NRI (binary) 675 ---------------------------------------------------------------- 676 1 non-IDR coded slice 10 677 2 Coded slice data partition A 10 678 3 Coded slice data partition B 01 679 4 Coded slice data partition C 01 681 Informative note: As mentioned before, the NRI value of non- 682 reference pictures is 00 as mandated by H.264/AVC. 684 An H.264 encoder SHOULD set the value of NRI for coded slice and 685 coded slice data partition NAL units of redundant coded reference 686 pictures equal to 01 (in binary format). 688 Definitions of the values for NRI for NAL unit types 24 to 29, 689 inclusive, are given in sections 5.7 and 5.8 of this memo. 691 No recommendation for the value of NRI is given for NAL units 692 having nal_unit_type in the range of 13 to 23, inclusive, because 693 these values are reserved for ITU-T and ISO/IEC. No 694 recommendation for the value of NRI is given for NAL units having 695 nal_unit_type equal to 0 or in the range of 30 to 31, inclusive, 696 as the semantics of these values are not specified in this memo. 698 5.4. Packetization Modes 700 This memo specifies three cases of packetization modes: 702 o Single NAL unit mode 704 o Non-interleaved mode 706 o Interleaved mode 708 The single NAL unit mode is targeted for conversational systems that 709 comply with ITU-T Recommendation H.241 [3] (see section 12.1). The 710 non-interleaved mode is targeted for conversational systems that may 711 not comply with ITU-T Recommendation H.241. In the non-interleaved 712 mode, NAL units are transmitted in NAL unit decoding order. The 713 interleaved mode is targeted for systems that do not require very low 714 end-to-end latency. The interleaved mode allows transmission of NAL 715 units out of NAL unit decoding order. 717 The packetization mode in use MAY be signaled by the value of the 718 OPTIONAL packetization-mode media type parameter. The used 719 packetization mode governs which NAL unit types are allowed in RTP 720 payloads. Table 3 summarizes the allowed packet payload types for 721 each packetization mode. Packetization modes are explained in more 722 detail in section 6. 724 Table 3. Summary of allowed NAL unit types for each packetization 725 mode (yes = allowed, no = disallowed, ig = ignore) 727 Payload Packet Single NAL Non-Interleaved Interleaved 728 Type Type Unit Mode Mode Mode 729 ------------------------------------------------------------- 730 0 reserved ig ig ig 731 1-23 NAL unit yes yes no 732 24 STAP-A no yes no 733 25 STAP-B no no yes 734 26 MTAP16 no no yes 735 27 MTAP24 no no yes 736 28 FU-A no yes yes 737 29 FU-B no no yes 738 30-31 reserved ig ig ig 740 Some NAL unit or payload type values (indicated as reserved in 741 Table 3) are reserved for future extensions. NAL units of those 742 types SHOULD NOT be sent by a sender (direct as packet payloads, or 743 as aggregation units in aggregation packets, or as fragmented units 744 in FU packets) and SHOULD be ignored by a receiver. For example, the 745 payload types 1-23, with the associated packet type "NAL unit", are 746 allowed in "Single NAL Unit Mode" and in "Non-Interleaved Mode", but 747 disallowed in "Interleaved Mode". However, NAL units of NAL unit 748 types 1-23 can be used in "Interleaved Mode" as aggregation units in 749 STAP-B, MTAP16 and MTAP14 packets as well as fragmented units in FU-A 750 and FU-B packets. Similarly, NAL units of NAL unit types 1-23 can 751 also be used in the "Non-Interleaved Mode" as aggregation units in 752 STAP-A packets or fragmented units in FU-A packets, in addition to 753 being directly used as packet payloads. 755 5.5. Decoding Order Number (DON) 757 In the interleaved packetization mode, the transmission order of NAL 758 units is allowed to differ from the decoding order of the NAL units. 759 Decoding order number (DON) is a field in the payload structure or a 760 derived variable that indicates the NAL unit decoding order. 761 Rationale and examples of use cases for transmission out of decoding 762 order and for the use of DON are given in section 13. 764 The coupling of transmission and decoding order is controlled by the 765 OPTIONAL sprop-interleaving-depth media type parameter as follows. 766 When the value of the OPTIONAL sprop-interleaving-depth media type 767 parameter is equal to 0 (explicitly or per default), the transmission 768 order of NAL units MUST conform to the NAL unit decoding order. When 769 the value of the OPTIONAL sprop-interleaving-depth media type 770 parameter is greater than 0, 772 o the order of NAL units in an MTAP16 and an MTAP24 is NOT REQUIRED 773 to be the NAL unit decoding order, and 775 o the order of NAL units generated by decapsulating STAP-Bs, MTAPs, 776 and FUs in two consecutive packets is NOT REQUIRED to be the NAL 777 unit decoding order. 779 The RTP payload structures for a single NAL unit packet, an STAP-A, 780 and an FU-A do not include DON. STAP-B and FU-B structures include 781 DON, and the structure of MTAPs enables derivation of DON as 782 specified in section 5.7.2. 784 Informative note: When an FU-A occurs in interleaved mode, it 785 always follows an FU-B, which sets its DON. 787 Informative note: If a transmitter wants to encapsulate a single 788 NAL unit per packet and transmit packets out of their decoding 789 order, STAP-B packet type can be used. 791 In the single NAL unit packetization mode, the transmission order of 792 NAL units, determined by the RTP sequence number, MUST be the same as 793 their NAL unit decoding order. In the non-interleaved packetization 794 mode, the transmission order of NAL units in single NAL unit packets, 795 STAP-As, and FU-As MUST be the same as their NAL unit decoding order. 796 The NAL units within an STAP MUST appear in the NAL unit decoding 797 order. Thus, the decoding order is first provided through the 798 implicit order within a STAP, and second provided through the RTP 799 sequence number for the order between STAPs, FUs, and single NAL unit 800 packets. 802 Signaling of the value of DON for NAL units carried in STAP-B, MTAP, 803 and a series of fragmentation units starting with an FU-B is 804 specified in sections 5.7.1, 5.7.2, and 5.8, respectively. The DON 805 value of the first NAL unit in transmission order MAY be set to any 806 value. Values of DON are in the range of 0 to 65535, inclusive. 807 After reaching the maximum value, the value of DON wraps around to 0. 809 The decoding order of two NAL units contained in any STAP-B, MTAP, or 810 a series of fragmentation units starting with an FU-B is determined 811 as follows. Let DON(i) be the decoding order number of the NAL unit 812 having index i in the transmission order. Function don_diff(m,n) is 813 specified as follows: 815 If DON(m) == DON(n), don_diff(m,n) = 0 817 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 818 don_diff(m,n) = DON(n) - DON(m) 820 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 821 don_diff(m,n) = 65536 - DON(m) + DON(n) 823 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 824 don_diff(m,n) = - (DON(m) + 65536 - DON(n)) 826 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 827 don_diff(m,n) = - (DON(m) - DON(n)) 829 A positive value of don_diff(m,n) indicates that the NAL unit having 830 transmission order index n follows, in decoding order, the NAL unit 831 having transmission order index m. When don_diff(m,n) is equal to 0, 832 then the NAL unit decoding order of the two NAL units can be in 833 either order. A negative value of don_diff(m,n) indicates that the 834 NAL unit having transmission order index n precedes, in decoding 835 order, the NAL unit having transmission order index m. 837 Values of DON related fields (DON, DONB, and DOND; see section 5.7) 838 MUST be such that the decoding order determined by the values of DON, 839 as specified above, conforms to the NAL unit decoding order. If the 840 order of two NAL units in NAL unit decoding order is switched and the 841 new order does not conform to the NAL unit decoding order, the NAL 842 units MUST NOT have the same value of DON. If the order of two 843 consecutive NAL units in the NAL unit stream is switched and the new 844 order still conforms to the NAL unit decoding order, the NAL units 845 MAY have the same value of DON. For example, when arbitrary slice 846 order is allowed by the video coding profile in use, all the coded 847 slice NAL units of a coded picture are allowed to have the same value 848 of DON. Consequently, NAL units having the same value of DON can be 849 decoded in any order, and two NAL units having a different value of 850 DON should be passed to the decoder in the order specified above. 851 When two consecutive NAL units in the NAL unit decoding order have a 852 different value of DON, the value of DON for the second NAL unit in 853 decoding order SHOULD be the value of DON for the first, incremented 854 by one. 856 An example of the decapsulation process to recover the NAL unit 857 decoding order is given in section 7. 859 Informative note: Receivers should not expect that the absolute 860 difference of values of DON for two consecutive NAL units in the 861 NAL unit decoding order will be equal to one, even in error-free 862 transmission. An increment by one is not required, as at the 863 time of associating values of DON to NAL units, it may not be 864 known whether all NAL units are delivered to the receiver. For 865 example, a gateway may not forward coded slice NAL units of non- 866 reference pictures or SEI NAL units when there is a shortage of 867 bit rate in the network to which the packets are forwarded. In 868 another example, a live broadcast is interrupted by pre-encoded 869 content, such as commercials, from time to time. The first intra 870 picture of a pre-encoded clip is transmitted in advance to ensure 871 that it is readily available in the receiver. When transmitting 872 the first intra picture, the originator does not exactly know how 873 many NAL units will be encoded before the first intra picture of 874 the pre-encoded clip follows in decoding order. Thus, the values 875 of DON for the NAL units of the first intra picture of the pre- 876 encoded clip have to be estimated when they are transmitted, and 877 gaps in values of DON may occur. 879 5.6. Single NAL Unit Packet 881 The single NAL unit packet defined here MUST contain only one NAL 882 unit, of the types defined in [1]. This means that neither an 883 aggregation packet nor a fragmentation unit can be used within a 884 single NAL unit packet. A NAL unit stream composed by decapsulating 885 single NAL unit packets in RTP sequence number order MUST conform to 886 the NAL unit decoding order. The structure of the single NAL unit 887 packet is shown in Figure 2. 889 Informative note: The first byte of a NAL unit co-serves as the 890 RTP payload header. 892 0 1 2 3 893 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 894 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 895 |F|NRI| Type | | 896 +-+-+-+-+-+-+-+-+ | 897 | | 898 | Bytes 2..n of a Single NAL unit | 899 | | 900 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 901 | :...OPTIONAL RTP padding | 902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 904 Figure 2 RTP payload format for single NAL unit packet 906 5.7. Aggregation Packets 908 Aggregation packets are the NAL unit aggregation scheme of this 909 payload specification. The scheme is introduced to reflect the 910 dramatically different MTU sizes of two key target networks: wireline 911 IP networks (with an MTU size that is often limited by the Ethernet 912 MTU size; roughly 1500 bytes), and IP or non-IP (e.g., ITU-T H.324/M) 913 based wireless communication systems with preferred transmission unit 914 sizes of 254 bytes or less. To prevent media transcoding between the 915 two worlds, and to avoid undesirable packetization overhead, a NAL 916 unit aggregation scheme is introduced. 918 Two types of aggregation packets are defined by this specification: 920 o Single-time aggregation packet (STAP): aggregates NAL units with 921 identical NALU-time. Two types of STAPs are defined, one without 922 DON (STAP-A) and another including DON (STAP-B). 924 o Multi-time aggregation packet (MTAP): aggregates NAL units with 925 potentially differing NALU-time. Two different MTAPs are defined, 926 differing in the length of the NAL unit timestamp offset. 928 Each NAL unit to be carried in an aggregation packet is encapsulated 929 in an aggregation unit. Please see below for the four different 930 aggregation units and their characteristics. 932 The structure of the RTP payload format for aggregation packets is 933 presented in Figure 3. 935 0 1 2 3 936 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 938 |F|NRI| Type | | 939 +-+-+-+-+-+-+-+-+ | 940 | | 941 | one or more aggregation units | 942 | | 943 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 944 | :...OPTIONAL RTP padding | 945 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 Figure 3 RTP payload format for aggregation packets 949 MTAPs and STAPs share the following packetization rules: The RTP 950 timestamp MUST be set to the earliest of the NALU-times of all the 951 NAL units to be aggregated. The type field of the NAL unit type 952 octet MUST be set to the appropriate value, as indicated in Table 4. 953 The F bit MUST be cleared if all F bits of the aggregated NAL units 954 are zero; otherwise, it MUST be set. The value of NRI MUST be the 955 maximum of all the NAL units carried in the aggregation packet. 957 Table 4. Type field for STAPs and MTAPs 959 Type Packet Timestamp offset DON related fields 960 field length (DON, DONB, DOND) 961 (in bits) present 962 -------------------------------------------------------- 963 24 STAP-A 0 no 964 25 STAP-B 0 yes 965 26 MTAP16 16 yes 966 27 MTAP24 24 yes 968 The marker bit in the RTP header is set to the value that the marker 969 bit of the last NAL unit of the aggregated packet would have if it 970 were transported in its own RTP packet. 972 The payload of an aggregation packet consists of one or more 973 aggregation units. See sections 5.7.1 and 5.7.2 for the four 974 different types of aggregation units. An aggregation packet can 975 carry as many aggregation units as necessary; however, the total 976 amount of data in an aggregation packet obviously MUST fit into an IP 977 packet, and the size SHOULD be chosen so that the resulting IP packet 978 is smaller than the MTU size. An aggregation packet MUST NOT contain 979 fragmentation units specified in section 5.8. Aggregation packets 980 MUST NOT be nested; i.e., an aggregation packet MUST NOT contain 981 another aggregation packet. 983 5.7.1. Single-Time Aggregation Packet 985 Single-time aggregation packet (STAP) SHOULD be used whenever NAL 986 units are aggregated that all share the same NALU-time. The payload 987 of an STAP-A does not include DON and consists of at least one 988 single-time aggregation unit, as presented in Figure 4. The payload 989 of an STAP-B consists of a 16-bit unsigned decoding order number 990 (DON) (in network byte order) followed by at least one single-time 991 aggregation unit, as presented in Figure 5. 993 0 1 2 3 994 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 995 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 996 : | 997 +-+-+-+-+-+-+-+-+ | 998 | | 999 | single-time aggregation units | 1000 | | 1001 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1002 | : 1003 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1005 Figure 4 Payload format for STAP-A 1007 0 1 2 3 1008 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1009 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1010 : decoding order number (DON) | | 1011 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1012 | | 1013 | single-time aggregation units | 1014 | | 1015 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1016 | : 1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1019 Figure 5 Payload format for STAP-B 1021 The DON field specifies the value of DON for the first NAL unit in an 1022 STAP-B in transmission order. For each successive NAL unit in 1023 appearance order in an STAP-B, the value of DON is equal to (the 1024 value of DON of the previous NAL unit in the STAP-B + 1) % 65536, in 1025 which '%' stands for the modulo operation. 1027 A single-time aggregation unit consists of 16-bit unsigned size 1028 information (in network byte order) that indicates the size of the 1029 following NAL unit in bytes (excluding these two octets, but 1030 including the NAL unit type octet of the NAL unit), followed by the 1031 NAL unit itself, including its NAL unit type byte. A single-time 1032 aggregation unit is byte aligned within the RTP payload, but it may 1033 not be aligned on a 32-bit word boundary. Figure 6 presents the 1034 structure of the single-time aggregation unit. 1036 0 1 2 3 1037 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1039 : NAL unit size | | 1040 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1041 | | 1042 | NAL unit | 1043 | | 1044 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1045 | : 1046 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1048 Figure 6 Structure for single-time aggregation unit 1050 Figure 7 presents an example of an RTP packet that contains an STAP- 1051 A. The STAP contains two single-time aggregation units, labeled as 1 1052 and 2 in the figure. 1054 0 1 2 3 1055 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1056 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1057 | RTP Header | 1058 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1059 |STAP-A NAL HDR | NALU 1 Size | NALU 1 HDR | 1060 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1061 | NALU 1 Data | 1062 : : 1063 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1064 | | NALU 2 Size | NALU 2 HDR | 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1066 | NALU 2 Data | 1067 : : 1068 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1069 | :...OPTIONAL RTP padding | 1070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1072 Figure 7 An example of an RTP packet including an STAP-A containing 1073 two single-time aggregation units 1075 Figure 8 presents an example of an RTP packet that contains an STAP- 1076 B. The STAP contains two single-time aggregation units, labeled as 1 1077 and 2 in the figure. 1079 0 1 2 3 1080 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1082 | RTP Header | 1083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1084 |STAP-B NAL HDR | DON | NALU 1 Size | 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1086 | NALU 1 Size | NALU 1 HDR | NALU 1 Data | 1087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1088 : : 1089 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1090 | | NALU 2 Size | NALU 2 HDR | 1091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 | NALU 2 Data | 1093 : : 1094 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1095 | :...OPTIONAL RTP padding | 1096 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 Figure 8 An example of an RTP packet including an STAP-B containing 1099 two single-time aggregation units 1101 5.7.2. Multi-Time Aggregation Packets (MTAPs) 1103 The NAL unit payload of MTAPs consists of a 16-bit unsigned decoding 1104 order number base (DONB) (in network byte order) and one or more 1105 multi-time aggregation units, as presented in Figure 9. DONB MUST 1106 contain the value of DON for the first NAL unit in the NAL unit 1107 decoding order among the NAL units of the MTAP. 1109 Informative note: The first NAL unit in the NAL unit decoding 1110 order is not necessarily the first NAL unit in the order in which 1111 the NAL units are encapsulated in an MTAP. 1113 0 1 2 3 1114 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1116 : decoding order number base | | 1117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1118 | | 1119 | multi-time aggregation units | 1120 | | 1121 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1122 | : 1123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1125 Figure 9 NAL unit payload format for MTAPs 1127 Two different multi-time aggregation units are defined in this 1128 specification. Both of them consist of 16 bits unsigned size 1129 information of the following NAL unit (in network byte order), an 8- 1130 bit unsigned decoding order number difference (DOND), and n bits (in 1131 network byte order) of timestamp offset (TS offset) for this NAL 1132 unit, whereby n can be 16 or 24. The choice between the different 1133 MTAP types (MTAP16 and MTAP24) is application dependent: the larger 1134 the timestamp offset is, the higher the flexibility of the MTAP, but 1135 the overhead is also higher. 1137 The structure of the multi-time aggregation units for MTAP16 and 1138 MTAP24 are presented in Figures 10 and 11, respectively. The 1139 starting or ending position of an aggregation unit within a packet is 1140 NOT REQUIRED to be on a 32-bit word boundary. The DON of the NAL 1141 unit contained in a multi-time aggregation unit is equal to (DONB + 1142 DOND) % 65536, in which % denotes the modulo operation. This memo 1143 does not specify how the NAL units within an MTAP are ordered, but, 1144 in most cases, NAL unit decoding order SHOULD be used. 1146 The timestamp offset field MUST be set to a value equal to the value 1147 of the following formula: If the NALU-time is larger than or equal to 1148 the RTP timestamp of the packet, then the timestamp offset equals 1149 (the NALU-time of the NAL unit - the RTP timestamp of the packet). 1150 If the NALU-time is smaller than the RTP timestamp of the packet, 1151 then the timestamp offset is equal to the NALU-time + (2^32 - the RTP 1152 timestamp of the packet). 1154 0 1 2 3 1155 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1156 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1157 : NAL unit size | DOND | TS offset | 1158 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1159 | TS offset | | 1160 +-+-+-+-+-+-+-+-+ NAL unit | 1161 | | 1162 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1163 | : 1164 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1166 Figure 10 Multi-time aggregation unit for MTAP16 1168 0 1 2 3 1169 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1170 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1171 : NAL unit size | DOND | TS offset | 1172 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1173 | TS offset | | 1174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1175 | NAL unit | 1176 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1177 | : 1178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1180 Figure 11 Multi-time aggregation unit for MTAP24 1182 For the "earliest" multi-time aggregation unit in an MTAP the 1183 timestamp offset MUST be zero. Hence, the RTP timestamp of the MTAP 1184 itself is identical to the earliest NALU-time. 1186 Informative note: The "earliest" multi-time aggregation unit is 1187 the one that would have the smallest extended RTP timestamp among 1188 all the aggregation units of an MTAP if the NAL units contained 1189 in the aggregation units were encapsulated in single NAL unit 1190 packets. An extended timestamp is a timestamp that has more than 1191 32 bits and is capable of counting the wraparound of the 1192 timestamp field, thus enabling one to determine the smallest 1193 value if the timestamp wraps. Such an "earliest" aggregation 1194 unit may not be the first one in the order in which the 1195 aggregation units are encapsulated in an MTAP. The "earliest" 1196 NAL unit need not be the same as the first NAL unit in the NAL 1197 unit decoding order either. 1199 Figure 12 presents an example of an RTP packet that contains a multi- 1200 time aggregation packet of type MTAP16 that contains two multi-time 1201 aggregation units, labeled as 1 and 2 in the figure. 1203 0 1 2 3 1204 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1205 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1206 | RTP Header | 1207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1208 |MTAP16 NAL HDR | decoding order number base | NALU 1 Size | 1209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1210 | NALU 1 Size | NALU 1 DOND | NALU 1 TS offset | 1211 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1212 | NALU 1 HDR | NALU 1 DATA | 1213 +-+-+-+-+-+-+-+-+ + 1214 : : 1215 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1216 | | NALU 2 SIZE | NALU 2 DOND | 1217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1218 | NALU 2 TS offset | NALU 2 HDR | NALU 2 DATA | 1219 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1220 : : 1221 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1222 | :...OPTIONAL RTP padding | 1223 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1225 Figure 12 An RTP packet including a multi-time aggregation packet of 1226 type MTAP16 containing two multi-time aggregation units 1228 Figure 13 presents an example of an RTP packet that contains a multi- 1229 time aggregation packet of type MTAP24 that contains two multi-time 1230 aggregation units, labeled as 1 and 2 in the figure. 1232 0 1 2 3 1233 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1235 | RTP Header | 1236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1237 |MTAP24 NAL HDR | decoding order number base | NALU 1 Size | 1238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1239 | NALU 1 Size | NALU 1 DOND | NALU 1 TS offs | 1240 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1241 |NALU 1 TS offs | NALU 1 HDR | NALU 1 DATA | 1242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1243 : : 1244 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1245 | | NALU 2 SIZE | NALU 2 DOND | 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1247 | NALU 2 TS offset | NALU 2 HDR | 1248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | NALU 2 DATA | 1250 : : 1251 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1252 | :...OPTIONAL RTP padding | 1253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1255 Figure 13 An RTP packet including a multi-time aggregation packet of 1256 type MTAP24 containing two multi-time aggregation units 1258 5.7.3. Fragmentation Units (FUs) 1260 This payload type allows fragmenting a NAL unit into several RTP 1261 packets. Doing so on the application layer instead of relying on 1262 lower layer fragmentation (e.g., by IP) has the following advantages: 1264 o The payload format is capable of transporting NAL units bigger 1265 than 64 kbytes over an IPv4 network that may be present in pre- 1266 recorded video, particularly in High Definition formats (there is 1267 a limit of the number of slices per picture, which results in a 1268 limit of NAL units per picture, which may result in big NAL 1269 units). 1271 o The fragmentation mechanism allows fragmenting a single NAL unit 1272 and applying generic forward error correction as described in 1273 section 12.5. 1275 Fragmentation is defined only for a single NAL unit and not for any 1276 aggregation packets. A fragment of a NAL unit consists of an integer 1277 number of consecutive octets of that NAL unit. Each octet of the NAL 1278 unit MUST be part of exactly one fragment of that NAL unit. 1280 Fragments of the same NAL unit MUST be sent in consecutive order with 1281 ascending RTP sequence numbers (with no other RTP packets within the 1282 same RTP packet stream being sent between the first and last 1283 fragment). Similarly, a NAL unit MUST be reassembled in RTP sequence 1284 number order. 1286 When a NAL unit is fragmented and conveyed within fragmentation units 1287 (FUs), it is referred to as a fragmented NAL unit. STAPs and MTAPs 1288 MUST NOT be fragmented. FUs MUST NOT be nested; i.e., an FU MUST NOT 1289 contain another FU. 1291 The RTP timestamp of an RTP packet carrying an FU is set to the NALU- 1292 time of the fragmented NAL unit. 1294 Figure 14 presents the RTP payload format for FU-As. An FU-A 1295 consists of a fragmentation unit indicator of one octet, a 1296 fragmentation unit header of one octet, and a fragmentation unit 1297 payload. 1299 0 1 2 3 1300 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1302 | FU indicator | FU header | | 1303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 1304 | | 1305 | FU payload | 1306 | | 1307 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1308 | :...OPTIONAL RTP padding | 1309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1311 Figure 14 RTP payload format for FU-A 1313 Figure 15 presents the RTP payload format for FU-Bs. An FU-B 1314 consists of a fragmentation unit indicator of one octet, a 1315 fragmentation unit header of one octet, a decoding order number (DON) 1316 (in network byte order), and a fragmentation unit payload. In other 1317 words, the structure of FU-B is the same as the structure of FU-A, 1318 except for the additional DON field. 1320 0 1 2 3 1321 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1323 | FU indicator | FU header | DON | 1324 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 1325 | | 1326 | FU payload | 1327 | | 1328 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1329 | :...OPTIONAL RTP padding | 1330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1332 Figure 15 RTP payload format for FU-B 1334 NAL unit type FU-B MUST be used in the interleaved packetization mode 1335 for the first fragmentation unit of a fragmented NAL unit. NAL unit 1336 type FU-B MUST NOT be used in any other case. In other words, in the 1337 interleaved packetization mode, each NALU that is fragmented has an 1338 FU-B as the first fragment, followed by one or more FU-A fragments. 1340 The FU indicator octet has the following format: 1342 +---------------+ 1343 |0|1|2|3|4|5|6|7| 1344 +-+-+-+-+-+-+-+-+ 1345 |F|NRI| Type | 1346 +---------------+ 1348 Values equal to 28 and 29 in the Type field of the FU indicator octet 1349 identify an FU-A and an FU-B, respectively. The use of the F bit is 1350 described in section 5.3. The value of the NRI field MUST be set 1351 according to the value of the NRI field in the fragmented NAL unit. 1353 The FU header has the following format: 1355 +---------------+ 1356 |0|1|2|3|4|5|6|7| 1357 +-+-+-+-+-+-+-+-+ 1358 |S|E|R| Type | 1359 +---------------+ 1361 S: 1 bit 1362 When set to one, the Start bit indicates the start of a 1363 fragmented NAL unit. When the following FU payload is not the 1364 start of a fragmented NAL unit payload, the Start bit is set to 1365 zero. 1367 E: 1 bit 1368 When set to one, the End bit indicates the end of a fragmented 1369 NAL unit, i.e., the last byte of the payload is also the last 1370 byte of the fragmented NAL unit. When the following FU payload 1371 is not the last fragment of a fragmented NAL unit, the End bit is 1372 set to zero. 1374 R: 1 bit 1375 The Reserved bit MUST be equal to 0 and SHOULD be ignored by the 1376 receiver. 1378 Type: 5 bits 1379 The NAL unit payload type as defined in Table 7-1 of [1]. 1381 The value of DON in FU-Bs is selected as described in section 5.5. 1383 Informative note: The DON field in FU-Bs allows gateways to 1384 fragment NAL units to FU-Bs without organizing the incoming NAL 1385 units to the NAL unit decoding order. 1387 A fragmented NAL unit MUST NOT be transmitted in one FU; i.e., the 1388 Start bit and End bit MUST NOT both be set to one in the same FU 1389 header. 1391 The FU payload consists of fragments of the payload of the fragmented 1392 NAL unit so that if the fragmentation unit payloads of consecutive 1393 FUs are sequentially concatenated, the payload of the fragmented NAL 1394 unit can be reconstructed. The NAL unit type octet of the fragmented 1395 NAL unit is not included as such in the fragmentation unit payload, 1396 but rather the information of the NAL unit type octet of the 1397 fragmented NAL unit is conveyed in F and NRI fields of the FU 1398 indicator octet of the fragmentation unit and in the type field of 1399 the FU header. An FU payload MAY have any number of octets and MAY 1400 be empty. 1402 Informative note: Empty FUs are allowed to reduce the latency of 1403 a certain class of senders in nearly lossless environments. 1404 These senders can be characterized in that they packetize NALU 1405 fragments before the NALU is completely generated and, hence, 1406 before the NALU size is known. If zero-length NALU fragments 1407 were not allowed, the sender would have to generate at least one 1408 bit of data of the following fragment before the current fragment 1409 could be sent. Due to the characteristics of H.264, where 1410 sometimes several macroblocks occupy zero bits, this is 1411 undesirable and can add delay. However, the (potential) use of 1412 zero-length NALU fragments should be carefully weighed against 1413 the increased risk of the loss of at least a part of the NALU 1414 because of the additional packets employed for its transmission. 1416 If a fragmentation unit is lost, the receiver SHOULD discard all 1417 following fragmentation units in transmission order corresponding to 1418 the same fragmented NAL unit. 1420 A receiver in an endpoint or in a MANE MAY aggregate the first n-1 1421 fragments of a NAL unit to an (incomplete) NAL unit, even if fragment 1422 n of that NAL unit is not received. In this case, the 1423 forbidden_zero_bit of the NAL unit MUST be set to one to indicate a 1424 syntax violation. 1426 6. Packetization Rules 1428 The packetization modes are introduced in section 5.2. The 1429 packetization rules common to more than one of the packetization 1430 modes are specified in section 6.1. The packetization rules for the 1431 single NAL unit mode, the non-interleaved mode, and the interleaved 1432 mode are specified in sections 6.2, 6.3, and 6.4, respectively. 1434 6.1. Common Packetization Rules 1436 All senders MUST enforce the following packetization rules regardless 1437 of the packetization mode in use: 1439 o Coded slice NAL units or coded slice data partition NAL units 1440 belonging to the same coded picture (and thus sharing the same RTP 1441 timestamp value) MAY be sent in any order; however, for delay- 1442 critical systems, they SHOULD be sent in their original decoding 1443 order to minimize the delay. Note that the decoding order is the 1444 order of the NAL units in the bitstream. 1446 o Parameter sets are handled in accordance with the rules and 1447 recommendations given in section 8.4. 1449 o MANEs MUST NOT duplicate any NAL unit except for sequence or 1450 picture parameter set NAL units, as neither this memo nor the 1451 H.264 specification provides means to identify duplicated NAL 1452 units. Sequence and picture parameter set NAL units MAY be 1453 duplicated to make their correct reception more probable, but any 1454 such duplication MUST NOT affect the contents of any active 1455 sequence or picture parameter set. Duplication SHOULD be 1456 performed on the application layer and not by duplicating RTP 1457 packets (with identical sequence numbers). 1459 Senders using the non-interleaved mode and the interleaved mode MUST 1460 enforce the following packetization rule: 1462 o MANEs MAY convert single NAL unit packets into one aggregation 1463 packet, convert an aggregation packet into several single NAL unit 1464 packets, or mix both concepts, in an RTP translator. The RTP 1465 translator SHOULD take into account at least the following 1466 parameters: path MTU size, unequal protection mechanisms (e.g., 1467 through packet-based FEC according to RFC 2733 [17], especially 1468 for sequence and picture parameter set NAL units and coded slice 1469 data partition A NAL units), bearable latency of the system, and 1470 buffering capabilities of the receiver. 1472 Informative note: An RTP translator is required to handle RTCP 1473 as per RFC 3550. 1475 6.2. Single NAL Unit Mode 1477 This mode is in use when the value of the OPTIONAL packetization-mode 1478 media type parameter is equal to 0 or the packetization-mode is not 1479 present. All receivers MUST support this mode. It is primarily 1480 intended for low-delay applications that are compatible with systems 1481 using ITU-T Recommendation H.241 [3] (see section 12.1). Only single 1482 NAL unit packets MAY be used in this mode. STAPs, MTAPs, and FUs 1483 MUST NOT be used. The transmission order of single NAL unit packets 1484 MUST comply with the NAL unit decoding order. 1486 6.3. Non-Interleaved Mode 1488 This mode is in use when the value of the OPTIONAL packetization-mode 1489 media type parameter is equal to 1. This mode SHOULD be supported. 1490 It is primarily intended for low-delay applications. Only single NAL 1491 unit packets, STAP-As, and FU-As MAY be used in this mode. STAP-Bs, 1492 MTAPs, and FU-Bs MUST NOT be used. The transmission order of NAL 1493 units MUST comply with the NAL unit decoding order. 1495 6.4. Interleaved Mode 1497 This mode is in use when the value of the OPTIONAL packetization-mode 1498 media type parameter is equal to 2. Some receivers MAY support this 1499 mode. STAP-Bs, MTAPs, FU-As, and FU-Bs MAY be used. STAP-As and 1500 single NAL unit packets MUST NOT be used. The transmission order of 1501 packets and NAL units is constrained as specified in section 5.5. 1503 7. De-Packetization Process 1505 The de-packetization process is implementation dependent. Therefore, 1506 the following description should be seen as an example of a suitable 1507 implementation. Other schemes may be used as well as long as the 1508 output for the same input is the same as the process described below. 1509 The output is the same meaning that the number of NAL units and their 1510 order are both the identical. Optimizations relative to the 1511 described algorithms are likely possible. Section 7.1 presents the 1512 de-packetization process for the single NAL unit and non-interleaved 1513 packetization modes, whereas section 7.2 describes the process for 1514 the interleaved mode. Section 7.3 includes additional decapsulation 1515 guidelines for intelligent receivers. 1517 All normal RTP mechanisms related to buffer management apply. In 1518 particular, duplicated or outdated RTP packets (as indicated by the 1519 RTP sequences number and the RTP timestamp) are removed. To 1520 determine the exact time for decoding, factors such as a possible 1521 intentional delay to allow for proper inter-stream synchronization 1522 must be factored in. 1524 7.1. Single NAL Unit and Non-Interleaved Mode 1526 The receiver includes a receiver buffer to compensate for 1527 transmission delay jitter. The receiver stores incoming packets in 1528 reception order into the receiver buffer. Packets are decapsulated 1529 in RTP sequence number order. If a decapsulated packet is a single 1530 NAL unit packet, the NAL unit contained in the packet is passed 1531 directly to the decoder. If a decapsulated packet is an STAP-A, the 1532 NAL units contained in the packet are passed to the decoder in the 1533 order in which they are encapsulated in the packet. For all the FU-A 1534 packets containing fragments of a single NAL unit, the decapsulated 1535 fragments are concatenated in their sending order to recover the NAL 1536 unit, which is then passed to the decoder. 1538 Informative note: If the decoder supports Arbitrary Slice Order, 1539 coded slices of a picture can be passed to the decoder in any 1540 order regardless of their reception and transmission order. 1542 7.2. Interleaved Mode 1544 The general concept behind these de-packetization rules is to reorder 1545 NAL units from transmission order to the NAL unit decoding order. 1547 The receiver includes a receiver buffer, which is used to compensate 1548 for transmission delay jitter and to reorder NAL units from 1549 transmission order to the NAL unit decoding order. In this section, 1550 the receiver operation is described under the assumption that there 1551 is no transmission delay jitter. To make a difference from a 1552 practical receiver buffer that is also used for compensation of 1553 transmission delay jitter, the receiver buffer is here after called 1554 the deinterleaving buffer in this section. Receivers SHOULD also 1555 prepare for transmission delay jitter; i.e., either reserve separate 1556 buffers for transmission delay jitter buffering and deinterleaving 1557 buffering or use a receiver buffer for both transmission delay jitter 1558 and deinterleaving. Moreover, receivers SHOULD take transmission 1559 delay jitter into account in the buffering operation; e.g., by 1560 additional initial buffering before starting of decoding and 1561 playback. 1563 This section is organized as follows: subsection 7.2.1 presents how o 1564 calculate the size of the deinterleaving buffer. Subsection 7.2.2 1565 specifies the receiver process how to organize received NAL units to 1566 the NAL unit decoding order. 1568 7.2.1. Size of the Deinterleaving Buffer 1570 When SDP Offer/Answer model or any other capability exchange 1571 procedure is used in session setup, the properties of the received 1572 stream SHOULD be such that the receiver capabilities are not 1573 exceeded. In the SDP Offer/Answer model, the receiver can indicate 1574 its capabilities to allocate a deinterleaving buffer with the deint- 1575 buf-cap media type parameter. The sender indicates the requirement 1576 for the deinterleaving buffer size with the sprop-deint-buf-req media 1577 type parameter. It is therefore RECOMMENDED to set the 1578 deinterleaving buffer size, in terms of number of bytes, equal to or 1579 greater than the value of sprop-deint-buf-req media type parameter. 1580 See section 8.1 for further information on deint-buf-cap and sprop- 1581 deint-buf-req media type parameters and section 8.2.2 for further 1582 information on their use in SDP Offer/Answer model. 1584 When a declarative session description is used in session setup, the 1585 sprop-deint-buf-req media type parameter signals the requirement for 1586 the deinterleaving buffer size. It is therefore RECOMMENDED to set 1587 the deinterleaving buffer size, in terms of number of bytes, equal to 1588 or greater than the value of sprop-deint-buf-req media type 1589 parameter. 1591 7.2.2. Deinterleaving Process 1593 There are two buffering states in the receiver: initial buffering and 1594 buffering while playing. Initial buffering occurs when the RTP 1595 session is initialized. After initial buffering, decoding and 1596 playback are started, and the buffering-while-playing mode is used. 1598 Regardless of the buffering state, the receiver stores incoming NAL 1599 units, in reception order, in the deinterleaving buffer as follows. 1600 NAL units of aggregation packets are stored in the deinterleaving 1601 buffer individually. The value of DON is calculated and stored for 1602 each NAL unit. 1604 The receiver operation is described below with the help of the 1605 following functions and constants: 1607 o Function AbsDON is specified in section 8.1. 1609 o Function don_diff is specified in section 5.5. 1611 o Constant N is the value of the OPTIONAL sprop-interleaving-depth 1612 media type type parameter (see section 8.1) incremented by 1. 1614 Initial buffering lasts until one of the following conditions is 1615 fulfilled: 1617 o There are N or more VCL NAL units in the deinterleaving buffer. 1619 o If sprop-max-don-diff is present, don_diff(m,n) is greater than 1620 the value of sprop-max-don-diff, in which n corresponds to the NAL 1621 unit having the greatest value of AbsDON among the received NAL 1622 units and m corresponds to the NAL unit having the smallest value 1623 of AbsDON among the received NAL units. 1625 o Initial buffering has lasted for the duration equal to or greater 1626 than the value of the OPTIONAL sprop-init-buf-time media type 1627 parameter. 1629 The NAL units to be removed from the deinterleaving buffer are 1630 determined as follows: 1632 o If the deinterleaving buffer contains at least N VCL NAL units, 1633 NAL units are removed from the deinterleaving buffer and passed to 1634 the decoder in the order specified below until the buffer contains 1635 N-1 VCL NAL units. 1637 o If sprop-max-don-diff is present, all NAL units m for which 1638 don_diff(m,n) is greater than sprop-max-don-diff are removed from 1639 the deinterleaving buffer and passed to the decoder in the order 1640 specified below. Herein, n corresponds to the NAL unit having the 1641 greatest value of AbsDON among the NAL units in the deinterleaving 1642 buffer. 1644 The order in which NAL units are passed to the decoder is specified 1645 as follows: 1647 o Let PDON be a variable that is initialized to 0 at the beginning 1648 of the RTP session. 1650 o For each NAL unit associated with a value of DON, a DON distance 1651 is calculated as follows. If the value of DON of the NAL unit is 1652 larger than the value of PDON, the DON distance is equal to DON - 1653 PDON. Otherwise, the DON distance is equal to 65535 - PDON + DON 1654 + 1. 1656 o NAL units are delivered to the decoder in ascending order of DON 1657 distance. If several NAL units share the same value of DON 1658 distance, they can be passed to the decoder in any order. 1660 o When a desired number of NAL units have been passed to the 1661 decoder, the value of PDON is set to the value of DON for the last 1662 NAL unit passed to the decoder. 1664 7.3. Additional De-Packetization Guidelines 1666 The following additional de-packetization rules may be used to 1667 implement an operational H.264 de-packetizer: 1669 o Intelligent RTP receivers (e.g., in gateways) may identify lost 1670 coded slice data partitions A (DPAs). If a lost DPA is found, a 1671 gateway may decide not to send the corresponding coded slice data 1672 partitions B and C, as their information is meaningless for H.264 1673 decoders. In this way a MANE can reduce network load by 1674 discarding useless packets without parsing a complex bitstream. 1676 o Intelligent RTP receivers (e.g., in gateways) may identify lost 1677 FUs. If a lost FU is found, a gateway may decide not to send the 1678 following FUs of the same fragmented NAL unit, as their 1679 information is meaningless for H.264 decoders. In this way a MANE 1680 can reduce network load by discarding useless packets without 1681 parsing a complex bitstream. 1683 o Intelligent receivers having to discard packets or NALUs should 1684 first discard all packets/NALUs in which the value of the NRI 1685 field of the NAL unit type octet is equal to 0. This will 1686 minimize the impact on user experience and keep the reference 1687 pictures intact. If more packets have to be discarded, then 1688 packets with a numerically lower NRI value should be discarded 1689 before packets with a numerically higher NRI value. However, 1690 discarding any packets with an NRI bigger than 0 very likely leads 1691 to decoder drift and SHOULD be avoided. 1693 8. Payload Format Parameters 1695 This section specifies the parameters that MAY be used to select 1696 optional features of the payload format and certain features of the 1697 bitstream. The parameters are specified here as part of the media 1698 subtype registration for the ITU-T H.264 | ISO/IEC 14496-10 codec. A 1699 mapping of the parameters into the Session Description Protocol (SDP) 1700 [6] is also provided for applications that use SDP. Equivalent 1701 parameters could be defined elsewhere for use with control protocols 1702 that do not use SDP. 1704 Some parameters provide a receiver with the properties of the stream 1705 that will be sent. The names of all these parameters start with 1706 "sprop" for stream properties. Some of these "sprop" parameters are 1707 limited by other payload or codec configuration parameters. For 1708 example, the sprop-parameter-sets parameter is constrained by the 1709 profile-level-id parameter. The media sender selects all "sprop" 1710 parameters rather than the receiver. This uncommon characteristic of 1711 the "sprop" parameters may not be compatible with some signaling 1712 protocol concepts, in which case the use of these parameters SHOULD 1713 be avoided. 1715 8.1. Media Type Registration 1717 The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec is 1718 allocated from the IETF tree. 1720 The receiver SHOULD ignore any unspecified parameter. 1722 Media Type name: video 1724 Media subtype name: H264 1726 Required parameters: none 1728 OPTIONAL parameters: 1730 profile-level-id: 1731 A base16 [7] (hexadecimal) representation of the following 1732 three bytes in the sequence parameter set NAL unit specified 1733 in [1]: 1) profile_idc, 2) a byte herein referred to as 1734 profile-iop, composed of the values of constraint_set0_flag, 1735 constraint_set1_flag,constraint_set2_flag, 1736 constraint_set3_flag, and reserved_zero_4bits in bit- 1737 significance order, starting from the most significant bit, 1738 and 3) level_idc. Note that reserved_zero_4bits is required 1739 to be equal to 0 in [1], but other values for it may be 1740 specified in the future by ITU-T or ISO/IEC. 1742 If the profile-level-id parameter is used to indicate 1743 properties of a NAL unit stream, it indicates the profile or a 1744 common subset of coding tools of more than one profile and the 1745 lowest level that a decoder has to support in order to comply 1746 with [1] when it decodes the stream. Bit 7 (the most 1747 significant bit, constraint_set0_flag), bit 6 1748 (constraint_set1_flag), and bit 5 (constraint_set2_flag) of 1749 the profile-iop byte indicate whether the NAL unit stream also 1750 obeys all constraints of the indicated profiles as follows. 1751 If bit 7, bit 6, or bit 5 of profile-iop is equal to 1, all 1752 constraints of the Baseline profile (profile_idc equal to 66), 1753 the Main profile (profile_idc equal to 77), or the Extended 1754 profile (profile_idc equal to 88), respectively, are obeyed in 1755 the NAL unit stream. 1757 When profile_idc is equal to 66, 77 or 88 (the Baseline, Main, 1758 or Extended profile) and level_idc is equal to 11, bit 4 1759 (constraint_set3_flag) of the profile-iop byte equal to 1 1760 indicates that the level for the NAL unit stream is level 1b. 1761 When profile_idc is equal to 100 or 110 (the High or High 10 1762 profile), constraint_set3_flag equal to 1 indicates that all 1763 constraints of the High 10 Intra profile (identified by 1764 profile_idc equal to 110 and constraint_set3_flag equal to 1) 1765 are obeyed in the NAL unit stream. When profile_idc is equal 1766 to 122 (the High 4:2:2 profile), constraint_set3_flag equal to 1767 1 indicates that all constraints of the High 4:2:2 Intra 1768 profile (identified by profile_idc equal to 122 and 1769 constraint_set3_flag equal to 1) are obeyed in the NAL unit 1770 stream. When profile_idc is equal to 244 (the High 4:4:4 1771 Predictive profile), constraint_set3_flag equal to 1 indicates 1772 that all constraints of the High 4:4:4 Intra profile 1773 (identified by profile_idc equal to 244 and 1774 constraint_set3_flag equal to 1) are obeyed in the NAL unit 1775 stream. 1777 If the profile-level-id parameter is used for capability 1778 exchange or session setup procedure, it indicates the profile 1779 or a common subset of coding tools of more than one profile 1780 that the codec supports and the highest level supported. Bit 1781 7, bit 6, bit 5, and bit 4 of the profile-iop byte indicate 1782 whether the codec has additional limitations whereby only the 1783 common subset of the coding tools and limitations of the 1784 profiles signaled with the profile-iop byte and of the profile 1785 indicated by profile_idc is supported by the codec. For 1786 example, if a codec supports only the common subset of the 1787 coding tools of the Baseline profile, the Main profile, and 1788 the Extended profile at level 2.1 and below, the profile- 1789 level-id may become 42E015, in which 42 (hexadecimal) stands 1790 for the Baseline profile, E0 (hexadecimal) indicates that only 1791 the common subset for all the three profiles is supported, and 1792 15 (hexadecimal) indicates level 2.1. The common subset of 1793 the coding tools of the Baseline profile, the Main profile, 1794 and the Extended profile is actually equivalent to the coding 1795 tools of the Constrained Baseline profile, for which the 1796 combination of profile_idc and profile-iop may be any of those 1797 corresponding to the Constrained Baseline profile in Table 5, 1798 which lists all profiles defined in Annex A of [1] and, for 1799 each of the profiles, the possible combinations of profile_idc 1800 and profile-iop that represent the same set of coding tools 1801 supported by the profile. 1803 Table 5. List of all profiles defined in Annex A of [1] 1804 and the possible combinations of profile_idc and profile- 1805 iop representing the same set of coding tools. In the 1806 following, x may be either 0 or 1, and other notions as 1807 follows. CB: Constrained Baseline profile, B: Baseline 1808 profile, M: Main profile, E: Extended profile, H: High 1809 profile, H10: High 10 profile, H42: High 4:2:2 profile, 1810 H44: High 4:4:4 Predictive profile, H10I: High 10 Intra 1811 profile, H42I: High 4:2:2 Intra profile, H44I: High 4:4:4 1812 Intra profile, and C44I: CAVLC 4:4:4 Intra profile. 1814 Profile profile_idc profile-iop 1815 (hexadecimal) (binary) 1817 CB 42 x1xx0000 1818 4D 1xxx0000 1819 58 11xx0000 1820 64, 6E, 7A or F4 1xx00000 1821 B 42 x0xx0000 1822 58 10xx0000 1823 M 4D 0x0x0000 1824 64,6E,7A or F4 01000000 1825 E 58 00xx0000 1826 H 64 00000000 1827 H10 6E 00000000 1828 H42 7A 00000000 1829 H44 F4 00000000 1830 H10I 64 00010000 1831 H42I 7A 00010000 1832 H44I F4 00010000 1833 C44I 2C 00010000 1835 Note that other combinations of profile_idc and profile-iop 1836 (note listed in Table 5) may represent a common subset of 1837 coding tools for more than one profile. Note also that a 1838 decoder conforming to a certain profile may be able to decoder 1839 bitstreams conforming to other profiles. For example, a 1840 decoder conforming to the High 4:4:4 profile at certain level 1841 must be able to decode bitstreams confirming to the 1842 Constrained Baseline, Main, High, High 10 or High 4:2:2 1843 profile at the same or a lower level. 1845 Informative note: Capability exchange and session setup 1846 procedures should provide means to list the capabilities 1847 for each supported codec profile and each common subset of 1848 profiles that can be represented by profile_idc and 1849 profile-iop separately. For example, the one-of-N codec 1850 selection procedure of the SDP Offer/Answer model can be 1851 used (section 10.2 of [8]). However, in some cases the 1852 value N in the one-of-N codec selection procedure may be 1853 too large for an acceptable size of the SDP message. 1854 Therefore, a receiver should understand the different 1855 equivalent combinations of profile_id and profile-iop that 1856 represent the same set of coding tools the receiver 1857 supports, and be ready to accept an offer using any of the 1858 equivalent combinations. 1860 If no profile-level-id is present, the Baseline Profile 1861 without additional constraints at Level 1 MUST be implied. 1863 max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br: 1864 These parameters MAY be used to signal the capabilities of a 1865 receiver implementation. These parameters MUST NOT be used for 1866 any other purpose. The profile-level-id parameter MUST be 1867 present in the same receiver capability description that 1868 contains any of these parameters. The level conveyed in the 1869 value of the profile-level-id parameter MUST be such that the 1870 receiver is fully capable of supporting. max-mbps, max-smbps, 1871 max-fs, max-cpb, max-dpb, and max-br MAY be used to indicate 1872 capabilities of the receiver that extend the required 1873 capabilities of the signaled level, as specified below. 1875 When more than one parameter from the set (max-mbps, max-smbps 1876 , max-fs, max-cpb, max-dpb, max-br) is present, the receiver 1877 MUST support all signaled capabilities simultaneously. For 1878 example, if both max-mbps and max-br are present, the signaled 1879 level with the extension of both the frame rate and bit rate 1880 is supported. That is, the receiver is able to decode NAL 1881 unit streams in which the macroblock processing rate is up to 1882 max-mbps (inclusive), the bit rate is up to max-br 1883 (inclusive), the coded picture buffer size is derived as 1884 specified in the semantics of the max-br parameter below, and 1885 other properties comply with the level specified in the value 1886 of the profile-level-id parameter. 1888 If a receiver can support all the properties of level A, the 1889 level specified in the value of the profile-level-id MUST be 1890 level A (i.e. MUST NOT be lower than level A). In other 1891 words, a sender or receiver MUST NOT signal values of max- 1892 mbps, max-fs, max-cpb, max-dpb, and max-br that meet the 1893 requirements of a higher level compared to the level specified 1894 in the value of the profile-level-id parameter. 1896 Informative note: When the OPTIONAL media type parameters 1897 are used to signal the properties of a NAL unit stream, 1898 max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br 1899 are not present, and the value of profile-level-id must 1900 always be such that the NAL unit stream complies fully with 1901 the specified profile and level. 1903 max-mbps: The value of max-mbps is an integer indicating the 1904 maximum macroblock processing rate in units of macroblocks per 1905 second. The max-mbps parameter signals that the receiver is 1906 capable of decoding video at a higher rate than is required by 1907 the signaled level conveyed in the value of the profile-level- 1908 id parameter. When max-mbps is signaled, the receiver MUST be 1909 able to decode NAL unit streams that conform to the signaled 1910 level, with the exception that the MaxMBPS value in Table A-1 1911 of [1] for the signaled level is replaced with the value of 1912 max-mbps. The value of max-mbps MUST be greater than or equal 1913 to the value of MaxMBPS for the level given in Table A-1 of 1914 [1]. Senders MAY use this knowledge to send pictures of a 1915 given size at a higher picture rate than is indicated in the 1916 signaled level. 1918 max-smbps: The value of max-smbps is an integer indicating the 1919 maximum static macroblock processing rate in units of static 1920 macroblocks per second, under the hypothetical assumption that 1921 all macroblocks are static macroblocks. When max-smbps is 1922 signalled the MaxMBPS value in Table A-1 of [1] should be 1923 replaced with the result of the following computation: 1925 o If the parameter max-mbps is signalled, set a variable 1926 MaxMacroblocksPerSecond to the value of max-mbps. 1927 Otherwise, set MaxMacroblocksPerSecond equal to the value 1928 of MaxMBPS for the level in Table A-1 [1]. 1930 o Set a variable P_non-static to the proportion of non-static 1931 macroblocks in picture n. 1933 o Set a variable P_static to the proportion of static 1934 macroblocks in picture n. 1936 o The value of MaxMBPS in Table A-1 of [1] should be 1937 considered by the encoder to be equal to: 1939 MaxMacroblocksPerSecond * max-smbps / ( P_non-static * max- 1940 smbps + P_static * MaxMacroblocksPerSecond) 1942 The encoder should recompute this value for each picture. The 1943 value of max-smbps MUST be greater than the value of MaxMBPS 1944 for the level given in Table A-1 of [1]. Senders MAY use this 1945 knowledge to send pictures of a given size at a higher picture 1946 rate than is indicated in the signalled level. 1948 When rfc3984-compatible is equal to 1, max-smbps MUST NOT be 1949 present. 1951 max-fs: The value of max-fs is an integer indicating the maximum 1952 frame size in units of macroblocks. The max-fs parameter 1953 signals that the receiver is capable of decoding larger 1954 picture sizes than are required by the signaled level conveyed 1955 in the value of the profile-level-id parameter. When max-fs 1956 is signaled, the receiver MUST be able to decode NAL unit 1957 streams that conform to the signaled level, with the exception 1958 that the MaxFS value in Table A-1 of [1] for the signaled 1959 level is replaced with the value of max-fs. The value of max- 1960 fs MUST be greater than or equal to the value of MaxFS for the 1961 level given in Table A-1 of [1]. Senders MAY use this 1962 knowledge to send larger pictures at a proportionally lower 1963 frame rate than is indicated in the signaled level. 1965 max-cpb: The value of max-cpb is an integer indicating the 1966 maximum coded picture buffer size in units of 1000 bits for 1967 the VCL HRD parameters (see A.3.1 item i of [1]) and in units 1968 of 1200 bits for the NAL HRD parameters (see A.3.1 item j of 1969 [1]). The max-cpb parameter signals that the receiver has 1970 more memory than the minimum amount of coded picture buffer 1971 memory required by the signaled level conveyed in the value of 1972 the profile-level-id parameter. When max-cpb is signaled, the 1973 receiver MUST be able to decode NAL unit streams that conform 1974 to the signaled level, with the exception that the MaxCPB 1975 value in Table A-1 of [1] for the signaled level is replaced 1976 with the value of max-cpb. The value of max-cpb MUST be 1977 greater than or equal to the value of MaxCPB for the level 1978 given in Table A-1 of [1]. Senders MAY use this knowledge to 1979 construct coded video streams with greater variation of bit 1980 rate than can be achieved with the MaxCPB value in Table A-1 1981 of [1]. 1983 Informative note: The coded picture buffer is used in the 1984 hypothetical reference decoder (Annex C) of H.264. The use 1985 of the hypothetical reference decoder is recommended in 1986 H.264 encoders to verify that the produced bitstream 1987 conforms to the standard and to control the output bitrate. 1988 Thus, the coded picture buffer is conceptually independent 1989 of any other potential buffers in the receiver, including 1990 de-interleaving and de-jitter buffers. The coded picture 1991 buffer need not be implemented in decoders as specified in 1992 Annex C of H.264, but rather standard-compliant decoders 1993 can have any buffering arrangements provided that they can 1994 decode standard-compliant bitstreams. Thus, in practice, 1995 the input buffer for video decoder can be integrated with 1996 de-interleaving and de-jitter buffers of the receiver. 1998 max-dpb: The value of max-dpb is an integer indicating the 1999 maximum decoded picture buffer size in units of 1024 bytes. 2000 The max-dpb parameter signals that the receiver has more 2001 memory than the minimum amount of decoded picture buffer 2002 memory required by the signaled level conveyed in the value of 2003 the profile-level-id parameter. When max-dpb is signaled, the 2004 receiver MUST be able to decode NAL unit streams that conform 2005 to the signaled level, with the exception that the MaxDPB 2006 value in Table A-1 of [1] for the signaled level is replaced 2007 with the value of max-dpb. Consequently, a receiver that 2008 signals max-dpb MUST be capable of storing the following 2009 number of decoded frames, complementary field pairs, and non- 2010 paired fields in its decoded picture buffer: 2012 Min(1024 * max-dpb / ( PicWidthInMbs * FrameHeightInMbs * 2013 256 * ChromaFormatFactor ), 16) 2015 PicWidthInMbs, FrameHeightInMbs, and ChromaFormatFactor are 2016 defined in [1]. 2018 The value of max-dpb MUST be greater than or equal to the 2019 value of MaxDPB for the level given in Table A-1 of [1]. 2020 Senders MAY use this knowledge to construct coded video 2021 streams with improved compression. 2023 Informative note: This parameter was added primarily to 2024 complement a similar codepoint in the ITU-T Recommendation 2025 H.245, so as to facilitate signaling gateway designs. The 2026 decoded picture buffer stores reconstructed samples. There 2027 is no relationship between the size of the decoded picture 2028 buffer and the buffers used in RTP, especially de- 2029 interleaving and de-jitter buffers. 2031 max-br: The value of max-br is an integer indicating the maximum 2032 video bit rate in units of 1000 bits per second for the VCL 2033 HRD parameters (see A.3.1 item i of [1]) and in units of 1200 2034 bits per second for the NAL HRD parameters (see A.3.1 item j 2035 of [1]). 2037 The max-br parameter signals that the video decoder of the 2038 receiver is capable of decoding video at a higher bit rate 2039 than is required by the signaled level conveyed in the value 2040 of the profile-level-id parameter. 2042 When max-br is signaled, the video codec of the receiver MUST 2043 be able to decode NAL unit streams that conform to the 2044 signaled level, conveyed in the profile-level-id parameter, 2045 with the following exceptions in the limits specified by the 2046 level: 2048 o The value of max-br replaces the MaxBR value of the signaled 2049 level (in Table A-1 of [1]). 2051 o When the max-cpb parameter is not present, the result of the 2052 following formula replaces the value of MaxCPB in Table A-1 2053 of [1]: (MaxCPB of the signaled level) * max-br / (MaxBR of 2054 the signaled level). 2056 For example, if a receiver signals capability for Level 1.2 2057 with max-br equal to 1550, this indicates a maximum video 2058 bitrate of 1550 kbits/sec for VCL HRD parameters, a maximum 2059 video bitrate of 1860 kbits/sec for NAL HRD parameters, and a 2060 CPB size of 4036458 bits (1550000 / 384000 * 1000 * 1000). 2062 The value of max-br MUST be greater than or equal to the value 2063 MaxBR for the signaled level given in Table A-1 of [1]. 2065 Senders MAY use this knowledge to send higher bitrate video as 2066 allowed in the level definition of Annex A of H.264, to 2067 achieve improved video quality. 2069 Informative note: This parameter was added primarily to 2070 complement a similar codepoint in the ITU-T Recommendation 2071 H.245, so as to facilitate signaling gateway designs. No 2072 assumption can be made from the value of this parameter 2073 that the network is capable of handling such bit rates at 2074 any given time. In particular, no conclusion can be drawn 2075 that the signaled bit rate is possible under congestion 2076 control constraints. 2078 redundant-pic-cap: 2079 This parameter signals the capabilities of a receiver 2080 implementation. When equal to 0, the parameter indicates that 2081 the receiver makes no attempt to use redundant coded pictures 2082 to correct incorrectly decoded primary coded pictures. When 2083 equal to 0, the receiver is not capable of using redundant 2084 slices; therefore, a sender SHOULD avoid sending redundant 2085 slices to save bandwidth. When equal to 1, the receiver is 2086 capable of decoding any such redundant slice that covers a 2087 corrupted area in a primary decoded picture (at least partly), 2088 and therefore a sender MAY send redundant slices. When the 2089 parameter is not present, then a value of 0 MUST be used for 2090 redundant-pic-cap. When present, the value of redundant-pic- 2091 cap MUST be either 0 or 1. 2093 When the profile-level-id parameter is present in the same 2094 signaling as the redundant-pic-cap parameter, and the profile 2095 indicated in profile-level-id is such that it disallows the 2096 use of redundant coded pictures (e.g., Main Profile), the 2097 value of redundant-pic-cap MUST be equal to 0. When a 2098 receiver indicates redundant-pic-cap equal to 0, the received 2099 stream SHOULD NOT contain redundant coded pictures. 2101 Informative note: Even if redundant-pic-cap is equal to 0, 2102 the decoder is able to ignore redundant codec pictures 2103 provided that the decoder supports such a profile 2104 (Baseline, Extended) in which redundant coded pictures are 2105 allowed. 2107 Informative note: Even if redundant-pic-cap is equal to 1, 2108 the receiver may also choose other error concealment 2109 strategies to replace or complement decoding of redundant 2110 slices. 2112 sprop-parameter-sets: 2113 This parameter MAY be used to convey any sequence and picture 2114 parameter set NAL units (herein referred to as the initial 2115 parameter set NAL units) that MUST precede any other NAL units 2116 in decoding order. The parameter MUST NOT be used to indicate 2117 codec capability in any capability exchange procedure. The 2118 value of the parameter is the base64 [7] representation of the 2119 initial parameter set NAL units as specified in sections 2120 7.3.2.1 and 7.3.2.2 of [1]. The parameter sets are conveyed 2121 in decoding order, and no framing of the parameter set NAL 2122 units takes place. A comma is used to separate any pair of 2123 parameter sets in the list. Note that the number of bytes in 2124 a parameter set NAL unit is typically less than 10, but a 2125 picture parameter set NAL unit can contain several hundreds of 2126 bytes. 2128 Informative note: When several payload types are offered in 2129 the SDP Offer/Answer model, each with its own sprop- 2130 parameter-sets parameter, then the receiver cannot assume 2131 that those parameter sets do not use conflicting storage 2132 locations (i.e., identical values of parameter set 2133 identifiers). Therefore, a receiver should double-buffer 2134 all sprop-parameter-sets and make them available to the 2135 decoder instance that decodes a certain payload type. 2137 The "sprop-parameter-sets" MUST contain only parameter sets 2138 that are conforming to the profile-level-id, i.e. the profile 2139 or the used subset of coding tools indicated by the parameter 2140 sets is equivalent to that indicated by profile-level-id, and 2141 the level indicated by the parameter sets is the same as the 2142 level indicated by profile-level-id. 2144 When the sprop-level-parameter-sets parameter is present, 2145 sprop-parameter-sets MUST NOT be present. There MAY be zero 2146 or one instance of "sprop-level-parameter-sets" present. 2148 sprop-level-parameter-sets: 2149 This parameter MAY be used to convey any sequence and picture 2150 parameter set NAL units (herein referred to as the initial 2151 parameter set NAL units) that MUST precede any other NAL units 2152 in decoding order and that are associated with a particular 2153 level. The parameter MUST NOT be used to indicate codec 2154 capability in any capability exchange procedure. The value of 2155 the parameter consists of a three-byte field followed by a 2156 comma and the base64 [7] representation of the initial 2157 parameter set NAL units as specified in sections 7.3.2.1 and 2158 7.3.2.2 of [1]. The parameter sets are conveyed in decoding 2159 order, and no framing of the parameter set NAL units takes 2160 place. A comma is used to separate any pair of parameter sets 2161 in the list. The three-byte field MUST be equal to the three 2162 bytes from profile_idc to level_idc, inclusive, in all 2163 sequence parameter set NAL units contained in this media type 2164 parameter. The profile_idc in all sequence parameter set NAL 2165 units contained in this media type parameter MUST be equal to 2166 the first byte in the media type parameter profile-level-id. 2167 The level indicated by the three-byte field MUST be equal to 2168 or lower than the level indicated by the media type parameter 2169 profile-level-id. 2171 Informative note: This parameter allows for efficient level 2172 down-gradable SDP offer/answer and out-of-band transport of 2173 parameter sets, with only one round of offer/answer. 2175 The "sprop-level-parameter-sets" MUST contain only parameter 2176 sets that are conforming to the profile-level-id, i.e. the 2177 profile or the used subset of coding tools indicated by the 2178 parameter sets is equivalent to that indicated by profile- 2179 level-id, and the level indicated by the parameter sets is the 2180 same as the level indicated by profile-level-id. 2182 When rfc3984-compatible is equal to 1 or sprop-parameter-sets 2183 is present, sprop-level-parameter-sets MUST NOT be present. 2185 There MAY be zero or one or more than one instance of "sprop- 2186 level-parameter-sets" present. 2188 sprop-ssrc: 2189 This parameter MAY be used to signal the properties of an RTP 2190 packet stream. It specifies the SSRC values in the RTP header 2191 of all RTP packets in the RTP packet stream. The 2192 representation of this parameter is the same as for the SSRC 2193 field in the RTP header. 2195 Informative note: This parameter allows for out-of-band 2196 transport of parameter sets in topologies like Topo-Video- 2197 switch-MCU [28]. 2199 When rfc3984-compatible is equal to 1, sprop-ssrc MUST NOT be 2200 present. 2202 parameter-add: 2203 This parameter MAY be used to signal whether the receiver of 2204 this parameter is allowed to add parameter sets in its 2205 signaling response using the sprop-parameter-sets media type 2206 parameter. The value of this parameter is either 0 or 1. 0 2207 is equal to false; i.e., it is not allowed to add parameter 2208 sets. 1 is equal to true; i.e., it is allowed to add 2209 parameter sets. If the parameter is not present, its value 2210 MUST be 1. 2212 The use of parameter-add is deprecated. 2214 When rfc3984-compatible is equal to 0, parameter-add MUST NOT 2215 be present. 2217 packetization-mode: 2218 This parameter signals the properties of an RTP payload type 2219 or the capabilities of a receiver implementation. Only a 2220 single configuration point can be indicated; thus, when 2221 capabilities to support more than one packetization-mode are 2222 declared, multiple configuration points (RTP payload types) 2223 must be used. 2225 When the value of packetization-mode is equal to 0 or 2226 packetization-mode is not present, the single NAL mode, as 2227 defined in section 6.2 of RFC 3984, MUST be used. This mode 2228 is in use in standards using ITU-T Recommendation H.241 [3] 2229 (see section 12.1). When the value of packetization-mode is 2230 equal to 1, the non-interleaved mode, as defined in section 2231 6.3 of RFC 3984, MUST be used. When the value of 2232 packetization-mode is equal to 2, the interleaved mode, as 2233 defined in section 6.4 of RFC 3984, MUST be used. The value 2234 of packetization-mode MUST be an integer in the range of 0 to 2235 2, inclusive. 2237 sprop-interleaving-depth: 2238 This parameter MUST NOT be present when packetization-mode is 2239 not present or the value of packetization-mode is equal to 0 2240 or 1. This parameter MUST be present when the value of 2241 packetization-mode is equal to 2. 2243 This parameter signals the properties of an RTP packet stream. 2244 It specifies the maximum number of VCL NAL units that precede 2245 any VCL NAL unit in the RTP packet stream in transmission 2246 order and follow the VCL NAL unit in decoding order. 2247 Consequently, it is guaranteed that receivers can reconstruct 2248 NAL unit decoding order when the buffer size for NAL unit 2249 decoding order recovery is at least the value of sprop- 2250 interleaving-depth + 1 in terms of VCL NAL units. 2252 The value of sprop-interleaving-depth MUST be an integer in 2253 the range of 0 to 32767, inclusive. 2255 sprop-deint-buf-req: 2256 This parameter MUST NOT be present when packetization-mode is 2257 not present or the value of packetization-mode is equal to 0 2258 or 1. It MUST be present when the value of packetization-mode 2259 is equal to 2. 2261 sprop-deint-buf-req signals the required size of the 2262 deinterleaving buffer for the RTP packet stream. The value of 2263 the parameter MUST be greater than or equal to the maximum 2264 buffer occupancy (in units of bytes) required in such a 2265 deinterleaving buffer that is specified in section 7.2 of RFC 2266 3984. It is guaranteed that receivers can perform the 2267 deinterleaving of interleaved NAL units into NAL unit decoding 2268 order, when the deinterleaving buffer size is at least the 2269 value of sprop-deint-buf-req in terms of bytes. 2271 The value of sprop-deint-buf-req MUST be an integer in the 2272 range of 0 to 4294967295, inclusive. 2274 Informative note: sprop-deint-buf-req indicates the 2275 required size of the deinterleaving buffer only. When 2276 network jitter can occur, an appropriately sized jitter 2277 buffer has to be provisioned for as well. 2279 deint-buf-cap: 2280 This parameter signals the capabilities of a receiver 2281 implementation and indicates the amount of deinterleaving 2282 buffer space in units of bytes that the receiver has available 2283 for reconstructing the NAL unit decoding order. A receiver is 2284 able to handle any stream for which the value of the sprop- 2285 deint-buf-req parameter is smaller than or equal to this 2286 parameter. 2288 If the parameter is not present, then a value of 0 MUST be 2289 used for deint-buf-cap. The value of deint-buf-cap MUST be an 2290 integer in the range of 0 to 4294967295, inclusive. 2292 Informative note: deint-buf-cap indicates the maximum 2293 possible size of the deinterleaving buffer of the receiver 2294 only. When network jitter can occur, an appropriately 2295 sized jitter buffer has to be provisioned for as well. 2297 sprop-init-buf-time: 2298 This parameter MAY be used to signal the properties of an RTP 2299 packet stream. The parameter MUST NOT be present, if the 2300 value of packetization-mode is equal to 0 or 1. 2302 The parameter signals the initial buffering time that a 2303 receiver MUST wait before starting decoding to recover the NAL 2304 unit decoding order from the transmission order. The 2305 parameter is the maximum value of (decoding time of the NAL 2306 unit - transmission time of a NAL unit), assuming reliable and 2307 instantaneous transmission, the same timeline for transmission 2308 and decoding, and that decoding starts when the first packet 2309 arrives. 2311 An example of specifying the value of sprop-init-buf-time 2312 follows. A NAL unit stream is sent in the following 2313 interleaved order, in which the value corresponds to the 2314 decoding time and the transmission order is from left to 2315 right: 2317 0 2 1 3 5 4 6 8 7 ... 2319 Assuming a steady transmission rate of NAL units, the 2320 transmission times are: 2322 0 1 2 3 4 5 6 7 8 ... 2324 Subtracting the decoding time from the transmission time 2325 column-wise results in the following series: 2327 0 -1 1 0 -1 1 0 -1 1 ... 2329 Thus, in terms of intervals of NAL unit transmission times, 2330 the value of sprop-init-buf-time in this example is 1. The 2331 parameter is coded as a non-negative base10 integer 2332 representation in clock ticks of a 90-kHz clock. If the 2333 parameter is not present, then no initial buffering time value 2334 is defined. Otherwise the value of sprop-init-buf-time MUST 2335 be an integer in the range of 0 to 4294967295, inclusive. 2337 In addition to the signaled sprop-init-buf-time, receivers 2338 SHOULD take into account the transmission delay jitter 2339 buffering, including buffering for the delay jitter caused by 2340 mixers, translators, gateways, proxies, traffic-shapers, and 2341 other network elements. 2343 sprop-max-don-diff: 2344 This parameter MAY be used to signal the properties of an RTP 2345 packet stream. It MUST NOT be used to signal transmitter or 2346 receiver or codec capabilities. The parameter MUST NOT be 2347 present if the value of packetization-mode is equal to 0 or 1. 2348 sprop-max-don-diff is an integer in the range of 0 to 32767, 2349 inclusive. If sprop-max-don-diff is not present, the value of 2350 the parameter is unspecified. sprop-max-don-diff is 2351 calculated as follows: 2353 sprop-max-don-diff = max{AbsDON(i) - AbsDON(j)}, 2354 for any i and any j>i, 2356 where i and j indicate the index of the NAL unit in the 2357 transmission order and AbsDON denotes a decoding order number 2358 of the NAL unit that does not wrap around to 0 after 65535. 2359 In other words, AbsDON is calculated as follows: Let m and n 2360 be consecutive NAL units in transmission order. For the very 2361 first NAL unit in transmission order (whose index is 0), 2362 AbsDON(0) = DON(0). For other NAL units, AbsDON is calculated 2363 as follows: 2365 If DON(m) == DON(n), AbsDON(n) = AbsDON(m) 2366 If (DON(m) < DON(n) and DON(n) - DON(m) < 32768), 2367 AbsDON(n) = AbsDON(m) + DON(n) - DON(m) 2369 If (DON(m) > DON(n) and DON(m) - DON(n) >= 32768), 2370 AbsDON(n) = AbsDON(m) + 65536 - DON(m) + DON(n) 2372 If (DON(m) < DON(n) and DON(n) - DON(m) >= 32768), 2373 AbsDON(n) = AbsDON(m) - (DON(m) + 65536 - DON(n)) 2375 If (DON(m) > DON(n) and DON(m) - DON(n) < 32768), 2376 AbsDON(n) = AbsDON(m) - (DON(m) - DON(n)) 2378 where DON(i) is the decoding order number of the NAL unit 2379 having index i in the transmission order. The decoding order 2380 number is specified in section 5.5 of RFC 3984. 2382 Informative note: Receivers may use sprop-max-don-diff to 2383 trigger which NAL units in the receiver buffer can be 2384 passed to the decoder. 2386 max-rcmd-nalu-size: 2387 This parameter MAY be used to signal the capabilities of a 2388 receiver. The parameter MUST NOT be used for any other 2389 purposes. The value of the parameter indicates the largest 2390 NALU size in bytes that the receiver can handle efficiently. 2391 The parameter value is a recommendation, not a strict upper 2392 boundary. The sender MAY create larger NALUs but must be 2393 aware that the handling of these may come at a higher cost 2394 than NALUs conforming to the limitation. 2396 The value of max-rcmd-nalu-size MUST be an integer in the 2397 range of 0 to 4294967295, inclusive. If this parameter is not 2398 specified, no known limitation to the NALU size exists. 2399 Senders still have to consider the MTU size available between 2400 the sender and the receiver and SHOULD run MTU discovery for 2401 this purpose. 2403 This parameter is motivated by, for example, an IP to H.223 2404 video telephony gateway, where NALUs smaller than the H.223 2405 transport data unit will be more efficient. A gateway may 2406 terminate IP; thus, MTU discovery will normally not work 2407 beyond the gateway. 2409 Informative note: Setting this parameter to a lower than 2410 necessary value may have a negative impact. 2412 sar: This parameter MAY be used to indicate a receiver preferene 2413 or a stream property. When used to indicate a receiver 2414 preferene, the value of this parameter is an integer that 2415 indicates that the reciever supports all the sample aspect 2416 ratios (SAR) corresponding to H.264 aspect_ratio_idc values in 2417 the range from 1 to N, inclusive, where N is the value of this 2418 parameter (see Table E-1 of [1]), without geometric 2419 distortion. Therefore, the indicated range of SAR values is 2420 expressed as a preferene of the receiver. [Ed. (YkW): It is 2421 an open issue on the semantics of sar when it is used to 2422 indicate a stream property.] 2424 H.264 compliant encoders SHOULD choose to send an SAR which 2425 can be displayed by receivers of the encoded bitstream without 2426 geometric distortion. However, H.264 compliant encoders MAY 2427 choose to send pictures using any SAR. 2429 When rfc3984-compatible is equal to 1, sar MUST NOT be 2430 present. 2432 esar: This parameter MAY be used only to indicate receiver 2433 capabilities. The value of this parameter is 1 or 0. 1 2434 indicates that the receiver supports all sample aspect ratios 2435 which are expressible using the H.264 aspect_ratio_idc value 2436 of 255 (Extended_SAR, see Table E-1 of [1]), without geometric 2437 distortion. If the parameter esar does not exist, its value 2438 MUST be inferred to be equal to 0. 2440 The actual sample aspect ratio or extended sample aspect 2441 ratio, when present, of the stream is conveyed in the Video 2442 Usability Information (VUI) part of a sequence parameter set. 2443 In the case where esar is signalled, the sar parameter can be 2444 used as well in order to signal the maximum index from Table 2445 E-1 of [1] the receiver supports. 2447 When rfc3984-compatible is equal to 1, esar MUST NOT be 2448 present. 2450 rfc3984-compatible: This parameter MAY be used in capability 2451 exchange or session setup procedure. When the parameter is not 2452 present, its value MUST be inferred to be equal to 1. If the 2453 value is equal to 1, then the capability exchange or session 2454 setup procedure is backwards compatible to the procedure as 2455 specified in RFC 3984. [Ed. (YkW): Add in section 16 that in 2456 this case some ways of operations (i.e. level downgrade 2457 together with out-of-band transport of parameter sets, and 2458 out-of-band transport of parameter sets in video switching 2459 topologies) cannot be applied and some newly-defined media 2460 type parameters cannot be present. Check whether some newly- 2461 defined media type parameters may be present and receivers can 2462 ignore them. The list of such "some ways of operations" 2463 should take into account both offer/answer and declarative 2464 usages.] Otherwise (the value is equal to 0), the capability 2465 exchange or session setup procedure is not backwards 2466 compatible to the procedure as specified in RFC 3984. [Ed. 2467 (YkW): Add in section 16 that in this case at least one of 2468 some ways of operations (i.e. level downgrade together with 2469 out-of-band transport of parameter sets, and out-of-band 2470 transport of parameter sets in video switching topologies) is 2471 applied and the needed newly-defined media type parameters are 2472 present. This also implies that parameter-add is not present, 2473 and there is no restriction to sprop-parameter-sets in 2474 signaling responses. The list of such "some ways of 2475 operations" should take into account both offer/answer and 2476 declarative usages.] 2478 Encoding considerations: 2479 This type is only defined for transfer via RTP (RFC 3550). 2481 Security considerations: 2482 See section 9 of RFC xxxx. 2484 Public specification: 2485 Please refer to RFC xxxx and its section 15. 2487 Additional information: 2488 None 2490 File extensions: none 2492 Macintosh file type code: none 2494 Object identifier or OID: none 2496 Person & email address to contact for further information: 2497 Ye-Kui Wang, ye-kui.wang@nokia.com 2499 Intended usage: COMMON 2501 Author: 2502 Ye-Kui Wang, ye-kui.wang@nokia.com 2504 Change controller: 2505 IETF Audio/Video Transport working group delegated from the 2506 IESG. 2508 8.2. SDP Parameters 2510 8.2.1. Mapping of Payload Type Parameters to SDP 2512 The media type video/H264 string is mapped to fields in the Session 2513 Description Protocol (SDP) [6] as follows: 2515 o The media name in the "m=" line of SDP MUST be video. 2517 o The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the 2518 media subtype). 2520 o The clock rate in the "a=rtpmap" line MUST be 90000. 2522 o The OPTIONAL parameters "profile-level-id", "max-mbps", "max- 2523 smbps", "max-fs", "max-cpb", "max-dpb", "max-br", "redundant-pic- 2524 cap", "sprop-parameter-sets", "sprop-level-parameter-sets", 2525 "sprop-ssrc", "parameter-add", "packetization-mode", "sprop- 2526 interleaving-depth", "sprop-deint-buf-req", "deint-buf-cap", 2527 "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-size", 2528 "sar", "esar", and "rfc3984-compatible", when present, MUST be 2529 included in the "a=fmtp" line of SDP. These parameters are 2530 expressed as a media type string, in the form of a semicolon 2531 separated list of parameter=value pairs. 2533 An example of media representation in SDP is as follows (Baseline 2534 Profile, Level 3.0, some of the constraints of the Main profile may 2535 not be obeyed): 2537 m=video 49170 RTP/AVP 98 2538 a=rtpmap:98 H264/90000 2539 a=fmtp:98 profile-level-id=42A01E; 2540 packetization-mode=1; 2541 sprop-parameter-sets= 2543 8.2.2. Usage with the SDP Offer/Answer Model 2545 When H.264 is offered over RTP using SDP in an Offer/Answer model [8] 2546 for negotiation for unicast usage, the following limitations and 2547 rules apply: [Ed. (YkW): Add rule for "sar".] 2548 o The parameters identifying a media format configuration for H.264 2549 are "profile-level-id", "packetization-mode", "sprop-deint-buf- 2550 req", when present, and "rfc3984-compatible", when present. These 2551 media format configuration parameters (except for the level part 2552 of "profile-level-id") MUST be used symmetrically; i.e., the 2553 answerer MUST either maintain all configuration parameters or 2554 remove the media format (payload type) completely, if one or more 2555 of the parameter values are not supported. Note that the level 2556 part of "profile-level-id" includes level_idc, and, for indication 2557 of level 1b when profile_idc is equal to 66, 77 or 88, bit 4 of 2558 profile-iop. The level part of "profile-level-id" is 2559 downgradable, i.e. the answerer MUST maintain the same or a lower 2560 level or remove the media format (payload type) completely. 2562 Informative note: The requirement for symmetric use applies 2563 only for the above media format configuration parameters 2564 excluding the level part of "profile-level-id", and not for 2565 the other stream properties and capability parameters. 2567 Informative note: In H.264, all the levels except for level 1b 2568 are equal to the value of level_idc divided by 10. Level 1b 2569 is a level higher than level 1.0 but lower than level 1.1, and 2570 is signaled in an ad-hoc manner, due to that the level was 2571 specified after level 1.0 and level 1.1. For the Baseline, 2572 Main and Extended profiles (with profile_idc equal to 66, 77 2573 and 88, respectively), level 1b is indicated by level_idc 2574 equal to 11 (i.e. same as level 1.1) and constraint_set3_flag 2575 equal to 1. For other profiles, level 1b is indicated by 2576 level_idc equal to 9 (but note that level 1b for these 2577 profiles are still higher than level 1, which has level_idc 2578 equal to 10, and lower than level 1.1). In SDP offer/answer, 2579 an answer to an offer may indicate a level value equal or 2580 lower than the level value indicated in the offer. Due to the 2581 ad-hoc indication of level 1b, offerers and answerers must 2582 check the value of constraint_set3_flag, located in the middle 2583 octet of the parameter "profile-level-id", when profile_idc is 2584 equal to 66, 77 or 88 and level_idc is equal to 11. 2586 To simplify handling and matching of these configurations, the 2587 same RTP payload type number used in the offer SHOULD also be 2588 used in the answer, as specified in [8]. An answer MUST NOT 2589 contain a payload type number used in the offer unless the 2590 configuration ("profile-level-id", "packetization-mode", and, if 2591 present, "sprop-deint-buf-req") is the same as in the offer or 2592 the configuration only differs from that in the offer with a 2593 lower level indicated by "profile-level-id". 2595 Informative note: An offerer, when receiving the answer, has 2596 to compare payload types not declared in the offer based on 2597 media type (i.e., video/H264) and the above media format 2598 configuration parameters with any payload types it has already 2599 declared, in order to determine whether the configuration in 2600 question is new or equivalent to a configuration already 2601 offered. 2603 o The parameters "packetization-mode", and if present, "sprop-deint- 2604 buf-req", "sprop-interleaving-depth", "sprop-max-don-diff", 2605 "sprop-init-buf-time", and "sprop-ssrc", describe the properties 2606 of the RTP packet stream that the offerer or answerer is sending 2607 for this media format configuration. This differs from the normal 2608 usage of the Offer/Answer parameters: normally such parameters 2609 declare the properties of the stream that the offerer or the 2610 answerer is able to receive. When dealing with H.264, the offerer 2611 assumes that the answerer will be able to receive media encoded 2612 using the configuration being offered. 2614 Informative note: The above parameters apply for any stream 2615 sent by the declaring entity with the same configuration; 2616 i.e., they are dependent on their source. Rather than being 2617 bound to the payload type, the values may have to be applied 2618 to another payload type when being sent, as they apply for the 2619 configuration. 2621 o The capability parameters ("max-mbps", "max-smbps", "max-fs", 2622 "max-cpb", "max-dpb", "max-br", ,"redundant-pic-cap", "max-rcmd- 2623 nalu-size", "esar") MAY be used to declare further capabilities of 2624 the offerer or answerer for receiving. These parameters can only 2625 be present when the direction attribute is sendrecv or recvonly, 2626 and the parameters describe the limitations of what the offerer or 2627 answerer accepts for receiving streams. 2629 o As specified above, an offerer has to include the size of the 2630 deinterleaving buffer, "sprop-deint-buf-req", in the offer for an 2631 interleaved H.264 stream. To enable the offerer and answerer to 2632 inform each other about their capabilities for deinterleaving 2633 buffering in receiving streams, both parties are RECOMMENDED to 2634 include "deint-buf-cap". This information MAY be used when the 2635 value for "sprop-deint-buf-req" is selected in a second round of 2636 offer and answer. For interleaved streams, it is also RECOMMENDED 2637 to consider offering multiple payload types with different 2638 buffering requirements when the capabilities of the receiver are 2639 unknown. 2641 o The "sprop-parameter-sets" or "sprop-level-parameter-sets" 2642 parameter, when present, is used for out-of-band transport of 2643 parameter sets. However, in this case, parameter sets MAY still 2644 be additionally transported in-band. If neither "sprop-parameter- 2645 sets" nor "sprop-level-parameter-sets" is present, then only in- 2646 band transport of parameter sets is used. When accepting an 2647 offered payload type, the answerer MUST be prepared to use the 2648 parameter sets included in "sprop-parameter-sets" or "sprop-level- 2649 parameter-sets", when present for the offered payload in the 2650 offer, for decoding the incoming NAL unit stream. However, when 2651 level downgrade is in use, i.e., the answerer accepts a level 2652 lower than the level indicated by the offered profile-level-id, 2653 the answerer SHOULD discard the parameter sets with a level higher 2654 than the accepted level. The answerer MAY include "sprop- 2655 parameter-sets" or "sprop-level-parameter-sets" in the answer for 2656 an accepted payload type. The offerer MUST be prepared to use the 2657 parameter sets included in "sprop-parameter-sets" or "sprop-level- 2658 parameter-sets", when present for the accepted payload type in the 2659 answer, for decoding the incoming NAL unit stream. 2661 When the "sprop-parameter-sets" or "sprop-level-parameter-sets" 2662 is present and the "sprop-ssrc" is present, the receiver of the 2663 parameters MUST store the parameter sets included in the "sprop- 2664 parameter-sets" or "sprop-level-parameter-sets" and associate 2665 them to "sprop-ssrc". These parameters MUST NOT be used to 2666 decode NAL units conveyed in packets with SSRC not equal to the 2667 associated "sprop-ssrc". 2669 For streams being delivered over multicast, the following rules 2670 apply: [Ed. (YkW): Add rules for "deint-buf-cap" and "sar". If 2671 "deint-buf-cap" MUST not be used in offer/answer for multicast, say 2672 it. With the latest change, the rule for "deint-buf-cap" is the same 2673 as for unicast above.] 2675 o The media format configuration is identified by the same 2676 parameters as above for unicast (i.e. "profile-level-id", 2677 "packetization-mode", "sprop-deint-buf-req", when present, and 2678 "rfc3984-compatible", when present). These media format 2679 configuration parameters (including the level part of "profile- 2680 level-id", i.e. the level part of "profile-level-id" is not 2681 downgradable for offer/answer in multicast) MUST be used 2682 symmetrically; i.e., the answerer MUST either maintain all 2683 configuration parameters or remove the media format (payload type) 2684 completely. 2686 To simplify handling and matching of these configurations, the 2687 same RTP payload type number used in the offer SHOULD also be 2688 used in the answer, as specified in [8]. An answer MUST NOT 2689 contain a payload type number used in the offer unless the 2690 configuration ("profile-level-id", "packetization-mode", "sprop- 2691 deint-buf-req", when present, and "rfc3984-compatible") is the 2692 same as in the offer. 2694 o The rules for other parameters are the same as above for unicast. 2696 Below are the complete lists of how the different parameters shall be 2697 interpreted in the different combinations of offer or answer and 2698 direction attribute. 2700 o In offers and answers for which "a=sendrecv" or no direction 2701 attribute is used, the following interpretation of the parameters 2702 MUST be used. 2704 Declaring actual configuration and properties for sending and 2705 receiving streams: 2707 - profile-level-id 2708 - packetization-mode 2709 - rfc3984-compatible 2711 Declaring actual properties of the stream to be sent: 2713 - sprop-deint-buf-req 2714 - sprop-interleaving-depth 2715 - sprop-max-don-diff 2716 - sprop-init-buf-time 2717 - sprop-ssrc 2719 Declaring receiver capabilities: 2721 - max-mbps 2722 - max-smbps 2723 - max-fs 2724 - max-cpb 2725 - max-dpb 2726 - max-br 2727 - redundant-pic-cap 2728 - deint-buf-cap 2729 - max-rcmd-nalu-size 2730 - esar 2732 Declaring how Offer/Answer negotiation shall be performed: 2734 - parameter-add 2736 Declaring receiver preferences: 2738 - sar 2740 Informative note: The interpretation of the optional parameter 2741 "sar" depends on the direction attribute. When "a=sendonly", 2742 it indicates the range of sample aspect ratios the stream will 2743 contain. [Ed. (YkW): This is problematic as it is not common 2744 to use multiple SARs in one stream. Better to remove this 2745 informative note, as the use of each parameter for each 2746 direction attribute is specified in the section.] When 2747 "a=sendrecv" or "a=recvonly", the value of this parameter 2748 indicates the range of sample ratios that the receiver is able 2749 to display without geometric (shape) distortion. The receiver 2750 shall still display pictures encoded at other sample aspect 2751 ratios (though perhaps with geometric distortion). 2753 Out-of-band transporting of parameter sets: 2755 - sprop-parameter-sets 2756 - sprop-level-parameter-sets 2758 o In offers and answers for which "a=recvonly" is used, the 2759 following interpretation of the parameters MUST be used. 2761 Declaring actual configuration and properties for receiving 2762 streams: 2764 - profile-level-id 2765 - packetization-mode 2766 - rfc3984-compatible 2768 Declaring receiver capabilities: 2770 - max-mbps 2771 - max-smbps 2772 - max-fs 2773 - max-cpb 2774 - max-dpb 2775 - max-br 2776 - redundant-pic-cap 2777 - deint-buf-cap 2778 - max-rcmd-nalu-size 2779 - esar 2781 Declaring receiver preferences: 2783 - sar 2785 Not usable (when present, they SHOULD be ignored): 2787 - sprop-deint-buf-req 2788 - sprop-interleaving-depth 2789 - sprop-parameter-sets 2790 - sprop-level-parameter-sets 2791 - sprop-max-don-diff 2792 - sprop-init-buf-time 2793 - sprop-ssrc 2794 - parameter-add 2796 o In offers or answers for which "a=sendonly" is used, the following 2797 interpretation of the parameters MUST be used. 2799 Declaring actual configuration and properties for sending 2800 streams: 2802 - profile-level-id 2803 - packetization-mode 2804 - rfc3984-compatible 2805 - sprop-deint-buf-req 2806 - sprop-max-don-diff 2807 - sprop-init-buf-time 2808 - sprop-interleaving-depth 2809 - sprop-ssrc 2811 Out-of-band transporting of parameter sets: 2813 - sprop-parameter-sets 2814 - sprop-level-parameter-sets 2816 Not usable(when present, they SHOULD be ignored): 2818 - max-mbps 2819 - max-smbps 2820 - max-fs 2821 - max-cpb 2822 - max-dpb 2823 - max-br 2824 - redundant-pic-cap 2825 - deint-buf-cap 2826 - max-rcmd-nalu-size 2827 - parameter-add 2828 - sar 2829 - esar 2831 Furthermore, the following considerations are necessary: 2833 o Parameters used for declaring receiver capabilities are in general 2834 downgradable; i.e., they express the upper limit for a sender's 2835 possible behavior. Thus a sender MAY select to set its encoder 2836 using only lower/less or equal values of these parameters. 2837 "sprop-parameter-sets" MUST NOT be used in a sender's declaration 2838 of its receiving capabilities, as the limits of the values that 2839 are carried inside the parameter sets are implicit with the 2840 profile and level used. 2842 o Parameters declaring a configuration point are not downgradable, 2843 with the exception of the level part of the "profile-level-id" 2844 parameter for unicast usage. This expresses values a receiver 2845 expects to be used and must be used verbatim on the sender side. 2847 o When a sender's capabilities are declared, and non-downgradable 2848 parameters are used in this declaration, then these parameters 2849 express a configuration that is acceptable for the sender to 2850 receive streams. In order to achieve high interoperability 2851 levels, it is often advisable to offer multiple alternative 2852 configurations; e.g., for the packetization mode. It is 2853 impossible to offer multiple configurations in a single payload 2854 type. Thus, when multiple configuration offers are made, each 2855 offer requires its own RTP payload type associated with the offer. 2857 o A receiver SHOULD understand all media type parameters, even if it 2858 only supports a subset of the payload format's functionality. 2859 This ensures that a receiver is capable of understanding when an 2860 offer to receive media can be downgraded to what is supported by 2861 the receiver of the offer. 2863 o An answerer MAY extend the offer with additional media format 2864 configurations. However, to enable their usage, in most cases a 2865 second offer is required from the offerer to provide the stream 2866 properties parameters that the media sender will use. This also 2867 has the effect that the offerer has to be able to receive this 2868 media format configuration, not only to send it. 2870 o If an offerer wishes to have non-symmetric capabilities between 2871 sending and receiving, the offerer should offer different RTP 2872 sessions; i.e., different media lines declared as "recvonly" and 2873 "sendonly", respectively. This may have further implications on 2874 the system. 2876 8.2.3. Usage in Declarative Session Descriptions 2878 When H.264 over RTP is offered with SDP in a declarative style, as in 2879 RTSP [26] or SAP [27], the following considerations are necessary. 2881 o All parameters capable of indicating both an RTP packet stream's 2882 properties and a receiver's capabilities are used to indicate only 2883 the properties of an RTP packet stream. For example, in this 2884 case, the parameter "profile-level-id" declares only the values 2885 used by the stream, not the capabilities for receiving streams. 2886 This results in that the following interpretation of the 2887 parameters MUST be used: 2889 Declaring actual configuration or stream properties: 2891 - profile-level-id 2892 - packetization-mode 2893 - rfc3984-compatible 2894 - sprop-parameter-sets 2895 - sprop-level-parameter-sets 2896 - sprop-interleaving-depth 2897 - sprop-deint-buf-req 2898 - sprop-max-don-diff 2899 - sprop-init-buf-time 2900 - sprop-ssrc 2902 Not usable(when present, they SHOULD be ignored): 2904 - max-mbps 2905 - max-smbps 2906 - max-fs 2907 - max-cpb 2908 - max-dpb 2909 - max-br 2910 - redundant-pic-cap 2911 - max-rcmd-nalu-size 2912 - parameter-add 2913 - deint-buf-cap 2914 - sar 2915 - esar 2917 o A receiver of the SDP is required to support all parameters and 2918 values of the parameters provided; otherwise, the receiver MUST 2919 reject (RTSP) or not participate in (SAP) the session. It falls 2920 on the creator of the session to use values that are expected to 2921 be supported by the receiving application. 2923 8.3. Examples 2925 An SDP Offer/Answer exchange wherein both parties are expected to 2926 both send and receive could look like the following. Only the media 2927 codec specific parts of the SDP are shown. Some lines are wrapped 2928 due to text constraints. 2930 Offerer -> Answerer SDP message: 2932 m=video 49170 RTP/AVP 100 99 98 2933 a=rtpmap:98 H264/90000 2934 a=fmtp:98 profile-level-id=42A01E; packetization-mode=0; 2935 sprop-parameter-sets= 2936 a=rtpmap:99 H264/90000 2937 a=fmtp:99 profile-level-id=42A01E; packetization-mode=1; 2938 sprop-parameter-sets= 2939 a=rtpmap:100 H264/90000 2940 a=fmtp:100 profile-level-id=42A01E; packetization-mode=2; 2941 sprop-parameter-sets=; 2942 sprop-interleaving-depth=45; sprop-deint-buf-req=64000; 2943 sprop-init-buf-time=102478; deint-buf-cap=128000 2945 The above offer presents the same codec configuration in three 2946 different packetization formats. PT 98 represents single NALU mode, 2947 PT 99 represents non-interleaved mode, and PT 100 indicates the 2948 interleaved mode. In the interleaved mode case, the interleaving 2949 parameters that the offerer would use if the answer indicates support 2950 for PT 100 are also included. In all three cases the parameter 2951 "sprop-parameter-sets" conveys the initial parameter sets that are 2952 required by the answerer when receiving a stream from the offerer 2953 when this configuration (profile-level-id and packetization mode) is 2954 accepted. Note that the value for "sprop-parameter-sets" could be 2955 different for each payload type. 2957 Answerer -> Offerer SDP message: 2959 m=video 49170 RTP/AVP 100 99 97 2960 a=rtpmap:97 H264/90000 2961 a=fmtp:97 profile-level-id=42A01E; packetization-mode=0; 2962 sprop-parameter-sets= 2963 a=rtpmap:99 H264/90000 2964 a=fmtp:99 profile-level-id=42A01E; packetization-mode=1; 2965 sprop-parameter-sets=; 2966 max-rcmd-nalu-size=3980 2967 a=rtpmap:100 H264/90000 2968 a=fmtp:100 profile-level-id=42A01E; packetization-mode=2; 2969 sprop-parameter-sets=; 2970 sprop-interleaving-depth=60; 2971 sprop-deint-buf-req=86000; sprop-init-buf-time=156320; 2972 deint-buf-cap=128000; max-rcmd-nalu-size=3980 2974 As the Offer/Answer negotiation covers both sending and receiving 2975 streams, an offer indicates the exact parameters for what the offerer 2976 is willing to receive, whereas the answer indicates the same for what 2977 the answerer accepts to receive. In this case the offerer declared 2978 that it is willing to receive payload type 98. The answerer accepts 2979 this by declaring an equivalent payload type 97; i.e., it has 2980 identical values for the two parameters "profile-level-id" and 2981 "packetization-mode" (since "packetization-mode" is equal to 0, 2982 "sprop-deint-buf-req" is not present). As the offered payload type 2983 98 is accepted, the answerer needs to store parameter sets included 2984 in sprop-parameter-sets= in case the offer finally 2985 decides to use this configuration. In the answer, the answerer 2986 includes the parameter sets in sprop-parameter-sets= 2987 that the answerer would use in the stream sent from the answerer if 2988 this configuration is finally used. 2990 The answerer also accepts the reception of the two configurations 2991 that payload types 99 and 100 represent. Again, the answerer needs 2992 to store parameter sets included in sprop-parameter-sets= and sprop-parameter-sets= in case the offer 2994 finally decides to use either of these two configurations. The 2995 answerer provides the initial parameter sets for the answerer-to- 2996 offerer direction, i.e. the parameter sets in sprop-parameter- 2997 sets= and sprop-parameter-sets=, for 2998 payload types 99 and 100, respectively, that it will use to send the 2999 payload types. The answerer also provides the offerer with its 3000 memory limit for deinterleaving operations by providing a "deint-buf- 3001 cap" parameter. This is only useful if the offerer decides on making 3002 a second offer, where it can take the new value into account. The 3003 "max-rcmd-nalu-size" indicates that the answerer can efficiently 3004 process NALUs up to the size of 3980 bytes. However, there is no 3005 guarantee that the network supports this size. 3007 Below are some more examples. 3009 In the following example, the original offer is accepted without 3010 downgrading the level part of "profile-level-id", and "sprop- 3011 parameter-sets" is present (i.e. there is out-of-band transmission of 3012 parameter sets). This case needs only one round of offer/answer. 3013 Note that sprop-parameter-sets= is basically 3014 independent of sprop-parameter-sets=. The former is 3015 to be used by the encoder of the offerer and the decoder of the 3016 answerer. The latter is to be used by the encoder of the answerer 3017 and the decoder of the offerer. Note that requiring them (including 3018 both sequence parameter sets and picture parameter sets) to be 3019 identical or the former to be a subset of the latter would seriously 3020 limit independent encoding optimizations by the independent encoders. 3022 Offer SDP: 3024 m=video 49170 RTP/AVP 98 3025 a=rtpmap:98 H264/90000 3026 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3027 packetization-mode=1; 3028 sprop-parameter-sets= 3030 Answer SDP: 3032 m=video 49170 RTP/AVP 98 3033 a=rtpmap:98 H264/90000 3034 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3035 packetization-mode=1; 3036 sprop-parameter-sets= 3038 In the following example, the original offer is also accepted without 3039 downgrading the level part of "profile-level-id", but "sprop- 3040 parameter-sets" is not present, meaning that there is no out-of-band 3041 transmission of parameter sets, which then have to be transmitted in- 3042 band. This case also needs only one round of offer/answer. 3044 Offer SDP: 3046 m=video 49170 RTP/AVP 98 3047 a=rtpmap:98 H264/90000 3048 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3049 packetization-mode=1 3051 Answer SDP: 3053 m=video 49170 RTP/AVP 98 3054 a=rtpmap:98 H264/90000 3055 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3056 packetization-mode=1 3058 In the following example, the original offer is accepted with 3059 downgrading the level part of "profile-level-id", and "sprop- 3060 parameter-sets" is present (i.e. there is out-of-band transmission of 3061 parameter sets). This case needs at least two rounds of offer/answer 3062 unless the offer includes multiple payload types for different 3063 levels, which makes the SDP message larger. The reason for the need 3064 of more than one offer/answer round is because the "sprop-parameter- 3065 sets" in the original offer is not applicable to any level lower than 3066 the one indicated by the level part of "profile-level-id". 3068 Note that sprop-parameter-sets= contains level_idc 3069 indicating Level 3.0, therefore cannot be used as the answerer wants 3070 Level 2.0 and does not need to be stored by the answerer. The sprop- 3071 parameter-sets= is to be used by the encoder of the 3072 offerer and the decoder of the answerer. The sprop-parameter- 3073 sets= is to be used by the encoder of the answerer 3074 and the decoder of the offerer. 3076 Offer SDP: 3078 m=video 49170 RTP/AVP 98 3079 a=rtpmap:98 H264/90000 3080 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3081 packetization-mode=1; 3082 sprop-parameter-sets= 3084 Answer SDP: 3086 m=video 49170 RTP/AVP 98 3087 a=rtpmap:98 H264/90000 3088 a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0 3089 packetization-mode=1; 3090 sprop-parameter-sets= 3092 Offer SDP: 3094 m=video 49170 RTP/AVP 98 3095 a=rtpmap:98 H264/90000 3096 a=fmtp:97 profile-level-id=42A014; //Baseline profile, Level 2.0 3097 packetization-mode=1; 3098 sprop-parameter-sets= 3100 Answer SDP: //This SDP is identical to previous answer SDP 3102 m=video 49170 RTP/AVP 98 3103 a=rtpmap:98 H264/90000 3104 a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0 3105 packetization-mode=1; 3106 sprop-parameter-sets= 3108 In the following example, the original offer is also accepted with 3109 downgrading the level part of "profile-level-id", but "sprop- 3110 parameter-sets" is not present, meaning that there is no out-of-band 3111 transmission of parameter sets, which then have to be transmitted in- 3112 band. This case also needs only one round of offer/answer. 3114 Offer SDP: 3116 m=video 49170 RTP/AVP 98 3117 a=rtpmap:98 H264/90000 3118 a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0 3119 packetization-mode=1 3121 Answer SDP: 3123 m=video 49170 RTP/AVP 98 3124 a=rtpmap:98 H264/90000 3125 a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0 3126 packetization-mode=1 3128 8.4. Parameter Set Considerations 3130 The H.264 parameter sets are a fundamental part of the video codec 3131 and vital to its operation; see section 1.2. Due to their 3132 characteristics and their importance for the decoding process, lost 3133 or erroneously transmitted parameter sets can hardly be concealed 3134 locally at the receiver. A reference to a corrupt parameter set has 3135 normally fatal results to the decoding process. Corruption could 3136 occur, for example, due to the erroneous transmission or loss of a 3137 parameter set data structure, but also due to the untimely 3138 transmission of a parameter set update. Therefore, the following 3139 recommendations are provided as a guideline for the implementer of 3140 the RTP sender. 3142 Parameter set NALUs can be transported using three different 3143 principles: 3145 A. Using a session control protocol (out-of-band) prior to the actual 3146 RTP session. 3148 B. Using a session control protocol (out-of-band) during an ongoing 3149 RTP session. 3151 C. Within the RTP packet stream in the payload (in-band) during an 3152 ongoing RTP session. 3154 It is recommended to implement principles A and B within a session 3155 control protocol. SIP and SDP can be used as described in the SDP 3156 Offer/Answer model and in the previous sections of this memo. This 3157 section contains guidelines on how principles A and B should be 3158 implemented within session control protocols. It is independent of 3159 the particular protocol used. Principle C is supported by the RTP 3160 payload format defined in this specification. There are topologies 3161 like Topo-Video-switch-MCU [28] that require the use of principle C. 3163 If in-band signaling of parameter sets is used, the picture and 3164 sequence parameter set NALUs SHOULD be transmitted in the RTP payload 3165 using a reliable method of delivering of RTP, as a loss of a 3166 parameter set of either type will likely prevent decoding of a 3167 considerable portion of the corresponding RTP packet stream. 3169 If in-band signaling of parameter sets is used, the sender SHOULD 3170 take the error characteristics into account and use mechanisms to 3171 provide a high probability for delivering the parameter sets 3172 correctly. Mechanisms that increase the probability for a correct 3173 reception include packet repetition, FEC, and retransmission. The 3174 use of an unreliable, out-of-band control protocol has similar 3175 disadvantages as the in-band signaling (possible loss) and, in 3176 addition, may also lead to difficulties in the synchronization (see 3177 below). Therefore, it is NOT RECOMMENDED. 3179 Parameter sets MAY be added or updated during the lifetime of a 3180 session using principles B and C. It is required that parameter sets 3181 are present at the decoder prior to the NAL units that refer to them. 3182 Updating or adding of parameter sets can result in further problems, 3183 and therefore the following recommendations should be considered. 3185 - When parameter sets are added or updated, care SHOULD be taken to 3186 ensure that any parameter set is delivered prior to its usage. It 3187 is common that no synchronization is present between out-of-band 3188 signaling and in-band traffic. If out-of-band signaling is used, 3189 it is RECOMMENDED that a sender does not start sending NALUs 3190 requiring the updated parameter sets prior to acknowledgement of 3191 delivery from the signaling protocol. 3193 - When parameter sets are updated, the following synchronization 3194 issue should be taken into account. When overwriting a parameter 3195 set at the receiver, the sender has to ensure that the parameter 3196 set in question is not needed by any NALU present in the network 3197 or receiver buffers. Otherwise, decoding with a wrong parameter 3198 set may occur. To lessen this problem, it is RECOMMENDED either 3199 to overwrite only those parameter sets that have not been used for 3200 a sufficiently long time (to ensure that all related NALUs have 3201 been consumed), or to add a new parameter set instead (which may 3202 have negative consequences for the efficiency of the video 3203 coding). 3205 Informative note: In some topologies like Topo-Video-switch- 3206 MCU [28] the origin of the whole set of parameter sets may 3207 come from multiple sources that may use non-unique parameter 3208 sets numbers. In this case an offer may overwrite an existing 3209 parameter set if no other mechanism that enables uniqueness of 3210 the parameter sets in the out-of-band channel exists. [Ed. 3211 (YkW): Would there be a better place for the text in this 3212 informative note?] 3214 - When new parameter sets are added, previously unused parameter set 3215 identifiers are used. This avoids the problem identified in the 3216 previous paragraph. However, in a multiparty session, unless a 3217 synchronized control protocol is used, there is a risk that 3218 multiple entities try to add different parameter sets for the same 3219 identifier, which has to be avoided. 3221 - Adding or modifying parameter sets by using both principles B and 3222 C in the same RTP session may lead to inconsistencies of the 3223 parameter sets because of the lack of synchronization between the 3224 control and the RTP channel. Therefore, principles B and C MUST 3225 NOT both be used in the same session unless sufficient 3226 synchronization can be provided. 3228 In some scenarios (e.g., when only the subset of this payload format 3229 specification corresponding to H.241 is used) or topologies, it is 3230 not possible to employ out-of-band parameter set transmission. In 3231 this case, parameter sets have to be transmitted in-band. Here, the 3232 synchronization with the non-parameter-set-data in the bitstream is 3233 implicit, but the possibility of a loss has to be taken into account. 3234 The loss probability should be reduced using the mechanisms discussed 3235 above. In case a loss of a parameter set is detected, recovery may 3236 be achieved by using a Decoder Refresh Point procedure, for example, 3237 using RTCP feedback Full Intra Request (FIR) [29]. Two example 3238 Decoder Refresh Point procedures are provided in the informative 3239 Section 8.5. 3241 - When parameter sets are initially provided using principle A and 3242 then later added or updated in-band (principle C), there is a risk 3243 associated with updating the parameter sets delivered out-of-band. 3244 If receivers miss some in-band updates (for example, because of a 3245 loss or a late tune-in), those receivers attempt to decode the 3246 bitstream using out-dated parameters. It is therefore RECOMMENDED 3247 that parameter set IDs be partitioned between the out-of-band and 3248 in-band parameter sets. 3250 8.5. Decoder Refresh Point Procedure using In-Band Transport of 3251 Parameter Sets (Informative) 3253 When a video encoder according to ITU-T Rec. H.264 receives a request 3254 for a decoder refresh point, the encoder shall enter the fast update 3255 mode by using one of the procedures specified in Section 8.5.1 or 3256 8.5.2 below. The procedure in 8.5.1 is the preferred response in a 3257 lossless transmission environment. Both procedures satisfy the 3258 requirement to enter the fast update mode for H.264 video encoding. 3260 8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point 3262 This section gives one possible way to respond to a request for a 3263 decoder refresh point. 3265 The encoder shall, in the order presented here: 3267 1) Immediately prepare to send an IDR picture. 3269 2) Send a sequence parameter set to be used by the IDR picture to be 3270 sent. The encoder may optionally also send other sequence 3271 parameter sets. 3273 3) Send a picture parameter set to be used by the IDR picture to be 3274 sent. The encoder may optionally also send other picture parameter 3275 sets. 3277 4) Send the IDR picture. 3279 5) From this point forward in time, send or re-send any other 3280 sequence or picture parameter sets, not sent in this procedure, 3281 prior to their reference by any NAL unit, regardless of whether 3282 such parameter sets were previously sent prior to receiving the 3283 request for a decoder refresh point. As needed, such parameter 3284 sets may be sent in a batch, one at a time, or in any combination 3285 of these two methods. Parameter sets may be re-sent at any time 3286 for redundancy. Caution should be taken when parameter set 3287 updates are present, as described above in Section 8.4. 3289 8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder 3290 Refresh Point 3292 This section gives another possible way to respond to a request for a 3293 decoder refresh point. 3295 The encoder shall, in the order presented here: 3297 1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of 3298 [1]). 3300 2) Repeat any sequence and picture parameter sets that were sent 3301 before the recovery point SEI message, prior to their reference by 3302 a NAL unit. 3304 The encoder shall ensure that the decoder has access to all reference 3305 pictures for inter prediction of pictures at or after the recovery 3306 point, which is indicated by the recovery point SEI message, in 3307 output order, assuming that the transmission from now on is error- 3308 free. For example, the encoder may mark all reference pictures as 3309 "unused for reference" by issuing a 3310 memory_management_control_operation equal to 5 (see Section 8.2.5 of 3311 [1]). 3313 The value of the recovery_frame_cnt syntax element in the recovery 3314 point SEI message shall be such that the time between the reception 3315 of the request for a decoder refresh point and completing the 3316 transmission of the access unit including the recovery point is less 3317 than or equal to 3 seconds. 3319 As needed, such parameter sets may be re-sent in a batch, one at a 3320 time, or in any combination of these two methods. Parameter sets may 3321 be re-sent at any time for redundancy. Caution should be taken when 3322 parameter set updates are present, as described above in Section 8.4. 3324 9. Security Considerations 3326 RTP packets using the payload format defined in this specification 3327 are subject to the security considerations discussed in the RTP 3328 specification [5], and in any appropriate RTP profile (for example, 3329 [15]). This implies that confidentiality of the media streams is 3330 achieved by encryption; for example, through the application of SRTP 3331 [25]. Because the data compression used with this payload format is 3332 applied end-to-end, any encryption needs to be performed after 3333 compression. A potential denial-of-service threat exists for data 3334 encodings using compression techniques that have non-uniform 3335 receiver-end computational load. The attacker can inject 3336 pathological datagrams into the stream that are complex to decode and 3337 that cause the receiver to be overloaded. H.264 is particularly 3338 vulnerable to such attacks, as it is extremely simple to generate 3339 datagrams containing NAL units that affect the decoding process of 3340 many future NAL units. Therefore, the usage of data origin 3341 authentication and data integrity protection of at least the RTP 3342 packet is RECOMMENDED; for example, with SRTP [25]. 3344 Note that the appropriate mechanism to ensure confidentiality and 3345 integrity of RTP packets and their payloads is very dependent on the 3346 application and on the transport and signaling protocols employed. 3347 Thus, although SRTP is given as an example above, other possible 3348 choices exist. 3350 Decoders MUST exercise caution with respect to the handling of user 3351 data SEI messages, particularly if they contain active elements, and 3352 MUST restrict their domain of applicability to the presentation 3353 containing the stream. 3355 End-to-End security with either authentication, integrity or 3356 confidentiality protection will prevent a MANE from performing media- 3357 aware operations other than discarding complete packets. And in the 3358 case of confidentiality protection it will even be prevented from 3359 performing discarding of packets in a media aware way. To allow any 3360 MANE to perform its operations, it will be required to be a trusted 3361 entity which is included in the security context establishment. 3363 10. Congestion Control 3365 Congestion control for RTP SHALL be used in accordance with RFC 3550 3366 [5], and with any applicable RTP profile; e.g., RFC 3551 [15]. An 3367 additional requirement if best-effort service is being used is: users 3368 of this payload format MUST monitor packet loss to ensure that the 3369 packet loss rate is within acceptable parameters. Packet loss is 3370 considered acceptable if a TCP flow across the same network path, and 3371 experiencing the same network conditions, would achieve an average 3372 throughput, measured on a reasonable timescale, that is not less than 3373 the RTP flow is achieving. This condition can be satisfied by 3374 implementing congestion control mechanisms to adapt the transmission 3375 rate (or the number of layers subscribed for a layered multicast 3376 session), or by arranging for a receiver to leave the session if the 3377 loss rate is unacceptably high. 3379 The bit rate adaptation necessary for obeying the congestion control 3380 principle is easily achievable when real-time encoding is used. 3381 However, when pre-encoded content is being transmitted, bandwidth 3382 adaptation requires the availability of more than one coded 3383 representation of the same content, at different bit rates, or the 3384 existence of non-reference pictures or sub-sequences [21] in the 3385 bitstream. The switching between the different representations can 3386 normally be performed in the same RTP session; e.g., by employing a 3387 concept known as SI/SP slices of the Extended Profile, or by 3388 switching streams at IDR picture boundaries. Only when non- 3389 downgradable parameters (such as the profile part of the 3390 profile/level ID) are required to be changed does it become necessary 3391 to terminate and re-start the media stream. This may be accomplished 3392 by using a different RTP payload type. 3394 MANEs MAY follow the suggestions outlined in section 7.3 and remove 3395 certain unusable packets from the packet stream when that stream was 3396 damaged due to previous packet losses. This can help reduce the 3397 network load in certain special cases. 3399 11. IANA Consideration 3401 IANA has registered one new media type; see section 8.1. 3403 12. Informative Appendix: Application Examples 3405 This payload specification is very flexible in its use, in order to 3406 cover the extremely wide application space anticipated for H.264. 3407 However, this great flexibility also makes it difficult for an 3408 implementer to decide on a reasonable packetization scheme. Some 3409 information on how to apply this specification to real-world 3410 scenarios is likely to appear in the form of academic publications 3411 and a test model software and description in the near future. 3412 However, some preliminary usage scenarios are described here as well. 3414 12.1. Video Telephony according to ITU-T Recommendation H.241 Annex A 3416 H.323-based video telephony systems that use H.264 as an optional 3417 video compression scheme are required to support H.241 Annex A [3] as 3418 a packetization scheme. The packetization mechanism defined in this 3419 Annex is technically identical with a small subset of this 3420 specification. 3422 When a system operates according to H.241 Annex A, parameter set NAL 3423 units are sent in-band. Only Single NAL unit packets are used. Many 3424 such systems are not sending IDR pictures regularly, but only when 3425 required by user interaction or by control protocol means; e.g., when 3426 switching between video channels in a Multipoint Control Unit or for 3427 error recovery requested by feedback. 3429 12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit 3430 Aggregation 3432 The RTP part of this scheme is implemented and tested (though not the 3433 control-protocol part; see below). 3435 In most real-world video telephony applications, picture parameters 3436 such as picture size or optional modes never change during the 3437 lifetime of a connection. Therefore, all necessary parameter sets 3438 (usually only one) are sent as a side effect of the capability 3439 exchange/announcement process, e.g., according to the SDP syntax 3440 specified in section 8.2 of this document. As all necessary 3441 parameter set information is established before the RTP session 3442 starts, there is no need for sending any parameter set NAL units. 3443 Slice data partitioning is not used, either. Thus, the RTP packet 3444 stream basically consists of NAL units that carry single coded 3445 slices. 3447 The encoder chooses the size of coded slice NAL units so that they 3448 offer the best performance. Often, this is done by adapting the 3449 coded slice size to the MTU size of the IP network. For small 3450 picture sizes, this may result in a one-picture-per-one-packet 3451 strategy. Intra refresh algorithms clean up the loss of packets and 3452 the resulting drift-related artifacts. 3454 12.3. Video Telephony, Interleaved Packetization Using NAL Unit 3455 Aggregation 3457 This scheme allows better error concealment and is used in H.263 3458 based designs using RFC 2429 packetization [10]. It has been 3459 implemented, and good results were reported [12]. 3461 The VCL encoder codes the source picture so that all macroblocks 3462 (MBs) of one MB line are assigned to one slice. All slices with even 3463 MB row addresses are combined into one STAP, and all slices with odd 3464 MB row addresses into another. Those STAPs are transmitted as RTP 3465 packets. The establishment of the parameter sets is performed as 3466 discussed above. 3468 Note that the use of STAPs is essential here, as the high number of 3469 individual slices (18 for a CIF picture) would lead to unacceptably 3470 high IP/UDP/RTP header overhead (unless the source coding tool FMO is 3471 used, which is not assumed in this scenario). Furthermore, some 3472 wireless video transmission systems, such as H.324M and the IP-based 3473 video telephony specified in 3GPP, are likely to use relatively small 3474 transport packet size. For example, a typical MTU size of H.223 AL3 3475 SDU is around 100 bytes [16]. Coding individual slices according to 3476 this packetization scheme provides further advantage in communication 3477 between wired and wireless networks, as individual slices are likely 3478 to be smaller than the preferred maximum packet size of wireless 3479 systems. Consequently, a gateway can convert the STAPs used in a 3480 wired network into several RTP packets with only one NAL unit, which 3481 are preferred in a wireless network, and vice versa. 3483 12.4. Video Telephony with Data Partitioning 3485 This scheme has been implemented and has been shown to offer good 3486 performance, especially at higher packet loss rates [12]. 3488 Data Partitioning is known to be useful only when some form of 3489 unequal error protection is available. Normally, in single-session 3490 RTP environments, even error characteristics are assumed; i.e., the 3491 packet loss probability of all packets of the session is the same 3492 statistically. However, there are means to reduce the packet loss 3493 probability of individual packets in an RTP session. A FEC packet 3494 according to RFC 2733 [17], for example, specifies which media 3495 packets are associated with the FEC packet. 3497 In all cases, the incurred overhead is substantial but is in the same 3498 order of magnitude as the number of bits that have otherwise been 3499 spent for intra information. However, this mechanism does not add 3500 any delay to the system. 3502 Again, the complete parameter set establishment is performed through 3503 control protocol means. 3505 12.5. Video Telephony or Streaming with FUs and Forward Error Correction 3507 This scheme has been implemented and has been shown to provide good 3508 performance, especially at higher packet loss rates [18]. 3510 The most efficient means to combat packet losses for scenarios where 3511 retransmissions are not applicable is forward error correction (FEC). 3512 Although application layer, end-to-end use of FEC is often less 3513 efficient than an FEC-based protection of individual links 3514 (especially when links of different characteristics are in the 3515 transmission path), application layer, end-to-end FEC is unavoidable 3516 in some scenarios. RFC 2733 [17] provides means to use generic, 3517 application layer, end-to-end FEC in packet-loss environments. A 3518 binary forward error correcting code is generated by applying the XOR 3519 operation to the bits at the same bit position in different packets. 3520 The binary code can be specified by the parameters (n,k) in which k 3521 is the number of information packets used in the connection and n is 3522 the total number of packets generated for k information packets; 3523 i.e., n-k parity packets are generated for k information packets. 3524 [Ed. (YkW): from Randell: References to RFC 2733 should be updated to 3525 (and checked against) RFC 5109. There are a lot of calculations and 3526 the like that should be checked. Also update [17] to RFC 5109. ] 3528 When a code is used with parameters (n,k) within the RFC 2733 3529 framework, the following properties are well known: 3531 a) If applied over one RTP packet, RFC 2733 provides only packet 3532 repetition. 3534 b) RFC 2733 is most bit rate efficient if XOR-connected packets have 3535 equal length. 3537 c) At the same packet loss probability p and for a fixed k, the 3538 greater the value of n is, the smaller the residual error 3539 probability becomes. For example, for a packet loss probability 3540 of 10%, k=1, and n=2, the residual error probability is about 1%, 3541 whereas for n=3, the residual error probability is about 0.1%. 3543 d) At the same packet loss probability p and for a fixed code rate 3544 k/n, the greater the value of n is, the smaller the residual error 3545 probability becomes. For example, at a packet loss probability of 3546 p=10%, k=1 and n=2, the residual error rate is about 1%, whereas 3547 for an extended Golay code with k=12 and n=24, the residual error 3548 rate is about 0.01%. 3550 For applying RFC 2733 in combination with H.264 baseline coded video 3551 without using FUs, several options might be considered: 3553 1) The video encoder produces NAL units for which each video frame is 3554 coded in a single slice. Applying FEC, one could use a simple 3555 code; e.g., (n=2, k=1). That is, each NAL unit would basically 3556 just be repeated. The disadvantage is obviously the bad code 3557 performance according to d), above, and the low flexibility, as 3558 only (n, k=1) codes can be used. 3560 2) The video encoder produces NAL units for which each video frame is 3561 encoded in one or more consecutive slices. Applying FEC, one 3562 could use a better code, e.g., (n=24, k=12), over a sequence of 3563 NAL units. Depending on the number of RTP packets per frame, a 3564 loss may introduce a significant delay, which is reduced when more 3565 RTP packets are used per frame. Packets of completely different 3566 length might also be connected, which decreases bit rate 3567 efficiency according to b), above. However, with some care and 3568 for slices of 1kb or larger, similar length (100-200 bytes 3569 difference) may be produced, which will not lower the bit 3570 efficiency catastrophically. 3572 3) The video encoder produces NAL units, for which a certain frame 3573 contains k slices of possibly almost equal length. Then, applying 3574 FEC, a better code, e.g., (n=24, k=12), can be used over the 3575 sequence of NAL units for each frame. The delay compared to that 3576 of 2), above, may be reduced, but several disadvantages are 3577 obvious. First, the coding efficiency of the encoded video is 3578 lowered significantly, as slice-structured coding reduces intra- 3579 frame prediction and additional slice overhead is necessary. 3580 Second, pre-encoded content or, when operating over a gateway, the 3581 video is usually not appropriately coded with k slices such that 3582 FEC can be applied. Finally, the encoding of video producing k 3583 slices of equal length is not straightforward and might require 3584 more than one encoding pass. 3586 Many of the mentioned disadvantages can be avoided by applying FUs in 3587 combination with FEC. Each NAL unit can be split into any number of 3588 FUs of basically equal length; therefore, FEC with a reasonable k and 3589 n can be applied, even if the encoder made no effort to produce 3590 slices of equal length. For example, a coded slice NAL unit 3591 containing an entire frame can be split to k FUs, and a parity check 3592 code (n=k+1, k) can be applied. However, this has the disadvantage 3593 that unless all created fragments can be recovered, the whole slice 3594 will be lost. Thus a larger section is lost than would be if the 3595 frame had been split into several slices. 3597 The presented technique makes it possible to achieve good 3598 transmission error tolerance, even if no additional source coding 3599 layer redundancy (such as periodic intra frames) is present. 3600 Consequently, the same coded video sequence can be used to achieve 3601 the maximum compression efficiency and quality over error-free 3602 transmission and for transmission over error-prone networks. 3603 Furthermore, the technique allows the application of FEC to pre- 3604 encoded sequences without adding delay. In this case, pre-encoded 3605 sequences that are not encoded for error-prone networks can still be 3606 transmitted almost reliably without adding extensive delays. In 3607 addition, FUs of equal length result in a bit rate efficient use of 3608 RFC 2733. 3610 If the error probability depends on the length of the transmitted 3611 packet (e.g., in case of mobile transmission [14]), the benefits of 3612 applying FUs with FEC are even more obvious. Basically, the 3613 flexibility of the size of FUs allows appropriate FEC to be applied 3614 for each NAL unit and unequal error protection of NAL units. 3616 When FUs and FEC are used, the incurred overhead is substantial but 3617 is in the same order of magnitude as the number of bits that have to 3618 be spent for intra-coded macroblocks if no FEC is applied. In [18], 3619 it was shown that the overall performance of the FEC-based approach 3620 enhanced quality when using the same error rate and same overall bit 3621 rate, including the overhead. 3623 12.6. Low Bit-Rate Streaming 3625 This scheme has been implemented with H.263 and non-standard RTP 3626 packetization and has given good results [19]. There is no technical 3627 reason why similarly good results could not be achievable with H.264. 3629 In today's Internet streaming, some of the offered bit rates are 3630 relatively low in order to allow terminals with dial-up modems to 3631 access the content. In wired IP networks, relatively large packets, 3632 say 500 - 1500 bytes, are preferred to smaller and more frequently 3633 occurring packets in order to reduce network congestion. Moreover, 3634 use of large packets decreases the amount of RTP/UDP/IP header 3635 overhead. For low bit-rate video, the use of large packets means 3636 that sometimes up to few pictures should be encapsulated in one 3637 packet. 3639 However, loss of a packet including many coded pictures would have 3640 drastic consequences for visual quality, as there is practically no 3641 other way to conceal a loss of an entire picture than to repeat the 3642 previous one. One way to construct relatively large packets and 3643 maintain possibilities for successful loss concealment is to 3644 construct MTAPs that contain interleaved slices from several 3645 pictures. An MTAP should not contain spatially adjacent slices from 3646 the same picture or spatially overlapping slices from any picture. 3647 If a packet is lost, it is likely that a lost slice is surrounded by 3648 spatially adjacent slices of the same picture and spatially 3649 corresponding slices of the temporally previous and succeeding 3650 pictures. Consequently, concealment of the lost slice is likely to 3651 be relatively successful. 3653 12.7. Robust Packet Scheduling in Video Streaming 3655 Robust packet scheduling has been implemented with MPEG-4 Part 2 and 3656 simulated in a wireless streaming environment [20]. There is no 3657 technical reason why similar or better results could not be 3658 achievable with H.264. 3660 Streaming clients typically have a receiver buffer that is capable of 3661 storing a relatively large amount of data. Initially, when a 3662 streaming session is established, a client does not start playing the 3663 stream back immediately. Rather, it typically buffers the incoming 3664 data for a few seconds. This buffering helps maintain continuous 3665 playback, as, in case of occasional increased transmission delays or 3666 network throughput drops, the client can decode and play buffered 3667 data. Otherwise, without initial buffering, the client has to freeze 3668 the display, stop decoding, and wait for incoming data. The 3669 buffering is also necessary for either automatic or selective 3670 retransmission in any protocol level. If any part of a picture is 3671 lost, a retransmission mechanism may be used to resend the lost data. 3672 If the retransmitted data is received before its scheduled decoding 3673 or playback time, the loss is recovered perfectly. Coded pictures 3674 can be ranked according to their importance in the subjective quality 3675 of the decoded sequence. For example, non-reference pictures, such 3676 as conventional B pictures, are subjectively least important, as 3677 their absence does not affect decoding of any other pictures. In 3678 addition to non-reference pictures, the ITU-T H.264 | ISO/IEC 14496- 3679 10 standard includes a temporal scalability method called sub- 3680 sequences [21]. Subjective ranking can also be made on coded slice 3681 data partition or slice group basis. Coded slices and coded slice 3682 data partitions that are subjectively the most important can be sent 3683 earlier than their decoding order indicates, whereas coded slices and 3684 coded slice data partitions that are subjectively the least important 3685 can be sent later than their natural coding order indicates. 3686 Consequently, any retransmitted parts of the most important slices 3687 and coded slice data partitions are more likely to be received before 3688 their scheduled decoding or playback time compared to the least 3689 important slices and slice data partitions. 3691 13. Informative Appendix: Rationale for Decoding Order Number 3693 13.1. Introduction 3695 The Decoding Order Number (DON) concept was introduced mainly to 3696 enable efficient multi-picture slice interleaving (see section 12.6) 3697 and robust packet scheduling (see section 12.7). In both of these 3698 applications, NAL units are transmitted out of decoding order. DON 3699 indicates the decoding order of NAL units and should be used in the 3700 receiver to recover the decoding order. Example use cases for 3701 efficient multi-picture slice interleaving and for robust packet 3702 scheduling are given in sections 13.2 and 13.3, respectively. 3703 Section 13.4 describes the benefits of the DON concept in error 3704 resiliency achieved by redundant coded pictures. Section 13.5 3705 summarizes considered alternatives to DON and justifies why DON was 3706 chosen to this RTP payload specification. 3708 13.2. Example of Multi-Picture Slice Interleaving 3710 An example of multi-picture slice interleaving follows. A subset of 3711 a coded video sequence is depicted below in output order. R denotes 3712 a reference picture, N denotes a non-reference picture, and the 3713 number indicates a relative output time. 3715 ... R1 N2 R3 N4 R5 ... 3717 The decoding order of these pictures from left to right is as 3718 follows: 3720 ... R1 R3 N2 R5 N4 ... 3722 The NAL units of pictures R1, R3, N2, R5, and N4 are marked with a 3723 DON equal to 1, 2, 3, 4, and 5, respectively. 3725 Each reference picture consists of three slice groups that are 3726 scattered as follows (a number denotes the slice group number for 3727 each macroblock in a QCIF frame): 3729 0 1 2 0 1 2 0 1 2 0 1 3730 2 0 1 2 0 1 2 0 1 2 0 3731 1 2 0 1 2 0 1 2 0 1 2 3732 0 1 2 0 1 2 0 1 2 0 1 3733 2 0 1 2 0 1 2 0 1 2 0 3734 1 2 0 1 2 0 1 2 0 1 2 3735 0 1 2 0 1 2 0 1 2 0 1 3736 2 0 1 2 0 1 2 0 1 2 0 3737 1 2 0 1 2 0 1 2 0 1 2 3739 For the sake of simplicity, we assume that all the macroblocks of a 3740 slice group are included in one slice. Three MTAPs are constructed 3741 from three consecutive reference pictures so that each MTAP contains 3742 three aggregation units, each of which contains all the macroblocks 3743 from one slice group. The first MTAP contains slice group 0 of 3744 picture R1, slice group 1 of picture R3, and slice group 2 of picture 3745 R5. The second MTAP contains slice group 1 of picture R1, slice 3746 group 2 of picture R3, and slice group 0 of picture R5. The third 3747 MTAP contains slice group 2 of picture R1, slice group 0 of picture 3748 R3, and slice group 1 of picture R5. Each non-reference picture is 3749 encapsulated into an STAP-B. 3751 Consequently, the transmission order of NAL units is the following: 3753 R1, slice group 0, DON 1, carried in MTAP,RTP SN: N 3754 R3, slice group 1, DON 2, carried in MTAP,RTP SN: N 3755 R5, slice group 2, DON 4, carried in MTAP,RTP SN: N 3756 R1, slice group 1, DON 1, carried in MTAP,RTP SN: N+1 3757 R3, slice group 2, DON 2, carried in MTAP,RTP SN: N+1 3758 R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+1 3759 R1, slice group 2, DON 1, carried in MTAP,RTP SN: N+2 3760 R3, slice group 1, DON 2, carried in MTAP,RTP SN: N+2 3761 R5, slice group 0, DON 4, carried in MTAP,RTP SN: N+2 3762 N2, DON 3, carried in STAP-B, RTP SN: N+3 3763 N4, DON 5, carried in STAP-B, RTP SN: N+4 3765 The receiver is able to organize the NAL units back in decoding order 3766 based on the value of DON associated with each NAL unit. 3768 If one of the MTAPs is lost, the spatially adjacent and temporally 3769 co-located macroblocks are received and can be used to conceal the 3770 loss efficiently. If one of the STAPs is lost, the effect of the 3771 loss does not propagate temporally. 3773 13.3. Example of Robust Packet Scheduling 3775 An example of robust packet scheduling follows. The communication 3776 system used in the example consists of the following components in 3777 the order that the video is processed from source to sink: 3779 o camera and capturing 3780 o pre-encoding buffer 3781 o encoder 3782 o encoded picture buffer 3783 o transmitter 3784 o transmission channel 3785 o receiver 3786 o receiver buffer 3787 o decoder 3788 o decoded picture buffer 3789 o display 3791 The video communication system used in the example operates as 3792 follows. Note that processing of the video stream happens gradually 3793 and at the same time in all components of the system. The source 3794 video sequence is shot and captured to a pre-encoding buffer. The 3795 pre-encoding buffer can be used to order pictures from sampling order 3796 to encoding order or to analyze multiple uncompressed frames for bit 3797 rate control purposes, for example. In some cases, the pre-encoding 3798 buffer may not exist; instead, the sampled pictures are encoded right 3799 away. The encoder encodes pictures from the pre-encoding buffer and 3800 stores the output; i.e., coded pictures, to the encoded picture 3801 buffer. The transmitter encapsulates the coded pictures from the 3802 encoded picture buffer to transmission packets and sends them to a 3803 receiver through a transmission channel. The receiver stores the 3804 received packets to the receiver buffer. The receiver buffering 3805 process typically includes buffering for transmission delay jitter. 3806 The receiver buffer can also be used to recover correct decoding 3807 order of coded data. The decoder reads coded data from the receiver 3808 buffer and produces decoded pictures as output into the decoded 3809 picture buffer. The decoded picture buffer is used to recover the 3810 output (or display) order of pictures. Finally, pictures are 3811 displayed. 3813 In the following example figures, I denotes an IDR picture, R denotes 3814 a reference picture, N denotes a non-reference picture, and the 3815 number after I, R, or N indicates the sampling time relative to the 3816 previous IDR picture in decoding order. Values below the sequence of 3817 pictures indicate scaled system clock timestamps. The system clock 3818 is initialized arbitrarily in this example, and time runs from left 3819 to right. Each I, R, and N picture is mapped into the same timeline 3820 compared to the previous processing step, if any, assuming that 3821 encoding, transmission, and decoding take no time. Thus, events 3822 happening at the same time are located in the same column throughout 3823 all example figures. 3825 A subset of a sequence of coded pictures is depicted below in 3826 sampling order. 3828 ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ... N58 N59 I00 N01 ... 3829 ... --|---|---|---|---|---|---|---|---|- ... -|---|---|---|- ... 3830 ... 58 59 60 61 62 63 64 65 66 ... 128 129 130 131 ... 3832 Figure 16 Sequence of pictures in sampling order 3834 The sampled pictures are buffered in the pre-encoding buffer to 3835 arrange them in encoding order. In this example, we assume that the 3836 non-reference pictures are predicted from both the previous and the 3837 next reference picture in output order, except for the non-reference 3838 pictures immediately preceding an IDR picture, which are predicted 3839 only from the previous reference picture in output order. Thus, the 3840 pre-encoding buffer has to contain at least two pictures, and the 3841 buffering causes a delay of two picture intervals. The output of the 3842 pre-encoding buffering process and the encoding (and decoding) order 3843 of the pictures are as follows: 3845 ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ... 3846 ... -|---|---|---|---|---|---|---|---|- ... 3847 ... 60 61 62 63 64 65 66 67 68 ... 3849 Figure 17 Re-ordered pictures in the pre-encoding buffer 3851 The encoder or the transmitter can set the value of DON for each 3852 picture to a value of DON for the previous picture in decoding order 3853 plus one. 3855 For the sake of simplicity, let us assume that: 3857 o the frame rate of the sequence is constant, 3858 o each picture consists of only one slice, 3859 o each slice is encapsulated in a single NAL unit packet, 3860 o there is no transmission delay, and 3861 o pictures are transmitted at constant intervals (that is, 1 / 3862 (frame rate)). 3864 When pictures are transmitted in decoding order, they are received as 3865 follows: 3867 ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ... 3868 ... -|---|---|---|---|---|---|---|---|- ... 3869 ... 60 61 62 63 64 65 66 67 68 ... 3871 Figure 18 Received pictures in decoding order 3873 The OPTIONAL sprop-interleaving-depth media type parameter is set to 3874 0, as the transmission (or reception) order is identical to the 3875 decoding order. 3877 The decoder has to buffer for one picture interval initially in its 3878 decoded picture buffer to organize pictures from decoding order to 3879 output order as depicted below: 3881 ... N58 N59 I00 N01 N02 R03 N04 N05 R06 ... 3882 ... -|---|---|---|---|---|---|---|---|- ... 3883 ... 61 62 63 64 65 66 67 68 69 ... 3885 Figure 19 Output order 3887 The amount of required initial buffering in the decoded picture 3888 buffer can be signaled in the buffering period SEI message or with 3889 the num_reorder_frames syntax element of H.264 video usability 3890 information. num_reorder_frames indicates the maximum number of 3891 frames, complementary field pairs, or non-paired fields that precede 3892 any frame, complementary field pair, or non-paired field in the 3893 sequence in decoding order and that follow it in output order. For 3894 the sake of simplicity, we assume that num_reorder_frames is used to 3895 indicate the initial buffer in the decoded picture buffer. In this 3896 example, num_reorder_frames is equal to 1. 3898 It can be observed that if the IDR picture I00 is lost during 3899 transmission and a retransmission request is issued when the value of 3900 the system clock is 62, there is one picture interval of time (until 3901 the system clock reaches timestamp 63) to receive the retransmitted 3902 IDR picture I00. 3904 Let us then assume that IDR pictures are transmitted two frame 3905 intervals earlier than their decoding position; i.e., the pictures 3906 are transmitted as follows: 3908 ... I00 N58 N59 R03 N01 N02 R06 N04 N05 ... 3909 ... --|---|---|---|---|---|---|---|---|- ... 3910 ... 62 63 64 65 66 67 68 69 70 ... 3912 Figure 20 Interleaving: Early IDR pictures in sending order 3914 The OPTIONAL sprop-interleaving-depth media type parameter is set 3915 equal to 1 according to its definition. (The value of sprop- 3916 interleaving-depth in this example can be derived as follows: Picture 3917 I00 is the only picture preceding picture N58 or N59 in transmission 3918 order and following it in decoding order. Except for pictures I00, 3919 N58, and N59, the transmission order is the same as the decoding 3920 order of pictures. As a coded picture is encapsulated into exactly 3921 one NAL unit, the value of sprop-interleaving-depth is equal to the 3922 maximum number of pictures preceding any picture in transmission 3923 order and following the picture in decoding order.) 3925 The receiver buffering process contains two pictures at a time 3926 according to the value of the sprop-interleaving-depth parameter and 3927 orders pictures from the reception order to the correct decoding 3928 order based on the value of DON associated with each picture. The 3929 output of the receiver buffering process is as follows: 3931 ... N58 N59 I00 R03 N01 N02 R06 N04 N05 ... 3932 ... -|---|---|---|---|---|---|---|---|- ... 3933 ... 63 64 65 66 67 68 69 70 71 ... 3935 Figure 21 Interleaving: Receiver buffer 3937 Again, an initial buffering delay of one picture interval is needed 3938 to organize pictures from decoding order to output order, as depicted 3939 below: 3941 ... N58 N59 I00 N01 N02 R03 N04 N05 ... 3942 ... -|---|---|---|---|---|---|---|- ... 3943 ... 64 65 66 67 68 69 70 71 ... 3945 Figure 22 Interleaving: Receiver buffer after reordering 3947 Note that the maximum delay that IDR pictures can undergo during 3948 transmission, including possible application, transport, or link 3949 layer retransmission, is equal to three picture intervals. Thus, the 3950 loss resiliency of IDR pictures is improved in systems supporting 3951 retransmission compared to the case in which pictures were 3952 transmitted in their decoding order. 3954 13.4. Robust Transmission Scheduling of Redundant Coded Slices 3956 A redundant coded picture is a coded representation of a picture or a 3957 part of a picture that is not used in the decoding process if the 3958 corresponding primary coded picture is correctly decoded. There 3959 should be no noticeable difference between any area of the decoded 3960 primary picture and a corresponding area that would result from 3961 application of the H.264 decoding process for any redundant picture 3962 in the same access unit. A redundant coded slice is a coded slice 3963 that is a part of a redundant coded picture. 3965 Redundant coded pictures can be used to provide unequal error 3966 protection in error-prone video transmission. If a primary coded 3967 representation of a picture is decoded incorrectly, a corresponding 3968 redundant coded picture can be decoded. Examples of applications and 3969 coding techniques using the redundant codec picture feature include 3970 the video redundancy coding [22] and the protection of "key pictures" 3971 in multicast streaming [23]. 3973 One property of many error-prone video communications systems is that 3974 transmission errors are often bursty. Therefore, they may affect 3975 more than one consecutive transmission packets in transmission order. 3976 In low bit-rate video communication, it is relatively common that an 3977 entire coded picture can be encapsulated into one transmission 3978 packet. Consequently, a primary coded picture and the corresponding 3979 redundant coded pictures may be transmitted in consecutive packets in 3980 transmission order. To make the transmission scheme more tolerant of 3981 bursty transmission errors, it is beneficial to transmit the primary 3982 coded picture and redundant coded picture separated by more than a 3983 single packet. The DON concept enables this. 3985 13.5. Remarks on Other Design Possibilities 3987 The slice header syntax structure of the H.264 coding standard 3988 contains the frame_num syntax element that can indicate the decoding 3989 order of coded frames. However, the usage of the frame_num syntax 3990 element is not feasible or desirable to recover the decoding order, 3991 due to the following reasons: 3993 o The receiver is required to parse at least one slice header per 3994 coded picture (before passing the coded data to the decoder). 3996 o Coded slices from multiple coded video sequences cannot be 3997 interleaved, as the frame number syntax element is reset to 0 in 3998 each IDR picture. 4000 o The coded fields of a complementary field pair share the same 4001 value of the frame_num syntax element. Thus, the decoding order 4002 of the coded fields of a complementary field pair cannot be 4003 recovered based on the frame_num syntax element or any other 4004 syntax element of the H.264 coding syntax. 4006 The RTP payload format for transport of MPEG-4 elementary streams 4007 [24] enables interleaving of access units and transmission of 4008 multiple access units in the same RTP packet. An access unit is 4009 specified in the H.264 coding standard to comprise all NAL units 4010 associated with a primary coded picture according to subclause 4011 7.4.1.2 of [1]. Consequently, slices of different pictures cannot be 4012 interleaved, and the multi-picture slice interleaving technique (see 4013 section 12.6) for improved error resilience cannot be used. 4015 14. Acknowledgements 4017 Stephan Wenger, Miska Hannuksela, Thomas Stockhammer, Magnus 4018 Westerlund, and David Singer are thanked as the authors of RFC 3984. 4019 Dave Lindbergh, Philippe Gentric, Gonzalo Camarillo, Gary Sullivan, 4020 Joerg Ott, and Colin Perkins are thanked for careful review during 4021 the development of RFC 3984. Randell Jesup is thanked for his 4022 valuable comments during the development of this RFC. 4024 This document was prepared using 2-Word-v2.0.template.dot. 4026 15. References 4028 15.1. Normative References 4030 [1] ITU-T Recommendation H.264, "Advanced video coding for generic 4031 audiovisual services", May 2003. [Ed. (YkW): This should be 4032 updated to the latest version.] 4034 [2] ISO/IEC International Standard 14496-10:2003. [Ed. (YkW): This 4035 should be updated to the latest version.] 4037 [3] ITU-T Recommendation H.241, "Extended video procedures and 4038 control signals for H.300 series terminals", May 2006. [Ed. 4039 (TK): This should be updated to the latest version, when 4040 published.] 4042 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 4043 Levels", BCP 14, RFC 2119, March 1997. 4045 [5] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 4046 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 4047 RFC 3550, July 2003. 4049 [6] Handley, M. and V. Jacobson, "SDP: Session Description 4050 Protocol", RFC 2327, April 1998. 4052 [7] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", 4053 RFC 3548, July 2003. 4055 [8] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model with 4056 Session Description Protocol (SDP)", RFC 3264, June 2002. 4058 15.2. Informative References 4060 [9] Luthra, A., Sullivan, G.J., and T. Wiegand (eds.), Special 4061 Issue on H.264/AVC. IEEE Transactions on Circuits and Systems 4062 on Video Technology, July 2003. 4064 [10] Bormann, C., Cline, L., Deisher, G., Gardos, T., Maciocco, C., 4065 Newell, D., Ott, J., Sullivan, G., Wenger, S., and C. Zhu, "RTP 4066 Payload Format for the 1998 Version of ITU-T Rec. H.263 Video 4067 (H.263+)", RFC 2429, October 1998. 4069 [11] ISO/IEC IS 14496-2. 4071 [12] Wenger, S., "H.26L over IP", IEEE Transaction on Circuits and 4072 Systems for Video technology, Vol. 13, No. 7, July 2003. 4074 [13] Wenger, S., "H.26L over IP: The IP Network Adaptation Layer", 4075 Proceedings Packet Video Workshop 02, April 2002. 4077 [14] Stockhammer, T., Hannuksela, M.M., and S. Wenger, "H.26L/JVT 4078 Coding Network Abstraction Layer and IP-based Transport" in 4079 Proc. ICIP 2002, Rochester, NY, September 2002. 4081 [15] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and Video 4082 Conferences with Minimal Control", STD 65, RFC 3551, July 2003. 4084 [16] ITU-T Recommendation H.223, "Multiplexing protocol for low bit 4085 rate multimedia communication", July 2001. 4087 [17] Rosenberg, J. and H. Schulzrinne, "An RTP Payload Format for 4088 Generic Forward Error Correction", RFC 2733, December 1999. 4090 [18] Stockhammer, T., Wiegand, T., Oelbaum, T., and F. Obermeier, 4091 "Video Coding and Transport Layer Techniques for H.264/AVC- 4092 Based Transmission over Packet-Lossy Networks", IEEE 4093 International Conference on Image Processing (ICIP 2003), 4094 Barcelona, Spain, September 2003. 4096 [19] Varsa, V. and M. Karczewicz, "Slice interleaving in compressed 4097 video packetization", Packet Video Workshop 2000. 4099 [20] Kang, S.H. and A. Zakhor, "Packet scheduling algorithm for 4100 wireless video streaming," International Packet Video Workshop 4101 2002. 4103 [21] Hannuksela, M.M., "Enhanced concept of GOP", JVT-B042, 4104 available http://ftp3.itu.int/av-arch/video-site/0201_Gen/JVT- 4105 B042.doc, anuary 2002. 4107 [22] Wenger, S., "Video Redundancy Coding in H.263+", 1997 4108 International Workshop on Audio-Visual Services over Packet 4109 Networks, September 1997. 4111 [23] Wang, Y.-K., Hannuksela, M.M., and M. Gabbouj, "Error Resilient 4112 Video Coding Using Unequally Protected Key Pictures", in Proc. 4113 International Workshop VLBV03, September 2003. 4115 [24] van der Meer, J., Mackie, D., Swaminathan, V., Singer, D., and 4116 P. Gentric, "RTP Payload Format for Transport of MPEG-4 4117 Elementary Streams", RFC 3640, November 2003. 4119 [25] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 4120 Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 4121 3711, March 2004. 4123 [26] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming 4124 Protocol (RTSP)", RFC 2326, April 1998. 4126 [27] Handley, M., Perkins, C., and E. Whelan, "Session Announcement 4127 Protocol", RFC 2974, October 2000. 4129 [28] Westerlund, M. and Wenger, S., "RTP Topologies", RFC 5117, 4130 January 2008. 4132 [29] Wenger, S., Chandra, U., and Westerlund, M., "Codec Control 4133 Messages in the RTP Audio-Visual Profile with Feedback (AVPF)", 4134 RFC 5104, February 2008. 4136 Authors' Addresses 4138 Ye-Kui Wang 4139 Nokia Research Center 4140 P.O. Box 1000 4141 33721 Tampere 4142 Finland 4144 Phone: +358-50-466-7004 4145 EMail: ye-kui.wang@nokia.com 4147 Roni Even 4148 14 David Hamelech 4149 Tel Aviv 64953 4150 Israel 4152 Phone: +972-545481099 4153 Email:ron.even.tlv@gmail.com 4155 Tom Kristensen 4156 TANDBERG 4157 Philip Pedersens vei 22 4158 N-1366 Lysaker 4159 Norway 4161 Phone: +47 67125125 4162 Email: tom.kristensen@tandberg.com, tomkri@ifi.uio.no 4164 Intellectual Property Statement 4166 The IETF takes no position regarding the validity or scope of any 4167 Intellectual Property Rights or other rights that might be claimed to 4168 pertain to the implementation or use of the technology described in 4169 this document or the extent to which any license under such rights 4170 might or might not be available; nor does it represent that it has 4171 made any independent effort to identify any such rights. Information 4172 on the procedures with respect to rights in RFC documents can be 4173 found in BCP 78 and BCP 79. 4175 Copies of IPR disclosures made to the IETF Secretariat and any 4176 assurances of licenses to be made available, or the result of an 4177 attempt made to obtain a general license or permission for the use of 4178 such proprietary rights by implementers or users of this 4179 specification can be obtained from the IETF on-line IPR repository at 4180 http://www.ietf.org/ipr. 4182 The IETF invites any interested party to bring to its attention any 4183 copyrights, patents or patent applications, or other proprietary 4184 rights that may cover technology that may be required to implement 4185 this standard. Please address the information to the IETF at 4186 ietf-ipr@ietf.org. 4188 Disclaimer of Validity 4190 This document and the information contained herein are provided on an 4191 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 4192 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 4193 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 4194 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 4195 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 4196 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 4198 Copyright Statement 4200 Copyright (C) The IETF Trust (2008). 4202 This document is subject to the rights, licenses and restrictions 4203 contained in BCP 78, and except as set forth therein, the authors 4204 retain all their rights. 4206 Acknowledgement 4208 Funding for the RFC Editor function is currently provided by the 4209 Internet Society. 4211 16. Backward compatibility to RFC 3984 4213 The current document is a revision of RFC 3984 and intends to 4214 obsolete it. This section addresses the backward compatibility 4215 issues. 4217 TBD. 4219 17. Changes from RFC 3984 4221 To be updated. 4223 17.1. Technical changes 4225 The technical changes (including bug fixes) from RFC 3984 are: 4227 1) In subsections 5.4, 5.5, 6.2, 6,3 and 6.4, removed that the 4228 packetization mode in use may be signaled by external means. 4230 2) In subsection 7.2.2, changed the sentence 4232 There are N VCL NAL units in the deinterleaving buffer. 4234 to 4236 There are N or more VCL NAL units in the deinterleaving buffer. 4238 3) In subsection 7.2.2, changed the sentence 4240 Herein, n corresponds to the NAL unit having the greatest value 4241 of AbsDON among the received NAL units. 4243 to 4245 Herein, n corresponds to the NAL unit having the greatest value 4246 of AbsDON among the NAL units in the deinterleaving buffer. 4248 4) In subsection 8.1, the semantics of sprop-init-buf-time, paragraph 4249 2, changed the sentence 4251 The parameter is the maximum value of (transmission time of a NAL 4252 unit - decoding time of the NAL unit), assuming reliable and 4253 instantaneous transmission, the same timeline for transmission 4254 and decoding, and that decoding starts when the first packet 4255 arrives. 4257 to 4259 The parameter is the maximum value of (decoding time of the NAL 4260 unit - transmission time of a NAL unit), assuming reliable and 4261 instantaneous transmission, the same timeline for transmission 4262 and decoding, and that decoding starts when the first packet 4263 arrives. 4265 5) In subsection 8.1, removed the specification of parameter-add. 4266 Other descriptions of parameter-add (in subsections 8.2 and 8.4) 4267 are also removed. 4269 6) In subsection 8.2.2, changed bullet item 1, such that in the SDP 4270 offer/answer model, the use of the level part of "profile-level- 4271 id" does not need to be symmetric, i.e. the value of the level 4272 part in the answer does not have not be the same as in the offer. 4274 In addition, an informative note was added to clarify the 4275 specification of level 1b in H.264. 4277 7) In subsection 8.2.2, changed bullet item 3 from 4279 The capability parameters ("max-mbps", "max-fs", "max-cpb", "max- 4280 dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size") MAY 4281 be used to declare further capabilities. Their interpretation 4282 depends on the direction attribute. When the direction attribute 4283 is sendonly, then the parameters describe the limits of the RTP 4284 packets and the NAL unit stream that the sender is capable of 4285 producing. When the direction attribute is sendrecv or recvonly, 4286 then the parameters describe the limitations of what the receiver 4287 accepts. 4289 to 4291 The capability parameters ("max-mbps", "max-fs", "max-cpb", "max- 4292 dpb", "max-br", ,"redundant-pic-cap", "max-rcmd-nalu-size") MAY 4293 be used to declare further capabilities. These parameters can 4294 only be present when the direction attribute is sendrecv or 4295 recvonly, and the parameters describe the limitations of what the 4296 receiver accepts. 4298 Such that the description matches the semantics of these 4299 parameters defined earlier. 4301 8) In subsection 8.2.2. the following paragraph 4303 For streams being delivered over multicast, the following rules 4304 apply in addition: 4306 was changed to the following (with an item added after the 4307 paragraph): 4309 For streams being delivered over multicast, the following rules 4310 apply: 4312 o The media format configuration is identified by the same 4313 parameters as above for unicast (i.e. "profile-level-id", 4314 "packetization-mode", and, if required by "packetization- 4315 mode", "sprop-deint-buf-req"). These media format 4316 configuration parameters (including the level part of 4317 "profile-level-id") MUST be used symmetrically; i.e., the 4318 answerer MUST either maintain all configuration parameters or 4319 remove the media format (payload type) completely. 4321 Because some items described above for unicast do not apply for 4322 multicast. 4324 9) In subsection 8.2.2, the bullet item starting with "In an offer or 4325 answer for which the direction attribute "a=sendonly" is included 4326 for the media stream, the following interpretation of the 4327 parameters MUST be used:", removed the following, because, the 4328 direction attribute is sendonly, the sender will not receive 4329 streams. 4331 Declaring the capabilities of the sender when it receives a 4332 stream: 4334 - max-mbps 4335 - max-fs 4336 - max-cpb 4337 - max-dpb 4338 - max-br 4339 - redundant-pic-cap 4340 - deint-buf-cap 4341 - max-rcmd-nalu-size 4343 17.2. Editorial changes 4345 The editorial changes from RFC 3984 are: 4347 1) In subsection 4.1 (Definitions), added the definition of "NALU- 4348 time" (moved from subsection 5.7 with slight modifications), and 4349 changed the use of "NALU time" to "NALU-time" at two places in 4350 subsection 5.7. 4352 2) Added the subsection number 4.2 for Abbreviations. 4354 3) In subsection 5.2, added the following paragraph right before 4355 Table 1: 4357 Table 1 summarizes NAL unit types and the corresponding RTP 4358 packet types when each of these NAL units is directly used a 4359 packet payload, and where the types are described in this memo. 4361 4) In subsection 5.2, changed Table 1 from 4363 Table 1. Summary of NAL unit types and their payload structures 4364 Type Packet Type name Section 4365 --------------------------------------------------------- 4366 0 undefined - 4367 1-23 NAL unit Single NAL unit packet per H.264 5.6 4368 24 STAP-A Single-time aggregation packet 5.7.1 4369 25 STAP-B Single-time aggregation packet 5.7.1 4370 26 MTAP16 Multi-time aggregation packet 5.7.2 4371 27 MTAP24 Multi-time aggregation packet 5.7.2 4372 28 FU-A Fragmentation unit 5.8 4373 29 FU-B Fragmentation unit 5.8 4374 30-31 undefined - 4376 to 4378 Table 1. Summary of NAL unit types and the corresponding packet 4379 types 4381 NAL Unit Packet Packet Type Name Section 4382 Type Type 4383 --------------------------------------------------------- 4384 0 reserved - 4385 1-23 NAL unit Single NAL unit packet 5.6 4386 24 STAP-A Single-time aggregation packet 5.7.1 4387 25 STAP-B Single-time aggregation packet 5.7.1 4388 26 MTAP16 Multi-time aggregation packet 5.7.2 4389 27 MTAP24 Multi-time aggregation packet 5.7.2 4390 28 FU-A Fragmentation unit 5.8 4391 29 FU-B Fragmentation unit 5.8 4392 30-31 reserved - 4394 5) In subsection 5.3, removed "greater than 00" from the following 4395 sentence: 4397 In addition to the specification above, according to this RTP 4398 payload specification, values of NRI indicate the relative 4399 transport priority, as determined by the encoder. 4401 6) In subsection 5.3, the second informative note, corrected "nal 4402 unit" to "NAL unit". 4404 7) In the end of subsection 5.4, changed the text starting from the 4405 second sentence of the last text paragraph and Table 3 from 4407 The used packetization mode governs which NAL unit types are 4408 allowed in RTP payloads. Table 3 summarizes the allowed NAL unit 4409 types for each packetization mode. Some NAL unit type values 4410 (indicated as undefined in Table 3) are reserved for future 4411 extensions. NAL units of those types SHOULD NOT be sent by a 4412 sender and MUST be ignored by a receiver. For example, the Types 4413 1-23, with the associated packet type "NAL unit", are allowed in 4414 "Single NAL Unit Mode" and in "Non-Interleaved Mode", but 4415 disallowed in "Interleaved Mode". Packetization modes are 4416 explained in more detail in section 6. 4418 Table 3. Summary of allowed NAL unit types for each packetization 4419 mode (yes = allowed, no = disallowed, ig = ignore) 4421 Type Packet Single NAL Non-Interleaved Interleaved 4422 Unit Mode Mode Mode 4423 ------------------------------------------------------------- 4424 0 undefined ig ig ig 4425 1-23 NAL unit yes yes no 4426 24 STAP-A no yes no 4427 25 STAP-B no no yes 4428 26 MTAP16 no no yes 4429 27 MTAP24 no no yes 4430 28 FU-A no yes yes 4431 29 FU-B no no yes 4432 30-31 undefined ig ig ig 4434 to 4436 The used packetization mode governs which NAL unit types are 4437 allowed in RTP payloads. Table 3 summarizes the allowed packet 4438 payload types for each packetization mode. Packetization modes 4439 are explained in more detail in section 6. 4441 Table 3. Summary of allowed NAL unit types for each packetization 4442 mode (yes = allowed, no = disallowed, ig = ignore) 4444 Payload Packet Single NAL Non-Interleaved Interleaved 4445 Type Type Unit Mode Mode Mode 4446 ------------------------------------------------------------- 4447 0 reserved ig ig ig 4448 1-23 NAL unit yes yes no 4449 24 STAP-A no yes no 4450 25 STAP-B no no yes 4451 26 MTAP16 no no yes 4452 27 MTAP24 no no yes 4453 28 FU-A no yes yes 4454 29 FU-B no no yes 4455 30-31 reserved ig ig ig 4457 Some NAL unit or payload type values (indicated as undefined in 4458 Table 3) are reserved for future extensions. NAL units of those 4459 types SHOULD NOT be sent by a sender (direct as packet payloads, 4460 or as aggregation units in aggregation packets, or as fragmented 4461 units in FU packets) and SHOULD be ignored by a receiver. For 4462 example, the payload types 1-23, with the associated packet type 4463 "NAL unit", are allowed in "Single NAL Unit Mode" and in "Non- 4464 Interleaved Mode", but disallowed in "Interleaved Mode". 4465 However, NAL units of NAL unit types 1-23 can be used in 4466 "Interleaved Mode" as aggregation units in STAP-B, MTAP16 and 4467 MTAP14 packets as well as fragmented units in FU-A and FU-B 4468 packets. Similarly, NAL units of NAL unit types 1-23 can also be 4469 used in the "Non-Interleaved Mode" as aggregation units in STAP-A 4470 packets or fragmented units in FU-A packets, in addition to being 4471 directly used as packet payloads. 4473 8) In subsections 5.6 and 5.7, changed "type" to "Type" in Figures 2 4474 and 3. 4476 9) Corrected the titles of Figures 7,8,12, and 13, wherein "and" was 4477 replaced by "containing". 4479 10)In subsection 5.7.2, corrected the sentence 4481 The DON of the following NAL unit is equal to (DONB + DOND) % 4482 65536, in which % denotes the modulo operation. 4484 to 4486 The DON of the NAL unit contained in a multi-time aggregation 4487 unit is equal to (DONB + DOND) % 65536, in which % denotes the 4488 modulo operation. 4490 11)In Figure 11, corrected "NALU unit size" to "NAL unit size". 4492 12)In subsection 5.7.2, the informative note under Figure 11, 4493 corrected the first sentence from 4495 The "earliest" multi-time aggregation unit is the one that would 4496 have the smallest extended RTP timestamp among all the 4497 aggregation units of an MTAP if the aggregation units were 4498 encapsulated in single NAL unit packets. 4500 to 4502 The "earliest" multi-time aggregation unit is the one that would 4503 have the smallest extended RTP timestamp among all the 4504 aggregation units of an MTAP if the NAL units contained in the 4505 aggregation units were encapsulated in single NAL unit packets. 4507 13)In subsection 5.7.3, corrected the following sentence by replacing 4508 "picture" with "NAL unit". 4510 The fragmentation mechanism allows fragmenting a single picture 4511 and applying generic forward error correction as described in 4512 section 12.5. 4514 14)In subsection 5.7.3, changed the following sentence by replacing 4515 'A' with "An". 4517 A FU payload MAY have any number of octets and MAY be empty. 4519 15)In subsection 5.7.3, the last informative note, changed the last 4520 sentence from 4522 However, the (potential) use of zero-length NALUs should be 4523 carefully weighed against the increased risk of the loss of the 4524 NALU because of the additional packets employed for its 4525 transmission. 4527 to 4529 However, the (potential) use of zero-length NALU fragments should 4530 be carefully weighed against the increased risk of the loss of at 4531 least a part of the NALU because of the additional packets 4532 employed for its transmission. 4534 16)In subsection 6.1, corrected the sentence 4536 Coded slice NAL units or coded slice data partition NAL units 4537 belonging to the same coded picture (and thus sharing the same 4538 RTP timestamp value) MAY be sent in any order permitted by the 4539 applicable profile defined in [1]; however, for delay-critical 4540 systems, they SHOULD be sent in their original coding order to 4541 minimize the delay. Note that the coding order is not 4542 necessarily the scan order, but the order the NAL packets become 4543 available to the RTP stack. 4545 to 4547 Coded slice NAL units or coded slice data partition NAL units 4548 belonging to the same coded picture (and thus sharing the same 4549 RTP timestamp value) MAY be sent in any order; however, for 4550 delay-critical systems, they SHOULD be sent in their original 4551 decoding order to minimize the delay. Note that the decoding 4552 order is the order of the NAL units in the bitstream. 4554 17)Removed "(Informative)" from the tile of section 7, and changed 4555 the first sentences in the beginning of section 7 from 4557 The de-packetization process is implementation dependent. 4558 Therefore, the following description should be seen as an example 4559 of a suitable implementation. Other schemes may be used as well. 4560 Optimizations relative to the described algorithms are likely 4561 possible. 4563 to 4565 The de-packetization process is implementation dependent. 4566 Therefore, the following description should be seen as an example 4567 of a suitable implementation. Other schemes may be used as well 4568 as long as the output for the same input is the same as the 4569 process described below. The output is the same meaning that the 4570 number of NAL units and their order are both the identical. 4571 Optimizations relative to the described algorithms are likely 4572 possible. 4574 18)In subsection 7.1, paragraph 1, corrected the last sentence from 4576 If a decapsulated packet is an FU-A, all the fragments of the 4577 fragmented NAL unit are concatenated and passed to the decoder. 4579 to 4581 For all the FU-A packets containing fragments of a single NAL 4582 unit, the decapsulated fragments are concatenated in their 4583 sending order to recover the NAL unit, which is then passed to 4584 the decoder. 4586 19)In subsection 7.2, paragraph 2, corrected the first sentence 4587 (copied below) by replacing "packets" with "NAL units". 4589 The receiver includes a receiver buffer, which is used to 4590 compensate for transmission delay jitter and to reorder packets 4591 from transmission order to the NAL unit decoding order. 4593 20)In subsection 7.2.2, paragraph 1, changed the following sentence 4594 by replacing "is" with "are". 4596 After initial buffering, decoding and playback is started, and 4597 the buffering-while-playing mode is used. 4599 21)In subsection 7.2.2, paragraph 1, changed the sentence 4601 The value of DON is calculated and stored for all NAL units. 4603 to 4605 The value of DON is calculated and stored for each NAL unit. 4607 22)In subsection 7.2.2, removed "an" from the following sentence. 4609 Let PDON be a variable that is initialized to 0 at the beginning 4610 of the an RTP session. 4612 23)In section 8, paragraph 2, changed the sentence 4614 The name of all these parameters starts with "sprop" for stream 4615 properties. 4617 to 4619 The names of all these parameters start with "sprop" for stream 4620 properties. 4622 24)In subsection 8.1, the semantics of max-mbps, max-fs, max-cpb, 4623 max-dpb, and max-br, changed the last paragraph above the 4624 informative note from 4626 A receiver MUST NOT signal values of max-mbps, max-fs, max- 4627 cpb, max-dpb, and max-br that meet the requirements of a 4628 higher level, referred to as level A herein, compared to the 4629 level specified in the value of the profile-level-id 4630 parameter, if the receiver can support all the properties of 4631 level A. 4633 to 4635 If a receiver can support all the properties of level A, the 4636 level specified in the value of the profile-level-id MUST be 4637 level A (i.e. MUST NOT be lower than level A). In other 4638 words, a sender or receiver MUST NOT signal values of max- 4639 mbps, max-fs, max-cpb, max-dpb, and max-br that meet the 4640 requirements of a higher level compared to the level specified 4641 in the value of the profile-level-id parameter. 4643 25)In subsection 8.1, the semantics of max-dpb, the informative note, 4644 removed "and is a property of the video decoder only" from the 4645 following sentence. 4647 The decoded picture buffer stores reconstructed samples and is a 4648 property of the video decoder only. 4650 26)In subsection 8.1, the semantics of max-br, paragraph 2, removed 4651 the following sentence that is repeated later in the exact form. 4653 The value of max-br MUST be greater than or equal to the value of 4654 MaxBR for the level given in Table A-1 of [1]. 4656 27)In subsection 8.1, the semantics of packetization-mode, changed 4657 the last sentence from 4659 The value of packetization mode MUST be an integer in the range 4660 of 0 to 2, inclusive. 4662 to 4664 The value of packetization-mode MUST be an integer in the range 4665 of 0 to 2, inclusive. 4667 28)In subsection 8.1, the semantics of sprop-init-buf-time, paragraph 4668 2, changed the following sentence by replacing "MUST buffer" by 4669 "MUST wait". 4671 The parameter signals the initial buffering time that a receiver 4672 MUST buffer before starting decoding to recover the NAL unit 4673 decoding order from the transmission order. 4675 29)In subsection 8.1, changed "NAL unit stream" to "RTP packet 4676 stream" at several places in the semantics of sprop-interleaving- 4677 depth, sprop-deint-buf-req, sprop-init-buf-time and sprop-max-don- 4678 diff. 4680 30)In subsection 8.1, removed the paragraph talking about file 4681 formats after the description of "Encoding considerations". The 4682 references 29 and 30 (to file format specifications) were also 4683 removed. 4685 31)In subsection 8.2.1, the example SDP message. Added the required 4686 parameter "packetization-mode", which was missing. Another change 4687 is the change of a hypothetical value of "sprop-parameter-sets" to 4688 "", as readers may be confused by the hypothetical 4689 value. The same change regarding hypothetical values of "sprop- 4690 parameter-sets" was made to the SDP examples in subsection 8.3. 4692 32)In subsection 8.2.2, bullet item 2, changed the beginning sentence 4693 from 4695 The parameters "sprop-parameter-sets", "sprop-deint-buf-req", 4696 "sprop-interleaving-depth", "sprop-max-don-diff", and "sprop- 4697 init-buf-time" describe the properties of the NAL unit stream 4698 that the offerer or answerer is sending for this media format 4699 configuration. 4701 to 4703 The parameters "packetization-mode", and if present, "sprop- 4704 deint-buf-req", "sprop-parameter-sets", "sprop-interleaving- 4705 depth", "sprop-max-don-diff", and "sprop-init-buf-time", describe 4706 the properties of the RTP packet stream that the offerer or 4707 answerer is sending for this media format configuration. 4709 33)In subsection 8.2.2, bullet item 2, the informative note, the last 4710 sentence, corrected the typo "then" to "than". 4712 34)In subsection 8.2.2, changed the following bullet item for 4713 multicast 4715 o The stream properties parameters ("sprop-parameter-sets", 4716 "sprop-deint-buf-req", "sprop-interleaving-depth", "sprop- 4717 max-don-diff", and "sprop-init-buf-time") MUST NOT be changed 4718 by the answerer. Thus, a payload type can either be accepted 4719 unaltered or removed. 4721 to 4723 o The stream properties parameters ("sprop-parameter-sets", 4724 "sprop-interleaving-depth", "sprop-max-don-diff", and "sprop- 4725 init-buf-time") MUST NOT be changed by the answerer. Thus, a 4726 payload type can either be accepted unaltered or removed. 4728 35)In subsection 8.2.3, changed two uses of "NAL unit stream" to "RTP 4729 packet stream", and changed the sentence "Declaring actual 4730 configuration or properties:" to "Declaring actual configuration 4731 or stream properties:". 4733 36)In subsection 8.3, the first sentence, changed "A SIP Offer/Answer 4734 exchange" to "An SDP Offer/Answer exchange". 4736 37)In subsection 8.3, the first example, changed "Offerer -> Answer 4737 SDP message:" to "Offerer -> Answerer SDP message:". 4739 38)In subsection 8.3, in the paragraph describing the offer SDP in 4740 the first example, changed 4742 PT 98 represents single NALU mode, PT 99 non-interleaved mode; PT 4743 100 indicates the interleaved mode. 4745 to 4747 PT 98 represents single NALU mode, PT 99 represents non- 4748 interleaved mode, and PT 100 indicates the interleaved mode. 4750 And changed "that are required for the answerer" to "that are 4751 required by the answerer". 4753 And changed 4755 Note that the value for "sprop-parameter-sets", although 4756 identical in the example above, could be different for each 4757 payload type. 4759 to 4761 Note that the value for "sprop-parameter-sets" could be different 4762 for each payload type. 4764 39)In subsection 8.3, changed the two paragraphs describing the 4765 answer SDP in the first example from 4767 As the Offer/Answer negotiation covers both sending and receiving 4768 streams, an offer indicates the exact parameters for what the 4769 offerer is willing to receive, whereas the answer indicates the 4770 same for what the answerer accepts to receive. In this case the 4771 offerer declared that it is willing to receive payload type 98. 4772 The answerer accepts this by declaring a equivalent payload type 4773 97; i.e., it has identical values for the three parameters 4774 "profile-level-id", packetization-mode, and "sprop-deint-buf- 4775 req". This has the following implications for both the offerer 4776 and the answerer concerning the parameters that declare 4777 properties. The offerer initially declared a certain value of 4778 the "sprop-parameter-sets" in the payload definition for PT=98. 4779 However, as the answerer accepted this as PT=97, the values of 4780 "sprop-parameter-sets" in PT=98 must now be used instead when the 4781 offerer sends PT=97. Similarly, when the answerer sends PT=98 to 4782 the offerer, it has to use the properties parameters it declared 4783 in PT=97. 4785 The answerer also accepts the reception of the two configurations 4786 that payload types 99 and 100 represent. It provides the initial 4787 parameter sets for the answerer-to-offerer direction, and for 4788 buffering related parameters that it will use to send the payload 4789 types. It also provides the offerer with its memory limit for 4790 deinterleaving operations by providing a "deint-buf-cap" 4791 parameter. This is only useful if the offerer decides on making 4792 a second offer, where it can take the new value into account. 4793 The "max-rcmd-nalu-size" indicates that the answerer can 4794 efficiently process NALUs up to the size of 3980 bytes. However, 4795 there is no guarantee that the network supports this size. 4797 to 4799 As the Offer/Answer negotiation covers both sending and receiving 4800 streams, an offer indicates the exact parameters for what the 4801 offerer is willing to receive, whereas the answer indicates the 4802 same for what the answerer accepts to receive. In this case the 4803 offerer declared that it is willing to receive payload type 98. 4804 The answerer accepts this by declaring an equivalent payload type 4805 97; i.e., it has identical values for the two parameters 4806 "profile-level-id" and "packetization-mode" (since 4807 "packetization-mode" is equal to 0, "sprop-deint-buf-req" is not 4808 present). As the offered payload type 98 is accepted, the 4809 answerer needs to store parameter sets included in sprop- 4810 parameter-sets= in case the offer finally decides 4811 to use this configuration. In the answer, the answerer includes 4812 the parameter sets in sprop-parameter-sets= that 4813 the answerer would use in the stream sent from the answerer if 4814 this configuration is finally used. 4816 The answerer also accepts the reception of the two configurations 4817 that payload types 99 and 100 represent. Again, the answerer 4818 needs to store parameter sets included in sprop-parameter- 4819 sets= and sprop-parameter-sets= in 4820 case the offer finally decides to use either of these two 4821 configurations. The answerer provides the initial parameter sets 4822 for the answerer-to-offerer direction, i.e. the parameter sets in 4823 sprop-parameter-sets= and sprop-parameter- 4824 sets=, for payload types 99 and 100, respectively, 4825 that it will use to send the payload types. The answerer also 4826 provides the offerer with its memory limit for deinterleaving 4827 operations by providing a "deint-buf-cap" parameter. This is 4828 only useful if the offerer decides on making a second offer, 4829 where it can take the new value into account. The "max-rcmd- 4830 nalu-size" indicates that the answerer can efficiently process 4831 NALUs up to the size of 3980 bytes. However, there is no 4832 guarantee that the network supports this size. 4834 40)In the end of subsection 8.3, added four more SDP offer/answer 4835 examples describing level downgrade and using or not using out-of- 4836 band transmission of parameter sets. 4838 41)In subsection 8.4, changed two uses of "RTP stream" to "RTP packet 4839 stream". 4841 42)In subsection 8.4, changed the sentence 4843 It is RECOMMENDED that parameter set IDs be partitioned between 4844 the out-of-band and in-band parameter sets. 4846 to 4848 It is therefore RECOMMENDED that parameter set IDs be partitioned 4849 between the out-of-band and in-band parameter sets. 4851 43)In subsection 13.3, changed 4853 pictures are transmitted at constant intervals (that is, 1 / 4854 frame rate). 4856 to 4858 pictures are transmitted at constant intervals (that is, 1 / 4859 (frame rate)). 4861 18. Open issues 4863 The issues remaining open are: 4865 1) Add a brief summary of the changes to RFC 3984 in the beginning of 4866 section 1. 4868 2) The semantics of sar, e.g. the allowed values and its relationship 4869 with esar, as well its use in SDP offer/answer need to be 4870 clarified. 4872 3) Use of deint-buf-cap in SDP offer/answer for multicast is missing 4873 (and was missing in RFC 3984). 4875 4) Some SDP offer/answer examples using new SDP parameters to be 4876 added. 4878 5) Sections 8.4 and 8.5 need to be updated under the new context of 4879 using sprop-parameter-sets or sprop-level-parameter-sets, and 4880 possibly together with sprop-ssrc, for out-of-band transporting of 4881 parameter sets. 4883 6) (from Randell) References to RFC 2733 should be updated to (and 4884 checked against) RFC 5109. There are a lot of calculations and 4885 the like that should be checked. Also update [17] to RFC 5109. 4887 7) To complete the section on backward compatibility to RFC 3984. For 4888 example, operations backwards-compatible and non-backwards- 4889 compatible to RFC 3984 need to be identified and added. 4891 8) To update the changes from RFC 3984. 4893 19. Changes Log 4895 Technical changes compared to draft-wang-avt-rfc3984bis-01.txt 4897 - Changed that reserved NAL unit types MUST be ignored to SHOULD be 4898 ignored in section 5.4. 4900 - (Could be considered as an editorial change as well) In section 4901 8.1, updated the semantics of profile-level-id, especially related 4902 to constraint_set3_flag, signaling of level 1b, Constrained 4903 Baseline profile, and equivalent combinations of profile_idc and 4904 profile-iop. 4906 - In section 8.1, added 7 new optional media type parameters: max- 4907 smbps, sprop-level-parameter-sets, sar, esar, sprop-ssrc, and 4908 rfc3984-compatible. 4910 - In section 8.1, updated the semantics of sprop-parameter-sets. 4912 - In section 8.1, added back parameter-add which is in RFC 3984. And 4913 added that the use of parameter-add is deprecated. 4915 - In section 8.2.2, the usage of media type parameters with the SDP 4916 offer/answer model has been updated. For example, the parameter 4917 rfc3984-compatible is now part of the media format configuration. 4918 Level downgrade is better clarified. Use of sprop-parameter-sets 4919 or sprop-level-parameter-sets for out-of-band transporting of 4920 parameter sets is explained in more details. 4922 - In section 8.4, updated the text such that both out-of-band and 4923 in-band parameter sets transports are possible but neither is 4924 recommended. Clarified that in some scenarios in-band transport is 4925 desired. 4927 - Added section 8.5 (informative) discussing decoder refresh point 4928 procedure in case some parameter sets are lost in in-band 4929 transport. 4931 Editorial changes compared to draft-wang-avt-rfc3984bis-01.txt (the 4932 list is incomplete) 4934 - Added a definition of "static macroblock" in section 4.1. 4936 - Added abbreviations "SAR" and "VUI" in section 4.2. 4938 - Changed the informative reference [15] to normative reference [3]. 4940 - Changed "undefined" in all the tables to "reserved". 4942 - Changed "MIME" to "media type" throughout the document.