idnits 2.17.1 draft-ietf-payload-rtp-mvc-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 889 has weird spacing: '... (the medi...' -- The document date (March 14, 2011) is 4792 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.draft-ietf-avt-svc' is mentioned on line 1079, but not defined == Unused Reference: 'RFC3548' is defined on line 964, but no explicit reference was found in the text == Unused Reference: 'DVB-H' is defined on line 980, but no explicit reference was found in the text == Unused Reference: 'IGMP' is defined on line 986, but no explicit reference was found in the text == Unused Reference: 'McCanne' is defined on line 990, but no explicit reference was found in the text == Unused Reference: 'MBMS' is defined on line 994, but no explicit reference was found in the text == Unused Reference: 'MPEG2' is defined on line 998, but no explicit reference was found in the text == Unused Reference: 'RFC3450' is defined on line 1000, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG4-10' -- Possible downref: Non-RFC (?) normative reference: ref. 'MVC' ** Obsolete normative reference: RFC 3548 (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 3984 (Obsoleted by RFC 6184) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 3450 (Obsoleted by RFC 5775) Summary: 4 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport Payloads WG Y.-K. Wang 2 Internet Draft Huawei Technologies 3 Intended status: Standards track T. Schierl 4 Expires: September 2011 Fraunhofer HHI 5 March 14, 2011 7 RTP Payload Format for MVC Video 8 draft-ietf-payload-rtp-mvc-00.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with 13 the provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on September 14, 2011. 33 Copyright Notice 35 Copyright (c) 2011 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with 43 respect to this document. Code Components extracted from this 44 document must include Simplified BSD License text as described in 45 Section 4.e of the Trust Legal Provisions and are provided without 46 warranty as described in the BSD License. 48 Abstract 50 This memo describes an RTP payload format for the multiview 51 extension of the ITU-T Recommendation H.264 video codec that is 52 technically identical to ISO/IEC International Standard 14496-10. 53 The RTP payload format allows for packetization of one or more 54 Network Abstraction Layer (NAL) units, produced by the video encoder, 55 in each RTP payload. The payload format can be applied in RTP based 56 3D video transmissions such as such as 3D video streaming, free- 57 viewpoint video, and 3DTV. 59 Table of Contents 61 1. Introduction...................................................3 62 2. Conventions....................................................4 63 3. The MVC Codec..................................................4 64 3.1. Overview..................................................4 65 3.2. Parameter Set Concept.....................................5 66 3.3. Network Abstraction Layer Unit Header.....................5 67 4. Scope..........................................................8 68 5. Definitions and Abbreviations..................................8 69 5.1. Definitions...............................................8 70 5.1.1. Definitions per MVC specification....................8 71 5.1.2. Definitions local to this memo.......................9 72 5.1. Abbreviations.............................................9 73 6. MVC RTP Payload Format.........................................9 74 6.1. Design Principles.........................................9 75 6.2. RTP Header Usage.........................................10 76 6.3. Common Structure of the RTP Payload Format...............10 77 6.4. NAL Unit Header Usage....................................10 78 6.5. Packetization Modes......................................11 79 6.5.1. Packetization Modes for single-session transmission.12 80 6.5.2. Packetization Modes for multi-session transmission..12 81 6.6. Aggregation Packets......................................12 82 6.7. Fragmentation Units (FUs)................................12 83 6.8. Payload Content Scalability Information (PACSI) NAL Unit for 84 MVC...........................................................12 85 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)16 86 6.10. Cross-Session DON (CS-DON) for multi-session transmission16 87 7. Packetization Rules...........................................16 88 8. De-Packetization Process (Informative)........................18 89 9. Payload Format Parameters.....................................18 90 9.1. Media Type Registration..................................18 91 9.2. SDP Parameters...........................................20 92 9.2.1. Mapping of Payload Type Parameters to SDP...........20 93 9.2.2. Usage with the SDP Offer/Answer Model...............20 94 9.2.3. Usage with multi-session transmission...............20 95 9.2.4. Usage in Declarative Session Descriptions...........20 96 9.3. Examples.................................................20 97 9.4. Parameter Set Considerations.............................20 98 10. Security Considerations......................................21 99 11. Congestion Control...........................................21 100 12. IANA Considerations..........................................21 101 13. Acknowledgments..............................................21 102 14. References...................................................21 103 14.1. Normative References....................................21 104 14.2. Informative References..................................22 105 Author's Addresses...............................................22 106 15. Open issues:.................................................23 107 16. Changes Log..................................................23 109 1. Introduction 111 This memo specifies an RTP [RFC3550] payload format for a 112 forthcoming new mode of the H.264/AVC video coding standard, known 113 as Multiview Video Coding (MVC). Formally, MVC will take the form 114 of Amendment 4 to ISO/IEC 14496 Part 10 [MPEG4-10], and Annex H of 115 ITU-T Rec. H.264 [H.264]. The latest draft specification of MVC is 116 available in [MVC]. 118 MVC covers a wide range of 3D video applications, including 3D video 119 streaming, free-viewpoint video as well as 3DTV. 121 This memo follows a backward compatible enhancement philosophy, by 122 keeping as close an alignment to the H.264/AVC payload format 123 [RFC3984] as possible. It documents the enhancements relevant from 124 an RTP transport viewpoint, and defines signaling support for MVC, 125 including a new media subtype name. 127 Due to the similarity between MVC and SVC in system and transport 128 aspects, this memo reuses the design principles as well as many 129 features of the SVC RTP payload draft [I-D.draft-ietf-avt-svc]. 131 [Ed.Note(TS):Need text on session multiplexing and on the relation 132 of this draft to [I-D.draft-ietf-avt-svc] here.] 134 2. Conventions 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 138 document are to be interpreted as described in BCP 14, RFC 2119 139 [RFC2119]. 141 This specification uses the notion of setting and clearing a bit 142 when bit fields are handled. Setting a bit is the same as assigning 143 that bit the value of 1 (On). Clearing a bit is the same as 144 assigning that bit the value of 0 (Off). 146 3. The MVC Codec 148 3.1. Overview 150 MVC provides multi-view video bitstreams. An MVC bitstream contains 151 a base view conforming to at least one of the profiles of H.264/AVC 152 as defined in Annex A of [H.264], and one or more non-base views. 153 To enable high compression efficiency, coding of a non-base view can 154 utilize other views for inter-view prediction, thus its decoding 155 relies on the presence of the views it depends on. Each coded view 156 itself may be temporally scalable. Besides temporal scalability, 157 MVC also supports view scalability, wherein a subset of the encoded 158 views can be extracted, decoded and displayed, whenever it is 159 desired by the application. 161 The concept of video coding layer (VCL) and network abstraction 162 layer (NAL) is inherited from H.264/AVC. The VCL contains the 163 signal processing functionality of the codec; mechanisms such as 164 transform, quantization, motion-compensated prediction, loop 165 filtering and inter-layer prediction. The Network Abstraction Layer 166 (NAL) encapsulates each slice generated by the VCL into one or more 167 Network Abstraction Layer Units (NAL units). Please consult RFC 168 3984 for a more in-depth discussion of the NAL unit concept. MVC 169 specifies the decoding order of NAL units. 171 In MVC, one access unit contains all NAL units pertaining to one 172 output time instance for all the views. Within one access unit, the 173 coded representation of each view, also named as view component, 174 consists of one or more slices. 176 The concept of temporal scalability is not newly introduced by SVC 177 or MVC, as profiles defined in Annex A of [H.264] already support it. 178 In [H.264], sub-sequences have been introduced in order to allow 179 optional use of temporal layers. SVC extended this approach by 180 advertising the temporal scalability information within the NAL unit 181 header or prefix NAL units, both were inherited to MVC. 183 3.2. Parameter Set Concept 185 The parameter set concept was first specified in [H.264]. Please 186 refer to section 1.2 of [RFC3984] for more details. SVC introduced 187 some new parameter set mechanisms. MVC has inherited the parameter 188 set concept from [H.264]. 190 In particular, a different type of sequence parameter set (SPS), 191 which is referred to as subset SPS, using a different NAL unit type 192 than "the old SPS" specified in [H.264] is used for non-base views, 193 while the base view still uses "the old SPS". Slices from different 194 views would be able to use either 1) the same sequence or picture 195 parameter set, or 2) different sequence or picture parameter sets. 197 The inter-view dependency and the decoding order of all the encoded 198 views are indicated in a new syntax structure, the SPS MVC extension, 199 included in each subset SPS. 201 3.3. Network Abstraction Layer Unit Header 203 An MVC NAL unit of type 20 or 14 consists of a header of four octets 204 and the payload byte string. MVC NAL units of type 20 are coded 205 slices of non-base views. A special type of an MVC NAL unit is the 206 prefix NAL unit (type 14) that includes descriptive information of 207 the associated H.264/AVC VCL NAL unit (type 1 or 5) that immediately 208 follows the prefix NAL unit. 210 MVC extends the one-byte H.264/AVC NAL unit header by three 211 additional octets. The header indicates the type of the NAL unit, 212 the (potential) presence of bit errors or syntax violations in the 213 NAL unit payload, information regarding the relative importance of 214 the NAL unit for the decoding process, the view identification 215 information, the temporal layer identification information, and 216 other fields as discussed below. 218 The syntax and semantics of the NAL unit header are formally 219 specified in [MVC], but the essential properties of the NAL unit 220 header are summarized below. 222 The first byte of the NAL unit header has the following format (the 223 bit fields are the same as defined for the one-byte H.264/AVC NAL 224 unit header, while the semantics of some fields have changed 225 slightly, in a backward compatible way): 227 +---------------+ 228 |0|1|2|3|4|5|6|7| 229 +-+-+-+-+-+-+-+-+ 230 |F|NRI| Type | 231 +---------------+ 233 F: 1 bit 235 forbidden_zero_bit. H.264/AVC declares a value of 1 as a syntax 236 violation. 238 NRI: 2 bits 240 nal_ref_idc. A value of 00 indicates that the content of the NAL 241 unit is not used to reconstruct reference pictures for future 242 prediction. Such NAL units can be discarded without risking the 243 integrity of the reference pictures in the same view. A value 244 higher than 00 indicates that the decoding of the NAL unit is 245 required to maintain the integrity of reference pictures in the same 246 view, or that the NAL unit contains parameter sets. 248 Type: 5 bits 250 nal_unit_type. This component specifies the NAL unit type. 252 In H.264/AVC, NAL unit types 14 and 20 are reserved for future 253 extensions. MVC uses these two NAL unit types. NAL unit type 14 is 254 used for prefix NAL unit, and NAL unit type 20 is used for coded 255 slice of non-base view. NAL unit types 14 and 20 indicate the 256 presence of three additional octets in the NAL unit header, as shown 257 below. 259 +---------------+---------------+---------------+ 260 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 262 |S|I| PRID | VID | TID |A|V|O| 263 +---------------+---------------+---------------+ 265 S: 1 bit 267 svc_extention_flag. MUST be equal to 0 in MVC context. In the 268 context of Scalable Video Coding (SVC), the flag must be equal to 1. 270 I: 1 bit 271 non_idr_flag. This component specifies whether the access unit the 272 NAL unit belongs to is an IDR access unit (when equal to 0) or not 273 (when equal to 1), as specified in [MVC]. 275 PRID: 6 bits 277 priority_id. This flag specifies a priority identifier for the NAL 278 unit. A lower value of PRID indicates a higher priority. 280 VID: 10 bits 282 view_id. This component specifies the view identifier of the view 283 the NAL unit belongs to. 285 TID: 3 bits 287 temporal_id. This component specifies the temporal layer (or frame 288 rate) hierarchy. Informally put, a temporal layer consisting of 289 view component with a less temporal_id corresponds to a lower frame 290 rate. A given temporal layer typically depends on the lower 291 temporal layers (i.e. the temporal layers with less temporal_id 292 values) but never depends on any higher temporal layer (i.e. a 293 temporal layers with higher temporal_id value). 295 A: 1 bit 297 anchor_pic_flag. This component specifies whether the access unit 298 the NAL unit belongs to is an anchor access unit (when equal to 1) 299 or not (when equal to 0), as specified in [MVC]. 301 V: 1 bit 303 inter_view_flag. This component specifies whether the view 304 component is used for inter-view prediction (when equal to 1) or not 305 (when equal to 0). 307 O: 1 bit 309 reserved_one_bit. Reserved bit for future extension. R shall be 310 equal to 1. Receivers SHOULD ignore the value of 311 reserved_zero_one_bit. 313 This memo reuses the same additional NAL unit types introduced in 314 RFC 3984, which are presented in section 6.3. In addition, this 315 memo introduces one more NAL unit type, 30, as specified in section 316 6.8. These NAL unit types are marked as unspecified in [MVC] and 317 intentionally reserved for use in systems specifications like this 318 memo. Moreover, this specification extends the semantics of F, NRI, 319 PRID, TID, A, and I as described in section 6.4. 321 4. Scope 323 This payload specification can only be used to carry the "naked" NAL 324 unit stream over RTP, and not the byte stream format according to 325 Annex B of [MVC]. Likely, the applications of this specification 326 will be in the IP based multimedia communications fields including 327 3D video streaming over IP, free-viewpoint video over IP, and 3DTV 328 over IP. 330 This specification allows, in a given RTP packet stream, to 331 encapsulate NAL units belonging to 333 o the base view only, detailed specification in [RFC3984], or 335 o one or more non-base views, or 337 o the base view and one or non-base views 339 [Ed.Note(YkW): To be extended to allow separate carriage of 340 different temporal layers in different RTP packet streams as in 341 [I-D.draft-ietf-avt-svc].] 343 5. Definitions and Abbreviations 345 5.1. Definitions 347 5.1.1. Definitions per MVC specification 349 This document uses the definitions of [MVC]. The following terms, 350 defined in [MVC], are summed up for convenience: 352 access unit: A set of NAL units always containing exactly one 353 primary coded picture with one or more view components. In addition 354 to the primary coded picture, an access unit may also contain one or 355 more redundant coded pictures, one auxiliary coded picture, or other 356 NAL units not containing slices or slice data partitions of a coded 357 picture. The decoding of an access unit always results in one 358 decoded picture. All slices or slice data partitions in an access 359 unit have the same value of picture order count. 361 prefix NAL unit: A NAL unit with nal_unit_type equal to 14 that 362 immediately precedes a NAL unit with nal_unit_type equal to 1, 5, 363 or 12. The NAL unit that succeeds the prefix NAL unit is also 364 referred to as the associated NAL unit. The prefix NAL unit 365 contains data associated with the associated NAL unit, which are 366 considered to be part of the associated NAL unit. 368 5.1.2. Definitions local to this memo 370 MVC NAL unit: A NAL unit of NAL unit type 14 or 20 as specified in 371 Annex H of [MVC]. An MVC NAL unit has a four-byte NAL unit header. 373 operation point: An operation point of an MVC bitstream represents 374 a certain level of temporal and view scalability. An operation 375 point contains only those NAL units required for a valid bitstream 376 to represent a certain subset of views at a certain temporal level. 377 An operation point is described by the view_id values of the subset 378 of views, and the highest temporal_id. 380 multi-session transmission: The transmission mode in which the MVC 381 bitstream is transmitted over multiple RTP sessions, with each 382 stream having the same SSRC. These multiple RTP streams can be 383 associated using the RTCP CNAME, or explicit signalling of the SSRC 384 used. Dependency between RTP sessions MUST be signaled according to 385 [RFC5583] and this memo. 387 single-session transmission: The transmission mode in which the MVC 388 bitstream is transmitted over a single RTP session, with a single 389 SSRC and separate timestamp and sequence number spaces. 391 [Ed.Note(TS):Need more definitions here.] 393 5.1. Abbreviations 395 In addition to the abbreviations defined in [RFC3984], the following 396 ones are defined. 398 MVC: Multiview Video Coding 399 CS-DON: Cross-Session Decoding Order Number 400 MST: multi-session transmission 401 PACSI: Payload Content Scalability Information 402 SST: single-session transmission 404 6. MVC RTP Payload Format 406 6.1. Design Principles 408 The following design principles have been observed: 410 o Backward compatibility with [RFC3984] wherever possible. 412 o As the MVC base view is H.264/AVC compatible, the base view or any 413 H.264/AVC compatible subset of it, when transmitted in its own RTP 414 packet stream, MUST be encapsulated using [RFC3984]. Requiring this 415 has the desirable side effect that the transmitted data can be 416 received by [RFC3984] receivers and decoded by H.264/AVC decoders. 418 o Media-Aware Network Elements (MANEs) as defined in [RFC3984] are 419 signaling aware and rely on signaling information. MANEs have state. 421 o MANEs can aggregate multiple RTP streams, possibly from multiple 422 RTP sessions. 424 o MANEs can perform media-aware stream thinning. By using the 425 payload header information identifying Layers within an RTP session, 426 MANEs are able to remove packets from the incoming RTP packet stream. 427 This implies rewriting the RTP headers of the outgoing packet stream 428 and rewriting of RTCP Receiver Reports. 430 6.2. RTP Header Usage 432 Please see section 5.1 of [RFC3984]. 434 6.3. Common Structure of the RTP Payload Format 436 Please see section 5.2 of [RFC3984]. 438 6.4. NAL Unit Header Usage 440 The structure and semantics of the NAL unit header were introduced 441 in section 3.3. This section specifies the semantics of F, NRI, 442 PRID, TID, A and I according to this specification. 444 Note that, in the context of this section, "protecting a NAL unit" 445 means any RTP or network transport mechanism that could improve the 446 probability of success delivery of the packet conveying the NAL unit, 447 including applying a QoS-enabled network, forward error correction 448 (FEC), retransmissions, and advanced scheduling behavior, whenever 449 possible. 451 The semantics of F specified in section 5.3 of [RFC3984] also 452 applies herein. 454 For NRI, for a bitstream conforming to one of the profiles defined 455 in Annex A of [H.264] and transported using [RFC3984], the semantics 456 specified in section 5.3 of [RFC3984] are applicable, i.e., NRI also 457 indicates the relative importance of NAL units. In MVC context, in 458 addition to the semantics specified in Annex H of [MVC] are 459 applicable, NRI also indicate the relative importance of NAL units 460 within a view. MANEs MAY use this information to protect more 461 important NAL units better than less important NAL units. 462 [Ed.Note(YkW): "MVC context" to be clearly specified.] 464 For PRID, the semantics specified in Annex H of [MVC] applies. Note 465 that MANEs implementing unequal error protection MAY use this 466 information to protect NAL units with smaller PRID values better 467 than those with larger PRID values, for example by including only 468 the more important NAL units in a forward error correction (FEC) 469 protection mechanism. The importance for the decoding process 470 decreases as the PRID value increases. 472 For TID, in addition to the semantics specified in Annex H of [MVC], 473 according to this memo, values of TID indicate the relative 474 importance. A lower value of TID indicates a higher importance for 475 NAL units within a view. MANEs MAY use this information to protect 476 more important NAL units better than less important NAL units. 478 For A, in addition to the semantics specified in Annex H of [MVC], 479 according to this memo, MANEs MAY use this information to protect 480 NAL units with A equal to 1 better than NAL units with A equal to 0. 481 MANEs MAY also utilize information of NAL units with A equal to 1 to 482 decide when to forward more packets for an RTP packet stream. For 483 example, when it is sensed that view switching has happened such 484 that the operation point has changed, MANEs MAY start to forward NAL 485 units for a new target view only after forwarding a NAL unit with A 486 equal to 1 for the new target view. 488 For I, in addition to the semantics specified in Annex H of [MVC], 489 according to this memo, MANEs MAY use this information to protect 490 NAL units with I equal to 1 better than NAL units with I equal to 0. 491 MANEs MAY also utilize information of NAL units with I equal to 1 to 492 decide when to forward more packets for an RTP packet stream. For 493 example, when it is sensed that view switching has happened such 494 that the operation point has changed, MANEs MAY start to forward NAL 495 units for a new target view only after forwarding a NAL unit with I 496 equal to 1 for the new target view. 498 6.5. Packetization Modes 500 [Ed.Note(TS): Need to add text from [I-D.draft-ietf-avt-rtp-svc] to 501 this section with respect to MVC.] 503 6.5.1. Packetization Modes for single-session transmission 505 This section will address the issues of section 4.5.1 and 5.1 of [I- 506 D.draft-ietf-avt-rtp-svc]. 508 6.5.2. Packetization Modes for multi-session transmission 510 This section will address the issues of section 4.5.2 and 5.2 of [I- 511 D.draft-ietf-avt-rtp-svc]. 513 6.6. Aggregation Packets 515 This section will address the issues of section 4.7 of [I-D.draft- 516 ietf-avt-rtp-svc]. 518 6.7. Fragmentation Units (FUs) 520 This section will address the issues of section 4.8 of [I-D.draft- 521 ietf-avt-rtp-svc]. 523 6.8. Payload Content Scalability Information (PACSI) NAL Unit for MVC 525 A new NAL unit type is specified in this memo, and referred to as 526 payload content scalability information (PACSI) NAL unit. The PACSI 527 NAL unit, if present, MUST be the first NAL unit in an aggregation 528 packet, and it MUST NOT be present in other types of packets. The 529 PACSI NAL unit indicates view and temporal scalability information 530 and other characteristics that are common for all the remaining NAL 531 units in the payload of the aggregation packet. Furthermore, a PACSI 532 NAL unit MAY include a DONC field and contain zero or more SEI NAL 533 units. PACSI NAL unit makes it easier for MANEs to decide whether 534 to forward/process/discard the aggregation packet containing the 535 PACSI NAL unit. Senders MAY create PACSI NAL units and receivers 536 MAY ignore them, or use them as hints to enable efficient 537 aggregation packet processing. Note that the NAL unit type for the 538 PACSI NAL unit is selected among those values that are unspecified 539 in [MVC] and [RFC3984]. 541 When the first aggregation unit of an aggregation packet contains a 542 PACSI NAL unit, there MUST be at least one additional aggregation 543 unit present in the same packet. The RTP header and payload header 544 fields of the aggregation packet are set according to the remaining 545 NAL units in the aggregation packet. 547 When a PACSI NAL unit is included in a multi-time aggregation packet 548 (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST 549 be set to indicate that the PACSI NAL unit has an identical DON to 550 the first NAL unit in decoding order among the remaining NAL units 551 in the aggregation packet. 553 The structure of a PACSI NAL unit is as follows. The first four 554 octets are exactly the same as the four-byte MVC NAL unit header as 555 discussed in section 3.3. They are followed by two always present 556 octet, two optional octets, and zero or more SEI NAL units, each SEI 557 NAL unit preceded by a 16-bit unsigned size field (in network byte 558 order) that indicates the size of the following NAL unit in bytes 559 (excluding these two octets, but including the NAL unit type octet 560 of the SEI NAL unit). Figure 1 illustrates the PACSI NAL unit 561 structure and an example of a PACSI NAL unit containing two SEI NAL 562 units. 564 The bits P, C, S, and E are specified only if the bit X is equal to 565 1. The T bit MUST NOT be equal to 1 if the aggregation packet 566 containing the PACSI NAL unit is not an STAP-A packet. The T bit 567 MAY be equal to 1 if the aggregation packet containing the PACSI NAL 568 unit is an STAP-A packet. The field DONC MUST NOT be present if the 569 T bit is equal to 0, and MUST be present if the T bit is equal to 1. 571 0 1 2 3 572 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 |F|NRI| Type |S| PRID | TID |A| VID |I|V|R| 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 |X|T|RR |P|C|S|E| RRR | DONC (optional) | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | NAL unit size 1 | | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 1 | 580 | | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 582 | NAL unit size 2 | | 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 2 | 584 | | 585 | +-+-+-+-+-+-+-+-+ 586 | | 587 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 Figure 1. PACSI NAL unit structure 591 The values of the fields in PACSI NAL unit MUST be set as follows. 592 The term "target NAL units" are used in the semantics of some fields. 594 The target NAL units are such NAL units contained in the aggregation 595 packet, but not included in the PACSI NAL unit, that are within the 596 access unit to which the first NAL unit following the PACSI NAL unit 597 in the aggregation packet belongs. 599 o The F bit MUST be set to 1 if the F bit in at least one of the 600 remaining NAL units in the aggregation packet is equal to 1. 601 Otherwise, the F bit MUST be set to 0. 603 o The NRI field MUST be set to the highest value of NRI field among 604 all the remaining NAL units in the aggregation packet. 606 o The Type field MUST be set to 30. 608 o The S bit MUST be set to 1. 610 o The PRID field MUST be set to the lowest value of the PRID values 611 of all the remaining NAL units in the aggregation packet. 613 o The TID field MUST be set to the lowest value of the TID values of 614 all the remaining NAL units with the lowest value of VID in the 615 aggregation packet. 617 o The A bit MUST be set to 1 if the A bit of at least one of the 618 remaining NAL units in the aggregation packet is equal to 1. 619 Otherwise, the A bit MUST be set to 0. 621 o The VID field MUST be set to the lowest value of the VID values of 622 all the remaining NAL units in the aggregation packet. 624 o The I bit MUST be set to 1 if the I bit of at least one of the 625 remaining NAL units in the aggregation packet is equal to 1. 626 Otherwise, the I bit MUST be set to 0. 628 o The V bit MUST be set to 1 if the V bit of at least one of the 629 remaining NAL units in the aggregation packet is equal to 1. 630 Otherwise, the A bit MUST be set to 0. 632 o The R bit MUST be set to 0. Receivers SHOULD ignore the value of 633 R. 635 o If the X bit is equal to 1, the bits P, C, S, and E are specified 636 as below. Otherwise, the bits P, C, S, and E are unspecified, and 637 receivers MUST ignore these bits. The X bit SHOULD be identical for 638 all the PACSI NAL units involved in all the RTP sessions conveying 639 an MVC bitstream. 641 o The RR field MUST be set to '00' (in binary form). Receivers 642 SHOULD ignore the value of RR. 644 o If the T bit is equal to 1, the OPTIONAL field DONC MUST be 645 present and specified as below. Otherwise, the field DONC MUST NOT 646 be present. 648 o The P bit MUST be set to 1 if all the remaining NAL units in the 649 aggregation packet are with redundant_pic_cnt higher than 0, i.e. 650 the slices are redundant slices. Otherwise, the P bit MUST be set 651 to 0. 653 Informative note: The P bit indicates whether the packet can be 654 discarded because it contains only redundant slice NAL units. 655 Without this bit, the corresponding information can be concluded 656 from the syntax element redundant_pic_cnt, which is buried in the 657 variable-length coded slice header. 659 o The C bit MUST be set to 1 if the target NAL units belong to an 660 access unit for which the view components are intra coded. 661 Otherwise, the C bit MUST be set to 0. The C bit SHOULD be 662 identical for all the PACSI NAL units for which the target NAL units 663 belong to the same access unit. 665 Informative note: The C bit indicates whether the packet contains 666 intra slices which may be the only packets to be forwarded for a 667 fast forward playback, e.g. when the network condition is 668 extremely bad. 670 o The S bit MUST be set to 1, if the first VCL NAL unit, in 671 transmission order, of the view component containing the first NAL 672 unit following the PACSI NAL unit in the aggregation packet is 673 present in the aggregation packet. Otherwise, the S bit MUST be set 674 to 0. 676 o The E bit MUST be set to 1, if the last VCL NAL unit, in 677 transmission order, of the view component containing the first NAL 678 unit following the PACSI NAL unit in the aggregation packet is 679 present in the aggregation packet. Otherwise, the E field MUST be 680 set to 0. 682 Informative note: The S or E bit indicates whether the first or 683 last slice, in transmission order, of a view component is in the 684 packet, to enable a MANE to detect slice loss and take proper 685 action such as requesting a retransmission as soon as possible, 686 as well as to allow an efficient playout buffer handling 687 similarly as the M bit in the RTP header. The M bit in the RTP 688 header still indicates the end of an access unit, not the end of 689 a view component. 691 o The RRR field MUST be set to '00000000'(in binary form). 692 Receivers SHOULD ignore the value of RRR. 694 o When present, the field DONC indicates the CL-DON value for the 695 first NAL unit in the STAP-A in transmission order. 697 SEI NAL units included in the PACSI NAL unit, if any, MUST contain a 698 subset of the SEI messages associated with the access unit of the 699 first NAL unit following the PACSI NAL unit within the aggregation 700 packet. 702 Informative note: Senders may repeat such SEI NAL units in the 703 PACSI NAL unit the presence of which in more than one packet is 704 essential for packet loss robustness. Receivers may use the 705 repeated SEI messages in place of missing SEI messages. 707 An SEI message SHOULD NOT be included in a PACSI NAL unit and 708 included in one of the remaining NAL units contained in the same 709 aggregation packet. 711 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs) 713 This section will address the issues of section 4.7.1 of [I-D.draft- 714 ietf-avt-rtp-svc]. 716 6.10. Cross-Session DON (CS-DON) for multi-session transmission 718 This section will address the issues of section 4.11 of [I-D.draft- 719 ietf-avt-rtp-svc]. 721 7. Packetization Rules 723 [Ed.Note(TS): We need to adjust this section with respect to [I- 724 D.draft-ietf-avt-rtp-svc].] 726 Section 6 of [RFC3984] applies. The following rules apply in 727 addition. 729 All receivers MUST support the single NAL unit packetization mode to 730 provide backward compatibility to endpoints supporting only the 731 single NAL unit mode of RFC 3984. However, the single NAL unit 732 packetization mode SHOULD NOT be used whenever possible, because 733 encapsulating NAL units of small sizes, e.g. small NAL units 734 containing parameter sets, SEI messages or prefix NAL units, in 735 their own packets is typically less efficient because of the 736 relatively big overhead. 738 All receivers MUST support the non-interleaved packetization mode. 740 Informative note: The non-interleaved mode allows an application 741 to encapsulate a single NAL unit in a single RTP packet. 742 Historically, the single NAL unit mode has been included into 743 [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A 744 [H.241]. There is no point in carrying this historic ballast 745 towards a new application space such as the one provided with MVC. 746 More technically speaking, the implementation complexity increase 747 for providing the additional mechanisms of the non-interleaved 748 mode (namely STAP-A and FU-A) is minor, and the benefits are 749 great, that STAP-A implementation is required. 751 A NAL unit of small size SHOULD be encapsulated in an aggregation 752 packet together with one or more other NAL units. For example, non- 753 VCL NAL units such as access unit delimiter, parameter set, or SEI 754 NAL unit are typically small. 756 A prefix NAL unit SHOULD be aggregated to the same packet as the 757 associated NAL unit following the prefix NAL unit in decoding order. 759 When the first aggregation unit of an aggregation packet contains a 760 PACSI NAL unit, there MUST be at least one additional aggregation 761 unit present in the same packet. 763 When an MVC bitstream is transported in more than one RTP session, 764 the following applies. 766 o Interleaved mode SHOULD be used for all the RTP sessions. 768 o An RTP session that does not use interleaved mode SHOULD be 769 constrained as follows. 771 - Non-interleaved mode MUST be used. 773 - STAP-A MUST be used, and any other type of packets MUST NOT be 774 used. 776 - Each STAP-A MUST contain a PACSI NAL unit and the DONC field 777 MUST be present in the PACSI NAL unit. 779 Informative note: The motivation for these constraints is to 780 allow the use of non-interleaved mode for the session conveying 781 the H.264/AVC compatible view, such that RFC 3984 receivers 782 without interleaved mode implementation can subscribe to the base 783 view session. 785 Non-VCL NAL units SHOULD be conveyed in the same session as the 786 associated VCL NAL units. To meet this, SEI messages that are 787 contained in scalable nesting SEI message and are applicable to more 788 than one session SHOULD be separated and contained into multiple 789 scalable nesting SEI messages. The DON values MUST indicate the 790 cross-layer decoding order number values as if all these SEI 791 messages were in separate scalable nesting SEI messages and 792 contained in the beginning of the corresponding access units as 793 specified in [MVC]. 795 8. De-Packetization Process (Informative) 797 For a single RTP session, the de-packetization process specified in 798 section 7 of [RFC3984] applies. 800 For receiving more than one of multiple RTP sessions conveying a 801 scalable bitstream, an example of a suitable implementation of the 802 de-packetization process is to be specified similarly as what will 803 be finally included in [I-D.draft-ietf-avt-svc]. 805 9. Payload Format Parameters 807 This section specifies the parameters that MAY be used to select 808 optional features of the payload format and certain features of the 809 bitstream. The parameters are specified here as part of the media 810 type registration for the MVC codec. A mapping of the parameters 811 into the Session Description Protocol (SDP) [RFC4566] is also 812 provided for applications that use SDP. Equivalent parameters could 813 be defined elsewhere for use with control protocols that do not use 814 SDP. 816 9.1. Media Type Registration 818 The media subtype for the MVC codec is allocated from the IETF tree. 820 The receiver MUST ignore any unspecified parameter. 822 Informative note: Requiring ignoring unspecified parameter allows 823 for backward compatibility of future extensions. For example, if 824 a future specification that is backward compatible to this 825 specification specifies some new parameters, then a receiver 826 according to this specification is capable of receiving data per 827 the new payload but ignoring those parameters newly specified in 828 the new payload specification. This sentence is also present in 829 RFC 3984. 831 Media Type name: video 833 Media subtype name: H264-MVC 835 The media subtype "H264" MUST be used for RTP streams using RFC 3984, 836 i.e. not using any of the new features introduced by this 837 specification compared to RFC 3984. For RTP streams using any of 838 the new features introduced by this specification compared to RFC 839 3984, the media subtype "H264-MVC" SHOULD be used, and the media 840 subtype "H264" MAY be used. Use of the media subtype "H264" for RTP 841 streams using the new features allows for RFC 3984 receivers to 842 negotiate and receive H.264/AVC or MVC streams packetized according 843 to this specification, but to ignore media parameters and NAL unit 844 types it does not recognize. 846 Required parameters: none 848 OPTIONAL parameters: to be specified. 850 Encoding considerations: 852 This type is only defined for transfer via RTP (RFC 3550). 854 Security considerations: 856 See section 10 of RFC XXXX. 858 Public specification: 860 Please refer to RFC XXXX and its section 14. 862 Additional information: none 864 File extensions: none 866 Macintosh file type code: none 868 Object identifier or OID: none 870 Person & email address to contact for further information: 872 Intended usage: COMMON 874 Author: NN 875 Change controller: 877 IETF Audio/Video Transport working group delegated from the IESG. 879 9.2. SDP Parameters 881 9.2.1. Mapping of Payload Type Parameters to SDP 883 The media type video/H264-MVC string is mapped to fields in the 884 Session Description Protocol (SDP) as follows: 886 The media name in the "m=" line of SDP MUST be video. 888 The encoding name in the "a=rtpmap" line of SDP MUST be H264-MVC 889 (the media subtype). 891 The clock rate in the "a=rtpmap" line MUST be 90000. 893 The OPTIONAL parameters, when present, MUST be included in the 894 "a=fmtp" line of SDP. These parameters are expressed as a media 895 type string, in the form of a semicolon separated list of 896 parameter=value pairs. 898 9.2.2. Usage with the SDP Offer/Answer Model 900 TBD. 902 9.2.3. Usage with multi-session transmission 904 If multi-session transmission is used, the rules on signaling media 905 decoding dependency in SDP as defined in 906 [RFC5583] apply. 908 9.2.4. Usage in Declarative Session Descriptions 910 TBD. 912 9.3. Examples 914 TBD. 916 9.4. Parameter Set Considerations 918 Please see section 10 of [RFC3984]. 920 10. Security Considerations 922 Please see section 11 of [RFC3984]. 924 11. Congestion Control 926 TBD. 928 12. IANA Considerations 930 Request for media type registration to be added. 932 13. Acknowledgments 934 The work of Thomas Schierl has been supported by the European 935 Commission under contract number FP7-ICT-248036, project COAST. 937 This document was prepared using 2-Word-v2.0.template.dot. 939 14. References 941 14.1. Normative References 943 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 944 generic audiovisual services", March 2010. 946 [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Wang, Y. -K., Schierl, T. 947 and A. Eleftheriadis, "RTP payload format for SVC video", 948 draft-ietf-avt-rtp-svc-27 (work in progress), Feburary 949 2011. 951 [RFC5583] Schierl, T., and Wenger, S., "Signaling media decoding 952 dependency in the Session Description Protocol (SDP)", RFC 953 5583, July 2009. 955 [MPEG4-10] 956 ISO/IEC International Standard 14496-10:2005. 958 [MVC] Annex H of ITU-T Recommendation H.264, "Advanced video 959 coding for generic audiovisual services", March 2010. 961 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 962 Requirement Levels", BCP 14, RFC 2119, March 1997. 964 [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data 965 Encodings", RFC 3548, July 2003. 967 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 968 V., "RTP: A Transport Protocol for Real-Time Applications", 969 STD 64, RFC 3550, July 2003. 971 [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, 972 M., and Singer, D., "RTP Payload Format for H.264 Video", 973 RFC 3984, February 2005. 975 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 976 Description Protocol", RFC 4566, July 2006. 978 14.2. Informative References 980 [DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H 981 Implementation Guidelines, ETSI TR 102 377, 2005. 983 [H.241] ITU-T Rec. H.241, "Extended video procedures and control 984 signals for H.300-series terminals", May 2006. 986 [IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B., and 987 Thyagarajan, A., "Internet Group Management Protocol, 988 Version 3", RFC 3376, October 2002. 990 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 991 driven layered multicast", in Proc. of ACM SIGCOMM'96, 992 pages 117--130, Stanford, CA, August 1996. 994 [MBMS] 3GPP - Technical Specification Group Services and System 995 Aspects; Multimedia Broadcast/Multicast Service (MBMS); 996 Protocols and codecs (Release 6), December 2005. 998 [MPEG2] ISO/IEC International Standard 13818-2:1993. 1000 [RFC3450] Luby, M., Gemmell, J., Vicisano, L., Rizzo, L., and 1001 Crowcroft, J., "Asynchronous layered coding (ALC) protocol 1002 instantiation", RFC 3450, December 2002. 1004 Author's Addresses 1006 Ye-Kui Wang 1007 Huawei Technologies 1008 400 Crossing Blvd, 2nd Floor 1009 Bridgewater, NJ 08807 1010 USA 1011 Phone: +1-908-541-3518 1012 EMail: yekui.wang@huawei.com 1013 Thomas Schierl 1014 Fraunhofer HHI 1015 Einsteinufer 37 1016 D-10587 Berlin 1017 Germany 1018 Phone: +49-30-31002-227 1019 EMail: ts@thomas-schierl.de 1021 15. Open issues: 1023 - The use of CL-DON for session reordering allows also for 1024 interleaved transmission with non-interleaved packetization mode. 1025 There should be a clear separation between both tools. This issue 1026 should be handled the same way as for the SVC payload draft. 1028 - Since SVC session multiplexing (multi source transmission(MST)) is 1029 cleared, it would be great to just reference the MST sections in 1030 [I-D.draft-ietf-avt-rtp-svc]. Since the text in sections 6 and 7 1031 of [I-D.draft-ietf-avt-rtp-svc] is currently very SVC specific, 1032 the authors would have to try to rewrite these sections in a more 1033 generic way. If this is not possible, we need to copy text from 1034 [I-D.draft-ietf-avt-rtp-svc] with respect to MVC. 1036 16. Changes Log 1038 Initial version 00 1040 10 November 2007: YkW 1041 Initial version 1043 12 November 2007: TS 1044 - Added definition of "Session multiplexing" 1045 - Added the reference of [I-D.draft-ietf-mmusic-decoding- 1046 dependency], and its reference in section 9.2.3 1048 12 November 2007: YkW 1049 - Added the reference of [I-D.draft-ietf-avt-svc] and its 1050 reference in section 1. 1051 - Added in sections 3.1 and 3.2 paragraphs regarding inter- 1052 view prediction 1054 From draft-wang-avt-rtp-mvc-00 to draft-wang-avt-rtp-mvc-01 1055 18 February 2008: YkW 1056 - Alignment to the latest MVC draft in JVT-Z209 and version 07 1057 of [I-D.draft-ietf-avt-svc]. 1059 25 February 2008: TS 1061 - Minor modifications and updates throughout the document 1063 - Added open issue on clear separation between "decoding order 1064 recovery" and "interleaving" 1066 From draft-wang-avt-rtp-mvc-01 to draft-wang-avt-rtp-mvc-02 1068 09 July 2008: TS 1070 - Minor modifications and updates throughout the document 1072 - Added open issue 1074 - NAL unit header alignment with MVC spec 1076 - Section 6. References corresponding sections in [RFC3984] and [I- 1077 D.draft-ietf-avt-svc]. 1079 - TBD: Section 7, we may align [I-D.draft-ietf-avt-svc] in a way 1080 that SVC is not mentioned in this paragraphs, so that we can 1081 reference them from this document. 1083 21 August 2008: 1085 - Minor modifications, editing and adding notes throughout the 1086 document. 1088 - Updated references 1090 From draft-wang-avt-rtp-mvc-02 to draft-wang-avt-rtp-mvc-03 1092 04 February 2009: YkW 1094 - Updated author's address. 1096 04 February 2009: YkW 1098 - Updated the boiler template. 1100 From draft-wang-avt-rtp-mvc-03 to draft-wang-avt-rtp-mvc-04 1101 22 October 2009: YkW 1103 - Updated author's address and the boiler template (added the last 1104 sentence in Copyright Notice). 1106 From draft-wang-avt-rtp-mvc-04 to draft-wang-avt-rtp-mvc-05 1108 22 April 2010: YkW 1110 - To keep the draft alive, no change other than version number etc. 1112 From draft-wang-avt-rtp-mvc-05 to draft-ietf-avt-rtp-mvc-00 1114 28 April 2010: YkW 1116 - No change other than version number etc. 1118 From draft-ietf-avt-rtp-mvc-00 to draft-ietf-avt-rtp-mvc-01 1120 8/9 October 2010: 1122 - YkW: Updated the NAL unit header syntax and semantics in section 1123 3.3 per the latest MVC specification. 1125 - TS: Minor edits 1127 From draft-ietf-avt-rtp-mvc-01 to draft-ietf-payload-rtp-mvc-00 1129 14 March 2011: YkW 1131 - Minor changes such as updates of some references the work group 1132 name from AVT to AVT Payload, etc.