idnits 2.17.1 draft-ietf-avt-rtp-mvc-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 9, 2010) is 4941 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.draft-ietf-avt-svc' is mentioned on line 1073, but not defined == Unused Reference: 'RFC3548' is defined on line 956, but no explicit reference was found in the text == Unused Reference: 'DVB-H' is defined on line 972, but no explicit reference was found in the text == Unused Reference: 'IGMP' is defined on line 978, but no explicit reference was found in the text == Unused Reference: 'McCanne' is defined on line 982, but no explicit reference was found in the text == Unused Reference: 'MBMS' is defined on line 986, but no explicit reference was found in the text == Unused Reference: 'MPEG2' is defined on line 990, but no explicit reference was found in the text == Unused Reference: 'RFC3450' is defined on line 992, but no explicit reference was found in the text == Outdated reference: A later version (-27) exists of draft-ietf-avt-rtp-svc-23 -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG4-10' -- Possible downref: Non-RFC (?) normative reference: ref. 'MVC' ** Obsolete normative reference: RFC 3548 (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 3984 (Obsoleted by RFC 6184) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 3450 (Obsoleted by RFC 5775) Summary: 5 errors (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG Y.-K. Wang 2 Internet Draft Huawei Technologies 3 Intended status: Standards track T. Schierl 4 Expires: April 2011 Fraunhofer HHI 5 October 9, 2010 7 RTP Payload Format for MVC Video 8 draft-ietf-avt-rtp-mvc-01.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on April 9, 2009. 32 Copyright Notice 34 Copyright (c) 2010 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the BSD License. 47 Abstract 49 This memo describes an RTP payload format for the multiview 50 extension of the ITU-T Recommendation H.264 video codec that is 51 technically identical to ISO/IEC International Standard 14496-10. 52 The RTP payload format allows for packetization of one or more 53 Network Abstraction Layer (NAL) units, produced by the video encoder, 54 in each RTP payload. The payload format can be applied in RTP based 55 3D video transmissions such as such as 3D video streaming, free- 56 viewpoint video, and 3DTV. 58 Table of Contents 60 1. Introduction...................................................3 61 2. Conventions....................................................4 62 3. The MVC Codec..................................................4 63 3.1. Overview..................................................4 64 3.2. Parameter Set Concept.....................................5 65 3.3. Network Abstraction Layer Unit Header.....................5 66 4. Scope..........................................................8 67 5. Definitions and Abbreviations..................................8 68 5.1. Definitions...............................................8 69 5.1.1. Definitions per MVC specification....................8 70 5.1.2. Definitions local to this memo.......................9 71 5.1. Abbreviations.............................................9 72 6. MVC RTP Payload Format.........................................9 73 6.1. Design Principles.........................................9 74 6.2. RTP Header Usage.........................................10 75 6.3. Common Structure of the RTP Payload Format...............10 76 6.4. NAL Unit Header Usage....................................10 77 6.5. Packetization Modes......................................11 78 6.5.1. Packetization Modes for single-session transmission.12 79 6.5.2. Packetization Modes for multi-session transmission..12 80 6.6. Aggregation Packets......................................12 81 6.7. Fragmentation Units (FUs)................................12 82 6.8. Payload Content Scalability Information (PACSI) NAL Unit for 83 MVC...........................................................12 84 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)16 85 6.10. Cross-Session DON (CS-DON) for multi-session transmission16 86 7. Packetization Rules...........................................16 87 8. De-Packetization Process (Informative)........................18 88 9. Payload Format Parameters.....................................18 89 9.1. Media Type Registration..................................18 90 9.2. SDP Parameters...........................................20 91 9.2.1. Mapping of Payload Type Parameters to SDP...........20 92 9.2.2. Usage with the SDP Offer/Answer Model...............20 93 9.2.3. Usage with multi-session transmission...............20 94 9.2.4. Usage in Declarative Session Descriptions...........20 95 9.3. Examples.................................................20 96 9.4. Parameter Set Considerations.............................20 97 10. Security Considerations......................................20 98 11. Congestion Control...........................................21 99 12. IANA Considerations..........................................21 100 13. Acknowledgments..............................................21 101 14. References...................................................21 102 14.1. Normative References....................................21 103 14.2. Informative References..................................22 104 Author's Addresses...............................................22 105 15. Open issues:.................................................23 106 16. Changes Log..................................................23 108 1. Introduction 110 This memo specifies an RTP [RFC3550] payload format for a forthcoming 111 new mode of the H.264/AVC video coding standard, known as Multiview 112 Video Coding (MVC). Formally, MVC will take the form of Amendment 4 113 to ISO/IEC 14496 Part 10 [MPEG4-10], and Annex H of ITU-T Rec. H.264 114 [H.264]. The latest draft specification of MVC is available in [MVC]. 116 MVC covers a wide range of 3D video applications, including 3D video 117 streaming, free-viewpoint video as well as 3DTV. 119 This memo follows a backward compatible enhancement philosophy, by 120 keeping as close an alignment to the H.264/AVC payload format 121 [RFC3984] as possible. It documents the enhancements relevant from 122 an RTP transport viewpoint, and defines signaling support for MVC, 123 including a new media subtype name. 125 Due to the similarity between MVC and SVC in system and transport 126 aspects, this memo reuses the design principles as well as many 127 features of the SVC RTP payload draft [I-D.draft-ietf-avt-svc]. 129 [Ed.Note(TS):Need text on session multiplexing and on the relation of 130 this draft to [I-D.draft-ietf-avt-svc] here.] 132 2. Conventions 134 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 135 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 136 document are to be interpreted as described in BCP 14, RFC 2119 137 [RFC2119]. 139 This specification uses the notion of setting and clearing a bit when 140 bit fields are handled. Setting a bit is the same as assigning that 141 bit the value of 1 (On). Clearing a bit is the same as assigning 142 that bit the value of 0 (Off). 144 3. The MVC Codec 146 3.1. Overview 148 MVC provides multi-view video bitstreams. An MVC bitstream contains 149 a base view conforming to at least one of the profiles of H.264/AVC 150 as defined in Annex A of [H.264], and one or more non-base views. To 151 enable high compression efficiency, coding of a non-base view can 152 utilize other views for inter-view prediction, thus its decoding 153 relies on the presence of the views it depends on. Each coded view 154 itself may be temporally scalable. Besides temporal scalability, MVC 155 also supports view scalability, wherein a subset of the encoded views 156 can be extracted, decoded and displayed, whenever it is desired by 157 the application. 159 The concept of video coding layer (VCL) and network abstraction layer 160 (NAL) is inherited from H.264/AVC. The VCL contains the signal 161 processing functionality of the codec; mechanisms such as transform, 162 quantization, motion-compensated prediction, loop filtering and 163 inter-layer prediction. The Network Abstraction Layer (NAL) 164 encapsulates each slice generated by the VCL into one or more Network 165 Abstraction Layer Units (NAL units). Please consult RFC 3984 for a 166 more in-depth discussion of the NAL unit concept. MVC specifies the 167 decoding order of NAL units. 169 In MVC, one access unit contains all NAL units pertaining to one 170 output time instance for all the views. Within one access unit, the 171 coded representation of each view, also named as view component, 172 consists of one or more slices. 174 The concept of temporal scalability is not newly introduced by SVC or 175 MVC, as profiles defined in Annex A of [H.264] already support it. 176 In [H.264], sub-sequences have been introduced in order to allow 177 optional use of temporal layers. SVC extended this approach by 178 advertising the temporal scalability information within the NAL unit 179 header or prefix NAL units, both were inherited to MVC. 181 3.2. Parameter Set Concept 183 The parameter set concept was first specified in [H.264]. Please 184 refer to section 1.2 of [RFC3984] for more details. SVC introduced 185 some new parameter set mechanisms. MVC has inherited the parameter 186 set concept from [H.264]. 188 In particular, a different type of sequence parameter set (SPS), 189 which is referred to as subset SPS, using a different NAL unit type 190 than "the old SPS" specified in [H.264] is used for non-base views, 191 while the base view still uses "the old SPS". Slices from different 192 views would be able to use either 1) the same sequence or picture 193 parameter set, or 2) different sequence or picture parameter sets. 195 The inter-view dependency and the decoding order of all the encoded 196 views are indicated in a new syntax structure, the SPS MVC extension, 197 included in each subset SPS. 199 3.3. Network Abstraction Layer Unit Header 201 An MVC NAL unit of type 20 or 14 consists of a header of four octets 202 and the payload byte string. MVC NAL units of type 20 are coded 203 slices of non-base views. A special type of an MVC NAL unit is the 204 prefix NAL unit (type 14) that includes descriptive information of 205 the associated H.264/AVC VCL NAL unit (type 1 or 5) that immediately 206 follows the prefix NAL unit. 208 MVC extends the one-byte H.264/AVC NAL unit header by three 209 additional octets. The header indicates the type of the NAL unit, 210 the (potential) presence of bit errors or syntax violations in the 211 NAL unit payload, information regarding the relative importance of 212 the NAL unit for the decoding process, the view identification 213 information, the temporal layer identification information, and other 214 fields as discussed below. 216 The syntax and semantics of the NAL unit header are formally 217 specified in [MVC], but the essential properties of the NAL unit 218 header are summarized below. 220 The first byte of the NAL unit header has the following format (the 221 bit fields are the same as defined for the one-byte H.264/AVC NAL 222 unit header, while the semantics of some fields have changed slightly, 223 in a backward compatible way): 225 +---------------+ 226 |0|1|2|3|4|5|6|7| 227 +-+-+-+-+-+-+-+-+ 228 |F|NRI| Type | 229 +---------------+ 231 F: 1 bit 233 forbidden_zero_bit. H.264/AVC declares a value of 1 as a syntax 234 violation. 236 NRI: 2 bits 238 nal_ref_idc. A value of 00 indicates that the content of the NAL 239 unit is not used to reconstruct reference pictures for future 240 prediction. Such NAL units can be discarded without risking the 241 integrity of the reference pictures in the same view. A value higher 242 than 00 indicates that the decoding of the NAL unit is required to 243 maintain the integrity of reference pictures in the same view, or 244 that the NAL unit contains parameter sets. 246 Type: 5 bits 248 nal_unit_type. This component specifies the NAL unit type. 250 In H.264/AVC, NAL unit types 14 and 20 are reserved for future 251 extensions. MVC uses these two NAL unit types. NAL unit type 14 is 252 used for prefix NAL unit, and NAL unit type 20 is used for coded 253 slice of non-base view. NAL unit types 14 and 20 indicate the 254 presence of three additional octets in the NAL unit header, as shown 255 below. 257 +---------------+---------------+---------------+ 258 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 260 |S|I| PRID | VID | TID |A|V|O| 261 +---------------+---------------+---------------+ 263 S: 1 bit 265 svc_extention_flag. MUST be equal to 0 in MVC context. In the 266 context of Scalable Video Coding (SVC), the flag must be equal to 1. 268 I: 1 bit 269 non_idr_flag. This component specifies whether the access unit the 270 NAL unit belongs to is an IDR access unit (when equal to 0) or not 271 (when equal to 1), as specified in [MVC]. 273 PRID: 6 bits 275 priority_id. This flag specifies a priority identifier for the NAL 276 unit. A lower value of PRID indicates a higher priority. 278 VID: 10 bits 280 view_id. This component specifies the view identifier of the view 281 the NAL unit belongs to. 283 TID: 3 bits 285 temporal_id. This component specifies the temporal layer (or frame 286 rate) hierarchy. Informally put, a temporal layer consisting of view 287 component with a less temporal_id corresponds to a lower frame rate. 288 A given temporal layer typically depends on the lower temporal layers 289 (i.e. the temporal layers with less temporal_id values) but never 290 depends on any higher temporal layer (i.e. a temporal layers with 291 higher temporal_id value). 293 A: 1 bit 295 anchor_pic_flag. This component specifies whether the access unit 296 the NAL unit belongs to is an anchor access unit (when equal to 1) or 297 not (when equal to 0), as specified in [MVC]. 299 V: 1 bit 301 inter_view_flag. This component specifies whether the view component 302 is used for inter-view prediction (when equal to 1) or not (when 303 equal to 0). 305 O: 1 bit 307 reserved_one_bit. Reserved bit for future extension. R shall be 308 equal to 1. Receivers SHOULD ignore the value of 309 reserved_zero_one_bit. 311 This memo reuses the same additional NAL unit types introduced in RFC 312 3984, which are presented in section 6.3. In addition, this memo 313 introduces one more NAL unit type, 30, as specified in section 6.8. 314 These NAL unit types are marked as unspecified in [MVC] and 315 intentionally reserved for use in systems specifications like this 316 memo. Moreover, this specification extends the semantics of F, NRI, 317 PRID, TID, A, and I as described in section 6.4. 319 4. Scope 321 This payload specification can only be used to carry the "naked" NAL 322 unit stream over RTP, and not the byte stream format according to 323 Annex B of [MVC]. Likely, the applications of this specification 324 will be in the IP based multimedia communications fields including 3D 325 video streaming over IP, free-viewpoint video over IP, and 3DTV over 326 IP. 328 This specification allows, in a given RTP packet stream, to 329 encapsulate NAL units belonging to 331 o the base view only, detailed specification in [RFC3984], or 333 o one or more non-base views, or 335 o the base view and one or non-base views 337 [Ed.Note(YkW): To be extended to allow separate carriage of different 338 temporal layers in different RTP packet streams as in 339 [I-D.draft-ietf-avt-svc].] 341 5. Definitions and Abbreviations 343 5.1. Definitions 345 5.1.1. Definitions per MVC specification 347 This document uses the definitions of [MVC]. The following terms, 348 defined in [MVC], are summed up for convenience: 350 access unit: A set of NAL units always containing exactly one 351 primary coded picture with one or more view components. In addition 352 to the primary coded picture, an access unit may also contain one or 353 more redundant coded pictures, one auxiliary coded picture, or other 354 NAL units not containing slices or slice data partitions of a coded 355 picture. The decoding of an access unit always results in one decoded 356 picture. All slices or slice data partitions in an access unit have 357 the same value of picture order count. 359 prefix NAL unit: A NAL unit with nal_unit_type equal to 14 that 360 immediately precedes a NAL unit with nal_unit_type equal to 1, 5, 361 or 12. The NAL unit that succeeds the prefix NAL unit is also 362 referred to as the associated NAL unit. The prefix NAL unit contains 363 data associated with the associated NAL unit, which are considered to 364 be part of the associated NAL unit. 366 5.1.2. Definitions local to this memo 368 MVC NAL unit: A NAL unit of NAL unit type 14 or 20 as specified in 369 Annex H of [MVC]. An MVC NAL unit has a four-byte NAL unit header. 371 operation point: An operation point of an MVC bitstream represents a 372 certain level of temporal and view scalability. An operation point 373 contains only those NAL units required for a valid bitstream to 374 represent a certain subset of views at a certain temporal level. An 375 operation point is described by the view_id values of the subset of 376 views, and the highest temporal_id. 378 multi-session transmission: The transmission mode in which the MVC 379 bitstream is transmitted over multiple RTP sessions, with each stream 380 having the same SSRC. These multiple RTP streams can be associated 381 using the RTCP CNAME, or explicit signalling of the SSRC used. 382 Dependency between RTP sessions MUST be signaled according to 383 [RFC5583] and this memo. 385 single-session transmission: The transmission mode in which the MVC 386 bitstream is transmitted over a single RTP session, with a single 387 SSRC and separate timestamp and sequence number spaces. 389 [Ed.Note(TS):Need more definitions here.] 391 5.1. Abbreviations 393 In addition to the abbreviations defined in [RFC3984], the following 394 ones are defined. 396 MVC: Multiview Video Coding 397 CS-DON: Cross-Session Decoding Order Number 398 MST: multi-session transmission 399 PACSI: Payload Content Scalability Information 400 SST: single-session transmission 402 6. MVC RTP Payload Format 404 6.1. Design Principles 406 The following design principles have been observed: 408 o Backward compatibility with [RFC3984] wherever possible. 410 o As the MVC base view is H.264/AVC compatible, the base view or any 411 H.264/AVC compatible subset of it, when transmitted in its own RTP 412 packet stream, MUST be encapsulated using [RFC3984]. Requiring this 413 has the desirable side effect that the transmitted data can be 414 received by [RFC3984] receivers and decoded by H.264/AVC decoders. 416 o Media-Aware Network Elements (MANEs) as defined in [RFC3984] are 417 signaling aware and rely on signaling information. MANEs have state. 419 o MANEs can aggregate multiple RTP streams, possibly from multiple 420 RTP sessions. 422 o MANEs can perform media-aware stream thinning. By using the 423 payload header information identifying Layers within an RTP session, 424 MANEs are able to remove packets from the incoming RTP packet stream. 425 This implies rewriting the RTP headers of the outgoing packet stream 426 and rewriting of RTCP Receiver Reports. 428 6.2. RTP Header Usage 430 Please see section 5.1 of [RFC3984]. 432 6.3. Common Structure of the RTP Payload Format 434 Please see section 5.2 of [RFC3984]. 436 6.4. NAL Unit Header Usage 438 The structure and semantics of the NAL unit header were introduced in 439 section 3.3. This section specifies the semantics of F, NRI, PRID, 440 TID, A and I according to this specification. 442 Note that, in the context of this section, "protecting a NAL unit" 443 means any RTP or network transport mechanism that could improve the 444 probability of success delivery of the packet conveying the NAL unit, 445 including applying a QoS-enabled network, forward error correction 446 (FEC), retransmissions, and advanced scheduling behavior, whenever 447 possible. 449 The semantics of F specified in section 5.3 of [RFC3984] also applies 450 herein. 452 For NRI, for a bitstream conforming to one of the profiles defined in 453 Annex A of [H.264] and transported using [RFC3984], the semantics 454 specified in section 5.3 of [RFC3984] are applicable, i.e., NRI also 455 indicates the relative importance of NAL units. In MVC context, in 456 addition to the semantics specified in Annex H of [MVC] are 457 applicable, NRI also indicate the relative importance of NAL units 458 within a view. MANEs MAY use this information to protect more 459 important NAL units better than less important NAL units. 460 [Ed.Note(YkW): "MVC context" to be clearly specified.] 462 For PRID, the semantics specified in Annex H of [MVC] applies. Note 463 that MANEs implementing unequal error protection MAY use this 464 information to protect NAL units with smaller PRID values better than 465 those with larger PRID values, for example by including only the more 466 important NAL units in a forward error correction (FEC) protection 467 mechanism. The importance for the decoding process decreases as the 468 PRID value increases. 470 For TID, in addition to the semantics specified in Annex H of [MVC], 471 according to this memo, values of TID indicate the relative 472 importance. A lower value of TID indicates a higher importance for 473 NAL units within a view. MANEs MAY use this information to protect 474 more important NAL units better than less important NAL units. 476 For A, in addition to the semantics specified in Annex H of [MVC], 477 according to this memo, MANEs MAY use this information to protect NAL 478 units with A equal to 1 better than NAL units with A equal to 0. 479 MANEs MAY also utilize information of NAL units with A equal to 1 to 480 decide when to forward more packets for an RTP packet stream. For 481 example, when it is sensed that view switching has happened such that 482 the operation point has changed, MANEs MAY start to forward NAL units 483 for a new target view only after forwarding a NAL unit with A equal 484 to 1 for the new target view. 486 For I, in addition to the semantics specified in Annex H of [MVC], 487 according to this memo, MANEs MAY use this information to protect NAL 488 units with I equal to 1 better than NAL units with I equal to 0. 489 MANEs MAY also utilize information of NAL units with I equal to 1 to 490 decide when to forward more packets for an RTP packet stream. For 491 example, when it is sensed that view switching has happened such that 492 the operation point has changed, MANEs MAY start to forward NAL units 493 for a new target view only after forwarding a NAL unit with I equal 494 to 1 for the new target view. 496 6.5. Packetization Modes 498 [Ed.Note(TS): Need to add text from [I-D.draft-ietf-avt-rtp-svc] to 499 this section with respect to MVC.] 501 6.5.1. Packetization Modes for single-session transmission 503 This section will address the issues of section 4.5.1 and 5.1 of [I- 504 D.draft-ietf-avt-rtp-svc]. 506 6.5.2. Packetization Modes for multi-session transmission 508 This section will address the issues of section 4.5.2 and 5.2 of [I- 509 D.draft-ietf-avt-rtp-svc]. 511 6.6. Aggregation Packets 513 This section will address the issues of section 4.7 of [I-D.draft- 514 ietf-avt-rtp-svc]. 516 6.7. Fragmentation Units (FUs) 518 This section will address the issues of section 4.8 of [I-D.draft- 519 ietf-avt-rtp-svc]. 521 6.8. Payload Content Scalability Information (PACSI) NAL Unit for MVC 523 A new NAL unit type is specified in this memo, and referred to as 524 payload content scalability information (PACSI) NAL unit. The PACSI 525 NAL unit, if present, MUST be the first NAL unit in an aggregation 526 packet, and it MUST NOT be present in other types of packets. The 527 PACSI NAL unit indicates view and temporal scalability information 528 and other characteristics that are common for all the remaining NAL 529 units in the payload of the aggregation packet. Furthermore, a PACSI 530 NAL unit MAY include a DONC field and contain zero or more SEI NAL 531 units. PACSI NAL unit makes it easier for MANEs to decide whether to 532 forward/process/discard the aggregation packet containing the PACSI 533 NAL unit. Senders MAY create PACSI NAL units and receivers MAY 534 ignore them, or use them as hints to enable efficient aggregation 535 packet processing. Note that the NAL unit type for the PACSI NAL 536 unit is selected among those values that are unspecified in [MVC] and 537 [RFC3984]. 539 When the first aggregation unit of an aggregation packet contains a 540 PACSI NAL unit, there MUST be at least one additional aggregation 541 unit present in the same packet. The RTP header and payload header 542 fields of the aggregation packet are set according to the remaining 543 NAL units in the aggregation packet. 545 When a PACSI NAL unit is included in a multi-time aggregation packet 546 (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST 547 be set to indicate that the PACSI NAL unit has an identical DON to 548 the first NAL unit in decoding order among the remaining NAL units in 549 the aggregation packet. 551 The structure of a PACSI NAL unit is as follows. The first four 552 octets are exactly the same as the four-byte MVC NAL unit header as 553 discussed in section 3.3. They are followed by two always present 554 octet, two optional octets, and zero or more SEI NAL units, each SEI 555 NAL unit preceded by a 16-bit unsigned size field (in network byte 556 order) that indicates the size of the following NAL unit in bytes 557 (excluding these two octets, but including the NAL unit type octet of 558 the SEI NAL unit). Figure 1 illustrates the PACSI NAL unit structure 559 and an example of a PACSI NAL unit containing two SEI NAL units. 561 The bits P, C, S, and E are specified only if the bit X is equal to 1. 562 The T bit MUST NOT be equal to 1 if the aggregation packet containing 563 the PACSI NAL unit is not an STAP-A packet. The T bit MAY be equal 564 to 1 if the aggregation packet containing the PACSI NAL unit is an 565 STAP-A packet. The field DONC MUST NOT be present if the T bit is 566 equal to 0, and MUST be present if the T bit is equal to 1. 568 0 1 2 3 569 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 571 |F|NRI| Type |S| PRID | TID |A| VID |I|V|R| 572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 573 |X|T|RR |P|C|S|E| RRR | DONC (optional) | 574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 575 | NAL unit size 1 | | 576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 1 | 577 | | 578 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 579 | NAL unit size 2 | | 580 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 2 | 581 | | 582 | +-+-+-+-+-+-+-+-+ 583 | | 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 586 Figure 1. PACSI NAL unit structure 588 The values of the fields in PACSI NAL unit MUST be set as follows. 589 The term "target NAL units" are used in the semantics of some fields. 590 The target NAL units are such NAL units contained in the aggregation 591 packet, but not included in the PACSI NAL unit, that are within the 592 access unit to which the first NAL unit following the PACSI NAL unit 593 in the aggregation packet belongs. 595 o The F bit MUST be set to 1 if the F bit in at least one of the 596 remaining NAL units in the aggregation packet is equal to 1. 597 Otherwise, the F bit MUST be set to 0. 599 o The NRI field MUST be set to the highest value of NRI field among 600 all the remaining NAL units in the aggregation packet. 602 o The Type field MUST be set to 30. 604 o The S bit MUST be set to 1. 606 o The PRID field MUST be set to the lowest value of the PRID values 607 of all the remaining NAL units in the aggregation packet. 609 o The TID field MUST be set to the lowest value of the TID values of 610 all the remaining NAL units with the lowest value of VID in the 611 aggregation packet. 613 o The A bit MUST be set to 1 if the A bit of at least one of the 614 remaining NAL units in the aggregation packet is equal to 1. 615 Otherwise, the A bit MUST be set to 0. 617 o The VID field MUST be set to the lowest value of the VID values of 618 all the remaining NAL units in the aggregation packet. 620 o The I bit MUST be set to 1 if the I bit of at least one of the 621 remaining NAL units in the aggregation packet is equal to 1. 622 Otherwise, the I bit MUST be set to 0. 624 o The V bit MUST be set to 1 if the V bit of at least one of the 625 remaining NAL units in the aggregation packet is equal to 1. 626 Otherwise, the A bit MUST be set to 0. 628 o The R bit MUST be set to 0. Receivers SHOULD ignore the value of R. 630 o If the X bit is equal to 1, the bits P, C, S, and E are specified 631 as below. Otherwise, the bits P, C, S, and E are unspecified, and 632 receivers MUST ignore these bits. The X bit SHOULD be identical for 633 all the PACSI NAL units involved in all the RTP sessions conveying an 634 MVC bitstream. 636 o The RR field MUST be set to '00' (in binary form). Receivers 637 SHOULD ignore the value of RR. 639 o If the T bit is equal to 1, the OPTIONAL field DONC MUST be present 640 and specified as below. Otherwise, the field DONC MUST NOT be present. 642 o The P bit MUST be set to 1 if all the remaining NAL units in the 643 aggregation packet are with redundant_pic_cnt higher than 0, i.e. the 644 slices are redundant slices. Otherwise, the P bit MUST be set to 0. 646 Informative note: The P bit indicates whether the packet can be 647 discarded because it contains only redundant slice NAL units. 648 Without this bit, the corresponding information can be concluded 649 from the syntax element redundant_pic_cnt, which is buried in the 650 variable-length coded slice header. 652 o The C bit MUST be set to 1 if the target NAL units belong to an 653 access unit for which the view components are intra coded. Otherwise, 654 the C bit MUST be set to 0. The C bit SHOULD be identical for all 655 the PACSI NAL units for which the target NAL units belong to the same 656 access unit. 658 Informative note: The C bit indicates whether the packet contains 659 intra slices which may be the only packets to be forwarded for a 660 fast forward playback, e.g. when the network condition is 661 extremely bad. 663 o The S bit MUST be set to 1, if the first VCL NAL unit, in 664 transmission order, of the view component containing the first NAL 665 unit following the PACSI NAL unit in the aggregation packet is 666 present in the aggregation packet. Otherwise, the S bit MUST be set 667 to 0. 669 o The E bit MUST be set to 1, if the last VCL NAL unit, in 670 transmission order, of the view component containing the first NAL 671 unit following the PACSI NAL unit in the aggregation packet is 672 present in the aggregation packet. Otherwise, the E field MUST be 673 set to 0. 675 Informative note: The S or E bit indicates whether the first or 676 last slice, in transmission order, of a view component is in the 677 packet, to enable a MANE to detect slice loss and take proper 678 action such as requesting a retransmission as soon as possible, 679 as well as to allow an efficient playout buffer handling 680 similarly as the M bit in the RTP header. The M bit in the RTP 681 header still indicates the end of an access unit, not the end of 682 a view component. 684 o The RRR field MUST be set to '00000000'(in binary form). Receivers 685 SHOULD ignore the value of RRR. 687 o When present, the field DONC indicates the CL-DON value for the 688 first NAL unit in the STAP-A in transmission order. 690 SEI NAL units included in the PACSI NAL unit, if any, MUST contain a 691 subset of the SEI messages associated with the access unit of the 692 first NAL unit following the PACSI NAL unit within the aggregation 693 packet. 695 Informative note: Senders may repeat such SEI NAL units in the 696 PACSI NAL unit the presence of which in more than one packet is 697 essential for packet loss robustness. Receivers may use the 698 repeated SEI messages in place of missing SEI messages. 700 An SEI message SHOULD NOT be included in a PACSI NAL unit and 701 included in one of the remaining NAL units contained in the same 702 aggregation packet. 704 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs) 706 This section will address the issues of section 4.7.1 of [I-D.draft- 707 ietf-avt-rtp-svc]. 709 6.10. Cross-Session DON (CS-DON) for multi-session transmission 711 This section will address the issues of section 4.11 of [I-D.draft- 712 ietf-avt-rtp-svc]. 714 7. Packetization Rules 716 [Ed.Note(TS): We need to adjust this section with respect to [I- 717 D.draft-ietf-avt-rtp-svc].] 719 Section 6 of [RFC3984] applies. The following rules apply in 720 addition. 722 All receivers MUST support the single NAL unit packetization mode to 723 provide backward compatibility to endpoints supporting only the 724 single NAL unit mode of RFC 3984. However, the single NAL unit 725 packetization mode SHOULD NOT be used whenever possible, because 726 encapsulating NAL units of small sizes, e.g. small NAL units 727 containing parameter sets, SEI messages or prefix NAL units, in their 728 own packets is typically less efficient because of the relatively big 729 overhead. 731 All receivers MUST support the non-interleaved packetization mode. 733 Informative note: The non-interleaved mode allows an application 734 to encapsulate a single NAL unit in a single RTP packet. 735 Historically, the single NAL unit mode has been included into 736 [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A 737 [H.241]. There is no point in carrying this historic ballast 738 towards a new application space such as the one provided with MVC. 739 More technically speaking, the implementation complexity increase 740 for providing the additional mechanisms of the non-interleaved 741 mode (namely STAP-A and FU-A) is minor, and the benefits are 742 great, that STAP-A implementation is required. 744 A NAL unit of small size SHOULD be encapsulated in an aggregation 745 packet together with one or more other NAL units. For example, non- 746 VCL NAL units such as access unit delimiter, parameter set, or SEI 747 NAL unit are typically small. 749 A prefix NAL unit SHOULD be aggregated to the same packet as the 750 associated NAL unit following the prefix NAL unit in decoding order. 752 When the first aggregation unit of an aggregation packet contains a 753 PACSI NAL unit, there MUST be at least one additional aggregation 754 unit present in the same packet. 756 When an MVC bitstream is transported in more than one RTP session, 757 the following applies. 759 o Interleaved mode SHOULD be used for all the RTP sessions. 761 o An RTP session that does not use interleaved mode SHOULD be 762 constrained as follows. 764 - Non-interleaved mode MUST be used. 766 - STAP-A MUST be used, and any other type of packets MUST NOT be 767 used. 769 - Each STAP-A MUST contain a PACSI NAL unit and the DONC field MUST 770 be present in the PACSI NAL unit. 772 Informative note: The motivation for these constraints is to 773 allow the use of non-interleaved mode for the session conveying 774 the H.264/AVC compatible view, such that RFC 3984 receivers 775 without interleaved mode implementation can subscribe to the base 776 view session. 778 Non-VCL NAL units SHOULD be conveyed in the same session as the 779 associated VCL NAL units. To meet this, SEI messages that are 780 contained in scalable nesting SEI message and are applicable to more 781 than one session SHOULD be separated and contained into multiple 782 scalable nesting SEI messages. The DON values MUST indicate the 783 cross-layer decoding order number values as if all these SEI messages 784 were in separate scalable nesting SEI messages and contained in the 785 beginning of the corresponding access units as specified in [MVC]. 787 8. De-Packetization Process (Informative) 789 For a single RTP session, the de-packetization process specified in 790 section 7 of [RFC3984] applies. 792 For receiving more than one of multiple RTP sessions conveying a 793 scalable bitstream, an example of a suitable implementation of the 794 de-packetization process is to be specified similarly as what will be 795 finally included in [I-D.draft-ietf-avt-svc]. 797 9. Payload Format Parameters 799 This section specifies the parameters that MAY be used to select 800 optional features of the payload format and certain features of the 801 bitstream. The parameters are specified here as part of the media 802 type registration for the MVC codec. A mapping of the parameters 803 into the Session Description Protocol (SDP) [RFC4566] is also 804 provided for applications that use SDP. Equivalent parameters could 805 be defined elsewhere for use with control protocols that do not use 806 SDP. 808 9.1. Media Type Registration 810 The media subtype for the MVC codec is allocated from the IETF tree. 812 The receiver MUST ignore any unspecified parameter. 814 Informative note: Requiring ignoring unspecified parameter allows 815 for backward compatibility of future extensions. For example, if 816 a future specification that is backward compatible to this 817 specification specifies some new parameters, then a receiver 818 according to this specification is capable of receiving data per 819 the new payload but ignoring those parameters newly specified in 820 the new payload specification. This sentence is also present in 821 RFC 3984. 823 Media Type name: video 825 Media subtype name: H264-MVC 826 The media subtype "H264" MUST be used for RTP streams using RFC 3984, 827 i.e. not using any of the new features introduced by this 828 specification compared to RFC 3984. For RTP streams using any of the 829 new features introduced by this specification compared to RFC 3984, 830 the media subtype "H264-MVC" SHOULD be used, and the media subtype 831 "H264" MAY be used. Use of the media subtype "H264" for RTP streams 832 using the new features allows for RFC 3984 receivers to negotiate and 833 receive H.264/AVC or MVC streams packetized according to this 834 specification, but to ignore media parameters and NAL unit types it 835 does not recognize. 837 Required parameters: none 839 OPTIONAL parameters: to be specified. 841 Encoding considerations: 843 This type is only defined for transfer via RTP (RFC 3550). 845 Security considerations: 847 See section 10 of RFC XXXX. 849 Public specification: 851 Please refer to RFC XXXX and its section 14. 853 Additional information: none 855 File extensions: none 857 Macintosh file type code: none 859 Object identifier or OID: none 861 Person & email address to contact for further information: 863 Intended usage: COMMON 865 Author: NN 867 Change controller: 869 IETF Audio/Video Transport working group delegated from the IESG. 871 9.2. SDP Parameters 873 9.2.1. Mapping of Payload Type Parameters to SDP 875 The media type video/H264-MVC string is mapped to fields in the 876 Session Description Protocol (SDP) as follows: 878 The media name in the "m=" line of SDP MUST be video. 880 The encoding name in the "a=rtpmap" line of SDP MUST be H264-MVC (the 881 media subtype). 883 The clock rate in the "a=rtpmap" line MUST be 90000. 885 The OPTIONAL parameters, when present, MUST be included in the 886 "a=fmtp" line of SDP. These parameters are expressed as a media type 887 string, in the form of a semicolon separated list of parameter=value 888 pairs. 890 9.2.2. Usage with the SDP Offer/Answer Model 892 TBD. 894 9.2.3. Usage with multi-session transmission 896 If multi-session transmission is used, the rules on signaling media 897 decoding dependency in SDP as defined in 898 [RFC5583] apply. 900 9.2.4. Usage in Declarative Session Descriptions 902 TBD. 904 9.3. Examples 906 TBD. 908 9.4. Parameter Set Considerations 910 Please see section 10 of [RFC3984]. 912 10. Security Considerations 914 Please see section 11 of [RFC3984]. 916 11. Congestion Control 918 TBD. 920 12. IANA Considerations 922 Request for media type registration to be added. 924 13. Acknowledgments 926 The work of Thomas Schierl has been supported by the European 927 Commission under contract number FP7-ICT-248036, project COAST. 929 This document was prepared using 2-Word-v2.0.template.dot. 931 14. References 933 14.1. Normative References 935 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 936 generic audiovisual services", 3rd Edition, November 2007. 938 [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Wang, Y. -K., Schierl, T. 939 and A. Eleftheriadis, "RTP payload format for SVC video", 940 draft-ietf-avt-rtp-svc-23 (work in progress), October 2010. 942 [RFC5583] Schierl, T., and Wenger, S., "Signaling media decoding 943 dependency in the Session Description Protocol (SDP)", RFC 944 5583, July 2009. 946 [MPEG4-10] 947 ISO/IEC International Standard 14496-10:2005. 949 [MVC] Joint Video Team, "Joint Draft 7 of MVC ", available from 950 http://ftp3.itu.ch/av-arch/jvt-site/2008_04_Geneva/JVT- 951 AA209.zip, Geneva, Switzerland, April 2008. 953 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 954 Requirement Levels", BCP 14, RFC 2119, March 1997. 956 [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data 957 Encodings", RFC 3548, July 2003. 959 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 960 V., "RTP: A Transport Protocol for Real-Time Applications", 961 STD 64, RFC 3550, July 2003. 963 [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, M., 964 and Singer, D., "RTP Payload Format for H.264 Video", RFC 965 3984, February 2005. 967 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 968 Description Protocol", RFC 4566, July 2006. 970 14.2. Informative References 972 [DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H 973 Implementation Guidelines, ETSI TR 102 377, 2005. 975 [H.241] ITU-T Rec. H.241, "Extended video procedures and control 976 signals for H.300-series terminals", May 2006. 978 [IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B., and 979 Thyagarajan, A., "Internet Group Management Protocol, 980 Version 3", RFC 3376, October 2002. 982 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 983 driven layered multicast", in Proc. of ACM SIGCOMM'96, 984 pages 117--130, Stanford, CA, August 1996. 986 [MBMS] 3GPP - Technical Specification Group Services and System 987 Aspects; Multimedia Broadcast/Multicast Service (MBMS); 988 Protocols and codecs (Release 6), December 2005. 990 [MPEG2] ISO/IEC International Standard 13818-2:1993. 992 [RFC3450] Luby, M., Gemmell, J., Vicisano, L., Rizzo, L., and 993 Crowcroft, J., "Asynchronous layered coding (ALC) protocol 994 instantiation", RFC 3450, December 2002. 996 Author's Addresses 998 Ye-Kui Wang 999 Huawei Technologies 1000 400 Somerset Corporate Blvd, Suite 602 1001 Bridgewater, NJ 08807 1002 USA 1004 Phone: +1-908-541-3518 1005 EMail: yekuiwang@huawei.com 1006 Thomas Schierl 1007 Fraunhofer HHI 1008 Einsteinufer 37 1009 D-10587 Berlin 1010 Germany 1012 Phone: +49-30-31002-227 1013 EMail: ts@thomas-schierl.de 1015 15. Open issues: 1017 - The use of CL-DON for session reordering allows also for 1018 interleaved transmission with non-interleaved packetization mode. 1019 There should be a clear separation between both tools. This issue 1020 should be handled the same way as for the SVC payload draft. 1022 - Since SVC session multiplexing (multi source transmission(MST)) is 1023 cleared, it would be great to just reference the MST sections in 1024 [I-D.draft-ietf-avt-rtp-svc]. Since the text in sections 6 and 7 1025 of [I-D.draft-ietf-avt-rtp-svc] is currently very SVC specific, 1026 the authors would have to try to rewrite these sections in a more 1027 generic way. If this is not possible, we need to copy text from 1028 [I-D.draft-ietf-avt-rtp-svc] with respect to MVC. 1030 16. Changes Log 1032 Initial version 00 1034 10 November 2007: YkW 1035 Initial version 1037 12 November 2007: TS 1038 - Added definition of "Session multiplexing" 1039 - Added the reference of [I-D.draft-ietf-mmusic-decoding- 1040 dependency], and its reference in section 9.2.3 1042 12 November 2007: YkW 1043 - Added the reference of [I-D.draft-ietf-avt-svc] and its 1044 reference in section 1. 1045 - Added in sections 3.1 and 3.2 paragraphs regarding inter-view 1046 prediction 1048 From draft-wang-avt-rtp-mvc-00 to draft-wang-avt-rtp-mvc-01 1049 18 February 2008: YkW 1050 - Alignment to the latest MVC draft in JVT-Z209 and version 07 1051 of [I-D.draft-ietf-avt-svc]. 1053 25 February 2008: TS 1055 - Minor modifications and updates throughout the document 1057 - Added open issue on clear separation between "decoding order 1058 recovery" and "interleaving" 1060 From draft-wang-avt-rtp-mvc-01 to draft-wang-avt-rtp-mvc-02 1062 09 July 2008: TS 1064 - Minor modifications and updates throughout the document 1066 - Added open issue 1068 - NAL unit header alignment with MVC spec 1070 - Section 6. References corresponding sections in [RFC3984] and [I- 1071 D.draft-ietf-avt-svc]. 1073 - TBD: Section 7, we may align [I-D.draft-ietf-avt-svc] in a way 1074 that SVC is not mentioned in this paragraphs, so that we can 1075 reference them from this document. 1077 21 August 2008: 1079 - Minor modifications, editing and adding notes throughout the 1080 document. 1082 - Updated references 1084 From draft-wang-avt-rtp-mvc-02 to draft-wang-avt-rtp-mvc-03 1086 04 February 2009: YkW 1088 - Updated author's address. 1090 04 February 2009: YkW 1092 - Updated the boiler template. 1094 From draft-wang-avt-rtp-mvc-03 to draft-wang-avt-rtp-mvc-04 1095 22 October 2009: YkW 1097 - Updated author's address and the boiler template (added the last 1098 sentence in Copyright Notice). 1100 From draft-wang-avt-rtp-mvc-04 to draft-wang-avt-rtp-mvc-05 1102 22 April 2010: YkW 1104 - To keep the draft alive, no change other than version number etc. 1106 From draft-wang-avt-rtp-mvc-05 to draft-ietf-avt-rtp-mvc-00 1108 28 April 2010: YkW 1110 - No change other than version number etc. 1112 From draft-ietf-avt-rtp-mvc-00 to draft-ietf-avt-rtp-mvc-01 1114 8/9 October 2010: 1116 - YkW: Updated the NAL unit header syntax and semantics in section 1117 3.3 per the latest MVC specification. 1119 - TS: Minor edits