idnits 2.17.1 draft-wang-avt-rtp-mvc-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 7 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 22, 2010) is 5111 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'I-D.draft-ietf-avt-svc' is mentioned on line 1078, but not defined == Unused Reference: 'RFC3548' is defined on line 959, but no explicit reference was found in the text == Unused Reference: 'DVB-H' is defined on line 975, but no explicit reference was found in the text == Unused Reference: 'IGMP' is defined on line 981, but no explicit reference was found in the text == Unused Reference: 'McCanne' is defined on line 985, but no explicit reference was found in the text == Unused Reference: 'MBMS' is defined on line 989, but no explicit reference was found in the text == Unused Reference: 'MPEG2' is defined on line 993, but no explicit reference was found in the text == Unused Reference: 'RFC3450' is defined on line 995, but no explicit reference was found in the text == Outdated reference: A later version (-27) exists of draft-ietf-avt-rtp-svc-13 == Outdated reference: A later version (-08) exists of draft-ietf-mmusic-decoding-dependency-02 -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG4-10' -- Possible downref: Non-RFC (?) normative reference: ref. 'MVC' ** Obsolete normative reference: RFC 3548 (Obsoleted by RFC 4648) ** Obsolete normative reference: RFC 3984 (Obsoleted by RFC 6184) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 3450 (Obsoleted by RFC 5775) Summary: 5 errors (**), 0 flaws (~~), 11 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Audio/Video Transport WG Y.-K. Wang 2 Internet Draft Huawei Technologies 3 Intended status: Standards track T. Schierl 4 Expires: October 2010 Fraunhofer HHI 5 April 22, 2010 7 RTP Payload Format for MVC Video 8 draft-wang-avt-rtp-mvc-05.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on October 22, 2010. 32 Copyright Notice 34 Copyright (c) 2010 IETF Trust and the persons identified as the 35 document authors. All rights reserved. 37 This document is subject to BCP 78 and the IETF Trust's Legal 38 Provisions Relating to IETF Documents 39 (http://trustee.ietf.org/license-info) in effect on the date of 40 publication of this document. Please review these documents 41 carefully, as they describe your rights and restrictions with respect 42 to this document. Code Components extracted from this document must 43 include Simplified BSD License text as described in Section 4.e of 44 the Trust Legal Provisions and are provided without warranty as 45 described in the BSD License. 47 Abstract 49 This memo describes an RTP payload format for the multiview 50 extension of the ITU-T Recommendation H.264 video codec that is 51 technically identical to ISO/IEC International Standard 14496-10. 52 The RTP payload format allows for packetization of one or more 53 Network Abstraction Layer (NAL) units, produced by the video encoder, 54 in each RTP payload. The payload format has wide applicability, 55 such as 3D video streaming, free-viewpoint video, and 3DTV. 57 Table of Contents 59 1. Introduction...................................................3 60 2. Conventions....................................................4 61 3. The MVC Codec..................................................4 62 3.1. Overview..................................................4 63 3.2. Parameter Set Concept.....................................5 64 3.3. Network Abstraction Layer Unit Header.....................5 65 4. Scope..........................................................8 66 5. Definitions and Abbreviations..................................8 67 5.1. Definitions...............................................8 68 5.1.1. Definitions per MVC specification....................8 69 5.1.2. Definitions local to this memo.......................9 70 5.1. Abbreviations.............................................9 71 6. MVC RTP Payload Format.........................................9 72 6.1. Design Principles.........................................9 73 6.2. RTP Header Usage.........................................10 74 6.3. Common Structure of the RTP Payload Format...............10 75 6.4. NAL Unit Header Usage....................................10 76 6.5. Packetization Modes......................................11 77 6.5.1. Packetization Modes for single-session transmission.12 78 6.5.2. Packetization Modes for multi-session transmission..12 79 6.6. Aggregation Packets......................................12 80 6.7. Fragmentation Units (FUs)................................12 81 6.8. Payload Content Scalability Information (PACSI) NAL Unit for 82 MVC...........................................................12 83 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs)16 84 6.10. Cross-Session DON (CS-DON) for multi-session transmission16 85 7. Packetization Rules...........................................16 86 8. De-Packetization Process (Informative)........................18 87 9. Payload Format Parameters.....................................18 88 9.1. Media Type Registration..................................18 89 9.2. SDP Parameters...........................................20 90 9.2.1. Mapping of Payload Type Parameters to SDP...........20 91 9.2.2. Usage with the SDP Offer/Answer Model...............20 92 9.2.3. Usage with multi-session transmission...............20 93 9.2.4. Usage in Declarative Session Descriptions...........20 94 9.3. Examples.................................................20 95 9.4. Parameter Set Considerations.............................20 96 10. Security Considerations......................................20 97 11. Congestion Control...........................................21 98 12. IANA Considerations..........................................21 99 13. Acknowledgments..............................................21 100 14. References...................................................21 101 14.1. Normative References....................................21 102 14.2. Informative References..................................22 103 Author's Addresses...............................................23 104 15. Open issues:.................................................23 105 16. Changes Log..................................................23 107 1. Introduction 109 This memo specifies an RTP [RFC3550] payload format for a forthcoming 110 new mode of the H.264/AVC video coding standard, known as Multiview 111 Video Coding (MVC). Formally, MVC will take the form of Amendment 4 112 to ISO/IEC 14496 Part 10 [MPEG4-10], and Annex H of ITU-T Rec. H.264 113 [H.264]. The latest draft specification of MVC is available in [MVC]. 115 MVC covers a wide range of 3D video applications, including 3D video 116 streaming, free-viewpoint video as well as 3DTV. 118 This memo follows a backward compatible enhancement philosophy, by 119 keeping as close an alignment to the H.264/AVC payload format 120 [RFC3984] as possible. It documents the enhancements relevant from 121 an RTP transport viewpoint, and defines signaling support for MVC, 122 including a new media subtype name. 124 Due to the similarity between MVC and SVC in system and transport 125 aspects, this memo reuses the design principles as well as many 126 features of the SVC RTP payload draft [I-D.draft-ietf-avt-svc]. 128 [Ed.Note(TS):Need text on session multiplexing and on the relation of 129 this draft to [I-D.draft-ietf-avt-svc] here.] 131 2. Conventions 133 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 134 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 135 document are to be interpreted as described in BCP 14, RFC 2119 136 [RFC2119]. 138 This specification uses the notion of setting and clearing a bit when 139 bit fields are handled. Setting a bit is the same as assigning that 140 bit the value of 1 (On). Clearing a bit is the same as assigning 141 that bit the value of 0 (Off). 143 3. The MVC Codec 145 3.1. Overview 147 MVC provides multi-view video bitstreams. An MVC bitstream contains 148 a base view conforming to at least one of the profiles of H.264/AVC 149 as defined in Annex A of [H.264], and one or more non-base views. To 150 enable high compression efficiency, coding of a non-base view can 151 utilize other views for inter-view prediction, thus its decoding 152 relies on the presence of the views it depends on. Each coded view 153 itself may be temporally scalable. Besides temporal scalability, MVC 154 also supports view scalability, wherein a subset of the encoded views 155 can be extracted, decoded and displayed, whenever it is desired by 156 the application. 158 The concept of video coding layer (VCL) and network abstraction layer 159 (NAL) is inherited from H.264/AVC. The VCL contains the signal 160 processing functionality of the codec; mechanisms such as transform, 161 quantization, motion-compensated prediction, loop filtering and 162 inter-layer prediction. The Network Abstraction Layer (NAL) 163 encapsulates each slice generated by the VCL into one or more Network 164 Abstraction Layer Units (NAL units). Please consult RFC 3984 for a 165 more in-depth discussion of the NAL unit concept. MVC specifies the 166 decoding order of NAL units. 168 In MVC, one access unit contains all NAL units pertaining to one 169 output time instance for all the views. Within one access unit, the 170 coded representation of each view, also named as view component, 171 consists of one or more slices. 173 The concept of temporal scalability is not newly introduced by SVC or 174 MVC, as profiles defined in Annex A of [H.264] already support it. 175 In [H.264], sub-sequences have been introduced in order to allow 176 optional use of temporal layers. SVC extended this approach by 177 advertising the temporal scalability information within the NAL unit 178 header or prefix NAL units, both were inherited to MVC. 180 3.2. Parameter Set Concept 182 The parameter set concept was first specified in [H.264]. Please 183 refer to section 1.2 of [RFC3984] for more details. SVC introduced 184 some new parameter set mechanisms. MVC has inherited the parameter 185 set concept from [H.264]. 187 In particular, a different type of sequence parameter set (SPS), 188 which is referred to as subset SPS, using a different NAL unit type 189 than "the old SPS" specified in [H.264] is used for non-base views, 190 while the base view still uses "the old SPS". Slices from different 191 views would be able to use either 1) the same sequence or picture 192 parameter set, or 2) different sequence or picture parameter sets. 194 The inter-view dependency and the decoding order of all the encoded 195 views are indicated in a new syntax structure, the SPS MVC extension, 196 included in each subset SPS. 198 3.3. Network Abstraction Layer Unit Header 200 An MVC NAL unit of type 20 or 14 consists of a header of four octets 201 and the payload byte string. MVC NAL units of type 20 are coded 202 slices of non-base views. A special type of an MVC NAL unit is the 203 prefix NAL unit (type 14) that includes descriptive information of 204 the associated H.264/AVC VCL NAL unit (type 1 or 5) that immediately 205 follows the prefix NAL unit. 207 MVC extends the one-byte H.264/AVC NAL unit header by three 208 additional octets. The header indicates the type of the NAL unit, 209 the (potential) presence of bit errors or syntax violations in the 210 NAL unit payload, information regarding the relative importance of 211 the NAL unit for the decoding process, the view identification 212 information, the temporal layer identification information, and other 213 fields as discussed below. 215 The syntax and semantics of the NAL unit header are formally 216 specified in [MVC], but the essential properties of the NAL unit 217 header are summarized below. 219 The first byte of the NAL unit header has the following format (the 220 bit fields are the same as defined for the one-byte H.264/AVC NAL 221 unit header, while the semantics of some fields have changed slightly, 222 in a backward compatible way): 224 +---------------+ 225 |0|1|2|3|4|5|6|7| 226 +-+-+-+-+-+-+-+-+ 227 |F|NRI| Type | 228 +---------------+ 230 F: 1 bit 232 forbidden_zero_bit. H.264/AVC declares a value of 1 as a syntax 233 violation. 235 NRI: 2 bits 237 nal_ref_idc. A value of 00 indicates that the content of the NAL 238 unit is not used to reconstruct reference pictures for future 239 prediction. Such NAL units can be discarded without risking the 240 integrity of the reference pictures in the same view. A value higher 241 than 00 indicates that the decoding of the NAL unit is required to 242 maintain the integrity of reference pictures in the same view, or 243 that the NAL unit contains parameter sets. 245 Type: 5 bits 247 nal_unit_type. This component specifies the NAL unit type. 249 In H.264/AVC, NAL unit types 14 and 20 are reserved for future 250 extensions. MVC uses these two NAL unit types. NAL unit type 14 is 251 used for prefix NAL unit, and NAL unit type 20 is used for coded 252 slice of non-base view. NAL unit types 14 and 20 indicate the 253 presence of three additional octets in the NAL unit header, as shown 254 below. 256 +---------------+---------------+---------------+ 257 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 259 |R|I| PRID | VID | TID |A|V|O| 260 +---------------+---------------+---------------+ 262 PRID: 6 bits 264 priority_id. This flag specifies a priority identifier for the NAL 265 unit. A lower value of PRID indicates a higher priority. 267 TID: 3 bits 269 temporal_id. This component specifies the temporal layer (or frame 270 rate) hierarchy. Informally put, a temporal layer consisting of view 271 component with a less temporal_id corresponds to a lower frame rate. 272 A given temporal layer typically depends on the lower temporal layers 273 (i.e. the temporal layers with less temporal_id values) but never 274 depends on any higher temporal layer (i.e. a temporal layers with 275 higher temporal_id value). 277 A: 1 bit 279 anchor_pic_flag. This component specifies whether the view component 280 is an anchor picture (when equal to 1) or not (when equal to 0), as 281 specified in [MVC]. 283 VID: 10 bits 285 view_id. This component specifies the view identifier of the view 286 the NAL unit belongs to. 288 I: 1 bit 290 idr_flag. This component specifies whether the view component is a 291 view instantaneous decoding refresh (V-IDR) picture for the view 292 (when equal to 1) or not (when equal to 0), as specified in [MVC]. 294 V: 1 bit 296 inter_view_flag. This component specifies whether the view component 297 is used for inter-view prediction (when equal to 1) or not (when 298 equal to 0). 300 R: 1 bit 302 reserved_zero_one_bit. Reserved bit for future extension. R MUST be 303 equal to 0. Receivers SHOULD ignore the value of 304 reserved_zero_one_bit. 306 O: 1 bit 308 reserved_one_bit. Reserved bit for future extension. R shall be 309 equal to 1. Receivers SHOULD ignore the value of 310 reserved_zero_one_bit. 312 This memo reuses the same additional NAL unit types introduced in RFC 313 3984, which are presented in section 6.3. In addition, this memo 314 introduces one more NAL unit type, 30, as specified in section 6.8. 315 These NAL unit types are marked as unspecified in [MVC] and 316 intentionally reserved for use in systems specifications like this 317 memo. Moreover, this specification extends the semantics of F, NRI, 318 PRID, TID, A, and I as described in section 6.4. 320 4. Scope 322 This payload specification can only be used to carry the "naked" NAL 323 unit stream over RTP, and not the byte stream format according to 324 Annex B of [MVC]. Likely, the applications of this specification 325 will be in the IP based multimedia communications fields including 3D 326 video streaming over IP, free-viewpoint video over IP, and 3DTV over 327 IP. 329 This specification allows, in a given RTP packet stream, to 330 encapsulate NAL units belonging to 332 o the base view only, detailed specification in [RFC3984], or 334 o one or more non-base views, or 336 o the base view and one or non-base views 338 [Ed.Note(YkW): To be extended to allow separate carriage of different 339 temporal layers in different RTP packet streams as in 340 [I-D.draft-ietf-avt-svc].] 342 5. Definitions and Abbreviations 344 5.1. Definitions 346 5.1.1. Definitions per MVC specification 348 This document uses the definitions of [MVC]. The following terms, 349 defined in [MVC], are summed up for convenience: 351 access unit: A set of NAL units always containing exactly one 352 primary coded picture with one or more view components. In addition 353 to the primary coded picture, an access unit may also contain one or 354 more redundant coded pictures, one auxiliary coded picture, or other 355 NAL units not containing slices or slice data partitions of a coded 356 picture. The decoding of an access unit always results in one decoded 357 picture. All slices or slice data partitions in an access unit have 358 the same value of picture order count. 360 prefix NAL unit: A NAL unit with nal_unit_type equal to 14 that 361 immediately precedes a NAL unit with nal_unit_type equal to 1, 5, 362 or 12. The NAL unit that succeeds the prefix NAL unit is also 363 referred to as the associated NAL unit. The prefix NAL unit contains 364 data associated with the associated NAL unit, which are considered to 365 be part of the associated NAL unit. 367 5.1.2. Definitions local to this memo 369 MVC NAL unit: A NAL unit of NAL unit type 14 or 20 as specified in 370 Annex H of [MVC]. An MVC NAL unit has a four-byte NAL unit header. 372 operation point: An operation point of an MVC bitstream represents a 373 certain level of temporal and view scalability. An operation point 374 contains only those NAL units required for a valid bitstream to 375 represent a certain subset of views at a certain temporal level. An 376 operation point is described by the view_id values of the subset of 377 views, and the highest temporal_id. 379 multi-session transmission: The transmission mode in which the MVC 380 bitstream is transmitted over multiple RTP sessions, with each stream 381 having the same SSRC. These multiple RTP streams can be associated 382 using the RTCP CNAME, or explicit signalling of the SSRC used. 383 Dependency between RTP sessions MUST be signaled according to [I- 384 D.ietf-mmusic-decoding-dependency] and this memo. 386 single-session transmission: The transmission mode in which the MVC 387 bitstream is transmitted over a single RTP session, with a single 388 SSRC and separate timestamp and sequence number spaces. 390 [Ed.Note(TS):Need more definitions here.] 392 5.1. Abbreviations 394 In addition to the abbreviations defined in [RFC3984], the following 395 ones are defined. 397 MVC: Multiview Video Coding 398 CS-DON: Cross-Session Decoding Order Number 399 MST: multi-session transmission 400 PACSI: Payload Content Scalability Information 401 SST: single-session transmission 403 6. MVC RTP Payload Format 405 6.1. Design Principles 407 The following design principles have been observed: 409 o Backward compatibility with [RFC3984] wherever possible. 411 o As the MVC base view is H.264/AVC compatible, the base view or any 412 H.264/AVC compatible subset of it, when transmitted in its own RTP 413 packet stream, MUST be encapsulated using [RFC3984]. Requiring this 414 has the desirable side effect that the transmitted data can be 415 received by [RFC3984] receivers and decoded by H.264/AVC decoders. 417 o Media-Aware Network Elements (MANEs) as defined in [RFC3984] are 418 signaling aware and rely on signaling information. MANEs have state. 420 o MANEs can aggregate multiple RTP streams, possibly from multiple 421 RTP sessions. 423 o MANEs can perform media-aware stream thinning. By using the 424 payload header information identifying Layers within an RTP session, 425 MANEs are able to remove packets from the incoming RTP packet stream. 426 This implies rewriting the RTP headers of the outgoing packet stream 427 and rewriting of RTCP Receiver Reports. 429 6.2. RTP Header Usage 431 Please see section 5.1 of [RFC3984]. 433 6.3. Common Structure of the RTP Payload Format 435 Please see section 5.2 of [RFC3984]. 437 6.4. NAL Unit Header Usage 439 The structure and semantics of the NAL unit header were introduced in 440 section 3.3. This section specifies the semantics of F, NRI, PRID, 441 TID, A and I according to this specification. 443 Note that, in the context of this section, "protecting a NAL unit" 444 means any RTP or network transport mechanism that could improve the 445 probability of success delivery of the packet conveying the NAL unit, 446 including applying a QoS-enabled network, forward error correction 447 (FEC), retransmissions, and advanced scheduling behavior, whenever 448 possible. 450 The semantics of F specified in section 5.3 of [RFC3984] also applies 451 herein. 453 For NRI, for a bitstream conforming to one of the profiles defined in 454 Annex A of [H.264] and transported using [RFC3984], the semantics 455 specified in section 5.3 of [RFC3984] are applicable, i.e., NRI also 456 indicates the relative importance of NAL units. In MVC context, in 457 addition to the semantics specified in Annex H of [MVC] are 458 applicable, NRI also indicate the relative importance of NAL units 459 within a view. MANEs MAY use this information to protect more 460 important NAL units better than less important NAL units. 461 [Ed.Note(YkW): "MVC context" to be clearly specified.] 463 For PRID, the semantics specified in Annex H of [MVC] applies. Note 464 that MANEs implementing unequal error protection MAY use this 465 information to protect NAL units with smaller PRID values better than 466 those with larger PRID values, for example by including only the more 467 important NAL units in a forward error correction (FEC) protection 468 mechanism. The importance for the decoding process decreases as the 469 PRID value increases. 471 For TID, in addition to the semantics specified in Annex H of [MVC], 472 according to this memo, values of TID indicate the relative 473 importance. A lower value of TID indicates a higher importance for 474 NAL units within a view. MANEs MAY use this information to protect 475 more important NAL units better than less important NAL units. 477 For A, in addition to the semantics specified in Annex H of [MVC], 478 according to this memo, MANEs MAY use this information to protect NAL 479 units with A equal to 1 better than NAL units with A equal to 0. 480 MANEs MAY also utilize information of NAL units with A equal to 1 to 481 decide when to forward more packets for an RTP packet stream. For 482 example, when it is sensed that view switching has happened such that 483 the operation point has changed, MANEs MAY start to forward NAL units 484 for a new target view only after forwarding a NAL unit with A equal 485 to 1 for the new target view. 487 For I, in addition to the semantics specified in Annex H of [MVC], 488 according to this memo, MANEs MAY use this information to protect NAL 489 units with I equal to 1 better than NAL units with I equal to 0. 490 MANEs MAY also utilize information of NAL units with I equal to 1 to 491 decide when to forward more packets for an RTP packet stream. For 492 example, when it is sensed that view switching has happened such that 493 the operation point has changed, MANEs MAY start to forward NAL units 494 for a new target view only after forwarding a NAL unit with I equal 495 to 1 for the new target view. 497 6.5. Packetization Modes 499 [Ed.Note(TS): Need to add text from [I-D.draft-ietf-avt-rtp-svc] to 500 this section with respect to MVC.] 502 6.5.1. Packetization Modes for single-session transmission 504 This section will address the issues of section 4.5.1 and 5.1 of [I- 505 D.draft-ietf-avt-rtp-svc]. 507 6.5.2. Packetization Modes for multi-session transmission 509 This section will address the issues of section 4.5.2 and 5.2 of [I- 510 D.draft-ietf-avt-rtp-svc]. 512 6.6. Aggregation Packets 514 This section will address the issues of section 4.7 of [I-D.draft- 515 ietf-avt-rtp-svc]. 517 6.7. Fragmentation Units (FUs) 519 This section will address the issues of section 4.8 of [I-D.draft- 520 ietf-avt-rtp-svc]. 522 6.8. Payload Content Scalability Information (PACSI) NAL Unit for MVC 524 A new NAL unit type is specified in this memo, and referred to as 525 payload content scalability information (PACSI) NAL unit. The PACSI 526 NAL unit, if present, MUST be the first NAL unit in an aggregation 527 packet, and it MUST NOT be present in other types of packets. The 528 PACSI NAL unit indicates view and temporal scalability information 529 and other characteristics that are common for all the remaining NAL 530 units in the payload of the aggregation packet. Furthermore, a PACSI 531 NAL unit MAY include a DONC field and contain zero or more SEI NAL 532 units. PACSI NAL unit makes it easier for MANEs to decide whether to 533 forward/process/discard the aggregation packet containing the PACSI 534 NAL unit. Senders MAY create PACSI NAL units and receivers MAY 535 ignore them, or use them as hints to enable efficient aggregation 536 packet processing. Note that the NAL unit type for the PACSI NAL 537 unit is selected among those values that are unspecified in [MVC] and 538 [RFC3984]. 540 When the first aggregation unit of an aggregation packet contains a 541 PACSI NAL unit, there MUST be at least one additional aggregation 542 unit present in the same packet. The RTP header and payload header 543 fields of the aggregation packet are set according to the remaining 544 NAL units in the aggregation packet. 546 When a PACSI NAL unit is included in a multi-time aggregation packet 547 (MTAP), the decoding order number (DON) for the PACSI NAL unit MUST 548 be set to indicate that the PACSI NAL unit has an identical DON to 549 the first NAL unit in decoding order among the remaining NAL units in 550 the aggregation packet. 552 The structure of a PACSI NAL unit is as follows. The first four 553 octets are exactly the same as the four-byte MVC NAL unit header as 554 discussed in section 3.3. They are followed by two always present 555 octet, two optional octets, and zero or more SEI NAL units, each SEI 556 NAL unit preceded by a 16-bit unsigned size field (in network byte 557 order) that indicates the size of the following NAL unit in bytes 558 (excluding these two octets, but including the NAL unit type octet of 559 the SEI NAL unit). Figure 1 illustrates the PACSI NAL unit structure 560 and an example of a PACSI NAL unit containing two SEI NAL units. 562 The bits P, C, S, and E are specified only if the bit X is equal to 1. 563 The T bit MUST NOT be equal to 1 if the aggregation packet containing 564 the PACSI NAL unit is not an STAP-A packet. The T bit MAY be equal 565 to 1 if the aggregation packet containing the PACSI NAL unit is an 566 STAP-A packet. The field DONC MUST NOT be present if the T bit is 567 equal to 0, and MUST be present if the T bit is equal to 1. 569 0 1 2 3 570 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 |F|NRI| Type |S| PRID | TID |A| VID |I|V|R| 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 |X|T|RR |P|C|S|E| RRR | DONC (optional) | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | NAL unit size 1 | | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 1 | 578 | | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | NAL unit size 2 | | 581 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ SEI NAL unit 2 | 582 | | 583 | +-+-+-+-+-+-+-+-+ 584 | | 585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 Figure 1. PACSI NAL unit structure 589 The values of the fields in PACSI NAL unit MUST be set as follows. 590 The term "target NAL units" are used in the semantics of some fields. 591 The target NAL units are such NAL units contained in the aggregation 592 packet, but not included in the PACSI NAL unit, that are within the 593 access unit to which the first NAL unit following the PACSI NAL unit 594 in the aggregation packet belongs. 596 o The F bit MUST be set to 1 if the F bit in at least one of the 597 remaining NAL units in the aggregation packet is equal to 1. 598 Otherwise, the F bit MUST be set to 0. 600 o The NRI field MUST be set to the highest value of NRI field among 601 all the remaining NAL units in the aggregation packet. 603 o The Type field MUST be set to 30. 605 o The S bit MUST be set to 1. 607 o The PRID field MUST be set to the lowest value of the PRID values 608 of all the remaining NAL units in the aggregation packet. 610 o The TID field MUST be set to the lowest value of the TID values of 611 all the remaining NAL units with the lowest value of VID in the 612 aggregation packet. 614 o The A bit MUST be set to 1 if the A bit of at least one of the 615 remaining NAL units in the aggregation packet is equal to 1. 616 Otherwise, the A bit MUST be set to 0. 618 o The VID field MUST be set to the lowest value of the VID values of 619 all the remaining NAL units in the aggregation packet. 621 o The I bit MUST be set to 1 if the I bit of at least one of the 622 remaining NAL units in the aggregation packet is equal to 1. 623 Otherwise, the I bit MUST be set to 0. 625 o The V bit MUST be set to 1 if the V bit of at least one of the 626 remaining NAL units in the aggregation packet is equal to 1. 627 Otherwise, the A bit MUST be set to 0. 629 o The R bit MUST be set to 0. Receivers SHOULD ignore the value of R. 631 o If the X bit is equal to 1, the bits P, C, S, and E are specified 632 as below. Otherwise, the bits P, C, S, and E are unspecified, and 633 receivers MUST ignore these bits. The X bit SHOULD be identical for 634 all the PACSI NAL units involved in all the RTP sessions conveying an 635 MVC bitstream. 637 o The RR field MUST be set to '00' (in binary form). Receivers 638 SHOULD ignore the value of RR. 640 o If the T bit is equal to 1, the OPTIONAL field DONC MUST be present 641 and specified as below. Otherwise, the field DONC MUST NOT be present. 643 o The P bit MUST be set to 1 if all the remaining NAL units in the 644 aggregation packet are with redundant_pic_cnt higher than 0, i.e. the 645 slices are redundant slices. Otherwise, the P bit MUST be set to 0. 647 Informative note: The P bit indicates whether the packet can be 648 discarded because it contains only redundant slice NAL units. 649 Without this bit, the corresponding information can be concluded 650 from the syntax element redundant_pic_cnt, which is buried in the 651 variable-length coded slice header. 653 o The C bit MUST be set to 1 if the target NAL units belong to an 654 access unit for which the view components are intra coded. Otherwise, 655 the C bit MUST be set to 0. The C bit SHOULD be identical for all 656 the PACSI NAL units for which the target NAL units belong to the same 657 access unit. 659 Informative note: The C bit indicates whether the packet contains 660 intra slices which may be the only packets to be forwarded for a 661 fast forward playback, e.g. when the network condition is 662 extremely bad. 664 o The S bit MUST be set to 1, if the first VCL NAL unit, in 665 transmission order, of the view component containing the first NAL 666 unit following the PACSI NAL unit in the aggregation packet is 667 present in the aggregation packet. Otherwise, the S bit MUST be set 668 to 0. 670 o The E bit MUST be set to 1, if the last VCL NAL unit, in 671 transmission order, of the view component containing the first NAL 672 unit following the PACSI NAL unit in the aggregation packet is 673 present in the aggregation packet. Otherwise, the E field MUST be 674 set to 0. 676 Informative note: The S or E bit indicates whether the first or 677 last slice, in transmission order, of a view component is in the 678 packet, to enable a MANE to detect slice loss and take proper 679 action such as requesting a retransmission as soon as possible, 680 as well as to allow an efficient playout buffer handling 681 similarly as the M bit in the RTP header. The M bit in the RTP 682 header still indicates the end of an access unit, not the end of 683 a view component. 685 o The RRR field MUST be set to '00000000'(in binary form). Receivers 686 SHOULD ignore the value of RRR. 688 o When present, the field DONC indicates the CL-DON value for the 689 first NAL unit in the STAP-A in transmission order. 691 SEI NAL units included in the PACSI NAL unit, if any, MUST contain a 692 subset of the SEI messages associated with the access unit of the 693 first NAL unit following the PACSI NAL unit within the aggregation 694 packet. 696 Informative note: Senders may repeat such SEI NAL units in the 697 PACSI NAL unit the presence of which in more than one packet is 698 essential for packet loss robustness. Receivers may use the 699 repeated SEI messages in place of missing SEI messages. 701 An SEI message SHOULD NOT be included in a PACSI NAL unit and 702 included in one of the remaining NAL units contained in the same 703 aggregation packet. 705 6.9. Non-Interleaved Multi-Time Aggregation Packets (NI-MTAPs) 707 This section will address the issues of section 4.7.1 of [I-D.draft- 708 ietf-avt-rtp-svc]. 710 6.10. Cross-Session DON (CS-DON) for multi-session transmission 712 This section will address the issues of section 4.11 of [I-D.draft- 713 ietf-avt-rtp-svc]. 715 7. Packetization Rules 717 [Ed.Note(TS): We need to adjust this section with respect to [I- 718 D.draft-ietf-avt-rtp-svc].] 720 Section 6 of [RFC3984] applies. The following rules apply in 721 addition. 723 All receivers MUST support the single NAL unit packetization mode to 724 provide backward compatibility to endpoints supporting only the 725 single NAL unit mode of RFC 3984. However, the single NAL unit 726 packetization mode SHOULD NOT be used whenever possible, because 727 encapsulating NAL units of small sizes, e.g. small NAL units 728 containing parameter sets, SEI messages or prefix NAL units, in their 729 own packets is typically less efficient because of the relatively big 730 overhead. 732 All receivers MUST support the non-interleaved packetization mode. 734 Informative note: The non-interleaved mode allows an application 735 to encapsulate a single NAL unit in a single RTP packet. 736 Historically, the single NAL unit mode has been included into 737 [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A 738 [H.241]. There is no point in carrying this historic ballast 739 towards a new application space such as the one provided with MVC. 740 More technically speaking, the implementation complexity increase 741 for providing the additional mechanisms of the non-interleaved 742 mode (namely STAP-A and FU-A) is minor, and the benefits are 743 great, that STAP-A implementation is required. 745 A NAL unit of small size SHOULD be encapsulated in an aggregation 746 packet together with one or more other NAL units. For example, non- 747 VCL NAL units such as access unit delimiter, parameter set, or SEI 748 NAL unit are typically small. 750 A prefix NAL unit SHOULD be aggregated to the same packet as the 751 associated NAL unit following the prefix NAL unit in decoding order. 753 When the first aggregation unit of an aggregation packet contains a 754 PACSI NAL unit, there MUST be at least one additional aggregation 755 unit present in the same packet. 757 When an MVC bitstream is transported in more than one RTP session, 758 the following applies. 760 o Interleaved mode SHOULD be used for all the RTP sessions. 762 o An RTP session that does not use interleaved mode SHOULD be 763 constrained as follows. 765 - Non-interleaved mode MUST be used. 767 - STAP-A MUST be used, and any other type of packets MUST NOT be 768 used. 770 - Each STAP-A MUST contain a PACSI NAL unit and the DONC field MUST 771 be present in the PACSI NAL unit. 773 Informative note: The motivation for these constraints is to 774 allow the use of non-interleaved mode for the session conveying 775 the H.264/AVC compatible view, such that RFC 3984 receivers 776 without interleaved mode implementation can subscribe to the base 777 view session. 779 Non-VCL NAL units SHOULD be conveyed in the same session as the 780 associated VCL NAL units. To meet this, SEI messages that are 781 contained in scalable nesting SEI message and are applicable to more 782 than one session SHOULD be separated and contained into multiple 783 scalable nesting SEI messages. The DON values MUST indicate the 784 cross-layer decoding order number values as if all these SEI messages 785 were in separate scalable nesting SEI messages and contained in the 786 beginning of the corresponding access units as specified in [MVC]. 788 8. De-Packetization Process (Informative) 790 For a single RTP session, the de-packetization process specified in 791 section 7 of [RFC3984] applies. 793 For receiving more than one of multiple RTP sessions conveying a 794 scalable bitstream, an example of a suitable implementation of the 795 de-packetization process is to be specified similarly as what will be 796 finally included in [I-D.draft-ietf-avt-svc]. 798 9. Payload Format Parameters 800 This section specifies the parameters that MAY be used to select 801 optional features of the payload format and certain features of the 802 bitstream. The parameters are specified here as part of the media 803 type registration for the MVC codec. A mapping of the parameters 804 into the Session Description Protocol (SDP) [RFC4566] is also 805 provided for applications that use SDP. Equivalent parameters could 806 be defined elsewhere for use with control protocols that do not use 807 SDP. 809 9.1. Media Type Registration 811 The media subtype for the MVC codec is allocated from the IETF tree. 813 The receiver MUST ignore any unspecified parameter. 815 Informative note: Requiring ignoring unspecified parameter allows 816 for backward compatibility of future extensions. For example, if 817 a future specification that is backward compatible to this 818 specification specifies some new parameters, then a receiver 819 according to this specification is capable of receiving data per 820 the new payload but ignoring those parameters newly specified in 821 the new payload specification. This sentence is also present in 822 RFC 3984. 824 Media Type name: video 826 Media subtype name: H264-MVC 827 The media subtype "H264" MUST be used for RTP streams using RFC 3984, 828 i.e. not using any of the new features introduced by this 829 specification compared to RFC 3984. For RTP streams using any of the 830 new features introduced by this specification compared to RFC 3984, 831 the media subtype "H264-MVC" SHOULD be used, and the media subtype 832 "H264" MAY be used. Use of the media subtype "H264" for RTP streams 833 using the new features allows for RFC 3984 receivers to negotiate and 834 receive H.264/AVC or MVC streams packetized according to this 835 specification, but to ignore media parameters and NAL unit types it 836 does not recognize. 838 Required parameters: none 840 OPTIONAL parameters: to be specified. 842 Encoding considerations: 844 This type is only defined for transfer via RTP (RFC 3550). 846 Security considerations: 848 See section 10 of RFC XXXX. 850 Public specification: 852 Please refer to RFC XXXX and its section 14. 854 Additional information: none 856 File extensions: none 858 Macintosh file type code: none 860 Object identifier or OID: none 862 Person & email address to contact for further information: 864 Intended usage: COMMON 866 Author: NN 868 Change controller: 870 IETF Audio/Video Transport working group delegated from the IESG. 872 9.2. SDP Parameters 874 9.2.1. Mapping of Payload Type Parameters to SDP 876 The media type video/H264-MVC string is mapped to fields in the 877 Session Description Protocol (SDP) as follows: 879 The media name in the "m=" line of SDP MUST be video. 881 The encoding name in the "a=rtpmap" line of SDP MUST be H264-MVC (the 882 media subtype). 884 The clock rate in the "a=rtpmap" line MUST be 90000. 886 The OPTIONAL parameters, when present, MUST be included in the 887 "a=fmtp" line of SDP. These parameters are expressed as a media type 888 string, in the form of a semicolon separated list of parameter=value 889 pairs. 891 9.2.2. Usage with the SDP Offer/Answer Model 893 TBD. 895 9.2.3. Usage with multi-session transmission 897 If multi-session transmission is used, the rules on signaling media 898 decoding dependency in SDP as defined in 899 [I-D.draft-ietf-mmusic-decoding-dependency] apply. 901 9.2.4. Usage in Declarative Session Descriptions 903 TBD. 905 9.3. Examples 907 TBD. 909 9.4. Parameter Set Considerations 911 Please see section 10 of [RFC3984]. 913 10. Security Considerations 915 Please see section 11 of [RFC3984]. 917 11. Congestion Control 919 TBD. 921 12. IANA Considerations 923 Request for media type registration to be added. 925 13. Acknowledgments 927 The author Thomas Schierl of Fraunhofer HHI is sponsored by the 928 European Commission under the contract number FP7-ICT-214063, project 929 SEA. 931 This document was prepared using 2-Word-v2.0.template.dot. 933 14. References 935 14.1. Normative References 937 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 938 generic audiovisual services", 3rd Edition, November 2007. 940 [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Wang, Y. -K., Schierl, T. 941 and A. Eleftheriadis, "RTP payload format for SVC video", 942 draft-ietf-avt-rtp-svc-13 (work in progress), July 2008. 944 [I-D.draft-ietf-mmusic-decoding-dependency] Schierl, T., and Wenger, 945 S., "Signaling media decoding dependency in Session 946 Description Protocol (SDP)", draft-ietf-mmusic-decoding- 947 dependency-02 (work in progress), May 2008. 949 [MPEG4-10] 950 ISO/IEC International Standard 14496-10:2005. 952 [MVC] Joint Video Team, "Joint Draft 7 of MVC ", available from 953 http://ftp3.itu.ch/av-arch/jvt-site/2008_04_Geneva/JVT- 954 AA209.zip, Geneva, Switzerland, April 2008. 956 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 957 Requirement Levels", BCP 14, RFC 2119, March 1997. 959 [RFC3548] Josefsson, S., "The Base16, Base32, and Base64 Data 960 Encodings", RFC 3548, July 2003. 962 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, 963 V., "RTP: A Transport Protocol for Real-Time Applications", 964 STD 64, RFC 3550, July 2003. 966 [RFC3984] Wenger, S., Hannuksela, M., Stockhammer, T., Westerlund, M., 967 and Singer, D., "RTP Payload Format for H.264 Video", RFC 968 3984, February 2005. 970 [RFC4566] Handley, M., Jacobson, V., and Perkins, C., "SDP: Session 971 Description Protocol", RFC 4566, July 2006. 973 14.2. Informative References 975 [DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H 976 Implementation Guidelines, ETSI TR 102 377, 2005. 978 [H.241] ITU-T Rec. H.241, "Extended video procedures and control 979 signals for H.300-series terminals", May 2006. 981 [IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B., and 982 Thyagarajan, A., "Internet Group Management Protocol, 983 Version 3", RFC 3376, October 2002. 985 [McCanne] McCanne, S., Jacobson, V., and Vetterli, M., "Receiver- 986 driven layered multicast", in Proc. of ACM SIGCOMM'96, 987 pages 117--130, Stanford, CA, August 1996. 989 [MBMS] 3GPP - Technical Specification Group Services and System 990 Aspects; Multimedia Broadcast/Multicast Service (MBMS); 991 Protocols and codecs (Release 6), December 2005. 993 [MPEG2] ISO/IEC International Standard 13818-2:1993. 995 [RFC3450] Luby, M., Gemmell, J., Vicisano, L., Rizzo, L., and 996 Crowcroft, J., "Asynchronous layered coding (ALC) protocol 997 instantiation", RFC 3450, December 2002. 999 Author's Addresses 1001 Ye-Kui Wang 1002 Huawei Technologies 1003 400 Somerset Corporate Blvd, Suite 602 1004 Bridgewater, NJ 08807 1005 USA 1007 Phone: +1-908-541-3518 1008 EMail: yekuiwang@huawei.com 1010 Thomas Schierl 1011 Fraunhofer HHI 1012 Einsteinufer 37 1013 D-10587 Berlin 1014 Germany 1016 Phone: +49-30-31002-227 1017 EMail: schierl@hhi.fhg.de 1019 15. Open issues: 1021 - The use of CL-DON for session reordering allows also for 1022 interleaved transmission with non-interleaved packetization mode. 1023 There should be a clear separation between both tools. This issue 1024 should be handled the same way as for the SVC payload draft. 1026 - Since SVC session multiplexing (multi source transmission(MST)) is 1027 cleared, it would be great to just reference the MST sections in 1028 [I-D.draft-ietf-avt-rtp-svc]. Since the text in sections 6 and 7 1029 of [I-D.draft-ietf-avt-rtp-svc] is currently very SVC specific, 1030 the authors would have to try to rewrite these sections in a more 1031 generic way. If this is not possible, we need to copy text from 1032 [I-D.draft-ietf-avt-rtp-svc] with respect to MVC. 1034 16. Changes Log 1036 Initial version 00 1038 10 November 2007: YkW 1039 Initial version 1041 12 November 2007: TS 1042 - Added definition of "Session multiplexing" 1043 - Added the reference of [I-D.draft-ietf-mmusic-decoding- 1044 dependency], and its reference in section 9.2.3 1046 12 November 2007: YkW 1047 - Added the reference of [I-D.draft-ietf-avt-svc] and its 1048 reference in section 1. 1049 - Added in sections 3.1 and 3.2 paragraphs regarding inter-view 1050 prediction 1052 From draft-wang-avt-rtp-mvc-00 to draft-wang-avt-rtp-mvc-01 1054 18 February 2008: YkW 1055 - Alignment to the latest MVC draft in JVT-Z209 and version 07 1056 of [I-D.draft-ietf-avt-svc]. 1058 25 February 2008: TS 1060 - Minor modifications and updates throughout the document 1062 - Added open issue on clear separation between "decoding order 1063 recovery" and "interleaving" 1065 From draft-wang-avt-rtp-mvc-01 to draft-wang-avt-rtp-mvc-02 1067 09 July 2008: TS 1069 - Minor modifications and updates throughout the document 1071 - Added open issue 1073 - NAL unit header alignment with MVC spec 1075 - Section 6. References corresponding sections in [RFC3984] and [I- 1076 D.draft-ietf-avt-svc]. 1078 - TBD: Section 7, we may align [I-D.draft-ietf-avt-svc] in a way 1079 that SVC is not mentioned in this paragraphs, so that we can 1080 reference them from this document. 1082 21 August 2008: 1084 - Minor modifications, editing and adding notes throughout the 1085 document. 1087 - Updated references 1089 From draft-wang-avt-rtp-mvc-02 to draft-wang-avt-rtp-mvc-03 1091 04 February 2009: YkW 1093 - Updated author's address. 1095 04 February 2009: YkW 1097 - Updated the boiler template. 1099 From draft-wang-avt-rtp-mvc-03 to draft-wang-avt-rtp-mvc-04 1101 22 October 2009: YkW 1103 - Updated author's address and the boiler template (added the last 1104 sentence in Copyright Notice). 1106 From draft-wang-avt-rtp-mvc-04 to draft-wang-avt-rtp-mvc-05 1108 22 April 2010: YkW 1110 - To keep the draft alive, no change other than version number etc.