idnits 2.17.1 draft-wenger-avt-rtp-svc-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1465. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1442. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1449. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1455. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2006) is 6396 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'SRTP' is defined on line 1409, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG4-10' == Outdated reference: A later version (-04) exists of draft-schierl-mmusic-layered-codec-01 -- Possible downref: Non-RFC (?) normative reference: ref. 'SVC' ** Obsolete normative reference: RFC 3984 (Obsoleted by RFC 6184) Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Wenger 3 Internet Draft Y.-K. Wang 4 Document: draft-wenger-avt-rtp-svc-03.txt T. Schierl 5 Expires: April 2007 6 October 2006 8 RTP Payload Format for SVC Video 10 Status of this Memo 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been or will be disclosed, and any of which he or she becomes 15 aware will be disclosed, in accordance with Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt. 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 This Internet-Draft will expire on April 20, 2007. 35 Copyright Notice 37 Copyright (C) The Internet Society (2006). 39 Abstract 41 This memo describes an RTP Payload format for the scalable extension 42 of the ITU-T Recommendation H.264 video codec which is the 43 technically identical to ISO/IEC International Standard 14496-10 44 video codec. The RTP payload format allows for packetization of one 45 or more Network Abstraction Layer Units (NALUs), produced by the 46 video encoder, in each RTP payload. The payload format has wide 47 applicability, as it supports applications from simple low bit-rate 48 conversational usage, to Internet video streaming with interleaved 49 transmission, to high bit-rate video-on-demand. 51 Table of Content 53 RTP Payload Format for SVC Video...............................1 54 1. Introduction..............................................5 55 1.1. SVC -- the scalable extensions of H.264/AVC................5 56 2. Conventions...............................................5 57 3. The SVC Codec.............................................6 58 3.1. Overview................................................6 59 3.2. Parameter Set Concept....................................7 60 3.3. Network Abstraction Layer Unit Header......................7 61 4. Scope...................................................11 62 5. Definitions and Abbreviations .............................11 63 5.1. Definitions............................................11 64 5.2. Abbreviations..........................................14 65 6. RTP Payload Format.......................................14 66 6.1. Design Principles.......................................14 67 6.2. RTP Header Usage........................................15 68 6.3. Common Structure of the RTP Payload Format................16 69 6.4. NAL Unit Header Usage...................................17 70 6.5. Packetization Modes.....................................18 71 6.6. Decoding Order Number (DON)..............................18 72 6.7. Single NAL Unit Packet..................................19 73 6.8. Aggregation Packets.....................................19 74 6.9. Fragmentation Units (FUs)................................19 75 6.10. Payload Content Scalability Information (PACSI) NAL Unit..19 76 7. Packetization Rules ......................................22 77 8. De-Packetization Process (Informative).....................22 78 9. Payload Format Parameters.................................22 79 9.1. MIME Registration.......................................23 80 9.2. SDP Parameters .........................................25 81 9.2.1. Mapping of MIME Parameters to SDP.......................25 82 9.2.2. Usage with the SDP Offer/Answer Model...................25 83 9.2.3. Usage with Session and SSRC multiplexing.................26 84 9.2.4. Usage in Declarative Session Descriptions................26 85 9.3. Examples...............................................26 86 9.4. Parameter Set Considerations.............................26 87 10. Security Considerations.................................26 88 11. Congestion Control......................................26 89 12. IANA Consideration......................................27 90 13. Informative Appendix: Application Examples................27 91 13.1. Introduction..........................................28 92 13.2. Layered Multicast.....................................28 93 13.3. Streaming of an SVC scalable stream.....................29 94 13.4. Multicast to MANE, SVC scalable stream to endpoint........30 95 13.5. SSRC Multiplexing in case of using SRTP .................32 96 13.6. Scenarios currently not considered for complexity reasons.34 97 13.7. Scenarios currently not considered for being unaligned with 98 IP philosophy...............................................34 99 14. Acknowledgements........................................36 100 15. References.............................................36 101 15.1. Normative References...................................36 102 15.2. Informative References.................................37 103 16. Author's Addresses......................................37 104 17. Intellectual Property Statement..........................38 105 18. Disclaimer of Validity..................................38 106 19. Copyright Statement.....................................38 107 20. RFC Editor Considerations................................39 108 21. Open Issues............................................39 109 22. Changes Log............................................39 111 1. Introduction 113 1.1. SVC -- the scalable extensions of H.264/AVC 115 This memo specifies an RTP [RFC3550] payload format for a 116 forthcoming new mode of the H.264/AVC video codec, known as Scalable 117 Video Coding (SVC). Formally, SVC will take the form of an Amendment 118 to ISO/IEC 14496 Part 10 [MPEG4-10], and likely as one or more new 119 Annexes of ITU-T Rec. H.264 [H.264]. It is planned to keep the 120 technical alignment between the two mentioned specifications, as 121 well as backward compatibility with previous versions of H.264/AVC. 123 The current working draft of SVC is available for public review 124 [SVC]. In this memo, SVC is used as an acronym for the mentioned 125 scalable extensions of H.264/AVC. 127 SVC covers all of H.264/AVC's applications, ranging from all forms 128 of digital compressed video from, low bit-rate Internet streaming 129 applications to HDTV broadcast and Digital Cinema applications with 130 nearly lossless coding. 132 This memo tries to follow a backward compatible enhancement 133 philosophy similar to what the video coding standardization 134 committees implement, by keeping as close an alignment to the 135 H.264/AVC payload RFC [RFC3984] as possible. It basically documents 136 the enhancements relevant from an RTP transport viewpoint, defines 137 signaling support for SVC, and deprecates the single NAL unit 138 packetization mode of RFC 3984. 140 2. Conventions 142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 144 document are to be interpreted as described in BCP 14, RFC 2119 145 [RFC2119]. 147 This specification uses the notion of setting and clearing a bit 148 when bit fields are handled. Setting a bit is the same as assigning 149 that bit the value of 1 (On). Clearing a bit is the same as 150 assigning that bit the value of 0 (Off). 152 3. The SVC Codec 154 3.1. Overview 156 SVC provides scalable video bitstreams. In SVC, a scalable video 157 bitstream contains a base layer conforming to the existing profiles 158 of H.264 as defined in [H.264] and one or more enhancement layers. 159 An enhancement layer may enhance the temporal resolution (i.e. the 160 frame rate), the spatial resolution, or the quality of the video 161 content represented by the lower layer or part thereof. The 162 scalable layers can be aggregated to a single RTP packet stream, or 163 transported independently. 165 The concept of video coding layer (VCL) and network abstraction 166 layer (NAL) is inherited from H.264. The VCL contains the signal 167 processing functionality of the codec; mechanisms such as transform, 168 quantization, motion-compensated prediction, loop filtering and 169 inter-layer prediction. A coded picture of a base or enhancement 170 layer consists of one or more slices. The Network Abstraction Layer 171 (NAL) encapsulates each slice generated by the VCL into one or more 172 Network Abstraction Layer Units (NAL units). Please consult RFC 3984 173 for a more in-depth discussion of the NAL unit concept. SVC 174 specifies the decoding order of these NAL units. 176 [Edt. Note: The definition of a ''coded picture'' is currently 177 under discussion in JVT. For now, we apply the same 178 definition as in the AVC specification within a give scalable 179 layer. That is, a ''coded picture'' consists of all the coded 180 slices having identical values of dependency_id, 181 quality_level and redundant_pic_cnt, respectively, in one 182 access unit.] 184 The term ''Layer'' in Video Coding Layer and Network Abstraction 185 Layer refers to a conceptual distinction, and is closely related to 186 syntax layers (block, macroblock, slice, ... layers). ''Layer'' here 187 describes a syntax level of the bitstream in contrast to the meaning 188 of layer as a nested part of the bitstream which may be discarded. 189 It should not be confused with base and enhancement layers. 191 The concept of temporal scalability is not newly introduced by SVC, 192 as H.264 already supports it. In [H.264], sub-sequences have been 193 introduced in order to allow optional use of temporal layers. [SVC] 194 extends this approach by advertising the temporal layer information 195 within the NAL unit header, or suffix NAL units, as discussed in 196 section 3.3 and [SVC]. By our definition, the base layer may be 197 scalable in the temporal dimension (only). 199 The concept of scaling the visual content quality in the granularity 200 of complete enhancement layers, i.e. through omitting the transport 201 and decoding of entire enhancement layers, is denoted as coarse- 202 grained scalability (CGS). This is what is commonly understood as 203 scalability in the IETF community. According to SVC, a CGS layer 204 may be a spatial or quality (SNR) enhancement layer. 206 In some cases, the bit rate of a given enhancement layer may be 207 reduced by truncating bits from individual NAL units. Truncation 208 leads to a graceful degradation of the video quality of the 209 reproduced enhancement layer. This concept is known as Fine 210 Granularity Scalability (FGS). In SVC, FGS is provided by a concept 211 known as progressive refinement slices. 213 3.2. Parameter Set Concept 215 The parameter set concept is inherited from [H.264]. Please see 216 section 1.2 of RFC 3984 for more details. 218 In SVC, pictures from different layers may use the same sequence or 219 picture parameter set, but may also use different sequence or 220 picture parameter sets. If different sequence or picture parameter 221 sets are used, then, at any time instant during the decoding 222 process, there may be more than one active sequence or picture 223 parameter set. Any specific active sequence parameter set remains 224 unchanged throughout a coded video sequence in the layer in which 225 the active sequence parameter set is referred to. The active 226 picture parameter set remains unchanged within a coded picture. 228 3.3. Network Abstraction Layer Unit Header 229 An SVC NAL unit consists of a header of four bytes and the payload 230 byte string. SVC extends by that the NAL unit header defined in 231 [H.264] by three additional bytes. The header indicates the type of 232 the NAL unit, the (potential) presence of bit errors or syntax 233 violations in the NAL unit payload, information regarding the 234 relative importance of the NAL unit for the decoding process, the 235 layer decoding dependency information, and FGS fragmentation 236 information. This RTP payload specification is designed to be 237 unaware of the bit string in the NAL unit payload. 239 The NAL unit header co-serves as the payload header of this RTP 240 payload format. The payload of a NAL unit follows immediately. 242 The syntax and semantics of the NAL unit header are formally 243 specified in [SVC], but the essential properties of the NAL unit 244 header are summarized below. 246 The first byte of the NAL unit header has the following format (the 247 bit fields are the same as in [H.264] and [RFC3984], while the 248 semantics have changed slightly, in a backward compatible way): 250 +---------------+ 251 |0|1|2|3|4|5|6|7| 252 +-+-+-+-+-+-+-+-+ 253 |F|NRI| Type | 254 +---------------+ 256 F: 1 bit 257 forbidden_zero_bit. H.264 declares a value of 1 as a syntax 258 violation. 260 NRI: 2 bits 261 nal_ref_idc. A value of 00 indicates that the content of the NAL 262 unit is not used to reconstruct reference pictures for inter picture 263 prediction. Such NAL units can be discarded without risking the 264 integrity of the reference pictures in the same layer. Values 265 greater than 00 indicate that the decoding of the NAL unit is 266 required to maintain the integrity of the reference pictures. 268 Type: 5 bits 269 nal_unit_type. This component specifies the NAL unit payload type 270 as defined in table 7-1 of [SVC], and later within this memo. For a 271 reference of all currently defined NAL unit types and their 272 semantics, please refer to section 7.4.1 in [SVC]. 274 Previously, NAL unit types 20 and 21 (among others) have been 275 reserved for future extensions. SVC is using these two NAL unit 276 types. They indicate the presence of three more bytes as shown 277 below. 279 +---------------+---------------+---------------+ 280 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 282 |RR | PRID | TL | DID | QL|R|B|U|D|G|L| O | 283 +---------------+---------------+---------------+ 285 RR: 2 bits 286 reserved_zero_two_bits. Reserved bits for future extension. RR 287 MUST be zero. 289 PRID: 6 bits 290 simple_priority_id. This component specifies a priority identifier 291 for the NAL unit. A lower value of PRID indicates a higher 292 priority. 294 TL: 3 bits 295 temporal_level indicates the temporal layer (or frame rate) 296 hierarchy. Informally put, a layer consisted of pictures of a 297 smaller temporal_level value has a smaller frame rate. A given 298 temporal layer typically depends on the lower temporal layers (i.e. 299 the temporal layers with smaller temporal_level values) but never 300 depends on any higher temporal layer. 302 DID: 3 bits 303 dependency_id denotes the inter-layer coding dependency hierarchy. 304 At any temporal location, a picture of a smaller dependency_id value 305 may be used for inter-layer prediction for coding of a picture of a 306 larger dependency_id value, while a picture of a larger 307 dependency_id value is disallowed to be used for inter-layer 308 prediction for coding of a picture of a smaller dependency_id value. 310 QL: 2 bits 311 quality_level designates the quality level hierarchy of a 312 progressive refinement (PR) or quality (SNR) enhancement layer 313 slice. At any temporal location and with identical dependency_id 314 value, a picture with quality_level equal to ql uses a picture with 315 quality_level equal to ql-1 for inter-layer prediction. 317 R: 1 bit 318 reserved_zero_bit. Reserved bit for future extension. R MUST be 319 zero. 321 B: 1 bit 322 layer_base_flag indicates that no inter-layer prediction (of coding 323 mode, motion, sample value, and/or residual prediction) is used for 324 the current slice otherwise inter-layer prediction may be used. 326 U: 1 bit 327 use_base_prediction_flag indicates that the base representation of 328 the reference pictures (i.e. only NAL units of the reference 329 pictures with QL equal to zero are used for inter prediction) is 330 used during the inter prediction process. 332 D: 1 bit 333 discardable_flag. A value of 1 indicates that the content of the 334 NAL unit with dependency_id equal to currDependencyId is not used in 335 the decoding process of NAL units with dependency_id larger than 336 currDependencyId. Such NAL units can be discarded without risking 337 the integrity of higher scalable layers with larger values of 338 dependency_id. discardable_flag equal to 0 indicates that the 339 decoding of the NAL unit is required to maintain the integrity of 340 higher scalable layers with larger values of dependency_id. 342 G: 1 bit 343 fragmented_flag indicates that the current NAL unit is fragmented, 344 which may be the case for partitions of an FGS (progressive 345 refinement) slice. 347 L: 1 bit 348 last_fragemented_flag indicates, that the NAL unit is the last 349 fragment of a fragmented NAL unit. 351 O: 2 bits 352 fragemnet_order indicates the order in which the NAL units with 353 fragmented_flag equal to 1 shall be ordered before the parsing 354 process is started, starting from lower values. 356 This memo introduces the same additional NAL unit types as RFC 3984, 357 which are presented in section 6.3. The NAL unit types defined in 358 this memo are marked as unspecified in [SVC]. Moreover, this 359 specification extends the semantics of F, NRI, PRID, D, TL, DID and 360 QL as described in section 6.4. 362 4. Scope 364 This payload specification can only be used to carry the "naked" SVC 365 NAL unit stream over RTP, and not the byte stream format according 366 to Annex B of [SVC]. Likely, the applications of this specification 367 will be in the IP based multimedia communications fields including 368 conversational multimedia, video telephony or video conferencing, 369 Internet streaming and TV over IP. 371 This specification allows, in a given RTP session, to encapsulate 372 NAL units belonging to 373 o the base layer only, detailed specification in [RFC3984], or 374 o one or more enhancement layers, or 375 o the base layer and one or more enhancement layers 377 5. Definitions and Abbreviations 379 5.1. Definitions 381 This document uses the definitions of [SVC] and [H.264]. The 382 following terms, defined in [SVC], are summed up for convenience: 384 scalable bitstream: An SVC compliant bit stream containing a base 385 layer and at least one enhancement layer. 387 suffix NAL unit: A NAL unit that immediately follows another NAL 388 unit in decoding order and contains descriptive information of the 389 preceding NAL unit, which is referred to as the associated NAL unit. 391 A suffix NAL unit shall have nal_ref_idc equal to 20 or 21, shall 392 have dependency_id and quality_level both equal to 0, and shall not 393 contain a coded slice. A suffix NAL unit belongs to the same coded 394 picture as the associated NAL unit. A suffix NAL unit may be used 395 for indicating temporal levels within the base layer. 397 base layer: The base layer is typically representing the minimal 398 spatial resolution and, or minimal quality of an SVC bitstream. The 399 base layer must be fully complying with [H.264]. The base layer is 400 independently decodable without the requirement of using any other 401 layer of the SVC bitstream. In SVC context each slice NAL unit in 402 the base layer is associated with a suffix NAL unit, which has a 403 four-byte NAL unit header containing all the syntax elements 404 described in section 3.3. 406 [Edt. Note: The definition of ''base layer'' is not deadly 407 clear, mainly because of temporal scalability. One definition 408 is to call all the coded pictures in the lowest inter-layer 409 coding hierarchy (i.e. having both dependency_id and 410 quality_level equal to 0) as the base layer. This concept 411 works perfectly if there is no temporal scalability. Another 412 definition is to call all the coded pictures having 413 temporal_level, dependency_id and quality_level all equal to 414 0 as the base layer. Yet another definition is to define the 415 layer for which the bitstream of the scalable layer 416 representation is non-scalable as the base layer. However, 417 the absolutely non-scalable stream is the bitstream 418 consisting of only one IDR picture having both dependency_id 419 and quality_level equal to 0.] 421 operation point: An operation point of a SVC bitstream represents a 422 certain level of temporal, spatial and quality scalability. An 423 operation point contains all NAL units required for restoring a 424 valid bitstream (conforming to [SVC]) up to a certain SVC layer. 425 The operation point is further described by simple_priority_id, 426 temporal_level, dependency_id, and quality_level values of that 427 layer. 429 scalable enhancement layer: An SVC enhancement layer is identified 430 by simple_priority_id, temporal_level, dependency_id, and 431 quality_level as defined in [SVC] and summarized in section 3.3. 433 access unit: A set of NAL units pertaining to a certain temporal 434 location. An access unit includes the slice data of the pictures of 435 all scalable layers at that temporal location and possibly other 436 associated data, e.g. SEI messages and parameter sets. 438 coded video sequence: A sequence of access units that consists, in 439 decoding order, of an instantaneous decoding refresh (IDR) access 440 unit followed by zero or more non-IDR access units including all 441 subsequent access units up to but not including any subsequent IDR 442 access unit. 444 IDR access unit: An access unit in which all the primary coded 445 pictures are IDR pictures. Such an access unit allows for random 446 access to any layer combination. 448 IDR picture: A coded picture with the property that the decoding of 449 this coded picture and all the following coded pictures in decoding 450 order, with the same value of dependency_id, can be performed 451 without inter prediction from any picture prior to the coded picture 452 in decoding order with the same value of dependency_id. Thus an IDR 453 picture allows for random access to the scalable layer, which it 454 belongs to. An IDR picture causes a "reset" in the decoding process 455 of the scalable layer containing the IDR picture. 457 progressive refinement (PR) slice: A progressive refinement slice 458 is contained in an SVC NAL unit that may be truncated since the end 459 of the slice header for bit-rate and quality reduction. PR slices 460 provide Fine Granularity Scalability (FGS). 462 The following terms are itemized for clarification on RTP 463 multiplexing strategies. For further information and discussion on 464 RTP multiplexing, we refer to section 5.2 of [RFC3550]: 466 RTP packet stream: A sequence of RTP packets with increasing 467 sequence numbers, identical PT and SSRC, carried in one RTP session, 468 and utilized to transport an integer number of SVC layers (which may 469 be FGS scalable). 471 Single-Sender RTP Session: an (perhaps multicasted) RTP session in 472 which all RTP packet streams in the session stem from entities that 473 are in close cooperation, and can coordinate SSRC values. By 474 definition, in Single-Sender RTP Sessions, SSRC collisions on the 475 forward media path cannot occur. Note that, in practice, the 476 ''entities in close cooperation'' likely run on the same machine and 477 communicate through non-protocol means, or they communicate by 478 protocols outside the RTP/SIP/SDP environment. 480 Session multiplexing: The scalable SVC bitstream is distributed 481 onto different RTP sessions, whereby each RTP session carries one 482 RTP packet stream. Each RTP session requires a separate signaling 483 and has a separate Timestamp, Sequence Number, and SSRC space. 484 Dependency between sessions MUST be signaled according to 485 [SDPsiglay]. 487 SSRC multiplexing: The scalable SVC bitstream is distributed in a 488 single RTP session, but that session comprises more than one RTP 489 packet stream, identified by its SSRC. 490 The use of SSRC multiplexing MUST be signaled according to 491 [SDPsiglay]. 493 5.2. Abbreviations 495 In addition to the abbreviations defined in [RFC3984], the following 496 ones are defined. 498 CGS: Coarse Granularity Scalability 499 FGS: Fine Granularity Scalability 501 6. RTP Payload Format 503 6.1. Design Principles 505 The authors observed the following design principles: 507 o Backward compatibility with RFC 3984 wherever possible. 509 o As the SVC base layer is H.264/AVC compatible, we assume the base 510 layer (when transmitted in its own session) to be 511 encapsulated using RFC 3984. Requiring this has the desirable 512 side effect that it can be used by RFC 3984 legacy devices. 514 o MANEs are signaling aware and rely on signaling information. 515 MANEs have state. 517 o MANEs can terminate RTP sessions, and create different RTP 518 sessions 519 with perhaps modified content. This form of a MANE acts as an RTP 520 mixer. Mixer-MANEs necessarily need to be in the SRTP security 521 context. 523 o MANEs can also perform very limited functionality, namely 524 aggregate 525 multiple RTP packet streams into a single RTP stream within the 526 same session, by utilizing SSRC multiplexing. In this case, a 527 MANE 528 acts as a translator, and does not necessarily need to be in the 529 security context. 531 o Packet integrity needs to be preserved end-to-end (whereby 532 end-to-end can mean endpoint to endpoint but also endpoint to 533 MANE, if (and only if) the MANE acts as a Mixer). 535 o In case of layered multicast transmission as motivated in section 536 13.2, SVC layers are transported in different RTP sessions 537 (Session multiplexing). If the application should require a 538 layered transmission on session level, the SVC layers are 539 transported in different RTP packet streams within a single RTP 540 session, each stream identified by a unique SSRC (SSRC 541 multiplexing). SSRC multiplexing may further allow for adaptation 542 of an RTP session in the security context, further discussion can 543 be found in section 13.5. 545 6.2. RTP Header Usage 547 Please see section 5.1 of RFC 3984 [RFC3984]. The following applies 548 in addition. 550 When different layers of a SVC bitstream are transported over more 551 than one RTP session, e.g. in layered multicast, for which the use 552 case is given in 13.2, SSRC multiplexing, as described below, MAY be 553 applied. 555 When SSRC multiplexing is in use the same IP address and port number 556 are shared between all RTP streams and all layers, while the 557 relative importance for the decoding process of each RTP stream 558 and/or layer is differentiated by the SSRC values. The SSRC value 559 space is evenly allocated to a number of sub value spaces, with the 560 number of sub value spaces being equal to the number of RTP packet 561 streams forming the RTP session for which SSRC multiplexing is used. 562 The first RTP packet stream conveying the lowest layers is mapped to 563 the first sub SSRC value space with the lowest SSRC values, the 564 second RTP packet stream conveying the second lowest layers is 565 mapped to the second sub SSRC value space with the second lowest 566 SSRC values, and so on. For the RTP packets of a certain RTP packet 567 stream, the SSRC value is randomly selected from the corresponding 568 sub SSRC value space. This way, a packet with a higher SSRC value 569 contains data belonging to higher layers or layers of lower 570 transport priority. 572 SSRC multiplexing as discussed above, in conjunction with multicast 573 from multiple senders requires that a) all streams SSRC multiplexed 574 in the same session carry data of the same layered bitstream, and b) 575 that the different senders are aware (by unspecified means of 576 signaling) of the relative importance of the RTP packet streams they 577 emit. Otherwise, it would be impossible to enforce the allocation 578 of SSRC numbering spaces according to the importance for the 579 decoding process. In other words, SSRC multiplexing as discussed 580 above works only for Single-Sender RTP sessions. 582 Note: in practice, it appears that SSRC multiplexing, due to the 583 above limitation, results in requiring a single entity to send all 584 RTP packet streams. No signaling means are currently available that 585 would allow different senders to coordinate the SSRC value spaces to 586 use. 588 6.3. Common Structure of the RTP Payload Format 590 Please see section 5.2 of RFC 3984 [RFC3984]. 592 6.4. NAL Unit Header Usage 594 The structure and semantics of the NAL unit header were introduced 595 in section 3.3. This section specifies the semantics of F, NRI, 596 PRID, D, TL, DID, QL, B, U, G, L, and O according to this 597 specification. 599 The semantics of F specified in section 5.3 of [RFC3984] also 600 applies herein. 602 For NRI, for the bitstream that is compliant with [H.264], the 603 semantics specified in section 5.3 of [RFC3984] are applicable, 604 otherwise only the semantics specified in SVC [SVC] is applicable. 606 For PRID, the semantics specified in [SVC] applies. MANEs 607 implementing unequal error protection may use this information to 608 protect NAL units with smaller PRID values better than those with 609 larger PRID values, for example by including only the more important 610 NAL units in a FEC protection mechanism. The desirable transport 611 priority increases as the PRID value increases. 613 For D, MANEs may use this information to protect NAL units with D 614 equal to 0 better than NAL units with D equal to 1. Furthermore a 615 MANE or a receiver may determine whether a given NAL unit is 616 required for successfully decoding a certain operation point of the 617 SVC bitstream. 619 For TL, DID and QL, in addition to the semantics specified in [SVC], 620 according to this memo, values of TL, DID or QL indicate the 621 relative priority in their respective dimension. A higher value of 622 TL, DID or QL indicates a higher priority if the other two 623 components are identical correspondingly. MANEs may use this 624 information to protect more important NAL units better than less 625 important NAL units. 627 Informative note: PRID, D, TL, DID, and QL, in combination, 628 provide complete information of the relative priority of a NAL 629 unit compared to any other NAL unit. [Edt. note: examples may be 630 provided in Informative Appendix 13 in future versions.] 632 For B, in addition to the semantics specified in [SVC], according to 633 this memo, a MANE or receiver may use this information in order to 634 identify the [H.264] conforming base layer NAL units (if marked by a 635 suffix NAL unit) and may determine the temporal layer (by the TL 636 value of the suffix NAL unit) of it. Thus it allows for generating 637 an outgoing RTP stream, with a certain temporal scalability layer 638 that conforms to [RFC3984] and [H.264]. 640 For U, the semantics specified in [SVC] apply. 642 For G, L and O, in addition to the semantics specified in [SVC], 643 according to this memo, a MANE or receiver may detect a fragmented 644 PR slice by G, L and O. Using this knowledge may let the MANE do 645 FGS adaptation on the PR slice, by forwarding not all of the 646 fragments in fragement_order (O). 648 6.5. Packetization Modes 650 Please see section 5.4 of RFC 3984 [RFC3984]. The single NAL unit 651 packetization mode SHALL NOT be used. 653 Informative note: The non-interleaved mode allows an application 654 to encapsulate a single NAL unit in a single RTP packet. 655 Historically, the single NAL unit mode has been included into 656 [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A. 657 There is no point in carrying this historic ballast towards a new 658 application space such as the one provided with SVC. More 659 technically speaking, the implementation complexity increase for 660 providing the additional mechanisms of the non-interleaved mode 661 (namely STAPs) is so minor, and the benefits are so great, that we 662 require STAP implementation. 664 6.6. Decoding Order Number (DON) 666 Please see section 5.5 of RFC 3984 [RFC3984]. The following applies 667 in addition. 669 When different layers of a SVC bitstream are transported in more 670 than one RTP packet stream (regardless of the use of session or SSRC 671 multiplexing, or a combination thereof), the interleaved 672 packetization mode MUST be used, and the DON values of all the NAL 673 units MUST indicate the correct NAL unit decoding order over all the 674 RTP packet streams. If Session multiplexing is used, each session 675 MUST signal the same value for the (marked as optional, but for this 676 use case mandatory) MIME parameters sprop-interleaving-depth, sprop- 677 max-don-diff, sprop-deint-buf-req, and sprop-init-buf-time. Further 678 these values must be valid for the reception capabilities over all 679 sessions. A receiver MUST signal the same (marked as optional, but 680 for this use case mandatory) MIME parameter deint-buf-cap for all 681 sessions used for Session multiplexing. 683 6.7. Single NAL Unit Packet 685 Please see section 5.6 of RFC 3984 [RFC3984]. 687 6.8. Aggregation Packets 689 Please see section 5.7 of RFC 3984 [RFC3984]. 691 6.9. Fragmentation Units (FUs) 693 Please see section 5.8 of RFC 3984 [RFC3984]. 695 6.10. Payload Content Scalability Information (PACSI) NAL Unit 697 A new NAL unit type is specified, and referred to as payload content 698 scalability information (PACSI) NAL unit. The PACSI NAL unit, if 699 present, MUST be the first NAL unit in an aggregation packet, and it 700 MUST NOT be present in other types of packets. The PACSI NAL unit 701 indicates scalability characteristics that are common for all the 702 remaining NAL units in the payload, thus making it easier for MANEs 703 to decide whether to forward or discard the packet. Senders MAY 704 create PACSI NAL units and receivers can ignore them. 706 Informative note: The NAL unit type for the PACSI NAL unit is 707 selected among those values that are unspecified in the H.264/AVC 708 specification and in RFC 3984 -- and therefore are ignored by 709 receiver. Hence an SVC stream, even when including PACSI NAL 710 units, can be processed with RFC 3984 receivers and H.264/AVC 711 decoders. 713 When the first aggregation unit of an aggregation packet contains a 714 PACSI NAL unit, there MUST be at least one additional aggregation 715 unit present in the same packet. The RTP header fields are set 716 according to the remaining NAL units in the aggregation packet. 718 When a PACSI NAL unit is included in a multi-time aggregation 719 packet, the decoding order number for the PACSI NAL unit MUST be set 720 to indicate that the PACSI NAL unit is the first NAL unit in 721 decoding order among the NAL units in the aggregation packet or the 722 PACSI NAL unit has an identical decoding order number to the first 723 NAL unit in decoding order among the remaining NAL units in the 724 aggregation packet. 726 The structure of PACSI NAL unit is exactly the same as the four-byte 727 SVC NAL unit header specified in 3.3, and reproduced here once more 728 for convenience:. 729 +---------------+---------------+---------------+---------------+ 730 |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7| 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 732 |F|NRI| Type |RR | PRID | TL | DID | QL|R|B|U|D|G|L| O | 733 +---------------+---------------+---------------+---------------+ 735 The values of the fields in PACSI NAL unit MUST be set as follows. 737 o The F bit MUST be set to 1 if the F bit in at least one remaining 738 NAL unit in the payload is equal to 1. Otherwise, the F bit MUST 739 be set to 0. 741 o The NRI field MUST be set to the highest value of NRI field among 742 all the remaining NAL units in the payload. 744 o The Type field MUST be set to 30. 746 o The RR field or reserved_zero_two_bits field (2 bits) MUST be set 747 to 0. 749 o The PRID field MUST be set to the lowest value of the PRID values 750 associated with all the remaining NAL units in the payload. 752 o The TL field MUST be set to the lowest value of the TL values 753 associated with all the remaining NAL units in the payload. 755 o The DID field MUST be set to the lowest value of the DID values 756 associated with all the remaining NAL units in the payload. 758 o The QL field MUST be set to the lowest value of the QL values 759 associated with all the remaining NAL units in the payload. 761 o The R field or reserved_zero_bit field (1 bit) MUST be set to 0. 763 o The B field or layer_base_flag field (1 bit) MUST be set to 1 if 764 the layer_base_flag associated with all the remaining NAL units in 765 the payload is equal to 1. Otherwise, layer_base_flag MUST be set 766 to 0. 768 o The U field or use_base_prediction_flag field (1 bit)MUST be set 769 to 1 if the use_base_prediction_flag associated with all the 770 remaining NAL units in the payload is equal to 1. Otherwise, 771 use_base_prediction_flag MUST be set to 0. 773 o The D bit MUST be set to 0 if the D value associated with at least 774 one remaining NAL unit in the payload is equal to 0. Otherwise, 775 the D bit MUST be set to 1. 777 o The G field or fragmented_flag field (1 bit) MUST be set to 1 if 778 the fragmented_flag associated with all the remaining NAL units in 779 the payload is equal to 1. Otherwise, fragmented_flag MUST be set 780 to 0. 782 o The L field or last_fragment_flag field (1 bit) MUST be set to 1 783 if 784 the last_fragment_flag associated with all the remaining NAL units 785 in the payload is equal to 1. Otherwise, last_fragment_flag MUST 786 be set to 0. 788 o The O field or fragment_order field (2 bits) MUST be set to the 789 lowest value of frame_order associated with all the remaining NAL 790 units in the payload. 792 7. Packetization Rules 794 Please see section 6 of RFC 3984 [RFC3984]. The following rules 795 apply in addition. 797 The single NAL unit mode SHALL NOT be used. (See also section 6.5 798 for the motivation). 800 When a suffix NAL unit is encapsulated for transmission, it SHOULD 801 be aggregated to the same transmission packet as the NAL unit 802 preceding the suffix NAL unit in decoding order. 804 When different layers of a SVC bitstream are transported in more 805 than one RTP packet stream, the interleaved packetization mode MUST 806 be used. 808 8. De-Packetization Process (Informative) 810 Please see section 7 of RFC 3984 [RFC3984]. The following rules 811 apply in addition. 813 [Edt. Do we need here more information about cross layer DON? Maybe 814 in the next version.] 816 9. Payload Format Parameters 818 [Edt. note: this section 9 and its subsections will be updated 819 according to the changes listed below, a little later in the 820 process. For now, we just list the adjustments necessary, so not to 821 bury any new information in the RFC 3984 text.] 823 Section 8 of [RFC3984] applies with the following modification. 825 The sentence 827 ''The parameters are specified here as part of the MIME subtype 828 registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.'' 830 is replaced with 831 ''The parameters are specified here as part of the MIME subtype 832 registration for the SVC codec.'' 834 9.1. MIME Registration 836 Editor's note: this needs to be updated by copy-pasting the 837 RFC 3984 MIME registration into this document, so to make it 838 self-contained. Will be done later in the process. 840 The MIME subtype for the SVC codec is allocated from the IETF tree. 842 The receiver MUST ignore any unspecified parameter. 844 Media Type name: video 846 Media subtype name: H.264-SVC 848 Required parameters: none 850 OPTIONAL parameters: 852 The optional MIME parameters specified in [RFC3984] apply, with the 853 following constraints (to be edited in at the appropriate time): 855 sprop-interleaving-depth: 856 In case of using Session multiplexing, the same sprop-interleaving- 857 depth value MUST be signaled for all sessions and MUST be valid over 858 all sessions of the multiplex. 860 sprop-max-don-diff: 861 In case of using Session multiplexing, the same sprop-max-don-diff 862 value MUST be signaled for all sessions and MUST be valid over all 863 sessions of the multiplex. 865 sprop-deint-buf-req: 866 In case of using Session multiplexing, the same sprop-deint-buf-req 867 value MUST be signaled for all sessions and MUST be valid over all 868 sessions of the multiplex. 870 sprop-init-buf-time: 872 In case of using Session multiplexing, the same sprop-init-buf-time 873 value MUST be signaled for all sessions and MUST be valid over all 874 sessions of the multiplex. 876 deint-buf-cap: 877 In case of using Session multiplexing, the same deint-buf-cap value 878 MUST be signaled by the receiver for all sessions and MUST be valid 879 over all sessions of the multiplex. 881 In addition the following optional MIME parameters apply: 883 sprop-scalability-info: 884 This parameter MAY be used to convey the NAL unit containing the 885 scalability information SEI message that MUST precede any other NAL 886 units in decoding order. The parameter MUST NOT be used to indicate 887 codec capability in any capability exchange procedure. The value of 888 the parameter is the base64 representation of the NAL unit 889 containing the scalability information SEI message as specified in 890 [SVC]. 892 sprop-transport-priority: 893 This parameter MAY be used to signal the transport priority 894 indicator value(s) in terms of second and third bytes of the SVC NAL 895 unit header for one or more SVC layer(s) conveyed in one RTP 896 session. A transport priority indicator is base64 coded. If more 897 than one layer is transmitted within one RTP session, the transport 898 priority indicator value of each layer MUST be itemized with 899 decreasing importance for decoding and MUST be comma-separated. 901 Encoding considerations: 902 This type is only defined for transfer 903 via RTP (RFC 3550). 905 Security considerations: 906 See section 9 of this specification. 908 Public specification: 909 Please refer to section 15 of this 910 specification. 912 Additional information: 913 None 915 File extensions: none 916 Macintosh file type code: none 917 Object identifier or OID: none 918 Person & email address to contact for further information: 919 Intended usage: COMMON 920 Author: 921 Change controller: 922 IETF Audio/Video Transport working group 923 delegated from the IESG. 925 9.2. SDP Parameters 927 9.2.1. Mapping of MIME Parameters to SDP 929 The MIME media type video/SVC string is mapped to fields in the 930 Session Description Protocol (SDP) as follows: 932 * The media name in the "m=" line of SDP MUST be video. 934 * The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the 935 MIME subtype). 937 * The clock rate in the "a=rtpmap" line MUST be 90000. 939 * The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs", 940 "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop- 941 parameter-sets", "parameter-add", "packetization-mode", "sprop- 942 interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req", 943 "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu- 944 size'', ''sprop-transport-priority'', and ''sprop-scalability- 945 info'', when present, MUST be included in the "a=fmtp" line of 946 SDP. These parameters are expressed as a MIME media type string, 947 in the form of a semicolon separated list of parameter=value 948 pairs. 950 9.2.2. Usage with the SDP Offer/Answer Model 952 TBD. 954 9.2.3. Usage with Session and SSRC multiplexing 956 If Session or SSRC multiplexing is used, the rules on signaling 957 media decoding dependency in SDP as defined in [SDPsiglay] apply. 958 Further the use of SSRC multiplexing must be signaled according to 959 [SDPsiglay]. 961 9.2.4. Usage in Declarative Session Descriptions 963 TBD. 965 9.3. Examples 967 TBD. 969 9.4. Parameter Set Considerations 971 Please see section 10 of RFC 3984 [RFC3984]. 973 10. Security Considerations 975 Please see section 11 of RFC 3984 [RFC3984]. 977 11. Congestion Control 979 Within any given RTP session carrying payload according to this 980 specification, the provisions of section 12 of RFC 3984 [RFC3984] 981 apply. 983 One key motivation for the recent attention to scalable codecs has 984 been the increasing awareness of media codec designers to network 985 congestion. While CGS scalability cannot reduce congestion for the 986 transport path of a given RTP session, MANEs and layered multicast 987 technologies can be used to alleviate congestion on a larger scale. 988 FGS scalability can be helpful to reduce session bandwidth both end- 989 to-end (with pre-coded content) and in network segments, again 990 assuming the use of MANEs. 992 MANEs MAY alleviate congestion on their outgoing network path by 993 a) removing the NAL units belonging to hierarchically ''highest'' 994 enhancement layer (or set of enhancement layers) from an RTP 995 stream carrying base and enhancement layers. 996 b) removing some or all bits of a given FGS NAL unit as long as the 997 remaining bits still form a conforming SVC NAL unit. 999 [Edt. Note: In the following paragraph, ''translator'' and ''mixer'' 1000 are not used consistently with RFC 3550. What we think we would 1001 need is a ''mixer'' that mixes only a single input in a single output 1002 (as a mixer terminates sessions). A ''Translator'' (that does not 1003 terminate the RTP session) carries certain unnecessary baggage which 1004 appears to make it undesirable for MANEs. The following paragraph 1005 can either be fixed into RFC 3550 style and logic (thereby removing 1006 an operation point we consider desirable), or we would need to 1007 explain in detail what we want to do (not really congestion control 1008 related and long). Perhaps we refer to the detailed discussions in 1009 the CCM draft... Added to open issues. 1011 In both cases, the incoming RTP session is terminated in the MANE, 1012 and a second RTP session originates at the MANE. The MANE acts as 1013 an RTP translator. The concept of scalability keeps the 1014 implementation and computational effort within the MANE low, and 1015 avoids expensive and delay-intensive full transcoding (in the sense 1016 of reconstruction and re-encoding).] 1018 When scalable layers are transported in their own RTP sessions, an 1019 RTP receiver SHOULD unsubscribe to one or more enhancement layers 1020 when it senses congestion, similar to what has been described in 1021 [McCanne/Vetterli]. This behavior could perhaps be sufficient to 1022 ease the network load to an acceptable level of congestion. 1023 Nevertheless, it MUST follow the mechanisms described in section 12 1024 of [RFC3984]. 1026 12. IANA Consideration 1028 [Edt. Note: A new MIME type should be registered from IANA.] 1030 13. Informative Appendix: Application Examples 1031 13.1. Introduction 1033 Scalable video coding is a concept that has been around at least 1034 since MPEG-2 [MPEG2], which goes back as early as 1993. 1035 Nevertheless, it has never gained wide acceptance; perhaps partly 1036 because applications didn't materialize in the form envisioned 1037 during standardization. 1039 MPEG and JVT, respectively, performed a requirement analysis before 1040 the SVC project was launched. Dozens of scenarios have been 1041 studied. While some of the scenarios appear not to follow the most 1042 basic design principles of the Internet -- and are therefore not 1043 appropriate for IETF standardization -- others are clearly in the 1044 scope of IETF work. Of these, this draft chooses the following 1045 subset for immediate consideration. Note that we do not reference 1046 the MPEG and JVT documents directly; partly, because at least the 1047 MPEG documents have a limited lifespan and are not publicly 1048 available, and partly because the language used in these documents 1049 is inappropriately video centric and imprecise, when it comes to 1050 protocol matters. 1052 With these remarks, we now introduce three main application 1053 scenarios that we consider as relevant, and that are implementable 1054 with this specification. 1056 13.2. Layered Multicast 1058 This well-understood form of the use of layered coding 1059 [McCanne/Vetterli] implies that all layers are individually conveyed 1060 in their own RTP packet streams, each carried in its own RTP session 1061 using the IP (multicast) address and port number as the single 1062 demultiplexing point. Receivers ''tune'' into the layers by 1063 subscribing to the IP multicast, normally by using IGMP [IGMP]. 1065 Layered Multicast has the great advantage of simplicity and easy 1066 implementation. However, it has also the great disadvantage of 1067 utilizing many different transport addresses. While we consider 1068 this not to be a major problem for a professionally maintained 1069 content server, receiving client endpoints need to open many ports 1070 to IP multicast addresses in their firewalls. This is a practical 1071 problem from a firewall/NAT viewpoint. Furthermore, even today IP 1072 multicast is not as widely deployed as many wish. 1074 We consider layered multicast an important application scenario for 1075 three reasons. First, it is well understood and the implementation 1076 constraints are well known. There may well by large scale IP 1077 networks outside the immediate Internet context that may wish to 1078 employ layered multicast in the future. One possible example could 1079 be a combination of content creation and core-network distribution 1080 for the various mobile TV services, e.g. those being developed by 1081 3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H]. Finally, when one base 1082 and one enhancement layer is in use and are being conveyed 1083 separately, that represents one operation point of layered 1084 multicast. 1086 13.3. Streaming of an SVC scalable stream 1088 In this scenario, a streaming server has a repository of stored SVC 1089 coded layers for a given content. At the time of streaming, and 1090 according to the capabilities and connectivity of the client(s), the 1091 streaming server generates a scalable stream. This scalable stream 1092 is served to the client(s). Both unicast and multicast serving is 1093 possible. At the same time, the streaming server may use the same 1094 repository of stored layers to compose different streams (with a 1095 different set of layers) intended for different audiences. 1097 As every endpoint receives only a single SVC RTP session, the number 1098 of firewall pinholes can be optimized. In fact, only a single 1099 firewall pinhole is required. 1101 The main difference between this scenario and straightforward 1102 simulcasting lies in the architecture and the requirements of the 1103 streaming server, and is therefore out of the scope of IETF 1104 standardization. However, compelling arguments can be made why such 1105 a streaming server design makes sense. One possible argument is 1106 related to storage space and channel bandwidth. Another is 1107 bandwidth adaptivity without transcoding -- a considerable advantage 1108 in a congestion controlled network. When the streaming server 1109 learns about congestion, it can reduce sending bitrate by choosing 1110 fewer layers when composing the layered stream. SVC is designed to 1111 gracefully support both bandwidth rampdown and bandwidth rampup with 1112 a considerable dynamic range. This payload format is designed to 1113 allow for bandwidth flexibility in the mentioned sense, both for CGS 1114 and FGS layers. While, in theory, a transcoding step could achieve 1115 a similar dynamic range, the computational demands are impractically 1116 high and video quality is typically lowered -- therefore, few (if 1117 any) streaming servers implement full transcoding. 1119 13.4. Multicast to MANE, SVC scalable stream to endpoint 1121 This final scenario is a bit more complex, and designed to optimize 1122 the network traffic in a core network, while still requiring only a 1123 single pinhole in the endpoint's firewall. One of its key 1124 applications is the mobile TV market. 1126 Consider a large IP network, e.g. the core network of 3GPP. 1127 Streaming servers within this core network can be assumed to be 1128 professionally maintained. We assume that these servers can have 1129 many ports open to the network and that layered multicast is a real 1130 option. Therefore, we assume that the streaming server multicasts 1131 SVC scalable layers, instead of simulcasting different 1132 representations of the same content at different bit rates. 1134 Also consider many endpoints of different classes. Some of these 1135 endpoints may not have the processing power or the display size to 1136 meaningfully decode all layers; other may have these capabilities. 1137 Users of some endpoints may not wish to pay for high quality and are 1138 happy with a base service, which may be cheaper or even free. Other 1139 users are willing to pay for high quality. Finally, some connected 1140 users may have a bandwidth problem in that they can't receive the 1141 bandwidth they would want to receive -- be it through congestion, 1142 connectivity, change of service quality, or for whatever other 1143 reasons. However, all these users have in common that they don't 1144 want to be exposed too much, and therefore the number of firewall 1145 pinholes need to be small. 1147 This situation can be handled best by introducing middleboxes close 1148 to the edge of the core network, which receive the layered multicast 1149 streams and compose the single SVC scalable bit stream according to 1150 the needs of the endpoint connected. These middleboxes are called 1151 MANEs throughout this specification. In practice, we envision the 1152 MANE to be part of (or at least physically and topologically close 1153 to) the base station of a mobile network, where all the signaling 1154 and media traffic necessarily are multiplexed on the same physical 1155 link. This is why we do not worry too much about decomposition 1156 aspects of the MANE as such. 1158 MANEs necessarily need to be fairly complex devices. They certainly 1159 need to understand the signaling, so, for example, to associate the 1160 PT octet in the RTP header with the SVC payload type. 1162 A MANE may terminate the multicasted layered RTP sessions incoming 1163 from the core network side, and create new RTP sessions (perhaps 1164 even multicast sessions) to the endpoints connected to them. In RTP 1165 terminology, these types of MANEs are RTP mixers. This implies, per 1166 RFC 3550, a very loose relationship between the incoming and 1167 outgoing RTP sessions. In particular, there is no direct 1168 relationship between the incoming and outgoing RTP sequence numbers, 1169 RTP timestamps, payload types used, etc. 1171 Mixer-based MANEs are conceptually easy to implement and can offer 1172 powerful features, primarily because they necessarily can ''see'' the 1173 payload (including the RTP payload headers), utilize the wealth of 1174 layering information available therein, and manipulate it. 1176 While a mixer-based MANE operation in its most trivial form 1177 (combining multiple RTP packet streams into a single one) can be 1178 implemented comparatively simply -- reordering the incoming packets 1179 according to the DON and sending them in the appropriate order -- 1180 more complex forms can also be envisioned. For example, a mixer- 1181 type MANE can be optimizing the outgoing RTP stream to the MTU size 1182 of the outgoing path by utilizing the aggregation and fragmentation 1183 mechanisms of this memo. 1185 A MANE can also act as a translator. In this case, we envision its 1186 functionality to be limited to the manipulation of the transport 1187 addresses, so to enable SSRC multiplexing. The most compelling use 1188 case appears to be to forward multiple incoming RTP packets streams 1189 (conveyed to their own transport addresses) to a single firewall 1190 pinhole. The translator variant of the MANE does not terminate RTP 1191 sessions, but rather ''translate'' them in a very simple way -- by 1192 changing the transport address -- so to SSRC-multiplex multiple 1193 sessions onto a single transport address. What sounds trivial at 1194 the first glance is in reality a highly complex process primarily 1195 due to the need of appropriate RTCP processing. This is 1196 particularly true when individual packets are intentionally being 1197 pruned or removed from the incoming session, which may be necessary 1198 to support FGS. 1200 Translator-based MANEs appear to be able to offer a limited amount 1201 of functionality without being in the security context, which opens 1202 up additional application range. Whether this form of a Translator 1203 based MANE is actually feasible, and whether it offers sufficient 1204 benefits to warrant the additional specification burden is open for 1205 discussion, and input is solicited. 1207 While the implementation complexity of either case of a MANE, as 1208 discussed above, is fairly high, the computational demands are 1209 comparatively low. In particular, SVC and/or this specification 1210 contain means to easily generate the correct inter-layer decoding 1211 order of NAL units. It is also simple to identify the fine 1212 granularity scalable bits in a given NAL unit. No serious bit- 1213 oriented processing is required and no significant state information 1214 (beyond that of the signaling and perhaps the SVC sequence parameter 1215 sets) need to be kept. 1217 13.5. SSRC Multiplexing in case of using SRTP 1219 When SRTP is in use, it is not possible to take advantage of the in- 1220 band information (SEI messages, NAL unit headers, PACSI NAL units) 1221 when processing layered streams. Therefore, a MANE outside the 1222 security context cannot make informed decisions when aggregating 1223 information. Some relevant information must be available in the RTP 1224 header to make meaningful decisions. 1226 The first, and most obvious, choice is to map SSRC values directly 1227 to certain layers by the means of signaling. As MANEs need to be in 1228 the signaling context, this appears to be sensible. However, it 1229 requires a per-SSRC signaling mechanism -- a demultiplexing point 1230 that is currently not envisioned in SDP. 1232 A second design choice is to somehow make available the information 1233 about the properties of a specific layer -- to the extent a MANE can 1234 make a meaningful decision -- in the SSRC value. In other words, 1235 SSRC is no more fully randomly chosen, but selected based on context. 1236 This is possible only when limiting the scope to a single sender to 1237 a multicast group, because the various senders have no means to 1238 coordinate their choice of SSRC values. In practice, that's not a 1239 major limitation. 1241 Any form of such a selection of SSRC values has two major drawbacks: 1242 First, without a sufficiently large random component the probability 1243 for SSRC collisions increases to a point that becomes unacceptable. 1244 We address this point by discouraging the use of multi-sender 1245 multicast. When only a single sender emits packets in a given RTP 1246 session, it can be expected that this sender is able to avoid SSRC 1247 collisions. In addition, we require a sufficiently large random 1248 component in the SSRC generation, which is constant for each layer 1249 stemming from the same sender. While the probability for SSRC 1250 collisions is still lowered, the random component can be kept as 1251 large as 26 bits assumes that the SVC bitstream in question contains 1252 64 layers. 1254 Second, and more critical, a straightforward copy of values known to 1255 be present at fixed locations in the RTP payload would make it easy 1256 for codebreakers to attack an SRTP encrypted stream, because an 1257 unencrypted representation of a encrypted known value would both be 1258 present in the same packet. This is outright unacceptable from a 1259 security viewpoint. 1261 Therefore, we do not allow to simply copy information from the 1262 bitstream into the SSRC field. Instead, we rely on a non-reversible 1263 function, that also necessarily contains the aforementioned random 1264 component, that, when executed, indicates the relative priority 1265 difference between two layers (signaled by two SSRC values). 1266 The SSRC value space is evenly allocated to a number of sub value 1267 spaces, with the number of sub value spaces being equal to the 1268 number of RTP sessions for which SSRC multiplexing is used. Then 1269 the first RTP session conveying the lowest layers is mapped to the 1270 first sub SSRC value space with the lowest SSRC values, and the 1271 second RTP session conveying the second lowest layers is mapped to 1272 the second sub SSRC value space with the second lowest SSRC values, 1273 and so on. For the RTP packets of a certain RTP session, the SSRC 1274 value is randomly selected from the corresponding sub SSRC value 1275 space. This way, a packet with a higher SSRC value contains data 1276 belonging to higher layers or layers of lower transport priority. 1278 A translator-based MANE can make use of the aforementioned SSRC 1279 values as follows. Suppose that the MANE has identified, through 1280 sensed congestion or other unspecified means, that it needs to 1281 discard packets belonging to higher layers, say K of the N buffered 1282 packets, to maintain a packet sending rate, it identifies the K 1283 packets with the highest SSRC values, and discards them. 1285 13.6. Scenarios currently not considered for complexity reasons 1287 -- vacat -- 1289 13.7. Scenarios currently not considered for being unaligned with 1290 IP philosophy 1292 Remarks have been made that the current draft does not take into 1293 consideration at least one application scenario which some JVT folks 1294 consider important. In particular, their idea is to make the RTP 1295 payload format (or the media stream itself) self-contained enough 1296 that a stateless, non signaling aware device can ''thin'' an RTP 1297 session to meet the bandwidth demands of the endpoint. They call 1298 this device a ''Router'' or ''Gateway'', and sometimes a MANE. 1299 Obviously, it's not a Router or Gateway in the IETF sense. To 1300 distinguish it from a MANE as defined in RFC 3984 and in this 1301 specification, let's call it a MDfH (Magic Device from Heaven). 1303 To simplify discussions, let's assume point-to-point traffic only. 1304 The endpoint has a signaling relationship with the streaming server, 1305 but it is known that the MDfH is somewhere in the media path (e.g. 1306 because the physical network topology ensures this). It has been 1307 requested, at least implicitly through MPEG's and JVT's requirements 1308 document, that the MDfH should be capable to intercept the SVC 1309 scalable bit stream, modify it by dropping packets or parts thereof, 1310 and forwarding the resulting packet stream to the receiving 1311 endpoint. It has been requested that this payload specification 1312 contains protocol elements facilitating such an operation, and the 1313 argument has been made that the NRI field of RFC 3984 serves exactly 1314 the same purpose. 1316 The authors of this I-D do not consider the scenario above to be 1317 aligned with the most basic design philosophies the IETF follows, 1318 and therefore have not addressed the comments made (except through 1319 this section). In particular, we see the following problems with 1320 the MDfH approach): 1322 - As the very minimum, the MDfH would need to know which RTP streams 1323 are carrying SVC. We don't see how this could be accomplished but 1324 by using a static payload type. None of the IETF defined RTP 1325 profiles envision static payload types for SVC, and even the de- 1326 facto profiles developed by some application standard 1327 organizations (3GPP for example) do not use this outdated concept. 1328 Therefore, the MDfH necessarily needs to be at least ''listening'' 1329 to the signaling. 1330 - If the RTP packet payload were encrypted, it would be impossible 1331 to interpret the payload header and/or the first bytes of the 1332 media stream. We understand that there are crypto schemes under 1333 discussion that encrypt only the last n bytes of an RTP payload, 1334 but we are more than unsure that this is fully in line with the 1335 IETF's security vision. 1337 Even if the above two problems would have been overcome through 1338 standardization outside of the IETF, we still foresee serious design 1339 flaws: 1341 - An MDfH can't simply dump RTP packets it doesn't want to forward. 1342 It either needs to act as a full RTP Translator (implying that it 1343 patches RTCP RRs and such), or it needs to patch the RTP sequence 1344 numbers to fulfill the RTP specification. Not doing either would, 1345 for the receiver, look like the gaps in the sequence numbers 1346 occurred due to unintentional erasures, which has interesting 1347 effects on congestion control (if implemented), will break pretty 1348 much every meta-payload ever developed, and so on. (Many more 1349 points could be made here). 1350 - An MDfH also can't ''prune'' FGS packets. Again, doing so would 1351 not be compatible with meta payloads, and would mess up RTCP RRs 1352 and congestion control (if the congestion control is based on 1353 octet count and not on packet count; there are discussions related 1354 to the former at least in the context of TFRC). 1356 In summary, based on our current knowledge we are not willing to 1357 specify protocol mechanisms that support an operation point that has 1358 so little in common with classic RTP use. 1360 14. Acknowledgements 1362 Funding for the RFC Editor function is currently provided by the 1363 Internet Society. Further, the author Thomas Schierl of Fraunhofer 1364 HHI is sponsored by the European Commission under the contract 1365 number FP6-IST-0028097, project ASTRALS. 1367 15. References 1369 15.1. Normative References 1371 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1372 Jacobson, "RTP: A Transport Protocol for Real-Time 1373 Applications", STD 64, RFC 3550, July 2003. 1374 [MPEG4-10] ISO/IEC International Standard 14496-10:2003. 1375 [H.264] ITU-T Recommendation H.264, "Advanced video coding for 1376 generic audiovisual services", May 2003. 1377 [SDPsiglay] Schierl, T., ''Signaling media decoding dependency in 1378 Session 1379 Description Protocol (SDP)'', IETF internet draft 1380 draft-schierl-mmusic-layered-codec-01, October 2006. 1381 [SVC] Joint Video Team, ''Annex G of Joint Draft 7 of SVC 1382 Amendment 1383 (with proposed changes)'', available from 1384 http://ftp3.itu.ch 1385 /av-arch/jvt-site/2006_07_Klagenfurt/JVT-T202.zip , 1386 July 2006 1387 [RFC3984] Wenger, S., Hannuksela, M, Stockhammer, T, Westerlund, M, 1388 Singer, D, ''RTP Payload Format for H.264 Video'', RFC 3984, 1389 February 2005 1391 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1392 Requirement Levels", BCP 14, RFC 2119, March 1997. 1394 15.2. Informative References 1396 [DVB-H] DVB - Digital Video Broadcasting (DVB); DVB-H 1397 Implementation Guidelines, ETSI TR 102 377, 2005 1398 [IGMP] Cain, B., Deering S., Kovenlas, I., Fenner, B. and 1399 Thyagarajan, A., ''Internet Group Management Protocol, 1400 Version 3'', RFC 3376, October 2002. 1401 [McCanne/Vetterli] 1402 V. Jacobson, S. McCanne and M. Vetterli. Receiver- 1403 driven layered multicast. In Proc. of ACM SIGCOMM'96, pages 1404 117--130, Stanford, CA, August 1996. 1405 [MBMS] 3GPP - Technical Specification Group Services and System 1406 Aspects; Multimedia Broadcast/Multicast Service (MBMS); 1407 Protocols and codecs (Release 6), December 2005. 1408 [MPEG2] ISO/IEC International Standard 13818-2:1993. 1409 [SRTP] Baugher, M., McGrew, D, Naslund, M, Carrara, E, 1410 Norrman, K, ''The secure real-time transport protocol 1411 (SRTP)'', RFC 3711, March 2004. 1413 16. Author's Addresses 1415 Stephan Wenger Phone: +358-50-486-0637 1416 Nokia Research Center Email: stewe@stewe.org 1417 P.O. Box 100 1418 FIN-33721 Tampere 1419 Finland 1421 Ye-Kui Wang Phone: +358-50-486-7004 1422 Nokia Research Center Email: ye-kui.wang@nokia.com 1423 P.O. Box 100 1424 FIN-33721 Tampere 1425 Finland 1427 Thomas Schierl Phone: +49-30-31002-227 1428 Fraunhofer HHI Email: schierl@hhi.fhg.de 1429 Einsteinufer 37 1430 D-10587 Berlin 1431 Germany 1433 17. Intellectual Property Statement 1435 The IETF takes no position regarding the validity or scope of any 1436 Intellectual Property Rights or other rights that might be claimed to 1437 pertain to the implementation or use of the technology described in 1438 this document or the extent to which any license under such rights 1439 might or might not be available; nor does it represent that it has 1440 made any independent effort to identify any such rights. Information 1441 on the procedures with respect to rights in RFC documents can be 1442 found in BCP 78 and BCP 79. 1444 Copies of IPR disclosures made to the IETF Secretariat and any 1445 assurances of licenses to be made available, or the result of an 1446 attempt made to obtain a general license or permission for the use of 1447 such proprietary rights by implementers or users of this 1448 specification can be obtained from the IETF on-line IPR repository at 1449 http://www.ietf.org/ipr. 1451 The IETF invites any interested party to bring to its attention any 1452 copyrights, patents or patent applications, or other proprietary 1453 rights that may cover technology that may be required to implement 1454 this standard. Please address the information to the IETF at 1455 ietf-ipr@ietf.org. 1457 18. Disclaimer of Validity 1459 This document and the information contained herein are provided on an 1460 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1461 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1462 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1463 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1464 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1465 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1467 19. Copyright Statement 1468 Copyright (C) The Internet Society (2006). This document is subject 1469 to the rights, licenses and restrictions contained in BCP 78, and 1470 except as set forth therein, the authors retain all their rights. 1472 20. RFC Editor Considerations 1474 none 1476 21. Open Issues 1478 1. Need to double check MANE, Mixers, and Translators throughout the 1479 document (consistently with RFC 3550). 1480 2. Packetization rules need work. 1481 3. Alignment with the SVC specification (ongoing) 1482 4. In context of SSRC multiplexing: make consistent higher/lower 1483 layers vs. RTP packet streams of higher/lower importance. 1485 22. Changes Log 1487 From -00 to -01 1489 - 04.02.2006, StW: Added details to scope 1490 - 04.02.2006, StW: Added short subsection 6.1 ''Design Principles'' 1491 - 04.02.2006, StW: Added section 15, ''Application Examples'' 1492 - 06.02 - 03.03.2006, YkW: Various modifications throughout the 1493 document 1494 - 13.02.2006 - 03.03.2006 , ThS: Added definitions and additional 1495 information to section 3.3, 5.1, 7 and 8, parameters in section 9.1 and 1496 added section 14 for NAL unit re-ordering for layered multicast. 1497 Further modifications throughout the document 1499 From -01 to -02 1501 - 06.03.2006, StW: Editorial improvements 1502 - 26.05.2006, YkW: Updated NAL unit header syntax and semantics 1503 according to the latest draft SVC spec 1504 - 20.06.2006, Miska/YkW: Added section 6.10 ''Payload Content 1505 Scalability Information (PACSI) NAL Unit'' 1506 - 20.06.2006, YkW: Updated the NAL unit reordering process for layered 1507 multicast (removed the old section 14 ''Informative Appendix: NAL Unit 1508 Re-ordering for Layered Multicast'' and added the new section 13 ''NAL 1509 Unit Reordering for Layered Multicast'') 1511 From -02 to -03 1512 - 05.09.2006, YkW: Updated the NAL unit header syntax, definitions, 1513 etc., according to the foreseen July JVT output. Updated possible MANE 1514 adaptation operations according to SPID, TL, DID and QL. Clarified the 1515 removal of single NAL unit packetiztaion mode. Added the support of 1516 SSRC multiplexing in layered multicast. 1517 - 08.09.2006, StW: Editorial changes throughout the document 1518 - 08.09.2006, YkW: Added the packetization rule for suffix NAL unit. 1519 - 19.09.2006, YkW: Moved/updated SSRC multiplexing support to section 1520 6.2 ''RTP header usage''. Moved/updated the cross layer DON constraint 1521 to Section 6.6 ''Decoding order number''. Moved/updated the 1522 packetization rule when a SVC bistream is transported over more than 1523 one RTP session to Section 7 ''Packetization rules''. Removed Section 13 1524 ''Support of layered multicast''. 1525 - 16.10, TS: Added detailed four-byte NAL unit header description. 1526 Change ''AVC'' to ''H.264'' conforming to 3984. Modifications throughout 1527 the document. Extended description of 3rd byte of PACSI NAL unit. 1528 Corrected terms RTP session and RTP packet stream in case of SSRC 1529 multiplexing. Added terms in definition section on RTP multiplexing. 1530 Constraints on optional MIME parameters of 3984 for cross-layer DON 1531 (DON section and MIME parameters). Copied parts of SI paper regarding 1532 mixer, translator and SSRC mux with SRTP to section application 1533 examples. Added section on SDP usage with Session and SSRC 1534 multiplexing. Added points in Design principles on translator/mixer and 1535 RTP multiplexing. Added additional founding information in Ack- 1536 section. Corrected reference for SVC and added reference for generic 1537 signaling. 1538 17.10, StW: Fixed many editorials, clarified MANE, mixer, translator 1539 and RTP packet stream throughout doc (hopefully consistently) 1540 18.10., removed comments, clarified B-Bit, changed definition of base- 1541 layer (do not need to be of the lowest temporal resolution),