idnits 2.17.1 

draft-wenger-avt-rtp-svc-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1465.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1442.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1449.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1455.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 2006) is 6396 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'SRTP' is defined on line 1409, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'MPEG4-10'

  == Outdated reference: A later version (-04) exists of
     draft-schierl-mmusic-layered-codec-01

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SVC'

  ** Obsolete normative reference: RFC 3984 (Obsoleted by RFC 6184)


     Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 9 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                        S. Wenger
3	Internet Draft                                               Y.-K. Wang
4	Document: draft-wenger-avt-rtp-svc-03.txt                    T. Schierl
5	Expires: April 2007
6	                                                          October 2006

8	                   RTP Payload Format for SVC Video

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on April 20, 2007.

35	Copyright Notice

37	   Copyright (C) The Internet Society (2006).

39	Abstract

41	   This memo describes an RTP Payload format for the scalable extension
42	   of the ITU-T Recommendation H.264 video codec which is the
43	   technically identical to ISO/IEC International Standard 14496-10
44	   video codec.  The RTP payload format allows for packetization of one
45	   or more Network Abstraction Layer Units (NALUs), produced by the
46	   video encoder, in each RTP payload.  The payload format has wide
47	   applicability, as it supports applications from simple low bit-rate
48	   conversational usage, to Internet video streaming with interleaved
49	   transmission, to high bit-rate video-on-demand.

51	Table of Content

53	   RTP Payload Format for SVC Video...............................1
54	   1. Introduction..............................................5
55	   1.1. SVC -- the scalable extensions of H.264/AVC................5
56	   2. Conventions...............................................5
57	   3. The SVC Codec.............................................6
58	   3.1. Overview................................................6
59	   3.2. Parameter Set Concept....................................7
60	   3.3. Network Abstraction Layer Unit Header......................7
61	   4. Scope...................................................11
62	   5. Definitions and Abbreviations .............................11
63	   5.1. Definitions............................................11
64	   5.2. Abbreviations..........................................14
65	   6. RTP Payload Format.......................................14
66	   6.1. Design Principles.......................................14
67	   6.2. RTP Header Usage........................................15
68	   6.3. Common Structure of the RTP Payload Format................16
69	   6.4. NAL Unit Header Usage...................................17
70	   6.5. Packetization Modes.....................................18
71	   6.6. Decoding Order Number (DON)..............................18
72	   6.7. Single NAL Unit Packet..................................19
73	   6.8. Aggregation Packets.....................................19
74	   6.9. Fragmentation Units (FUs)................................19
75	   6.10. Payload Content Scalability Information (PACSI) NAL Unit..19
76	   7. Packetization Rules ......................................22
77	   8. De-Packetization Process (Informative).....................22
78	   9. Payload Format Parameters.................................22
79	   9.1. MIME Registration.......................................23
80	   9.2. SDP Parameters .........................................25
81	   9.2.1. Mapping of MIME Parameters to SDP.......................25
82	   9.2.2. Usage with the SDP Offer/Answer Model...................25
83	   9.2.3. Usage with Session and SSRC multiplexing.................26
84	   9.2.4. Usage in Declarative Session Descriptions................26
85	   9.3. Examples...............................................26
86	   9.4. Parameter Set Considerations.............................26
87	   10.  Security Considerations.................................26
88	   11.  Congestion Control......................................26
89	   12.  IANA Consideration......................................27
90	   13.  Informative Appendix: Application Examples................27
91	   13.1. Introduction..........................................28
92	   13.2. Layered Multicast.....................................28
93	   13.3. Streaming of an SVC scalable stream.....................29
94	   13.4. Multicast to MANE, SVC scalable stream to endpoint........30
95	   13.5. SSRC Multiplexing in case of using SRTP .................32
96	   13.6. Scenarios currently not considered for complexity reasons.34
97	   13.7. Scenarios currently not considered for being unaligned with
98	   IP philosophy...............................................34
99	   14.  Acknowledgements........................................36
100	   15.  References.............................................36
101	   15.1. Normative References...................................36
102	   15.2. Informative References.................................37
103	   16.  Author's Addresses......................................37
104	   17.  Intellectual Property Statement..........................38
105	   18.  Disclaimer of Validity..................................38
106	   19.  Copyright Statement.....................................38
107	   20.  RFC Editor Considerations................................39
108	   21.  Open Issues............................................39
109	   22.  Changes Log............................................39

111	1. Introduction

113	1.1. SVC -- the scalable extensions of H.264/AVC

115	   This memo specifies an RTP [RFC3550] payload format for a
116	   forthcoming new mode of the H.264/AVC video codec, known as Scalable
117	   Video Coding (SVC). Formally, SVC will take the form of an Amendment
118	   to ISO/IEC 14496 Part 10 [MPEG4-10], and likely as one or more new
119	   Annexes of ITU-T Rec. H.264 [H.264].  It is planned to keep the
120	   technical alignment between the two mentioned specifications, as
121	   well as backward compatibility with previous versions of H.264/AVC.

123	   The current working draft of SVC is available for public review
124	   [SVC]. In this memo, SVC is used as an acronym for the mentioned
125	   scalable extensions of H.264/AVC.

127	   SVC covers all of H.264/AVC's applications, ranging from all forms
128	   of digital compressed video from, low bit-rate Internet streaming
129	   applications to HDTV broadcast and Digital Cinema applications with
130	   nearly lossless coding.

132	   This memo tries to follow a backward compatible enhancement
133	   philosophy similar to what the video coding standardization
134	   committees implement, by keeping as close an alignment to the
135	   H.264/AVC payload RFC [RFC3984] as possible.  It basically documents
136	   the enhancements relevant from an RTP transport viewpoint, defines
137	   signaling support for SVC, and deprecates the single NAL unit
138	   packetization mode of RFC 3984.

140	2. Conventions

142	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
143	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
144	   document are to be interpreted as described in BCP 14, RFC 2119
145	   [RFC2119].

147	   This specification uses the notion of setting and clearing a bit
148	   when bit fields are handled.  Setting a bit is the same as assigning
149	   that bit the value of 1 (On).  Clearing a bit is the same as
150	   assigning that bit the value of 0 (Off).

152	3. The SVC Codec

154	3.1. Overview

156	   SVC provides scalable video bitstreams.  In SVC, a scalable video
157	   bitstream contains a base layer conforming to the existing profiles
158	   of H.264 as defined in [H.264] and one or more enhancement layers.
159	   An enhancement layer may enhance the temporal resolution (i.e. the
160	   frame rate), the spatial resolution, or the quality of the video
161	   content represented by the lower layer or part thereof.  The
162	   scalable layers can be aggregated to a single RTP packet stream, or
163	   transported independently.

165	   The concept of video coding layer (VCL) and network abstraction
166	   layer (NAL) is inherited from H.264. The VCL contains the signal
167	   processing functionality of the codec; mechanisms such as transform,
168	   quantization, motion-compensated prediction, loop filtering and
169	   inter-layer prediction.  A coded picture of a base or enhancement
170	   layer consists of one or more slices.  The Network Abstraction Layer
171	   (NAL) encapsulates each slice generated by the VCL into one or more
172	   Network Abstraction Layer Units (NAL units). Please consult RFC 3984
173	   for a more in-depth discussion of the NAL unit concept.  SVC
174	   specifies the decoding order of these NAL units.

176	          [Edt. Note: The definition of a ''coded picture'' is currently
177	          under discussion in JVT. For now, we apply the same
178	          definition as in the AVC specification within a give scalable
179	          layer. That is, a ''coded picture'' consists of all the coded
180	          slices having identical values of dependency_id,
181	          quality_level and redundant_pic_cnt, respectively, in one
182	          access unit.]

184	   The term ''Layer'' in Video Coding Layer and Network Abstraction
185	   Layer refers to a conceptual distinction, and is closely related to
186	   syntax layers (block, macroblock, slice, ... layers). ''Layer'' here
187	   describes a syntax level of the bitstream in contrast to the meaning
188	   of layer as a nested part of the bitstream which may be discarded.
189	   It should not be confused with base and enhancement layers.

191	   The concept of temporal scalability is not newly introduced by SVC,
192	   as H.264 already supports it.  In [H.264], sub-sequences have been
193	   introduced in order to allow optional use of temporal layers.  [SVC]
194	   extends this approach by advertising the temporal layer information
195	   within the NAL unit header, or suffix NAL units, as discussed in
196	   section 3.3 and [SVC].  By our definition, the base layer may be
197	   scalable in the temporal dimension (only).

199	   The concept of scaling the visual content quality in the granularity
200	   of complete enhancement layers, i.e. through omitting the transport
201	   and decoding of entire enhancement layers, is denoted as coarse-
202	   grained scalability (CGS).  This is what is commonly understood as
203	   scalability in the IETF community.  According to SVC, a CGS layer
204	   may be a spatial or quality (SNR) enhancement layer.

206	   In some cases, the bit rate of a given enhancement layer may be
207	   reduced by truncating bits from individual NAL units.  Truncation
208	   leads to a graceful degradation of the video quality of the
209	   reproduced enhancement layer.  This concept is known as Fine
210	   Granularity Scalability (FGS).  In SVC, FGS is provided by a concept
211	   known as progressive refinement slices.

213	3.2. Parameter Set Concept

215	   The parameter set concept is inherited from [H.264]. Please see
216	   section 1.2 of RFC 3984 for more details.

218	   In SVC, pictures from different layers may use the same sequence or
219	   picture parameter set, but may also use different sequence or
220	   picture parameter sets.  If different sequence or picture parameter
221	   sets are used, then, at any time instant during the decoding
222	   process, there may be more than one active sequence or picture
223	   parameter set. Any specific active sequence parameter set remains
224	   unchanged throughout a coded video sequence in the layer in which
225	   the active sequence parameter set is referred to.  The active
226	   picture parameter set remains unchanged within a coded picture.

228	3.3. Network Abstraction Layer Unit Header
229	   An SVC NAL unit consists of a header of four bytes and the payload
230	   byte string.  SVC extends by that the NAL unit header defined in
231	   [H.264] by three additional bytes.  The header indicates the type of
232	   the NAL unit, the (potential) presence of bit errors or syntax
233	   violations in the NAL unit payload, information regarding the
234	   relative importance of the NAL unit for the decoding process, the
235	   layer decoding dependency information, and FGS fragmentation
236	   information. This RTP payload specification is designed to be
237	   unaware of the bit string in the NAL unit payload.

239	   The NAL unit header co-serves as the payload header of this RTP
240	   payload format.  The payload of a NAL unit follows immediately.

242	   The syntax and semantics of the NAL unit header are formally
243	   specified in [SVC], but the essential properties of the NAL unit
244	   header are summarized below.

246	   The first byte of the NAL unit header has the following format (the
247	   bit fields are the same as in [H.264] and [RFC3984], while the
248	   semantics have changed slightly, in a backward compatible way):

250	         +---------------+
251	         |0|1|2|3|4|5|6|7|
252	         +-+-+-+-+-+-+-+-+
253	         |F|NRI|  Type   |
254	         +---------------+

256	   F: 1 bit
257	   forbidden_zero_bit.  H.264 declares a value of 1 as a syntax
258	   violation.

260	   NRI: 2 bits
261	   nal_ref_idc.  A value of 00 indicates that the content of the NAL
262	   unit is not used to reconstruct reference pictures for inter picture
263	   prediction.  Such NAL units can be discarded without risking the
264	   integrity of the reference pictures in the same layer.  Values
265	   greater than 00 indicate that the decoding of the NAL unit is
266	   required to maintain the integrity of the reference pictures.

268	   Type: 5 bits
269	   nal_unit_type.  This component specifies the NAL unit payload type
270	   as defined in table 7-1 of [SVC], and later within this memo.  For a
271	   reference of all currently defined NAL unit types and their
272	   semantics, please refer to section 7.4.1 in [SVC].

274	   Previously, NAL unit types 20 and 21 (among others) have been
275	   reserved for future extensions.  SVC is using these two NAL unit
276	   types.  They indicate the presence of three more bytes as shown
277	   below.

279	            +---------------+---------------+---------------+
280	            |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
281	            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
282	            |RR |   PRID    | TL  | DID | QL|R|B|U|D|G|L| O |
283	            +---------------+---------------+---------------+

285	   RR: 2 bits
286	   reserved_zero_two_bits.  Reserved bits for future extension.  RR
287	   MUST be zero.

289	   PRID: 6 bits
290	   simple_priority_id.  This component specifies a priority identifier
291	   for the NAL unit.  A lower value of PRID indicates a higher
292	   priority.

294	   TL: 3 bits
295	   temporal_level indicates the temporal layer (or frame rate)
296	   hierarchy.  Informally put, a layer consisted of pictures of a
297	   smaller temporal_level value has a smaller frame rate.  A given
298	   temporal layer typically depends on the lower temporal layers (i.e.
299	   the temporal layers with smaller temporal_level values) but never
300	   depends on any higher temporal layer.

302	   DID: 3 bits
303	   dependency_id denotes the inter-layer coding dependency hierarchy.
304	   At any temporal location, a picture of a smaller dependency_id value
305	   may be used for inter-layer prediction for coding of a picture of a
306	   larger dependency_id value, while a picture of a larger
307	   dependency_id value is disallowed to be used for inter-layer
308	   prediction for coding of a picture of a smaller dependency_id value.

310	   QL: 2 bits
311	   quality_level designates the quality level hierarchy of a
312	   progressive refinement (PR) or quality (SNR) enhancement layer
313	   slice. At any temporal location and with identical dependency_id
314	   value, a picture with quality_level equal to ql uses a picture with
315	   quality_level equal to ql-1 for inter-layer prediction.

317	   R: 1 bit
318	   reserved_zero_bit.  Reserved bit for future extension.  R MUST be
319	   zero.

321	   B: 1 bit
322	   layer_base_flag indicates that no inter-layer prediction (of coding
323	   mode, motion, sample value, and/or residual prediction) is used for
324	   the current slice otherwise inter-layer prediction may be used.

326	   U: 1 bit
327	   use_base_prediction_flag indicates that the base representation of
328	   the reference pictures (i.e. only NAL units of the reference
329	   pictures with QL equal to zero are used for inter prediction) is
330	   used during the inter prediction process.

332	   D: 1 bit
333	   discardable_flag.  A value of 1 indicates that the content of the
334	   NAL unit with dependency_id equal to currDependencyId is not used in
335	   the decoding process of NAL units with dependency_id larger than
336	   currDependencyId.  Such NAL units can be discarded without risking
337	   the integrity of higher scalable layers with larger values of
338	   dependency_id.  discardable_flag equal to 0 indicates that the
339	   decoding of the NAL unit is required to maintain the integrity of
340	   higher scalable layers with larger values of dependency_id.

342	   G: 1 bit
343	   fragmented_flag indicates that the current NAL unit is fragmented,
344	   which may be the case for partitions of an FGS (progressive
345	   refinement) slice.

347	   L: 1 bit
348	   last_fragemented_flag indicates, that the NAL unit is the last
349	   fragment of a fragmented NAL unit.

351	   O: 2 bits
352	   fragemnet_order indicates the order in which the NAL units with
353	   fragmented_flag equal to 1 shall be ordered before the parsing
354	   process is started, starting from lower values.

356	   This memo introduces the same additional NAL unit types as RFC 3984,
357	   which are presented in section 6.3.  The NAL unit types defined in
358	   this memo are marked as unspecified in [SVC].  Moreover, this
359	   specification extends the semantics of F, NRI, PRID, D, TL, DID and
360	   QL as described in section 6.4.

362	4. Scope

364	   This payload specification can only be used to carry the "naked" SVC
365	   NAL unit stream over RTP, and not the byte stream format according
366	   to Annex B of [SVC].  Likely, the applications of this specification
367	   will be in the IP based multimedia communications fields including
368	   conversational multimedia, video telephony or video conferencing,
369	   Internet streaming and TV over IP.

371	   This specification allows, in a given RTP session, to encapsulate
372	   NAL units belonging to
373	     o the base layer only, detailed specification in [RFC3984], or
374	     o one or more enhancement layers, or
375	     o the base layer and one or more enhancement layers

377	5. Definitions and Abbreviations

379	5.1. Definitions

381	   This document uses the definitions of [SVC] and [H.264].  The
382	   following terms, defined in [SVC], are summed up for convenience:

384	   scalable bitstream:  An SVC compliant bit stream containing a base
385	   layer and at least one enhancement layer.

387	   suffix NAL unit:  A NAL unit that immediately follows another NAL
388	   unit in decoding order and contains descriptive information of the
389	   preceding NAL unit, which is referred to as the associated NAL unit.

391	   A suffix NAL unit shall have nal_ref_idc equal to 20 or 21, shall
392	   have dependency_id and quality_level both equal to 0, and shall not
393	   contain a coded slice.  A suffix NAL unit belongs to the same coded
394	   picture as the associated NAL unit.  A suffix NAL unit may be used
395	   for indicating temporal levels within the base layer.

397	   base layer:  The base layer is typically representing the minimal
398	   spatial resolution and, or minimal quality of an SVC bitstream.  The
399	   base layer must be fully complying with [H.264].  The base layer is
400	   independently decodable without the requirement of using any other
401	   layer of the SVC bitstream.  In SVC context each slice NAL unit in
402	   the base layer is associated with a suffix NAL unit, which has a
403	   four-byte NAL unit header containing all the syntax elements
404	   described in section 3.3.

406	          [Edt. Note: The definition of ''base layer'' is not deadly
407	          clear, mainly because of temporal scalability. One definition
408	          is to call all the coded pictures in the lowest inter-layer
409	          coding hierarchy (i.e. having both dependency_id and
410	          quality_level equal to 0) as the base layer. This concept
411	          works perfectly if there is no temporal scalability. Another
412	          definition is to call all the coded pictures having
413	          temporal_level, dependency_id and quality_level all equal to
414	          0 as the base layer. Yet another definition is to define the
415	          layer for which the bitstream of the scalable layer
416	          representation is non-scalable as the base layer. However,
417	          the absolutely non-scalable stream is the bitstream
418	          consisting of only one IDR picture having both dependency_id
419	          and quality_level equal to 0.]

421	   operation point:  An operation point of a SVC bitstream represents a
422	   certain level of temporal, spatial and quality scalability.  An
423	   operation point contains all NAL units required for restoring a
424	   valid bitstream (conforming to [SVC]) up to a certain SVC layer.
425	   The operation point is further described by simple_priority_id,
426	   temporal_level, dependency_id, and quality_level values of that
427	   layer.

429	   scalable enhancement layer:  An SVC enhancement layer is identified
430	   by simple_priority_id, temporal_level, dependency_id, and
431	   quality_level as defined in [SVC] and summarized in section 3.3.

433	   access unit:  A set of NAL units pertaining to a certain temporal
434	   location. An access unit includes the slice data of the pictures of
435	   all scalable layers at that temporal location and possibly other
436	   associated data, e.g. SEI messages and parameter sets.

438	   coded video sequence:  A sequence of access units that consists, in
439	   decoding order, of an instantaneous decoding refresh (IDR) access
440	   unit followed by zero or more non-IDR access units including all
441	   subsequent access units up to but not including any subsequent IDR
442	   access unit.

444	   IDR access unit:  An access unit in which all the primary coded
445	   pictures are IDR pictures.  Such an access unit allows for random
446	   access to any layer combination.

448	   IDR picture:  A coded picture with the property that the decoding of
449	   this coded picture and all the following coded pictures in decoding
450	   order, with the same value of dependency_id, can be performed
451	   without inter prediction from any picture prior to the coded picture
452	   in decoding order with the same value of dependency_id.  Thus an IDR
453	   picture allows for random access to the scalable layer, which it
454	   belongs to.  An IDR picture causes a "reset" in the decoding process
455	   of the scalable layer containing the IDR picture.

457	   progressive refinement (PR) slice:  A progressive refinement slice
458	   is contained in an SVC NAL unit that may be truncated since the end
459	   of the slice header for bit-rate and quality reduction.  PR slices
460	   provide Fine Granularity Scalability (FGS).

462	   The following terms are itemized for clarification on RTP
463	   multiplexing strategies.  For further information and discussion on
464	   RTP multiplexing, we refer to section 5.2 of [RFC3550]:

466	   RTP packet stream: A sequence of RTP packets with increasing
467	   sequence numbers, identical PT and SSRC, carried in one RTP session,
468	   and utilized to transport an integer number of SVC layers (which may
469	   be FGS scalable).

471	   Single-Sender RTP Session: an (perhaps multicasted) RTP session in
472	   which all RTP packet streams in the session stem from entities that
473	   are in close cooperation, and can coordinate SSRC values.  By
474	   definition, in Single-Sender RTP Sessions, SSRC collisions on the
475	   forward media path cannot occur.  Note that, in practice, the
476	   ''entities in close cooperation'' likely run on the same machine and
477	   communicate through non-protocol means, or they communicate by
478	   protocols outside the RTP/SIP/SDP environment.

480	   Session multiplexing:  The scalable SVC bitstream is distributed
481	   onto different RTP sessions, whereby each RTP session carries one
482	   RTP packet stream.  Each RTP session requires a separate signaling
483	   and has a separate Timestamp, Sequence Number, and SSRC space.
484	   Dependency between sessions MUST be signaled according to
485	   [SDPsiglay].

487	   SSRC multiplexing:  The scalable SVC bitstream is distributed in a
488	   single RTP session, but that session comprises more than one RTP
489	   packet stream, identified by its SSRC.
490	   The use of SSRC multiplexing MUST be signaled according to
491	   [SDPsiglay].

493	5.2. Abbreviations

495	   In addition to the abbreviations defined in [RFC3984], the following
496	   ones are defined.

498	   CGS:       Coarse Granularity Scalability
499	   FGS:       Fine Granularity Scalability

501	6. RTP Payload Format

503	6.1. Design Principles

505	   The authors observed the following design principles:

507	   o Backward compatibility with RFC 3984 wherever possible.

509	   o As the SVC base layer is H.264/AVC compatible, we assume the base
510	     layer (when transmitted in its own session) to be
511	     encapsulated using RFC 3984.  Requiring this has the desirable
512	     side effect that it can be used by RFC 3984 legacy devices.

514	   o MANEs are signaling aware and rely on signaling information.
515	     MANEs have state.

517	   o MANEs can terminate RTP sessions, and create different RTP
518	   sessions
519	     with perhaps modified content.  This form of a MANE acts as an RTP
520	     mixer.  Mixer-MANEs necessarily need to be in the SRTP security
521	     context.

523	   o MANEs can also perform very limited functionality, namely
524	   aggregate
525	     multiple RTP packet streams into a single RTP stream within the
526	     same session, by utilizing SSRC multiplexing.  In this case, a
527	   MANE
528	     acts as a translator, and does not necessarily need to be in the
529	     security context.

531	   o Packet integrity needs to be preserved end-to-end (whereby
532	     end-to-end can mean endpoint to endpoint but also endpoint to
533	     MANE, if (and only if) the MANE acts as a Mixer).

535	   o In case of layered multicast transmission as motivated in section
536	     13.2, SVC layers are transported in different RTP sessions
537	     (Session multiplexing).  If the application should require a
538	     layered transmission on session level, the SVC layers are
539	     transported in different RTP packet streams within a single RTP
540	     session, each stream identified by a unique SSRC (SSRC
541	     multiplexing).  SSRC multiplexing may further allow for adaptation
542	     of an RTP session in the security context, further discussion can
543	     be found in section 13.5.

545	6.2. RTP Header Usage

547	   Please see section 5.1 of RFC 3984 [RFC3984].  The following applies
548	   in addition.

550	   When different layers of a SVC bitstream are transported over more
551	   than one RTP session, e.g. in layered multicast, for which the use
552	   case is given in 13.2, SSRC multiplexing, as described below, MAY be
553	   applied.

555	   When SSRC multiplexing is in use the same IP address and port number
556	   are shared between all RTP streams and all layers, while the
557	   relative importance for the decoding process of each RTP stream
558	   and/or layer is differentiated by the SSRC values.  The SSRC value
559	   space is evenly allocated to a number of sub value spaces, with the
560	   number of sub value spaces being equal to the number of RTP packet
561	   streams forming the RTP session for which SSRC multiplexing is used.
562	   The first RTP packet stream conveying the lowest layers is mapped to
563	   the first sub SSRC value space with the lowest SSRC values, the
564	   second RTP packet stream conveying the second lowest layers is
565	   mapped to the second sub SSRC value space with the second lowest
566	   SSRC values, and so on.  For the RTP packets of a certain RTP packet
567	   stream, the SSRC value is randomly selected from the corresponding
568	   sub SSRC value space. This way, a packet with a higher SSRC value
569	   contains data belonging to higher layers or layers of lower
570	   transport priority.

572	   SSRC multiplexing as discussed above, in conjunction with multicast
573	   from multiple senders requires that a) all streams SSRC multiplexed
574	   in the same session carry data of the same layered bitstream, and b)
575	   that the different senders are aware (by unspecified means of
576	   signaling) of the relative importance of the RTP packet streams they
577	   emit.  Otherwise, it would be impossible to enforce the allocation
578	   of SSRC numbering spaces according to the importance for the
579	   decoding process.  In other words, SSRC multiplexing as discussed
580	   above works only for Single-Sender RTP sessions.

582	   Note: in practice, it appears that SSRC multiplexing, due to the
583	   above limitation, results in requiring a single entity to send all
584	   RTP packet streams.  No signaling means are currently available that
585	   would allow different senders to coordinate the SSRC value spaces to
586	   use.

588	6.3. Common Structure of the RTP Payload Format

590	   Please see section 5.2 of RFC 3984 [RFC3984].

592	6.4. NAL Unit Header Usage

594	   The structure and semantics of the NAL unit header were introduced
595	   in section 3.3.  This section specifies the semantics of F, NRI,
596	   PRID, D, TL, DID, QL, B, U, G, L, and O according to this
597	   specification.

599	   The semantics of F specified in section 5.3 of [RFC3984] also
600	   applies herein.

602	   For NRI, for the bitstream that is compliant with [H.264], the
603	   semantics specified in section 5.3 of [RFC3984] are applicable,
604	   otherwise only the semantics specified in SVC [SVC] is applicable.

606	   For PRID, the semantics specified in [SVC] applies.  MANEs
607	   implementing unequal error protection may use this information to
608	   protect NAL units with smaller PRID values better than those with
609	   larger PRID values, for example by including only the more important
610	   NAL units in a FEC protection mechanism.  The desirable transport
611	   priority increases as the PRID value increases.

613	   For D, MANEs may use this information to protect NAL units with D
614	   equal to 0 better than NAL units with D equal to 1. Furthermore a
615	   MANE or a receiver may determine whether a given NAL unit is
616	   required for successfully decoding a certain operation point of the
617	   SVC bitstream.

619	   For TL, DID and QL, in addition to the semantics specified in [SVC],
620	   according to this memo, values of TL, DID or QL indicate the
621	   relative priority in their respective dimension.  A higher value of
622	   TL, DID or QL indicates a higher priority if the other two
623	   components are identical correspondingly.  MANEs may use this
624	   information to protect more important NAL units better than less
625	   important NAL units.

627	      Informative note: PRID, D, TL, DID, and QL, in combination,
628	      provide complete information of the relative priority of a NAL
629	      unit compared to any other NAL unit. [Edt. note: examples may be
630	      provided in Informative Appendix 13 in future versions.]

632	   For B, in addition to the semantics specified in [SVC], according to
633	   this memo, a MANE or receiver may use this information in order to
634	   identify the [H.264] conforming base layer NAL units (if marked by a
635	   suffix NAL unit) and may determine the temporal layer (by the TL
636	   value of the suffix NAL unit) of it.  Thus it allows for generating
637	   an outgoing RTP stream, with a certain temporal scalability layer
638	   that conforms to [RFC3984] and [H.264].

640	   For U, the semantics specified in [SVC] apply.

642	   For G, L and O, in addition to the semantics specified in [SVC],
643	   according to this memo, a MANE or receiver may detect a fragmented
644	   PR slice by G, L and O.  Using this knowledge may let the MANE do
645	   FGS adaptation on the PR slice, by forwarding not all of the
646	   fragments in fragement_order (O).

648	6.5. Packetization Modes

650	   Please see section 5.4 of RFC 3984 [RFC3984].  The single NAL unit
651	   packetization mode SHALL NOT be used.

653	     Informative note: The non-interleaved mode allows an application
654	     to encapsulate a single NAL unit in a single RTP packet.
655	     Historically, the single NAL unit mode has been included into
656	     [RFC3984] only for compatibility with ITU-T Rec. H.241 Annex A.
657	     There is no point in carrying this historic ballast towards a new
658	     application space such as the one provided with SVC.  More
659	     technically speaking, the implementation complexity increase for
660	     providing the additional mechanisms of the non-interleaved mode
661	     (namely STAPs) is so minor, and the benefits are so great, that we
662	     require STAP implementation.

664	6.6. Decoding Order Number (DON)

666	   Please see section 5.5 of RFC 3984 [RFC3984]. The following applies
667	   in addition.

669	   When different layers of a SVC bitstream are transported in more
670	   than one RTP packet stream (regardless of the use of session or SSRC
671	   multiplexing, or a combination thereof), the interleaved
672	   packetization mode MUST be used, and the DON values of all the NAL
673	   units MUST indicate the correct NAL unit decoding order over all the
674	   RTP packet streams.  If Session multiplexing is used, each session
675	   MUST signal the same value for the (marked as optional, but for this
676	   use case mandatory) MIME parameters sprop-interleaving-depth, sprop-
677	   max-don-diff, sprop-deint-buf-req, and sprop-init-buf-time.  Further
678	   these values must be valid for the reception capabilities over all
679	   sessions.  A receiver MUST signal the same (marked as optional, but
680	   for this use case mandatory) MIME parameter deint-buf-cap for all
681	   sessions used for Session multiplexing.

683	6.7. Single NAL Unit Packet

685	   Please see section 5.6 of RFC 3984 [RFC3984].

687	6.8. Aggregation Packets

689	   Please see section 5.7 of RFC 3984 [RFC3984].

691	6.9. Fragmentation Units (FUs)

693	   Please see section 5.8 of RFC 3984 [RFC3984].

695	6.10.    Payload Content Scalability Information (PACSI) NAL Unit

697	   A new NAL unit type is specified, and referred to as payload content
698	   scalability information (PACSI) NAL unit.  The PACSI NAL unit, if
699	   present, MUST be the first NAL unit in an aggregation packet, and it
700	   MUST NOT be present in other types of packets.  The PACSI NAL unit
701	   indicates scalability characteristics that are common for all the
702	   remaining NAL units in the payload, thus making it easier for MANEs
703	   to decide whether to forward or discard the packet.  Senders MAY
704	   create PACSI NAL units and receivers can ignore them.

706	      Informative note: The NAL unit type for the PACSI NAL unit is
707	      selected among those values that are unspecified in the H.264/AVC
708	      specification and in RFC 3984 -- and therefore are ignored by
709	      receiver.  Hence an SVC stream, even when including PACSI NAL
710	      units, can be processed with RFC 3984 receivers and H.264/AVC
711	      decoders.

713	   When the first aggregation unit of an aggregation packet contains a
714	   PACSI NAL unit, there MUST be at least one additional aggregation
715	   unit present in the same packet.  The RTP header fields are set
716	   according to the remaining NAL units in the aggregation packet.

718	   When a PACSI NAL unit is included in a multi-time aggregation
719	   packet, the decoding order number for the PACSI NAL unit MUST be set
720	   to indicate that the PACSI NAL unit is the first NAL unit in
721	   decoding order among the NAL units in the aggregation packet or the
722	   PACSI NAL unit has an identical decoding order number to the first
723	   NAL unit in decoding order among the remaining NAL units in the
724	   aggregation packet.

726	   The structure of PACSI NAL unit is exactly the same as the four-byte
727	   SVC NAL unit header specified in 3.3, and reproduced here once more
728	   for convenience:.
729	    +---------------+---------------+---------------+---------------+
730	    |0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|0|1|2|3|4|5|6|7|
731	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
732	    |F|NRI|  Type   |RR |   PRID    | TL  | DID | QL|R|B|U|D|G|L| O |
733	    +---------------+---------------+---------------+---------------+

735	   The values of the fields in PACSI NAL unit MUST be set as follows.

737	   o The F bit MUST be set to 1 if the F bit in at least one remaining
738	     NAL unit in the payload is equal to 1.  Otherwise, the F bit MUST
739	     be set to 0.

741	   o The NRI field MUST be set to the highest value of NRI field among
742	     all the remaining NAL units in the payload.

744	   o The Type field MUST be set to 30.

746	   o The RR field or reserved_zero_two_bits field (2 bits) MUST be set
747	     to 0.

749	   o The PRID field MUST be set to the lowest value of the PRID values
750	     associated with all the remaining NAL units in the payload.

752	   o The TL field MUST be set to the lowest value of the TL values
753	     associated with all the remaining NAL units in the payload.

755	   o The DID field MUST be set to the lowest value of the DID values
756	     associated with all the remaining NAL units in the payload.

758	   o The QL field MUST be set to the lowest value of the QL values
759	     associated with all the remaining NAL units in the payload.

761	   o The R field or reserved_zero_bit field (1 bit) MUST be set to 0.

763	   o The B field or layer_base_flag field (1 bit) MUST be set to 1 if
764	     the layer_base_flag associated with all the remaining NAL units in
765	     the payload is equal to 1.  Otherwise, layer_base_flag MUST be set
766	     to 0.

768	   o The U field or use_base_prediction_flag field (1 bit)MUST be set
769	     to 1 if the use_base_prediction_flag associated with all the
770	     remaining NAL units in the payload is equal to 1.  Otherwise,
771	     use_base_prediction_flag MUST be set to 0.

773	   o The D bit MUST be set to 0 if the D value associated with at least
774	     one remaining NAL unit in the payload is equal to 0.  Otherwise,
775	     the D bit MUST be set to 1.

777	   o The G field or fragmented_flag field (1 bit) MUST be set to 1 if
778	     the fragmented_flag associated with all the remaining NAL units in
779	     the payload is equal to 1.  Otherwise, fragmented_flag MUST be set
780	     to 0.

782	   o The L field or last_fragment_flag field (1 bit) MUST be set to 1
783	   if
784	     the last_fragment_flag associated with all the remaining NAL units
785	     in the payload is equal to 1.  Otherwise, last_fragment_flag MUST
786	     be set to 0.

788	   o The O field or fragment_order field (2 bits) MUST be set to the
789	     lowest value of frame_order associated with all the remaining NAL
790	     units in the payload.

792	7. Packetization Rules

794	   Please see section 6 of RFC 3984 [RFC3984].  The following rules
795	   apply in addition.

797	   The single NAL unit mode SHALL NOT be used.  (See also section 6.5
798	   for the motivation).

800	   When a suffix NAL unit is encapsulated for transmission, it SHOULD
801	   be aggregated to the same transmission packet as the NAL unit
802	   preceding the suffix NAL unit in decoding order.

804	   When different layers of a SVC bitstream are transported in more
805	   than one RTP packet stream, the interleaved packetization mode MUST
806	   be used.

808	8. De-Packetization Process (Informative)

810	   Please see section 7 of RFC 3984 [RFC3984].  The following rules
811	   apply in addition.

813	   [Edt. Do we need here more information about cross layer DON?  Maybe
814	   in the next version.]

816	9. Payload Format Parameters

818	   [Edt. note: this section 9 and its subsections will be updated
819	   according to the changes listed below, a little later in the
820	   process.  For now, we just list the adjustments necessary, so not to
821	   bury any new information in the RFC 3984 text.]

823	   Section 8 of [RFC3984] applies with the following modification.

825	   The sentence

827	   ''The parameters are specified here as part of the MIME subtype
828	   registration for the ITU-T H.264 | ISO/IEC 14496-10 codec.''

830	   is replaced with
831	   ''The parameters are specified here as part of the MIME subtype
832	   registration for the SVC codec.''

834	9.1. MIME Registration

836	          Editor's note: this needs to be updated by copy-pasting the
837	          RFC 3984 MIME registration into this document, so to make it
838	          self-contained.  Will be done later in the process.

840	   The MIME subtype for the SVC codec is allocated from the IETF tree.

842	   The receiver MUST ignore any unspecified parameter.

844	   Media Type name:     video

846	   Media subtype name:  H.264-SVC

848	   Required parameters: none

850	   OPTIONAL parameters:

852	   The optional MIME parameters specified in [RFC3984] apply, with the
853	   following constraints (to be edited in at the appropriate time):

855	   sprop-interleaving-depth:
856	   In case of using Session multiplexing, the same sprop-interleaving-
857	   depth value MUST be signaled for all sessions and MUST be valid over
858	   all sessions of the multiplex.

860	   sprop-max-don-diff:
861	   In case of using Session multiplexing, the same sprop-max-don-diff
862	   value MUST be signaled for all sessions and MUST be valid over all
863	   sessions of the multiplex.

865	   sprop-deint-buf-req:
866	   In case of using Session multiplexing, the same sprop-deint-buf-req
867	   value MUST be signaled for all sessions and MUST be valid over all
868	   sessions of the multiplex.

870	   sprop-init-buf-time:

872	   In case of using Session multiplexing, the same sprop-init-buf-time
873	   value MUST be signaled for all sessions and MUST be valid over all
874	   sessions of the multiplex.

876	   deint-buf-cap:
877	   In case of using Session multiplexing, the same deint-buf-cap value
878	   MUST be signaled by the receiver for all sessions and MUST be valid
879	   over all sessions of the multiplex.

881	   In addition the following optional MIME parameters apply:

883	   sprop-scalability-info:
884	   This parameter MAY be used to convey the NAL unit containing the
885	   scalability information SEI message that MUST precede any other NAL
886	   units in decoding order. The parameter MUST NOT be used to indicate
887	   codec capability in any capability exchange procedure.  The value of
888	   the parameter is the base64 representation of the NAL unit
889	   containing the scalability information SEI message as specified in
890	   [SVC].

892	   sprop-transport-priority:
893	   This parameter MAY be used to signal the transport priority
894	   indicator value(s) in terms of second and third bytes of the SVC NAL
895	   unit header for one or more SVC layer(s) conveyed in one RTP
896	   session.  A transport priority indicator is base64 coded.  If more
897	   than one layer is transmitted within one RTP session, the transport
898	   priority indicator value of each layer MUST be itemized with
899	   decreasing importance for decoding and MUST be comma-separated.

901	      Encoding considerations:
902	                           This type is only defined for transfer
903	                           via RTP (RFC 3550).

905	      Security considerations:
906	                           See section 9 of this specification.

908	      Public specification:
909	                           Please refer to section 15 of this
910	                           specification.

912	      Additional information:
913	                           None

915	      File extensions:     none
916	      Macintosh file type code: none
917	      Object identifier or OID: none
918	      Person & email address to contact for further information:
919	      Intended usage:      COMMON
920	      Author:
921	      Change controller:
922	                           IETF Audio/Video Transport working group
923	                           delegated from the IESG.

925	9.2. SDP Parameters

927	9.2.1.   Mapping of MIME Parameters to SDP

929	   The MIME media type video/SVC string is mapped to fields in the
930	   Session Description Protocol (SDP) as follows:

932	   *  The media name in the "m=" line of SDP MUST be video.

934	   *  The encoding name in the "a=rtpmap" line of SDP MUST be SVC (the
935	      MIME subtype).

937	   *  The clock rate in the "a=rtpmap" line MUST be 90000.

939	   *  The OPTIONAL parameters "profile-level-id", "max-mbps", "max-fs",
940	      "max-cpb", "max-dpb", "max-br", "redundant-pic-cap", "sprop-
941	      parameter-sets", "parameter-add", "packetization-mode", "sprop-
942	      interleaving-depth", "deint-buf-cap", "sprop-deint-buf-req",
943	      "sprop-init-buf-time", "sprop-max-don-diff", "max-rcmd-nalu-
944	      size'', ''sprop-transport-priority'', and ''sprop-scalability-
945	      info'', when present, MUST be included in the "a=fmtp" line of
946	      SDP. These parameters are expressed as a MIME media type string,
947	      in the form of a semicolon separated list of parameter=value
948	      pairs.

950	9.2.2.   Usage with the SDP Offer/Answer Model

952	   TBD.

954	9.2.3.   Usage with Session and SSRC multiplexing

956	   If Session or SSRC multiplexing is used, the rules on signaling
957	   media decoding dependency in SDP as defined in [SDPsiglay] apply.
958	   Further the use of SSRC multiplexing must be signaled according to
959	   [SDPsiglay].

961	9.2.4.   Usage in Declarative Session Descriptions

963	   TBD.

965	9.3. Examples

967	   TBD.

969	9.4. Parameter Set Considerations

971	   Please see section 10 of RFC 3984 [RFC3984].

973	10. Security Considerations

975	   Please see section 11 of RFC 3984 [RFC3984].

977	11. Congestion Control

979	   Within any given RTP session carrying payload according to this
980	   specification, the provisions of section 12 of RFC 3984 [RFC3984]
981	   apply.

983	   One key motivation for the recent attention to scalable codecs has
984	   been the increasing awareness of media codec designers to network
985	   congestion.  While CGS scalability cannot reduce congestion for the
986	   transport path of a given RTP session, MANEs and layered multicast
987	   technologies can be used to alleviate congestion on a larger scale.
988	   FGS scalability can be helpful to reduce session bandwidth both end-
989	   to-end (with pre-coded content) and in network segments, again
990	   assuming the use of MANEs.

992	   MANEs MAY alleviate congestion on their outgoing network path by
993	   a) removing the NAL units belonging to hierarchically ''highest''
994	      enhancement layer (or set of enhancement layers) from an RTP
995	      stream carrying base and enhancement layers.
996	   b) removing some or all bits of a given FGS NAL unit as long as the
997	      remaining bits still form a conforming SVC NAL unit.

999	   [Edt. Note: In the following paragraph, ''translator'' and ''mixer''
1000	   are not used consistently with RFC 3550.  What we think we would
1001	   need is a ''mixer'' that mixes only a single input in a single output
1002	   (as a mixer terminates sessions).  A ''Translator'' (that does not
1003	   terminate the RTP session) carries certain unnecessary baggage which
1004	   appears to make it undesirable for MANEs.  The following paragraph
1005	   can either be fixed into RFC 3550 style and logic (thereby removing
1006	   an operation point we consider desirable), or we would need to
1007	   explain in detail what we want to do (not really congestion control
1008	   related and long).  Perhaps we refer to the detailed discussions in
1009	   the CCM draft...  Added to open issues.

1011	   In both cases, the incoming RTP session is terminated in the MANE,
1012	   and a second RTP session originates at the MANE.  The MANE acts as
1013	   an RTP translator.  The concept of scalability keeps the
1014	   implementation and computational effort within the MANE low, and
1015	   avoids expensive and delay-intensive full transcoding (in the sense
1016	   of reconstruction and re-encoding).]

1018	   When scalable layers are transported in their own RTP sessions, an
1019	   RTP receiver SHOULD unsubscribe to one or more enhancement layers
1020	   when it senses congestion, similar to what has been described in
1021	   [McCanne/Vetterli].  This behavior could perhaps be sufficient to
1022	   ease the network load to an acceptable level of congestion.
1023	   Nevertheless, it MUST follow the mechanisms described in section 12
1024	   of [RFC3984].

1026	12. IANA Consideration

1028	   [Edt. Note: A new MIME type should be registered from IANA.]

1030	13. Informative Appendix: Application Examples
1031	13.1.    Introduction

1033	   Scalable video coding is a concept that has been around at least
1034	   since MPEG-2 [MPEG2], which goes back as early as 1993.
1035	   Nevertheless, it has never gained wide acceptance; perhaps partly
1036	   because applications didn't materialize in the form envisioned
1037	   during standardization.

1039	   MPEG and JVT, respectively, performed a requirement analysis before
1040	   the SVC project was launched.  Dozens of scenarios have been
1041	   studied.  While some of the scenarios appear not to follow the most
1042	   basic design principles of the Internet -- and are therefore not
1043	   appropriate for IETF standardization -- others are clearly in the
1044	   scope of IETF work.  Of these, this draft chooses the following
1045	   subset for immediate consideration.  Note that we do not reference
1046	   the MPEG and JVT documents directly; partly, because at least the
1047	   MPEG documents have a limited lifespan and are not publicly
1048	   available, and partly because the language used in these documents
1049	   is inappropriately video centric and imprecise, when it comes to
1050	   protocol matters.

1052	   With these remarks, we now introduce three main application
1053	   scenarios that we consider as relevant, and that are implementable
1054	   with this specification.

1056	13.2.    Layered Multicast

1058	   This well-understood form of the use of layered coding
1059	   [McCanne/Vetterli] implies that all layers are individually conveyed
1060	   in their own RTP packet streams, each carried in its own RTP session
1061	   using the IP (multicast) address and port number as the single
1062	   demultiplexing point.  Receivers ''tune'' into the layers by
1063	   subscribing to the IP multicast, normally by using IGMP [IGMP].

1065	   Layered Multicast has the great advantage of simplicity and easy
1066	   implementation.  However, it has also the great disadvantage of
1067	   utilizing many different transport addresses.  While we consider
1068	   this not to be a major problem for a professionally maintained
1069	   content server, receiving client endpoints need to open many ports
1070	   to IP multicast addresses in their firewalls.  This is a practical
1071	   problem from a firewall/NAT viewpoint.  Furthermore, even today IP
1072	   multicast is not as widely deployed as many wish.

1074	   We consider layered multicast an important application scenario for
1075	   three reasons.  First, it is well understood and the implementation
1076	   constraints are well known.  There may well by large scale IP
1077	   networks outside the immediate Internet context that may wish to
1078	   employ layered multicast in the future.  One possible example could
1079	   be a combination of content creation and core-network distribution
1080	   for the various mobile TV services, e.g. those being developed by
1081	   3GPP (MBMS) [MBMS] and DVB (DVB-H) [DVB-H].  Finally, when one base
1082	   and one enhancement layer is in use and are being conveyed
1083	   separately, that represents one operation point of layered
1084	   multicast.

1086	13.3.    Streaming of an SVC scalable stream

1088	   In this scenario, a streaming server has a repository of stored SVC
1089	   coded layers for a given content.  At the time of streaming, and
1090	   according to the capabilities and connectivity of the client(s), the
1091	   streaming server generates a scalable stream.  This scalable stream
1092	   is served to the client(s).  Both unicast and multicast serving is
1093	   possible.  At the same time, the streaming server may use the same
1094	   repository of stored layers to compose different streams (with a
1095	   different set of layers) intended for different audiences.

1097	   As every endpoint receives only a single SVC RTP session, the number
1098	   of firewall pinholes can be optimized.  In fact, only a single
1099	   firewall pinhole is required.

1101	   The main difference between this scenario and straightforward
1102	   simulcasting lies in the architecture and the requirements of the
1103	   streaming server, and is therefore out of the scope of IETF
1104	   standardization.  However, compelling arguments can be made why such
1105	   a streaming server design makes sense.  One possible argument is
1106	   related to storage space and channel bandwidth.  Another is
1107	   bandwidth adaptivity without transcoding -- a considerable advantage
1108	   in a congestion controlled network.  When the streaming server
1109	   learns about congestion, it can reduce sending bitrate by choosing
1110	   fewer layers when composing the layered stream.  SVC is designed to
1111	   gracefully support both bandwidth rampdown and bandwidth rampup with
1112	   a considerable dynamic range.  This payload format is designed to
1113	   allow for bandwidth flexibility in the mentioned sense, both for CGS
1114	   and FGS layers.  While, in theory, a transcoding step could achieve
1115	   a similar dynamic range, the computational demands are impractically
1116	   high and video quality is typically lowered -- therefore, few (if
1117	   any) streaming servers implement full transcoding.

1119	13.4.    Multicast to MANE, SVC scalable stream to endpoint

1121	   This final scenario is a bit more complex, and designed to optimize
1122	   the network traffic in a core network, while still requiring only a
1123	   single pinhole in the endpoint's firewall.  One of its key
1124	   applications is the mobile TV market.

1126	   Consider a large IP network, e.g. the core network of 3GPP.
1127	   Streaming servers within this core network can be assumed to be
1128	   professionally maintained.  We assume that these servers can have
1129	   many ports open to the network and that layered multicast is a real
1130	   option.  Therefore, we assume that the streaming server multicasts
1131	   SVC scalable layers, instead of simulcasting different
1132	   representations of the same content at different bit rates.

1134	   Also consider many endpoints of different classes.  Some of these
1135	   endpoints may not have the processing power or the display size to
1136	   meaningfully decode all layers; other may have these capabilities.
1137	   Users of some endpoints may not wish to pay for high quality and are
1138	   happy with a base service, which may be cheaper or even free.  Other
1139	   users are willing to pay for high quality.  Finally, some connected
1140	   users may have a bandwidth problem in that they can't receive the
1141	   bandwidth they would want to receive -- be it through congestion,
1142	   connectivity, change of service quality, or for whatever other
1143	   reasons.  However, all these users have in common that they don't
1144	   want to be exposed too much, and therefore the number of firewall
1145	   pinholes need to be small.

1147	   This situation can be handled best by introducing middleboxes close
1148	   to the edge of the core network, which receive the layered multicast
1149	   streams and compose the single SVC scalable bit stream according to
1150	   the needs of the endpoint connected.  These middleboxes are called
1151	   MANEs throughout this specification.  In practice, we envision the
1152	   MANE to be part of (or at least physically and topologically close
1153	   to) the base station of a mobile network, where all the signaling
1154	   and media traffic necessarily are multiplexed on the same physical
1155	   link.  This is why we do not worry too much about decomposition
1156	   aspects of the MANE as such.

1158	   MANEs necessarily need to be fairly complex devices.  They certainly
1159	   need to understand the signaling, so, for example, to associate the
1160	   PT octet in the RTP header with the SVC payload type.

1162	   A MANE may terminate the multicasted layered RTP sessions incoming
1163	   from the core network side, and create new RTP sessions (perhaps
1164	   even multicast sessions) to the endpoints connected to them.  In RTP
1165	   terminology, these types of MANEs are RTP mixers.  This implies, per
1166	   RFC 3550, a very loose relationship between the incoming and
1167	   outgoing RTP sessions.  In particular, there is no direct
1168	   relationship between the incoming and outgoing RTP sequence numbers,
1169	   RTP timestamps, payload types used, etc.

1171	   Mixer-based MANEs are conceptually easy to implement and can offer
1172	   powerful features, primarily because they necessarily can ''see'' the
1173	   payload (including the RTP payload headers), utilize the wealth of
1174	   layering information available therein, and manipulate it.

1176	   While a mixer-based MANE operation in its most trivial form
1177	   (combining multiple RTP packet streams into a single one) can be
1178	   implemented comparatively simply -- reordering the incoming packets
1179	   according to the DON and sending them in the appropriate order --
1180	   more complex forms can also be envisioned.  For example, a mixer-
1181	   type MANE can be optimizing the outgoing RTP stream to the MTU size
1182	   of the outgoing path by utilizing the aggregation and fragmentation
1183	   mechanisms of this memo.

1185	   A MANE can also act as a translator.  In this case, we envision its
1186	   functionality to be limited to the manipulation of the transport
1187	   addresses, so to enable SSRC multiplexing.  The most compelling use
1188	   case appears to be to forward multiple incoming RTP packets streams
1189	   (conveyed to their own transport addresses) to a single firewall
1190	   pinhole.  The translator variant of the MANE does not terminate RTP
1191	   sessions, but rather ''translate'' them in a very simple way -- by
1192	   changing the transport address -- so to SSRC-multiplex multiple
1193	   sessions onto a single transport address.  What sounds trivial at
1194	   the first glance is in reality a highly complex process primarily
1195	   due to the need of appropriate RTCP processing.  This is
1196	   particularly true when individual packets are intentionally being
1197	   pruned or removed from the incoming session, which may be necessary
1198	   to support FGS.

1200	   Translator-based MANEs appear to be able to offer a limited amount
1201	   of functionality without being in the security context, which opens
1202	   up additional application range.  Whether this form of a Translator
1203	   based MANE is actually feasible, and whether it offers sufficient
1204	   benefits to warrant the additional specification burden is open for
1205	   discussion, and input is solicited.

1207	   While the implementation complexity of either case of a MANE, as
1208	   discussed above, is fairly high, the computational demands are
1209	   comparatively low.  In particular, SVC and/or this specification
1210	   contain means to easily generate the correct inter-layer decoding
1211	   order of NAL units.  It is also simple to identify the fine
1212	   granularity scalable bits in a given NAL unit.  No serious bit-
1213	   oriented processing is required and no significant state information
1214	   (beyond that of the signaling and perhaps the SVC sequence parameter
1215	   sets) need to be kept.

1217	13.5.    SSRC Multiplexing in case of using SRTP

1219	   When SRTP is in use, it is not possible to take advantage of the in-
1220	   band information (SEI messages, NAL unit headers, PACSI NAL units)
1221	   when processing layered streams.  Therefore, a MANE outside the
1222	   security context cannot make informed decisions when aggregating
1223	   information.  Some relevant information must be available in the RTP
1224	   header to make meaningful decisions.

1226	   The first, and most obvious, choice is to map SSRC values directly
1227	   to certain layers by the means of signaling.  As MANEs need to be in
1228	   the signaling context, this appears to be sensible.  However, it
1229	   requires a per-SSRC signaling mechanism -- a demultiplexing point
1230	   that is currently not envisioned in SDP.

1232	   A second design choice is to somehow make available the information
1233	   about the properties of a specific layer -- to the extent a MANE can
1234	   make a meaningful decision -- in the SSRC value.  In other words,
1235	   SSRC is no more fully randomly chosen, but selected based on context.
1236	   This is possible only when limiting the scope to a single sender to
1237	   a multicast group, because the various senders have no means to
1238	   coordinate their choice of SSRC values.  In practice, that's not a
1239	   major limitation.

1241	   Any form of such a selection of SSRC values has two major drawbacks:
1242	   First, without a sufficiently large random component the probability
1243	   for SSRC collisions increases to a point that becomes unacceptable.
1244	   We address this point by discouraging the use of multi-sender
1245	   multicast.  When only a single sender emits packets in a given RTP
1246	   session, it can be expected that this sender is able to avoid SSRC
1247	   collisions.  In addition, we require a sufficiently large random
1248	   component in the SSRC generation, which is constant for each layer
1249	   stemming from the same sender.  While the probability for SSRC
1250	   collisions is still lowered, the random component can be kept as
1251	   large as 26 bits assumes that the SVC bitstream in question contains
1252	   64 layers.

1254	   Second, and more critical, a straightforward copy of values known to
1255	   be present at fixed locations in the RTP payload would make it easy
1256	   for codebreakers to attack an SRTP encrypted stream, because an
1257	   unencrypted representation of a encrypted known value would both be
1258	   present in the same packet.  This is outright unacceptable from a
1259	   security viewpoint.

1261	   Therefore, we do not allow to simply copy information from the
1262	   bitstream into the SSRC field.  Instead, we rely on a non-reversible
1263	   function, that also necessarily contains the aforementioned random
1264	   component, that, when executed, indicates the relative priority
1265	   difference between two layers (signaled by two SSRC values).
1266	   The SSRC value space is evenly allocated to a number of sub value
1267	   spaces, with the number of sub value spaces being equal to the
1268	   number of RTP sessions for which SSRC multiplexing is used.  Then
1269	   the first RTP session conveying the lowest layers is mapped to the
1270	   first sub SSRC value space with the lowest SSRC values, and the
1271	   second RTP session conveying the second lowest layers is mapped to
1272	   the second sub SSRC value space with the second lowest SSRC values,
1273	   and so on.  For the RTP packets of a certain RTP session, the SSRC
1274	   value is randomly selected from the corresponding sub SSRC value
1275	   space. This way, a packet with a higher SSRC value contains data
1276	   belonging to higher layers or layers of lower transport priority.

1278	   A translator-based MANE can make use of the aforementioned SSRC
1279	   values as follows.  Suppose that the MANE has identified, through
1280	   sensed congestion or other unspecified means, that it needs to
1281	   discard packets belonging to higher layers, say K of the N buffered
1282	   packets, to maintain a packet sending rate, it identifies the K
1283	   packets with the highest SSRC values, and discards them.

1285	13.6.    Scenarios currently not considered for complexity reasons

1287	   -- vacat --

1289	13.7.    Scenarios currently not considered for being unaligned with
1290	          IP philosophy

1292	   Remarks have been made that the current draft does not take into
1293	   consideration at least one application scenario which some JVT folks
1294	   consider important.  In particular, their idea is to make the RTP
1295	   payload format (or the media stream itself) self-contained enough
1296	   that a stateless, non signaling aware device can ''thin'' an RTP
1297	   session to meet the bandwidth demands of the endpoint.  They call
1298	   this device a ''Router'' or ''Gateway'', and sometimes a MANE.
1299	   Obviously, it's not a Router or Gateway in the IETF sense.  To
1300	   distinguish it from a MANE as defined in RFC 3984 and in this
1301	   specification, let's call it a MDfH (Magic Device from Heaven).

1303	   To simplify discussions, let's assume point-to-point traffic only.
1304	   The endpoint has a signaling relationship with the streaming server,
1305	   but it is known that the MDfH is somewhere in the media path (e.g.
1306	   because the physical network topology ensures this).  It has been
1307	   requested, at least implicitly through MPEG's and JVT's requirements
1308	   document, that the MDfH should be capable to intercept the SVC
1309	   scalable bit stream, modify it by dropping packets or parts thereof,
1310	   and forwarding the resulting packet stream to the receiving
1311	   endpoint.  It has been requested that this payload specification
1312	   contains protocol elements facilitating such an operation, and the
1313	   argument has been made that the NRI field of RFC 3984 serves exactly
1314	   the same purpose.

1316	   The authors of this I-D do not consider the scenario above to be
1317	   aligned with the most basic design philosophies the IETF follows,
1318	   and therefore have not addressed the comments made (except through
1319	   this section).  In particular, we see the following problems with
1320	   the MDfH approach):

1322	   - As the very minimum, the MDfH would need to know which RTP streams
1323	     are carrying SVC.  We don't see how this could be accomplished but
1324	     by using a static payload type.  None of the IETF defined RTP
1325	     profiles envision static payload types for SVC, and even the de-
1326	     facto profiles developed by some application standard
1327	     organizations (3GPP for example) do not use this outdated concept.
1328	     Therefore, the MDfH necessarily needs to be at least ''listening''
1329	     to the signaling.
1330	   - If the RTP packet payload were encrypted, it would be impossible
1331	     to interpret the payload header and/or the first bytes of the
1332	     media stream.  We understand that there are crypto schemes under
1333	     discussion that encrypt only the last n bytes of an RTP payload,
1334	     but we are more than unsure that this is fully in line with the
1335	     IETF's security vision.

1337	   Even if the above two problems would have been overcome through
1338	   standardization outside of the IETF, we still foresee serious design
1339	   flaws:

1341	   - An MDfH can't simply dump RTP packets it doesn't want to forward.
1342	     It either needs to act as a full RTP Translator (implying that it
1343	     patches RTCP RRs and such), or it needs to patch the RTP sequence
1344	     numbers to fulfill the RTP specification.  Not doing either would,
1345	     for the receiver, look like the gaps in the sequence numbers
1346	     occurred due to unintentional erasures, which has interesting
1347	     effects on congestion control (if implemented), will break pretty
1348	     much every meta-payload ever developed, and so on.  (Many more
1349	     points could be made here).
1350	   - An MDfH also can't ''prune'' FGS packets.  Again, doing so would
1351	     not be compatible with meta payloads, and would mess up RTCP RRs
1352	     and congestion control (if the congestion control is based on
1353	     octet count and not on packet count; there are discussions related
1354	     to the former at least in the context of TFRC).

1356	   In summary, based on our current knowledge we are not willing to
1357	   specify protocol mechanisms that support an operation point that has
1358	   so little in common with classic RTP use.

1360	14. Acknowledgements

1362	   Funding for the RFC Editor function is currently provided by the
1363	   Internet Society.  Further, the author Thomas Schierl of Fraunhofer
1364	   HHI is sponsored by the European Commission under the contract
1365	   number FP6-IST-0028097, project ASTRALS.

1367	15. References

1369	15.1.    Normative References

1371	[RFC3550]   Schulzrinne, H., Casner, S., Frederick, R., and V.
1372	            Jacobson, "RTP: A Transport Protocol for Real-Time
1373	            Applications", STD 64, RFC 3550, July 2003.
1374	[MPEG4-10]  ISO/IEC International Standard 14496-10:2003.
1375	[H.264]     ITU-T Recommendation H.264, "Advanced video coding for
1376	            generic audiovisual services", May 2003.
1377	[SDPsiglay] Schierl, T., ''Signaling media decoding dependency in
1378	Session
1379	            Description Protocol (SDP)'', IETF internet draft
1380	            draft-schierl-mmusic-layered-codec-01, October 2006.
1381	[SVC]       Joint Video Team, ''Annex G of Joint Draft 7 of SVC
1382	Amendment
1383	            (with proposed changes)'', available from
1384	http://ftp3.itu.ch
1385	            /av-arch/jvt-site/2006_07_Klagenfurt/JVT-T202.zip ,
1386	            July 2006
1387	[RFC3984]   Wenger, S., Hannuksela, M, Stockhammer, T, Westerlund, M,
1388	            Singer, D, ''RTP Payload Format for H.264 Video'', RFC 3984,
1389	            February 2005

1391	[RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
1392	            Requirement Levels", BCP 14, RFC 2119, March 1997.

1394	15.2.    Informative References

1396	[DVB-H]     DVB - Digital Video Broadcasting (DVB); DVB-H
1397	            Implementation Guidelines, ETSI TR 102 377, 2005
1398	[IGMP]      Cain, B., Deering S., Kovenlas, I., Fenner, B. and
1399	            Thyagarajan, A., ''Internet Group Management Protocol,
1400	            Version 3'', RFC 3376, October 2002.
1401	[McCanne/Vetterli]
1402	            V. Jacobson, S. McCanne and M. Vetterli. Receiver-
1403	            driven layered multicast. In Proc. of ACM SIGCOMM'96, pages
1404	            117--130, Stanford, CA, August 1996.
1405	[MBMS]      3GPP - Technical Specification Group Services and System
1406	            Aspects; Multimedia Broadcast/Multicast Service (MBMS);
1407	            Protocols and codecs (Release 6), December 2005.
1408	[MPEG2]     ISO/IEC International Standard 13818-2:1993.
1409	[SRTP]      Baugher, M., McGrew, D, Naslund, M, Carrara, E,
1410	            Norrman, K, ''The secure real-time transport protocol
1411	            (SRTP)'', RFC 3711, March 2004.

1413	16. Author's Addresses

1415	   Stephan Wenger                 Phone: +358-50-486-0637
1416	   Nokia Research Center          Email: stewe@stewe.org
1417	   P.O. Box 100
1418	   FIN-33721 Tampere
1419	   Finland

1421	   Ye-Kui Wang                    Phone: +358-50-486-7004
1422	   Nokia Research Center          Email: ye-kui.wang@nokia.com
1423	   P.O. Box 100
1424	   FIN-33721 Tampere
1425	   Finland

1427	   Thomas Schierl                 Phone: +49-30-31002-227
1428	   Fraunhofer HHI                 Email: schierl@hhi.fhg.de
1429	   Einsteinufer 37
1430	   D-10587 Berlin
1431	   Germany

1433	17. Intellectual Property Statement

1435	   The IETF takes no position regarding the validity or scope of any
1436	   Intellectual Property Rights or other rights that might be claimed to
1437	   pertain to the implementation or use of the technology described in
1438	   this document or the extent to which any license under such rights
1439	   might or might not be available; nor does it represent that it has
1440	   made any independent effort to identify any such rights.  Information
1441	   on the procedures with respect to rights in RFC documents can be
1442	   found in BCP 78 and BCP 79.

1444	   Copies of IPR disclosures made to the IETF Secretariat and any
1445	   assurances of licenses to be made available, or the result of an
1446	   attempt made to obtain a general license or permission for the use of
1447	   such proprietary rights by implementers or users of this
1448	   specification can be obtained from the IETF on-line IPR repository at
1449	   http://www.ietf.org/ipr.

1451	   The IETF invites any interested party to bring to its attention any
1452	   copyrights, patents or patent applications, or other proprietary
1453	   rights that may cover technology that may be required to implement
1454	   this standard.  Please address the information to the IETF at
1455	   ietf-ipr@ietf.org.

1457	18. Disclaimer of Validity

1459	   This document and the information contained herein are provided on an
1460	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1461	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1462	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1463	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1464	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1465	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1467	19. Copyright Statement
1468	   Copyright (C) The Internet Society (2006).  This document is subject
1469	   to the rights, licenses and restrictions contained in BCP 78, and
1470	   except as set forth therein, the authors retain all their rights.

1472	20. RFC Editor Considerations

1474	   none

1476	21. Open Issues

1478	   1. Need to double check MANE, Mixers, and Translators throughout the
1479	   document (consistently with RFC 3550).
1480	   2. Packetization rules need work.
1481	   3. Alignment with the SVC  specification (ongoing)
1482	   4. In context of SSRC multiplexing: make consistent higher/lower
1483	   layers vs. RTP packet streams of higher/lower importance.

1485	22. Changes Log

1487	From -00 to -01

1489	- 04.02.2006, StW: Added details to scope
1490	- 04.02.2006, StW: Added short subsection 6.1 ''Design Principles''
1491	- 04.02.2006, StW: Added section 15, ''Application Examples''
1492	- 06.02 - 03.03.2006, YkW: Various modifications throughout the
1493	document
1494	- 13.02.2006 - 03.03.2006 , ThS: Added definitions and additional
1495	information to section 3.3, 5.1, 7 and 8, parameters in section 9.1 and
1496	added section 14 for NAL unit re-ordering for layered multicast.
1497	Further modifications throughout the document

1499	From -01 to -02

1501	- 06.03.2006, StW: Editorial improvements
1502	- 26.05.2006, YkW: Updated NAL unit header syntax and semantics
1503	according to the latest draft SVC spec
1504	- 20.06.2006, Miska/YkW: Added section 6.10 ''Payload Content
1505	Scalability Information (PACSI) NAL Unit''
1506	- 20.06.2006, YkW: Updated the NAL unit reordering process for layered
1507	multicast (removed the old section 14 ''Informative Appendix: NAL Unit
1508	Re-ordering for Layered Multicast'' and added the new section 13 ''NAL
1509	Unit Reordering for Layered Multicast'')

1511	From -02 to -03
1512	- 05.09.2006, YkW: Updated the NAL unit header syntax, definitions,
1513	etc., according to the foreseen July JVT output.  Updated possible MANE
1514	adaptation operations according to SPID, TL, DID and QL.  Clarified the
1515	removal of single NAL unit packetiztaion mode.  Added the support of
1516	SSRC multiplexing in layered multicast.
1517	- 08.09.2006, StW: Editorial changes throughout the document
1518	- 08.09.2006, YkW: Added the packetization rule for suffix NAL unit.
1519	- 19.09.2006, YkW: Moved/updated SSRC multiplexing support to section
1520	6.2 ''RTP header usage''. Moved/updated the cross layer DON constraint
1521	to Section 6.6 ''Decoding order number''. Moved/updated the
1522	packetization rule when a SVC bistream is transported over more than
1523	one RTP session to Section 7 ''Packetization rules''. Removed Section 13
1524	''Support of layered multicast''.
1525	- 16.10, TS: Added detailed four-byte NAL unit header description.
1526	Change ''AVC'' to ''H.264'' conforming to 3984. Modifications throughout
1527	the document. Extended description of 3rd byte of PACSI NAL unit.
1528	Corrected terms RTP session and RTP packet stream in case of SSRC
1529	multiplexing. Added terms in definition section on RTP multiplexing.
1530	Constraints on optional MIME parameters of 3984 for cross-layer DON
1531	(DON section and MIME parameters). Copied parts of SI paper regarding
1532	mixer, translator and SSRC mux with SRTP to section application
1533	examples. Added section on SDP usage with Session and SSRC
1534	multiplexing. Added points in Design principles on translator/mixer and
1535	RTP multiplexing. Added additional founding information in Ack-
1536	section. Corrected reference for SVC and added reference for generic
1537	signaling.
1538	17.10, StW: Fixed many editorials, clarified MANE, mixer, translator
1539	and RTP packet stream throughout doc (hopefully consistently)
1540	18.10., removed comments, clarified B-Bit, changed definition of base-
1541	layer (do not need to be of the lowest temporal resolution),