Network Working Group                                      M. Westerlund
Internet-Draft                                                 B. Burman
Intended status: Standards Track                            M. Lindqvist
Expires: August 29, 2013                                      F. Jansson
                                                                Ericsson
                                                       February 25, 2013


                    Using Simulcast in RTP Sessions
               draft-westerlund-avtcore-rtp-simulcast-02

Abstract

   In some applications it may be necessary to send multiple media
   encodings derived from the same media source in independent RTP media
   streams.  This is called Simulcast.  This document discusses the best
   way of accomplishing this in RTP and how to signal it in SDP.  It is
   concluded that a solution where the different simulcast versions are
   based on separate SDP media descriptions provides best support for
   simulcast.  A solution is defined by making two extensions to SDP.
   The first extension consists of two new attributes in SDP that
   express capability to send or receive simulcast streams,
   respectively.  The second extension describes how to group media
   descriptions belonging to the same simulcast source by using the
   grouping framework.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on August 29, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.




Westerlund, et al.       Expires August 29, 2013                [Page 1]

Internet-Draft                RTP Simulcast                February 2013


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  5
     2.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
     2.2.  Requirements Language  . . . . . . . . . . . . . . . . . .  5
   3.  Simulcast Scenarios  . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Simulcasting to RTP Mixer  . . . . . . . . . . . . . . . .  5
       3.1.1.  Simulcast Combined with Scalable Encoding  . . . . . .  7
     3.2.  Multicast Transported Simulcasted Media  . . . . . . . . .  7
       3.2.1.  Diversity in Receiver Population . . . . . . . . . . .  7
       3.2.2.  Bit-rate Adaptation  . . . . . . . . . . . . . . . . .  8
     3.3.  Same Encoding to Multiple Destinations . . . . . . . . . .  9
     3.4.  Different Encoding to Independent Destinations . . . . . .  9
   4.  Network Aspects  . . . . . . . . . . . . . . . . . . . . . . . 10
   5.  Simulcast Alternatives . . . . . . . . . . . . . . . . . . . . 10
     5.1.  Using the Payload Type . . . . . . . . . . . . . . . . . . 11
     5.2.  Using Single RTP session . . . . . . . . . . . . . . . . . 11
     5.3.  Using Multiple RTP sessions  . . . . . . . . . . . . . . . 12
     5.4.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . 12
   6.  Simulcast Signaling Proposal . . . . . . . . . . . . . . . . . 13
     6.1.  Simulcast Capability . . . . . . . . . . . . . . . . . . . 14
     6.2.  Grouping Simulcast Media Descriptions  . . . . . . . . . . 16
       6.2.1.  Declarative Use  . . . . . . . . . . . . . . . . . . . 16
       6.2.2.  Offer/Answer Use . . . . . . . . . . . . . . . . . . . 16
     6.3.  Two-Phase Negotiation  . . . . . . . . . . . . . . . . . . 17
     6.4.  Media Stream Requirements  . . . . . . . . . . . . . . . . 17
     6.5.  Relating Alternative Encodings . . . . . . . . . . . . . . 18
     6.6.  Multiple Stream handling . . . . . . . . . . . . . . . . . 18
   7.  Simulcast Signaling Examples . . . . . . . . . . . . . . . . . 18
     7.1.  Alice: Desktop Client  . . . . . . . . . . . . . . . . . . 19
   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 22
   10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
     11.1. Normative References . . . . . . . . . . . . . . . . . . . 23
     11.2. Informative References . . . . . . . . . . . . . . . . . . 24



Westerlund, et al.       Expires August 29, 2013                [Page 2]

Internet-Draft                RTP Simulcast                February 2013


   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25


















































Westerlund, et al.       Expires August 29, 2013                [Page 3]

Internet-Draft                RTP Simulcast                February 2013


1.  Introduction

   Simulcast is the act of simultaneously sending multiple different
   versions of the same media content, e.g. the same video source
   encoded with different video encoders or target resolutions.  This
   can be done in several ways and for different purposes.  This
   document focuses on the case where one wants to provide multiple
   streams with different encodings over RTP [RFC3550] towards an
   intermediary so that the intermediary can select which encoding to
   forward to other participants in the session, and more specifically
   how the grouping of the streams is defined.  From an RTP perspective,
   simulcast is a specific application of the aspects discussed in RTP
   Multiplexing Architecture
   [I-D.westerlund-avtcore-multiplex-architecture].

   The different encodings of a media content that are considered in
   this document can differ in:

   Bit-rate:  The difference is the amount of bits spent to encode the
      media thus giving different quality.

   Codec:  Different media codecs are used to ensure that different
      receivers that do not have a common set of decoders can decode at
      least one of the versions.  This can include codec configuration
      options that are not compatible, like video encoder profiles, or
      the capability of receiving the transport packetization.

   Sampling:  Different sampling of media, in spatial as well as in
      temporal domain, may be used to suit different rendering
      capabilities or needs at the receiving endpoints, as well as a
      method to achieve different bit-rates.  For video streams, spatial
      sampling affects image resolution and temporal sampling affects
      video frame rate.  For audio, spatial sampling relates to the
      number of audio channels and temporal sampling affects audio
      bandwidth.  Obviously, a difference in sampling may result in
      difference in bit-rate.

   There are different reasons for an application to provide multiple
   different encodings of a single media source.  As soon as an
   application has the need to send multiple encodings, there is a
   potential need for simulcast.  This need can arise even when using
   media codecs that have scalability features built in.  The purpose of
   this document is to describe a few scenarios where it is motivated to
   use simulcast, elaborate on possible alternatives and available
   mechanisms, and find a suitable solution for signaling and performing
   RTP simulcast.  The discussion results in a signaling proposal to
   support simulcast.




Westerlund, et al.       Expires August 29, 2013                [Page 4]

Internet-Draft                RTP Simulcast                February 2013


2.  Definitions

2.1.  Terminology

   The following terms and abbreviations are used in this document:

   Encoding:  A particular encoding is the choice of the media encoder
      (codec) that has been used to compress the media and the fidelity
      of that encoding through the choice of sampling, bit-rate and
      other codec configuration parameters.

   Different encodings:  An encoding is different when some parameter
      that characterize the encoding of a particular media source is
      changed.  Such changes can be one or more of the following
      parameters; codec, codec configuration, bit-rate, sampling.

   Simulcast versions:  Media streams used for simulcast that use
      different encodings and thus constitute different versions of the
      same media source.

2.2.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].


3.  Simulcast Scenarios

   This section discusses different usage scenarios for the term
   simulcast and clarifies which of those this document focuses on.  It
   also reviews why simulcast and scalable codecs can be a useful
   combination.

3.1.  Simulcasting to RTP Mixer

   This scenario relates to a multi-party session where one or more
   central nodes are used to facilitate the media transport between the
   session participants.  Thus, this targets the RTP Mixer Topology
   defined in [RFC5117] (Section 3.4: Topo-Mixer).  This scenario is
   targeted for further discussion in this document.

   Simulcasting different media encodings of video that differ both in
   resolution and in bit-rate is highly applicable to video conferencing
   scenarios.  For example, an RTP mixer selects the video of the most
   active speaker and sends that participant's video stream as a high
   resolution stream to the other participants, and in addition also
   sends a number of low resolution video streams of the other



Westerlund, et al.       Expires August 29, 2013                [Page 5]

Internet-Draft                RTP Simulcast                February 2013


   participants, enabling the receiving user to both display the current
   speaker in high quality and monitor the other participants in lower
   quality/resolution/size.  As the participants should not receive the
   stream showing themselves, the set of streams will be unique to all
   participants.

   A number of alternatives exist to provide both high and low
   resolutions from an RTP Mixer:

   Simulcast:  The clients send one stream for the low resolution and
      another for the high resolution to the RTP Mixer.

   Scalable Video Coding:  The clients send one stream to the RTP Mixer,
      using a video encoder that in this stream can provide both the
      high resolution and also enables the mixer to extract a low
      resolution representation from that single stream.

   Transcoding in the Mixer:  The clients send a high resolution stream
      to the RTP Mixer which performs a transcoding to a lower
      resolution stream.

   The Transcoding alternative requires that the RTP mixer has
   sufficient amount of transcoding resources to produce the number of
   low resolution streams required.  In worst case, all participants'
   streams may need to be transcoded.  If the resources are not
   available, a different solution is needed.  There will also normally
   be a quality loss and an increase in latency associated with the
   transcoding operation.

   Scalable video encoding requires a more complex encoder compared to
   non-scalable encoding.  Also, if the resolution difference between
   the streams is large, a scalable codec may in fact be only marginally
   more bandwidth efficient than the simulcast case where the different
   resolutions are sent as separate streams from the clients to the
   mixer.  At the same time, with scalable video encoding using the
   currently available scalable video codecs, the transmission of all
   but the lowest resolution will consume more bandwidth from the mixer
   to the other participants compared to a non-scalable encoding.

   Simulcasting has the benefit that it is conceptually simple.  It
   enables the use of any media codec that the participants agree on,
   allowing the RTP mixer to be codec-agnostic.









Westerlund, et al.       Expires August 29, 2013                [Page 6]

Internet-Draft                RTP Simulcast                February 2013


                               +------------+      +---+
                    +---+      |            |----->| B |
                    |   |=====>|            |      +---+
                    | A |      |   Mixer    |
                    |   |----->|            |      +---+
                    +---+      |            |=====>| C |
                               +------------+      +---+

           Figure 1: RTP Mixer selecting from simulcast versions

   The sender A provides the mixer with both a high resolution version
   "===>" and a low resolution version "--->".  The mixer selects who in
   it's receiver population should get a particular version.

3.1.1.  Simulcast Combined with Scalable Encoding

   As explained in the previous section, a scalable codec is not always
   more bandwidth efficient than simulcast, especially in the path from
   the mixer to the receiver.

   There are however cases where a combination of simulcast and scalable
   encoding can be beneficial.  By using simulcast in cases where the
   scalable codec is less efficient, it is possible to optimize the
   efficiency of the complete system.  A good example of this usage
   would be where the video is encoded using SVC transported in RTP
   [RFC6190], where each simulcast stream has a different resolution,
   and each SVC media stream uses temporal scalability and signal to
   noise ratio (SNR) scalability within that single media stream.  If
   only resolution and temporal variations are needed, this can be
   implemented using the non-scalable part of H.264, as each simulcast
   version provides the different resolution, and each media stream
   within a simulcast encoding has temporal scalability through the use
   of non-reference frames.

3.2.  Multicast Transported Simulcasted Media

   When using multicast, particularly Source-Specific Multicast (SSM)
   [RFC3569] to distribute RTP/RTCP packets to a large receiver
   population one faces some issues.  There are at least two different
   issues where simulcast can potentially be useful.

3.2.1.  Diversity in Receiver Population

   If there is any diversity in the receivers regarding e.g. capability,
   codec support or code base, there are potentially restrictions in
   what streams can be delivered to the receivers.  If using the lowest
   common denominator over a diverse receiver population isn't
   acceptable, simulcast can be one possible solution.  By offering



Westerlund, et al.       Expires August 29, 2013                [Page 7]

Internet-Draft                RTP Simulcast                February 2013


   different stream alternatives, it is possible to let the receivers
   choose the simulcast version that matches their capabilities.  By
   using explicit signalling for simulcast, it is not necessary for the
   stream distributor to handle multiple receiver configurations
   individually for a multi-media session, nor to ensure that each
   receiver gets an encoding that matches their capabilities.

   The simulcast version granularity the receivers can select will be on
   multicast group level.  Thus, this use case puts a strict requirement
   on supporting separation through differnt RTP sessions.  The reason
   being that having a single RTP session straddle several multicast
   groups makes any reporting on the received sources very difficult to
   interpret.  Using one RTP session per simulcast version instead
   provides consistency.

3.2.2.  Bit-rate Adaptation

   If the network paths from the media sender to the receivers can
   support different bit-rates, there is a need to support media streams
   encoded to different bit-rates.  If these path differences are of a
   more static nature, for example depending primarily on the underlying
   link layers, using simulcast has an advantage over scalable encoding.
   The reason is that the efficiency of scalable coding will never be
   better than encoding to a single target rate.  When the receiver can
   determine current network interface connectivity, it can choose
   simulcast version with certainty.  That choice will also be correct
   until the event of another network interface becoming the active one.
   This assumes that the multicast transmission uses dedicated resources
   and will thus not be congested due to other network traffic.  To
   support this behavior, the signalling must support indication of
   which media streams that are alternatives to each other, and it is
   also necessary to be able to determine aggregate bit-rate for the
   selected multicast group(s) compared to available network properties.

   Simulcast is possible to use also in more dynamic situations where
   each receiver continuously gathers reception statistics to detect
   path congestion and based on that may change which version to
   receive.  The main issue with such usage is how to achieve a switch
   from one version to another with minimal playback interruption and
   also avoiding to put extra load on the network during the actual
   switch.  Here, scalable encoding in general have better
   characteristics since scalability layers are typically synchronized.

   When comparing simulcast and scalable encoding, the trade-offs are
   different and the down-sides occur at different places.  Simulcast
   will have a higher bit-rate load at a media sender and that will also
   be the case for any network path shared between receivers of multiple
   simulcast versions.  However, for parts of the network path where



Westerlund, et al.       Expires August 29, 2013                [Page 8]

Internet-Draft                RTP Simulcast                February 2013


   there is only a single simulcast version, the achievable quality at a
   given bit-rate will be slightly higher for simulcast.  It will also
   be more difficult to seamlessly switch between simulcast versions
   than between different scalable encodings, as simulcast actually
   switches from one media stream version to another instead of adding
   or removing some enhancement layers.

3.3.  Same Encoding to Multiple Destinations

   One interpretation of simulcast is when one encoding is sent to
   multiple receivers.  This is well supported in RTP by simply copying
   all outgoing RTP and RTCP traffic to several transport destinations,
   if the intention is to create a common RTP session.  As long as all
   participants do the same, a full mesh is constructed and everyone in
   the multi party session have a similar view of the joint RTP session.
   This is analog to an Any Source Multicast (ASM) session but without
   the traffic optimization as multiple copies of the same content is
   likely to have to pass over the same link.

                              +---+      +---+
                              | A |<---->| B |
                              +---+      +---+
                                ^         ^
                                 \       /
                                  \     /
                                   v   v
                                   +---+
                                   | C |
                                   +---+

                    Figure 2: Full Mesh / Multi-unicast

   As this type of simulcast is analog to ASM usage and RTP has good
   support for ASM sessions, no further consideration is made in this
   document for this scenario.

3.4.  Different Encoding to Independent Destinations

   Another alternative interpretation of simulcast includes multiple
   destinations, where each destination gets a specifically tailored
   version, but where the destinations are independent.  A typical
   example for this would be a streaming server distributing the same
   live session to a number of receivers, adapting the quality and
   resolution of the multi-media session to each receiver's capability
   and available bit-rate.  This case can be solved in RTP by having
   independent RTP sessions between the sender and the receivers.  Thus
   this case is not considered further.




Westerlund, et al.       Expires August 29, 2013                [Page 9]

Internet-Draft                RTP Simulcast                February 2013


4.  Network Aspects

   The network aspects that are relevant for simulcast are:

   Quality of Service:  When using simulcast it might be of interest to
      prioritize a particular simulcast version, rather than applying
      equal treatment to all versions.  For example, lower bit-rate
      versions may be prioritized over higher bit-rate versions to
      minimize congestion or packet losses in the low bit-rate versions.
      Thus, there is a benefit to use a simulcast solution that supports
      QoS as good as possible.  By separating simulcast versions into
      different RTP sessions and send those RTP sessions over different
      transport flows, a simulcast version can be prioritized by
      existing flow based QoS mechanisms.  When using unicast, QoS
      mechanisms based on individual packet marking are also feasible,
      which do not require separation of simulcast versions into
      different RTP sessions to apply different QoS.

   NAT/FW Traversal:  Using multiple RTP sessions will incur more cost
      for NAT/FW traversal unless they can re-use the same transport
      flow, which can be achieved by either one of multiplexing multiple
      RTP sessions on a single lower layer transport
      [I-D.westerlund-avtcore-transport-multiplexing] or Multiplexing
      Negotiation Using SDP Port Numbers
      [I-D.ietf-mmusic-sdp-bundle-negotiation].  If flow based QoS with
      any differentiation is desirable, the cost for additional
      transport flows is likely necessary.

   Multicast:  Multiple RTP sessions will be required to enable
      combining simulcast with multicast.  Different simulcast versions
      have to be separated to different multicast groups to allow a
      multicast receiver to pick the version it wants, rather than
      receive all of them.  In this case, the only reasonable
      implementation is to use different RTP sessions for each multicast
      group so that reporting and other RTCP functions operate as
      intended.


5.  Simulcast Alternatives

   Simulcast is in this document defined as the act of sending multiple
   alternative encodings of the same underlying media source.  When
   transmitting multiple independent streams that originate from the
   same source, it could potentially be done in several different ways
   using RTP.  A general discussion on how considerations for use of the
   different RTP multiplexing alternatives can be found in Guidelines
   for using the Multiplexing Features of RTP
   [I-D.westerlund-avtcore-multiplex-architecture].  Discussion and



Westerlund, et al.       Expires August 29, 2013               [Page 10]

Internet-Draft                RTP Simulcast                February 2013


   clarification on how to handle multiple streams in an RTP session can
   be found in [I-D.lennox-avtcore-rtp-multi-stream].

   The below sub-sections briefly describe potential ways of achieving
   RTP media stream multiplexing and identification of which streams are
   alternative simulcast encodings of the same source.  In the following
   descriptions it is also included how this interacts with multiple
   sources (SSRCs) in the same RTP session for other reasons than
   simulcast.  Multiple SSRCs may occur for various reasons such as
   multiple participants in multipoint topologies like multicast,
   transport relays or full mesh transport simulcasting, multiple source
   devices such as multiple cameras or microphones at one end-point, or
   other RTP mechanisms such as RTP Retransmission [RFC4588].

5.1.  Using the Payload Type

   An alternative could be to use only the RTP payload type to identify
   the different simulcast streams.  This could be tempting, since
   simulcast streams may differ in codec, codec configuration, or
   sampling, all of which are typically specified in SDP by a format
   number on the media line that is in turn connected to an RTP Payload
   Type.  Thus all simulcast streams would be sent in the same RTP
   session using only a single SSRC per actual media source.  However,
   as discussed in Guidelines for using the Multiplexing Features of RTP
   [I-D.westerlund-avtcore-multiplex-architecture], using Payload Type
   Multiplexing does not generally work and is hereby dismissed as
   potential solution.

5.2.  Using Single RTP session

   This idea is based on using a unique SSRC for each alternative
   encoding of an actual media source within a single RTP session.  The
   identification of streams and how they are specified to be related
   alternatives needs an additional mechanism, for example using SSRC
   grouping [RFC5576], and potentially also a new SDES item such as
   SRCNAME proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname] with a
   semantics that indicate them as alternatives of a particular media
   source.  When there are multiple actual media sources in a session,
   each media source will have to use a number of SSRCs to represent the
   different simulcast alternatives it produces.  For example, assume
   the number of media sources is n and if they all produce the same
   number of simulcast versions, m, there will be n*m SSRCs in use in
   the RTP session.  Each SSRC can use any of the configured payload
   types for this RTP session.  All session level attributes and
   parameters that are not source specific will apply and must function
   with all the alternative encodings in use.

   In the currently used signaling system based on SDP [RFC4566] and



Westerlund, et al.       Expires August 29, 2013               [Page 11]

Internet-Draft                RTP Simulcast                February 2013


   Offer/Answer [RFC3264], the properties of media streams are typically
   negotiated on media block (m-line) level.  Sending simulcast
   alternatives as different SSRC belonging to the same media
   description is likely possible to achieve, but SSRC centric signaling
   providing the needed media stream properties is currently almost non-
   existent and it would require a considerable effort to make the
   necessary SDP extensions.

   A single RTP session can be described in SDP by more than a single
   m-line, like for BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], and
   it can re-use the same m-line grouping [RFC5888] as would be used for
   multiple RTP sessions (Section 5.3), but the RTP aspects described in
   this section will still apply.  This would enable the same signalling
   expressenes for multiple RTP sessions as for a single RTP sessions.

5.3.  Using Multiple RTP sessions

   Using multiple RTP sessions means that each different simulcast
   version of an actual media source is transmitted in a separate RTP
   session, using whatever session identifier to distinguish the
   different versions.  Since each RTP session is described by one or
   more SDP m-lines, this solution needs explicit m-line grouping
   [RFC5888] with a semantics that indicate them as simulcast
   alternatives.  It is also important to identify the SSRCs in the
   different sessions that are alternative encodings of the same media
   source, if there are more than a single media source in each RTP
   session.  This could be accomplished using the same SSRC across the
   sessions, but that is not robust against SSRC collisions and could
   potentially force cascading SSRC changes between sessions.  A better
   choice would be to use different SSRC, but relate streams through a
   new SDES item proposed in [I-D.westerlund-avtext-rtcp-sdes-srcname].
   Each RTP session will have its own set of configured RTP payload
   types available for use with any SSRC in that session.  In addition,
   all other attributes for sessions or sources can be used as normal to
   indicate the configuration of that particular alternative.

5.4.  Conclusions

   If it is at all desirable to support simulcast based on multicast,
   the solution must support using multiple RTP sessions.  The main
   reason is that receiver based selection of simulcast version must be
   possible, which is accomplished in multicast through receiver
   selection of which multicast group(s) it joins.  This also has the
   advantage of being able to use the existing SDP media description
   (m=) expressiveness to signal or negotiate simulcast versions.

   When using simulcast based on unicast, it is desirable to be able to
   use the same media description signalling expressiveness regardless



Westerlund, et al.       Expires August 29, 2013               [Page 12]

Internet-Draft                RTP Simulcast                February 2013


   if multiple RTP sessions are used or not.  Assuming that MMUSIC
   decides to enable single RTP media stream negotiation per SDP media
   description and combine that with BUNDLE to identify RTP sessions, it
   appears that using one or more RTP sessions for simulcast over
   unicast will be able to use the same signalling solution.  Thus the
   decision to use one or more RTP sessions can be taken based on other
   limitations, such as cost of NAT/FW traversal, need for flow-based
   QoS etc.

   A solution proposal for an SDP media description level signaling for
   Simulcast version parameters is outlined below.


6.  Simulcast Signaling Proposal

   Signaling simulcast is about negotiating between media sender and
   receiver what the different simulcast versions should be, how to
   identify them in terms of RTP streams, and how to relate those RTP
   streams.

   The proposed solution consists of:

   o  Signaling simulcast capability as SDP media level attributes in a
      first round of Offer/Answer

      *  Separate send and receive simulcast capabilities

      *  Media properties that are supported as base for different
         simulcast versions are listed as parameters

   o  Adding SDP media descriptions for the simulcast streams in a
      second round of Offer/Answer

      *  Grouping SDP media descriptions from the same media source,
         belonging to the same simulcast, using the SDP grouping
         framework [RFC5888]

      *  Separate send and receive simulcast groupings

      *  Negotiating parameters for simulcast version using regular,
         individual SDP media descriptions

      *  Identifying RTP media streams (SSRC) from same media source
         using new SDES Item SRCNAME
         [I-D.westerlund-avtext-rtcp-sdes-srcname]

   This is further outlined below.




Westerlund, et al.       Expires August 29, 2013               [Page 13]

Internet-Draft                RTP Simulcast                February 2013


6.1.  Simulcast Capability

   There are numerous media properties that can be varied to construct a
   set of simulcast versions.  A simulcast enabled endpoint could also
   support simulcast based on several of those properties.  As long as
   those properties are relatively independent and if each simulcast
   version need explicit definition (an m-line) in the SDP, this would
   lead to an exponential number of simulcast version candidates and a
   very long SDP that is likely also hard to interpret.  There is thus a
   need to limit the simulcast version candidates included in the SDP to
   cover as small set of properties as possible.

   If a legacy endpoint not supporting simulcast were to be presented
   with an SDP including media descriptions for a set of simulcast
   versions, it may not know how to correctly handle or interpret these
   "surplus" media descriptions.

   Based on the functionality that simulcast is intended to achieve, it
   should be clear that the reasons to send simulcast versions are not
   the same as to receive simulcast versions, seen from a single
   endpoint.

   For these reasons, it is proposed to define two new SDP media level
   attributes, "a=sim-send" and "a=sim-recv", which explicitly signal
   support for simulcast media transmission and simulcast media
   reception, respectively, for that media description. "a=sim-send" and
   "a=sim-recv" MAY be used independently and simulaneously.  These
   attributes are also proposed to have parameters indicating the media
   properties used to create the simulcast versions.  The meaning of the
   attributes on SDP session level is undefined and MUST NOT be used.
   simulcast   = "a="( "sim-send:" / "sim-recv:" ) prop-list
   prop-list   = prop-entry *(WSP prop-entry)
   prop-entry  = prop *("=" q-value)
   prop        = "rtpmap"
               / "fmtp"
               / "imageattr"
               / "ptime"
               / "crypto"
               / token ; for future extensions
   q-value     = ( "0" "." 1*2DIGIT )
               / ( "1" "." 1*2("0") )
               ; Values between 0.00 and 1.00
   ; WSP and DIGIT defined in [RFC5234]
   ; token defined in [RFC4566]


                       Figure 3: ABNF for Simulcast




Westerlund, et al.       Expires August 29, 2013               [Page 14]

Internet-Draft                RTP Simulcast                February 2013


   The media property values are taken from existing (and could likely
   be extended to cover future) SDP attributes that express media
   properties that can be varied to create different simulcast versions:

   rtpmap:  Differences in codec type, sampling rate (see Section 6.4),
      and number of channels

   fmtp:  Differences in codec-specific encoding parameters

   imageattr:  Differences in video resolution, aspect ratio, and
      framerate [RFC6236]

   ptime:  Differences in frame aggregation per packet

   crypto:  Differences in encryption [RFC4568]

   ...:

   The optional q-value expresses the relative preference to base a
   simulcast version on that media property, with 1.00 meaning maximum
   (100%) preference and 0.00 meaning no (0%) preference.  Several media
   properties can share the same q-value, in which case they are equally
   preferred.

   An offerer wanting to use simulcast SHALL include either one or both
   of those attributes, depending on in which direction(s) simulcast
   will be used.  An offerer that receives an answer without "a=sim-
   send" or "a=sim-recv" MUST NOT define or use any simulcast
   alternatives belonging to that media description and in that
   direction to the answerer.

   An answerer that does not understand the concept of simulcast will
   also not know those attributes and will remove them in the SDP
   answer, as defined in existing SDP Offer/Answer procedures.  An
   answerer that does understand the attributes and that wants to
   support simulcast in the indicated direction SHALL reverse
   directionality of the attribute, "sim-send" becomes "sim-recv" and
   vice versa, and include it in the answer.

   An offerer that intends to send simulcast alternatives and thus
   includes "a=sim-send", MUST also include at least one media property
   parameter that it intends to use to construct the simulcast
   alternatives, but it MAY include more media property parameters.
   Including multiple media property parameters in "a=sim-send" SHALL be
   interpreted as an offer to send simulcast versions covering all
   combinations thereof, but MAY be further restricted by other
   information in the SDP such as for example the number of simulcast-
   related media descriptions in the SDP or use of max-ssrc signaling



Westerlund, et al.       Expires August 29, 2013               [Page 15]

Internet-Draft                RTP Simulcast                February 2013


   [I-D.westerlund-mmusic-max-ssrc].

   An offerer that is capable of receiving simulcast alternatives and
   thus includes "a=sim-recv", MUST also include at least one media
   property parameter that it is willing to use as discriminator between
   received simulcast alternatives, but MAY include more media property
   parameters.  Including multiple media property parameters in "a=sim-
   recv" SHALL be interpreted as an offer to receive simulcast versions
   covering all combinations thereof, but MAY be further restricted by
   other information in the SDP such as for example the number of
   simulcast-related media descriptions in the SDP or use of max-ssrc
   signaling [I-D.westerlund-mmusic-max-ssrc].

   An answerer either lacks the capability or desire to use simulcast
   versions based on a certain media property parameter in a specific
   direction MUST remove such media property parameter from "a=sim-send"
   or "a=sim-recv".  The answerer MUST NOT add any media property
   parameters that were not included in the offer.

6.2.  Grouping Simulcast Media Descriptions

   To relate media descriptions holding simulcast versions, two new
   simulcast grouping semantics are defined, "SimulCast Receive" (SCR)
   and "SimulCast Send" (SCS).  There is a need to separate semantics
   for the intent to send simulcast streams from the semantics that
   describe capability to recognize and receive simulcast streams.  Both
   sematics act as an indicator that simulcast is desired and that the
   grouped media descriptions (m-lines) carries simulcast versions of
   media sources.  There may be multiple sets of media descriptions that
   carries simulcast versions.

6.2.1.  Declarative Use

   When used as a declarative media description, SCR indicates the
   configured end-point's required capability to recognize and receive a
   specified set of RTP streams as simulcast streams.  In the same
   fashion, SCS requests the end-point to send a specified set of RTP
   streams as simulcast streams.  SCR and SCS MAY be used independently
   and at the same time and they need not specify the same or even the
   same number of media descriptions in the group.

6.2.2.  Offer/Answer Use

   When used in an offer, SCS indicates the SDP providing agent's intent
   of sending simulcast and the particular set of media descriptions,
   and SCR indicates the agent's capability of receiving simulcast
   streams within the configured set of media descriptions.  SCS and SCR
   MAY be used independently and at the same time and they need not



Westerlund, et al.       Expires August 29, 2013               [Page 16]

Internet-Draft                RTP Simulcast                February 2013


   specify the same or even the same number of media descriptions in the
   group.  The answerer MUST change SCS to SCR and SCR to SCS in the
   answer, given that it has and wants to use the corresponding
   (reverse) capability.  An answerer not supporting the SCS or SCR
   direction, or not supporting SCS or SCR grouping semantics at all,
   will remove that grouping attribute altogether, according to the
   grouping framework [RFC5888].  However, this case should not occur or
   at least be very rare due to the proposed two-phase approach
   (Section 6.3).  An offerer that receives an answer indicating lack of
   simulcast support in one or both directions, where SCR and/or SCS
   grouping are removed, MUST NOT use simulcast in the non-supported
   direction(s).

6.3.  Two-Phase Negotiation

   These new "a=sim-send" and "a=sim-recv" attributes are proposed to be
   included in the SDP as a first phase in a two-phased approach, where
   the first phase involves a first SDP Offer/Answer procedure that only
   establishes simulcast capability at both the offerer and the
   answerer.  This has the additional advantage to avoid sending media
   descriptions related to simulcast to an endpoint that does not
   support simulcast.  It is also not likely that it incurs any
   significant extra signaling round-trips, given that many other recent
   SDP techniques also makes use of two Offer/Answer procedures, as long
   as this phased approach can be used in parallel with those.  Such
   other two-phase techniques include ICE [RFC5245] and BUNDLE
   [I-D.ietf-mmusic-sdp-bundle-negotiation].

   Thus, the first Offer/Answer SHOULD NOT include any simulcast-grouped
   media descriptions, which SHOULD then be added in a second Offer/
   Answer phase.  This second phase SHOULD be initiated by the simulcast
   receiver, meaning the endpoint that included "a=sim-recv" in the
   first phase SDP SHOULD be offerer in the second phase.  If both
   endpoints are simulcast receivers, it is not possible to define a
   preferred offerer in the second phase and either endpoint MAY then
   send the offer, using regular Offer/Answer rules to handle race
   conditions.

   The first phase of establishing capability is not possible to use
   with declarative SDP, in which case it SHALL be by-passed, using the
   second phase media description grouping directly.

6.4.  Media Stream Requirements

   When doing simulcast, the media streams that are alternatives need to
   meet certain constraints to ensure that switching between alternative
   streams are as issue-free as possible.  The following constraints are
   needed:



Westerlund, et al.       Expires August 29, 2013               [Page 17]

Internet-Draft                RTP Simulcast                February 2013


   Same Clock Base:  To enable correct alignment of media packets on the
      source time-line, all alternative streams (SSRCs) MUST use the
      same underlying clock to relate their RTP timestamp values with
      the network time protocol (NTP) formatted sender time in the RTCP
      Sender Reports.



6.5.  Relating Alternative Encodings

   To ensure that simulcast streams can be related correctly also on RTP
   level, the usage of SDES SRCNAME
   [I-D.westerlund-avtext-rtcp-sdes-srcname] to label and relate
   simulcast versions belonging to the same media source is RECOMMENDED.

6.6.  Multiple Stream handling

   When using multiple SSRC in a single media description, for example
   when using simulcast for multiple independent media sources, the
   grouping semantics SCR and SCS SHOULD be combined with the SDP
   attributes "a=max-send-ssrc" and "a=max-recv-ssrc"
   [I-D.westerlund-mmusic-max-ssrc] to indicate the number of
   simultaneous streams of each encoding that may be sent or that can be
   handled in the receive direction.


7.  Simulcast Signaling Examples

   For brevity and clarity, the SDP in all below examples does not
   contain signaling for multiple streams, such as the ones related to
   RTP level relations (Section 6.5) or multiple SSRC signaling
   (Section 6.6).

   This example is for a case of client to video conference service
   using a centralized media topology with an RTP mixer.  Alice and Bob
   calls into a conference server for a conference call with audio and
   video sent to the RTP mixer, these clients being capable to send a
   few video simulcast versions.  The conference server also dials out
   to Fred, which is a legacy client resulting in fallback behavior.
   When dialing out to Joe, more functionality is enabled as Joe is a
   client similar to Alice.










Westerlund, et al.       Expires August 29, 2013               [Page 18]

Internet-Draft                RTP Simulcast                February 2013


                    +---+      +-----------+      +---+
                    | A |<---->|           |<---->| B |
                    +---+      |           |      +---+
                               |   Mixer   |
                    +---+      |           |      +---+
                    | F |<---->|           |<---->| J |
                    +---+      +-----------+      +---+

                Figure 4: Four-party Mixer-based Conference

   Example of Media plane for RTP mixer based multi-party conference
   with 4 participants.

7.1.  Alice: Desktop Client

   Alice is calling in to the mixer with an audiovisual single stream
   desktop client, only adding capability to send video resolution
   [RFC6236] ("imageattr") and framerate based simulcast compared to a
   legacy client.  The first phase offer from Alice looks like

   v=0
   o=alice 2362969037 2362969040 IN IP4 192.0.2.156
   s=Simulcast enabled Desktop Client
   t=0 0
   c=IN IP4 192.0.2.156
   b=AS:665
   m=audio 49200 RTP/AVP 96 97 9 8
   b=AS:145
   a=rtpmap:96 G719/48000/2
   a=rtpmap:97 G719/48000
   a=rtpmap:9 G722/8000
   a=rtpmap:8 PCMA/8000
   m=video 49300 RTP/AVP 96
   b=AS:520
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c01e
   a=sim-send:imageattr=1.0 fmtp=0.8
   a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
   a=content:main


          Figure 5: Alice First Offer for a Simulcast Conference

   In this first phase, the only thing in the SDP that indicates
   simulcast capability is the line in the video media description
   containing the "sim-send" attribute.

   The answer from the server indicates both that it is simulcast



Westerlund, et al.       Expires August 29, 2013               [Page 19]

Internet-Draft                RTP Simulcast                February 2013


   capable and that it would only like to use video resolution
   ("imageattr") based simulcast only.  Should it not have been
   simulcast capable, the "a=sim-recv" line would not have been present
   and communication would have started with the media negotiated in the
   SDP.

   v=0
   o=server 823479283 1209384938 IN IP4 192.0.2.2
   s=Answer to simulcast enabled Desktop Client
   t=0 0
   c=IN IP4 192.0.2.43
   b=AS:665
   m=audio 49200 RTP/AVP 96
   b=AS:145
   a=rtpmap:96 G719/48000/2
   m=video 49300 RTP/AVP 96
   b=AS:520
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c01e
   a=sim-recv:imageattr
   a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
   a=content:main


         Figure 6: Server First Answer for a Simulcast Conference

   Since the server is the simulcast media receiver, it immediately
   initiates another Offer/Answer including the simulcast versions.  The
   server also keeps the "sim-recv" as explicit simulcast capability
   indication in this second Offer/Answer round.  Note that the "non-
   simulcast" media can be started already now, before the second phase
   Offer/Answer, with the only restriction that the simulcast
   functionality is not yet established.


















Westerlund, et al.       Expires August 29, 2013               [Page 20]

Internet-Draft                RTP Simulcast                February 2013


   v=0
   o=server 823479283 1209384938 IN IP4 192.0.2.2
   s=Server inviting simulcast enabled Desktop Client
   t=0 0
   c=IN IP4 192.0.2.43
   b=AS:825
   a=group:SCR 2 3
   m=audio 49200 RTP/AVP 96
   b=AS:145
   a=rtpmap:96 G719/48000/2
   a=mid:1
   m=video 49300 RTP/AVP 96
   b=AS:520
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c01e
   a=sim-recv:imageattr
   a=imageattr:* send [x=640,y=360] [x=320,y=180] recv [x=640,y=360]
   a=mid:2
   a=content:main
   m=video 49400 RTP/AVP 96
   b=AS:160
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c00d
   a=imageattr:96 recv [x=320,y=180]
   a=mid:3
   a=recvonly

         Figure 7: Server Second Offer for a Simulcast Conference

   The server has added one additional receive-only media description
   with the simulcast version based on difference only in imageattr.
   That the two media lines are considered to be simulcast versions is
   seen from the SCR grouping tag and the two media IDs (2 and 3).  The
   first video version with media ID 2 prefers 360p resolution (signaled
   via imageattr) and the second video version with media ID 3 prefers
   180p resolution.  The first video media line also acts as the single
   send video (making media line sendrecv), while the second video media
   line is only related to simulcast transmission and is thus offered
   recvonly.

   The fact that fmtp for this second video is also different should be
   seen as a secondary effect from the change of resolution and does not
   create any kind of conflict.  The capabilities of Alice's client is
   very well aligned with this and the SDP answer is straightforward.







Westerlund, et al.       Expires August 29, 2013               [Page 21]

Internet-Draft                RTP Simulcast                February 2013


   v=0
   o=alice 2362969037 2362969040 IN IP4 192.0.2.156
   s=Final answer from simulcast enabled Desktop Client
   t=0 0
   c=IN IP4 192.0.2.156
   b=AS:825
   a=group:SCS 2 3
   m=audio 49200 RTP/AVP 96
   b=AS:145
   a=rtpmap:96 G719/48000/2
   a=mid:1
   m=video 49300 RTP/AVP 96
   b=AS:520
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c01e
   a=sim-send:imageattr
   a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180]
   a=mid:2
   a=content:main
   m=video 49400 RTP/AVP 96
   b=AS:160
   a=rtpmap:96 H264/90000
   a=fmtp:96 profile-level-id=42c00d
   a=imageattr:96 send [x=320,y=180]
   a=mid:3
   a=sendonly

         Figure 8: Alice Second Answer for a Simulcast Conference


8.  IANA Considerations

   This document requests that two new attributes sim-send and sim-recv,
   with a new registry of defined parameters taken from existing SDP
   attributes, and two new SDP grouping semantics, SCS and SCR, are
   registered.

   Formal registrations to be written.


9.  Security Considerations

   The simulcast capability attributes and parameters are vulnerable to
   attacks in signaling.

   A false inclusion of simulcast attributes may result in generation of
   a second phase SDP that potentially contains a large number of non-
   supported media descriptions expressing simulcast alternatives.  A



Westerlund, et al.       Expires August 29, 2013               [Page 22]

Internet-Draft                RTP Simulcast                February 2013


   correct SDP implementation will however be able to reject any non-
   supported media descriptions and the effect from that should be
   limited.

   A hostile removal of the simulcast attributes will result in skipping
   any second phase Offer/Answer and that simulcast is not used.

   The simulcast grouping semantics are vulnerable to attacks in the
   signalling.

   A false grouping of non-simulcast streams as simulcast would risk
   that some streams are incorrectly ignored by receivers that know
   simulcast and that are not interested in the assumed simulcast
   streams.

   A hostile removal of simulcast grouping will prevent streams from
   being interpreted as simulcast, which obviously prevents use of the
   simulcast functionality.  It will also risk that intended simulcast
   streams are instead presented as separate, independent streams to a
   receiver.

   Neither of the above will likely have any major consequences and can
   be mitigated by signaling that is at least integrity and source
   authenticated to prevent an attacker to change it.


10.  Acknowledgements


11.  References

11.1.  Normative References

   [I-D.westerlund-avtext-rtcp-sdes-srcname]
              Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES
              Item SRCNAME to Label Individual Sources",
              draft-westerlund-avtext-rtcp-sdes-srcname-00 (work in
              progress), October 2011.

   [I-D.westerlund-mmusic-max-ssrc]
              Holmberg, C., Westerlund, M., Burman, B., and F. Jansson,
              "Multiple Synchronization Sources (SSRC) in SDP Media
              Descriptions", draft-westerlund-mmusic-max-ssrc-00 (work
              in progress), September 2012.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.




Westerlund, et al.       Expires August 29, 2013               [Page 23]

Internet-Draft                RTP Simulcast                February 2013


   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
              Jacobson, "RTP: A Transport Protocol for Real-Time
              Applications", STD 64, RFC 3550, July 2003.

   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
              Description Protocol", RFC 4566, July 2006.

   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
              Description Protocol (SDP) Security Descriptions for Media
              Streams", RFC 4568, July 2006.

   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
              Media Attributes in the Session Description Protocol
              (SDP)", RFC 5576, June 2009.

   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

   [RFC6236]  Johansson, I. and K. Jung, "Negotiation of Generic Image
              Attributes in the Session Description Protocol (SDP)",
              RFC 6236, May 2011.

11.2.  Informative References

   [I-D.ietf-mmusic-sdp-bundle-negotiation]
              Holmberg, C., Alvestrand, H., and C. Jennings,
              "Multiplexing Negotiation Using Session Description
              Protocol (SDP) Port Numbers",
              draft-ietf-mmusic-sdp-bundle-negotiation-03 (work in
              progress), February 2013.

   [I-D.lennox-avtcore-rtp-multi-stream]
              Lennox, J. and M. Westerlund, "Real-Time Transport
              Protocol (RTP) Considerations for Endpoints Sending
              Multiple Media Streams",
              draft-lennox-avtcore-rtp-multi-stream-01 (work in
              progress), October 2012.

   [I-D.westerlund-avtcore-multiplex-architecture]
              Westerlund, M., Burman, B., and C. Perkins, "RTP
              Multiplexing Architecture",
              draft-westerlund-avtcore-multiplex-architecture-00 (work
              in progress), October 2011.

   [I-D.westerlund-avtcore-transport-multiplexing]
              Westerlund, M., "Multiple RTP Session on a Single Lower-
              Layer Transport",
              draft-westerlund-avtcore-transport-multiplexing-00 (work



Westerlund, et al.       Expires August 29, 2013               [Page 24]

Internet-Draft                RTP Simulcast                February 2013


              in progress), October 2011.

   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
              with Session Description Protocol (SDP)", RFC 3264,
              June 2002.

   [RFC3569]  Bhattacharyya, S., "An Overview of Source-Specific
              Multicast (SSM)", RFC 3569, July 2003.

   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
              July 2006.

   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
              January 2008.

   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
              (ICE): A Protocol for Network Address Translator (NAT)
              Traversal for Offer/Answer Protocols", RFC 5245,
              April 2010.

   [RFC6190]  Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis,
              "RTP Payload Format for Scalable Video Coding", RFC 6190,
              May 2011.


Authors' Addresses

   Magnus Westerlund
   Ericsson
   Farogatan 6
   SE-164 80 Kista
   Sweden

   Phone: +46 10 714 82 87
   Email: magnus.westerlund@ericsson.com


   Bo Burman
   Ericsson
   Farogatan 6
   SE-164 80 Kista
   Sweden

   Phone: +46 10 714 13 11
   Email: bo.burman@ericsson.com





Westerlund, et al.       Expires August 29, 2013               [Page 25]

Internet-Draft                RTP Simulcast                February 2013


   Morgan Lindqvist
   Ericsson
   Farogatan 6
   Kista,   SE-164 80
   Sweden

   Phone: +46 10 719 00 00
   Fax:
   Email: morgan.lindqvist@ericsson.com
   URI:


   Fredrik Jansson
   Ericsson
   Farogatan 6
   Kista,   SE-164 80
   Sweden

   Phone: +46 10 719 00 00
   Fax:
   Email: fredrik.k.jansson@ericsson.com
   URI:





























Westerlund, et al.       Expires August 29, 2013               [Page 26]